asks - Useful idioms and tricks.

Sanely making many requests (with semaphores)

A (bounded) semaphore is like a sofa (sofaphore?). It can only fit so many tasks at once. If we have a semaphore who’s maximum size is 5 then only 5 tasks can sit on it. If one task finishes, another task can sit down. This is an extremely simple and effective way to manage the resources used by asks when making large amounts of requests.

If we wanted to request two thousand urls, we wouldn’t want to spawn two thousand tasks and have them all fight for CPU time.

import asks
import curio

async def worker(sema, url):
    async with sema:
        r = await asks.get(url)
        print('got ', url)

async def main(url_list):
    sema = curio.BoundedSemaphore(value=2)  # Set sofa size.
    for url in url_list:
        await curio.spawn(worker(sema, url))

url_list = ['http://httpbin.org/delay/5',
            'http://httpbin.org/delay/1',
            'http://httpbin.org/delay/2']

curio.run(main(url_list))

This method of limiting works for the single-request asks functions and for any of the sessions’ methods.

The result of running this is that the first and second url (delay/5 and delay/1) run. delay/1 finishes, and allows the third url, delay/2 to run.

  • After one second, delay/1 finishes.
  • After three seconds, delay/2 finishes.
  • After five seconds, delay/5 finishes.

Maintaining Order

Due to the nature of async, if you feed a list of urls to asks in some fashion, and store the responses in a list, there is no guarantee the responses will be in the same order as the list of urls.

A handy way of dealing with this on an example url_list is to pass the enumerated list as a dict dict(enumerate(url_list)) and then create a sorted list from a response dict. This sounds more confusing in writing than it is in code. Take a look:

import asks
import curio

results = {}

url_list = ['a', 'big', 'list', 'of', 'urls']

async def worker(key, url):
    r = await s.get(url)
    results[key] = r

async def main(url_list):
    url_dict = dict(enumerate(url_list))
    for key, url in url_dict.items():
        await curio.spawn(worker(key, url))

sorted_results = [response for _, response in sorted(results.items())]

s = asks.Session(connections=10)
curio.run(main(url_list))

In the above example, sorted_results is a list of response objects in the same order as url_list.

There are of course many ways to achieve this, but the above is noob friendly. Another way of handling order would be a heapq, or managing it while iterating curio’s taskgroups. Here’s an example of that:

import asks
import curio

results = []
url_list = ["https://www.httpbin.org/get" for _ in range(50)]

s = asks.Session()

async def worker(key, url):
    r = await s.get(url)
    results.append((key, r.body))

async def main():
    async with curio.TaskGroup() as g:
        for key, url in enumerate(url_list):
            g.start_soon(worker, key, url)
        # Here we iterate the TaskGroup, getting results as they come.
        async for _ in g:
            print(f"done with {results[-1][0]}")

    sorted_results = [response for _, response in sorted(results)]
    print(sorted_results)

Handling response body content (downloads etc.)

The recommended way to handle this sort of thing is by streaming. The following examples use a context manager on the response body to ensure the underlying connection is always handled properly:

import asks
import curio

async def main():
    r = await asks.get('http://httpbin.org/image/png', stream=True)
    with open('our_image.png', 'ab') as out_file:
        async with r.body: # you can do the usual "as x" here if you like.
            async for bytechunk in r.body:
                out_file.write(bytechunk)

curio.run(main())

An example of multiple downloads with streaming:

import asks
import curio

from functools import partial

async def downloader(filename, url):
    r = await asks.get(url, stream=True)
    async with curio.aopen(filename, 'ab') as out_file:
        async with r.body:
            async for bytechunk in r.body:
                out_file.write(bytechunk)

async def main():
    for indx, url in enumerate(['http://placehold.it/1000x1000',
                                'http://httpbin.org/image/png']):
        func = partial(downloader, str(indx) + '.png')
        await curio.spawn(func(url))

curio.run(main())

The callback argument lets you pass a function as a callback that will be run on each byte chunk of response body as the request is being processed . A simple use case for this is downloading a file.

Below you’ll find an example of a single download of an image with a given filename, and multiple downloads with sequential numeric filenames. They are very similar to the streaming examples above.

We define a callback function downloader that takes bytes and saves ‘em, and pass it in.

import asks
import curio

async def downloader(bytechunk):
    async with curio.aopen('our_image.png', 'ab') as out_file:
        await out_file.write(bytechunk)

async def main():
    r = await asks.get('http://httpbin.org/image/png', callback=downloader)

curio.run(main())

What about downloading a whole bunch of images, and naming them sequentially?

import asks
import curio

from functools import partial

async def downloader(filename, bytechunk):
    async with curio.aopen(filename, 'ab') as out_file:
        await out_file.write(bytechunk)

async def main():
    for indx, url in enumerate(['http://placehold.it/1000x1000',
                             'http://httpbin.org/image/png']):
        func = partial(downloader, str(indx) + '.png')
        await curio.spawn(asks.get(url, callback=func))

curio.run(main())