python异步框架中协程之间的并行
python异步框架中协程之间的并行
python中的异步协程框架有很多,比如 tornado , gevent , asyncio , twisted 等。协程带来的是低消耗的并发,在等待IO事件的时候可以把控制权交给其它的协程,这个是它并发能力的保障。但是光有并发还是不够的,高并发并不能保证低延迟,因为一个业务逻辑的流程可能包含多个异步IO的请求,如果这些异步IO的请求是一个一个逐步执行的,虽然server的吞吐量还是很高,但是每个请求的延迟就会很大。为了解决这类问题,每个框架都有各自不同的方式,下面我们就来分别看看,它们都是怎么管理互不相关协程之间的并行的。
tornado
python2.7及以上
tornado的代码就简短很多,直接yield一个coroutine的列表出去就好了:
#!/usr/bin/env python
# _*_coding:utf-8_*_
import random
import requests,json
import time
from tornado import gen
from tornado.ioloop import IOLoop
@gen.coroutine
def get_url(url):
r = requests.get(url, timeout=3)
print url, r.status_code
resp = r.text
print type(resp)
raise gen.Return((url, r.status_code))
@gen.coroutine
def process_once_everything_ready():
before = time.time()
# coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']]
coroutines = [get_url(url) for url in ['https://www.python.org/', 'https://github.com/', 'https://www.yahoo.com/']]
result = yield coroutines
after = time.time()
print(result)
print('total time: {} seconds'.format(after - before))
if __name__ == '__main__':
IOLoop.current().run_sync(process_once_everything_ready)
输出:
/usr/bin/python /Users/liujianzuo/py_test/s83_company_code/edns_prober_v2/cname_search/a_sync_io.py
https://www.python.org/ 200
<type 'unicode'>
https://github.com/ 200
<type 'unicode'>
https://www.yahoo.com/ 200
<type 'unicode'>
[('https://www.python.org/', 200), ('https://github.com/', 200), ('https://www.yahoo.com/', 200)]
total time: 4.64905309677 seconds
import random import time from tornado import gen from tornado.ioloop import IOLoop @gen.coroutine def get_url(url): wait_time = random.randint(1, 4) yield gen.sleep(wait_time) print('URL {} took {}s to get!'.format(url, wait_time)) raise gen.Return((url, wait_time)) @gen.coroutine def process_once_everything_ready(): before = time.time() coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']] result = yield coroutines after = time.time() print(result) print('total time: {} seconds'.format(after - before)) if __name__ == '__main__': IOLoop.current().run_sync(process_once_everything_ready)
$ python3 tornado_test.py URL URL2 took 1s to get! URL URL3 took 1s to get! URL URL1 took 4s to get! [('URL1', 4), ('URL2', 1), ('URL3', 1)] total time: 4.000649929046631 seconds
在这里,总的运行时间也是等于最长的协程的运行时间
因为现在tornado已经集成了 asyncio 以及 twisted 模块,也可以利用它们的方式去做,这里就不展开了。
asyncio
python3.4及以上http://xidui.github.io/2015/11/11/python%E5%BC%82%E6%AD%A5%E6%A1%86%E6%9E%B6%E5%8D%8F%E7%A8%8B%E4%B9%8B%E9%97%B4%E7%9A%84%E5%B9%B6%E8%A1%8C/?utm_source=tuicool&utm_medium=referral
在我的博客里有一篇关于asyncio库的译文,里面最后一部分就有介绍它是如何管理互不相关的协程的。这里我们还是引用它,并给他增加了计时的功能来更好地阐述协程是如何并行的:
import asyncio
import random
import time
@asyncio.coroutine
def get_url(url):
wait_time = random.randint(1, 4)
yield from asyncio.sleep(wait_time)
print('URL {} took {}s to get!'.format(url, wait_time))
return url, wait_time
@asyncio.coroutine
def process_as_results_come_in():
before = time.time()
coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']]
for coroutine in asyncio.as_completed(coroutines):
url, wait_time = yield from coroutine
print('Coroutine for {} is done'.format(url))
after = time.time()
print('total time: {} seconds'.format(after - before))
@asyncio.coroutine
def process_once_everything_ready():
before = time.time()
coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']]
results = yield from asyncio.gather(*coroutines)
print(results)
after = time.time()
print('total time: {} seconds'.format(after - before))
def main():
loop = asyncio.get_event_loop()
print("First, process results as they come in:")
loop.run_until_complete(process_as_results_come_in())
print("\nNow, process results once they are all ready:")
loop.run_until_complete(process_once_everything_ready())
if __name__ == '__main__':
main()
总结
- 在协程框架中的sleep,都不能用原来
time模块中的sleep了,不然它会阻塞整个线程,而所有协程都是运行在同一个线程中的。可以看到两个框架都会sleep作了封装gen.sleep()和asyncio.sleep(),内部的实现上,它们都是注册了一个定时器在eventloop中,把CPU的控制权交给其它协程。 - 从协程的实现原理层面去说,也是比较容易理解这种并行方式的。两个框架都是把一个生成器对象的列表yield出去,交给调度器,再由调度器分别执行并注册回调,所以才能够实现并行。
浙公网安备 33010602011771号