day10-协程
一、协程
协程:用户态轻量级线程,串行,无需锁
协程拥有自己的寄存器上下文和栈,能保留上一次调用时的状态,线程的上下文保存在cpu的寄存器中
CPU识别线程但不识别协程,协程由用户自己控制
好处:
无需线程上下文切换的开销
无需原子操作(不会被线程调度机制打断)锁定及同步的开销
方便切换控制流,简化编程模型
并发+高扩展性+低成本:一个CPU可支持上万协程,适合用于高并发处理
缺点:
无法利用多核资源:协程本质是单线程,不能同时将 单个CPU 的多个核用上,协程需要和进程配合才能运行在多CPU上
进行阻塞(Blocking)操作(如IO时)会阻塞掉整个程序
二、协程切换
1 # ------------------------------------------------------ 2 # greenlet实现手动切换 3 from greenlet import greenlet 4 5 def test1(): 6 print(12) 7 # 切换 8 gr2.switch() 9 print(34) 10 gr2.switch() 11 12 def test2(): 13 print(56) 14 gr1.switch() 15 print(78) 16 17 # 启动协程 18 gr1 = greenlet(test1) 19 gr2 = greenlet(test2) 20 gr1.switch() 21 22 # ------------------------------------------------------ 23 # 协程遇到IO操作进行切换 24 import gevent 25 26 def foo(): 27 print("Running in foo") 28 # 模仿io操作,一遇到io操作就切换 29 gevent.sleep(2) 30 print("Explicit context switch to foo again") 31 32 def bar(): 33 print("Explicit context to bar") 34 gevent.sleep(1) 35 print("Implicit context switch back to bar") 36 37 def fun3(): 38 print("running fun3") 39 # 虽然是0秒,但是会触发一次切换 40 gevent.sleep(0) 41 print("running fun3 again") 42 43 gevent.joinall([ 44 # 生成协程 45 gevent.spawn(foo), 46 gevent.spawn(bar), 47 gevent.spawn(fun3) 48 ])
三、协程爬网页
1 import urllib2 2 import time 3 import gevent 4 5 def f(url): 6 print("GET:{0}".format(url)) 7 resp = urllib2.urlopen(urllib2.Request(url)) 8 # 读取爬到的数据 9 data = resp.read() 10 with open("url.html", "wb") as f: 11 f.write(data) 12 print('{0} bytes received from {1}'.format(len(data), url)) 13 14 # ---------------------------------------------------------------------- 15 # 串行爬网页 16 urls = [ 17 'http://www.163.com/', 18 'https://www.yahoo.com/', 19 'https://github.com/' 20 ] 21 time_start = time.time() 22 for url in urls: 23 f(url) 24 print("cost: ", time.time()-time_start) 25 26 # ------------------------------------------------------------------------- 27 # 协程爬网页 28 # gevent检测不到urllib和socket的IO操作,通过gevent调用时会阻塞,等于串行 29 # 打monkey补丁可解决上述问题 30 from gevent import monkey 31 32 # 把当前程序的所有的io操作单独做标记 33 monkey.patch_all() 34 async_time_start = time.time() 35 gevent.joinall([ 36 # 用gevent启动协程 37 gevent.spawn(f, 'http://www.163.com/'), 38 gevent.spawn(f, 'https://www.yahoo.com/'), 39 gevent.spawn(f, 'https://github.com/'), 40 ]) 41 print("async cost: ", time.time()-async_time_start)

浙公网安备 33010602011771号