协程的那点事

参考系列教程https://blog.itpub.net/70041327/viewspace-3037606/

当前python使用的携程模块

greenlet和基于greenlet开发的gevent模块(手动切换阻塞)
yield关键字,只是能模拟出携程的运行过程
asyncio (自动切换阻塞) python3.4版本后加入
async & await关键字 python3.5版本后加入

3和4的区别在于asyncio是用语法糖模式,而async是直接在函数前加async,可以看下他们的语法上的差别并不大

asyncio模块的方法解释

循环器loop

asyncio的原理就是通过把N个任务放到一个死循环中，那么放入前我们需要先获得一个循环器的对象。
然后在循环器中，N个任务组成的任务列表，任务列表返回可执行任务和已经完成任务，
可执行任务丢到执行列表，准备执行，已完成任务从已完成任务列表中删除。
最后任务列表为空的时候，那么循环结束。

# 先获取一个事件循环器的对象
loop=asyncio.get_event_loop()
# 将任务放到任务列表中
loop.run_until_complete(asyncio.wait(task))

# 在3.7版本后 可以用asyncio.run 代替上面的代码

协程函数与协程对象

async def 函数名叫协程函数,而协程函数的返回值叫协程对象.
执行协程对象并不会运行协程函数内部代码,必须要交给事件循环器来执行

async def func():  #协程函数
	print("start")
ret = func() #到这一步并不会立即执行协程对象

# 必须交给循环器来执行
loop=asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(ret))

# python3.7对循环器又进行了封装,只需要调用run方法即可
asyncio.run(ret)

await`与可等待对象

await 后面跟可等待对象（协程对象、Future、Task 对象），用于等待 IO 操作完成。
协程函数体中，遇到 await，函数会等待，不会切换执行其他函数。

import asyncio

async def func(aa):
    print("%s>>start"%aa)
    await asyncio.sleep(2)
    print("%s>>end"%aa)
    return "func结束了,返回值是%s"%aa


async def main():
    print("main执行")
    ret1 = await func("ret1")
    print("ret1返回值是%s"%ret1)

    ret2 = await func("ret2")
    print("ret2返回值是%s" % ret2)

obj=asyncio.get_event_loop()
obj.run_until_complete(main())

"""
main执行
ret1>>start
ret1>>end
ret1返回值是func结束了,返回值是ret1
ret2>>start
ret2>>end
ret2返回值是func结束了,返回值是ret2
"""

Task对象

在事件循环器中添加多个任务，Task 类似并发调度协程，通过 asyncio.create_task(协程对象) 创建 Task 对象，使协程加入事件循环器中被调度执行。

asyncio.create_task(协程对象) 在 Python 3.7 引入，之前版本推荐使用 ensure_future()。

import asyncio

async def func(aa):
    print("%s>>start"%aa)
    await asyncio.sleep(2)
    print("%s>>end"%aa)
    return "func结束了,返回值是%s"%aa

async def main():
    print("main执行")
    task1 = obj.create_task(func("func1"))
    print("1")
    task2 = obj.create_task(func("func2"))
    print("2")
    task3 = asyncio.ensure_future(func("func3"))
    print("3")
    # python 3.7以上版本适用
    # task4 = asyncio.create_task(func("fun4"))
    ret1 = await task1
    ret2 = await task2
    ret3 = await task3
    print(ret1)
    print(ret2)
    print(ret3)
obj=asyncio.get_event_loop()
obj.run_until_complete(main())
# python 3.7以上版本适用
# asyncio.run(main())

"""
main执行
func1>>start
func2>>start
func3>>start
func1>>end
func2>>end
func3>>end
func结束了,返回值是func1
func结束了,返回值是func2
func结束了,返回值是func3
"""
从输出打印的内容看task会把函数添加任务后执行,添加后会继续往下执行,await是阻塞等待进程返回值

上述例子只是了解使用语法.一般我们会这么去遍历使用

下面看下红色框框就是优化后的代码.done和pending是await asyncio.wait(takslist)的返回值,运行结束的会放到done变量,而未结束的会放到pending 中去,done和pending都是一个集合对象.

await asyncio.wait(takslist,timeout =2),他们还有个timeout参数,可以设置等待时间,如果等待时间到强行结束,默认设置为None

下面例三,看下去掉上面的main协程函数怎么运行,asyncio.wait里加的

import asyncio

async def func(aa):
    print("%s>>start"%aa)
    await asyncio.sleep(2)
    print("%s>>end"%aa)
    return "func结束了,返回值是%s"%aa


takslist = [
        func("func1"),
        func("func2"),
        func("func3"),
    ]

# 用下面这种就不行,因为这么写会把task立即加到循环器中,而此时obj还未产生循环器的实例对象
#tasklist=[
#     obj.create_task(func("func1")),
#     obj.create_task(func("func2")),
# ]

obj=asyncio.get_event_loop()
done,pending = obj.run_until_complete(asyncio.wait(takslist))
print(done)


但是把tasklist放到obj下面就可以运行了,但是这也破坏了代码的结构和调用方式
#obj=asyncio.get_event_loop()

#takslist=[
#    obj.create_task(func("func1")),
#    obj.create_task(func("func2")),
#]
#done,pending = obj.run_until_complete(asyncio.wait(takslist))

Future对象

Future对象功能也是用来等待异步处理的结果的
asyncio中的Future对象是一个相对更偏向底层的可对象，通常我们不会直接用到这个对象，而是直接使用Task对象来完成任务的并和状态的追踪。可以理解为是一个容器，用于保存某个计算的结果，该计算可能是立即可用的，也可能在以后可用。

Future 对象用于等待异步处理的结果，Task 继承自 Future，Task 对象内部 await 的结果处理基于 Future 对象。

占位：它们在事件循环中充当计算的占位符。初始状态是挂起的，直到异步操作完成。
读取结果：可以从 Future 对象中读取异步计算的结果。
状态管理：可以查询异步操作的状态，例如是否完成、是否成功等。

import asyncio
# 示例 1：手动设置 Future 的结果
async def main():
    # 创建一个未完成的 Future 对象
    loop = asyncio.get_event_loop()
    fut = loop.create_future()

    # 手动设置 Future 的结果
    print("Setting future result manually...")
    fut.set_result('manual result')

    # 等待 Future 完成
    result = await fut
    print("Future received result:", result)

asyncio.run(main())

# 示例 2：手动取消 Future
async def main():
    loop = asyncio.get_event_loop()
    fut = loop.create_future()

    # 手动取消 Future
    print("Cancelling future manually...")
    # fut.cancel() 手动取消 Future。如果你试图等待已取消的 Future，它会引发 CancelledError。
    fut.cancel()

    try:
        # 尝试等待一个已取消的 Future
        await fut
    except asyncio.CancelledError:
        print("Future was cancelled")

asyncio.run(main())

# 组合多个 Future

import asyncio

async def set_after(fut, delay, value):
    await asyncio.sleep(delay)
    fut.set_result(value)

async def main():
    loop = asyncio.get_event_loop()
    
    # 创建两个 Future 对象
    fut1 = loop.create_future()
    fut2 = loop.create_future()
    
    # 设定一些异步操作完成 Future
    loop.create_task(set_after(fut1, 2, 'result1'))
    loop.create_task(set_after(fut2, 4, 'result2'))

    # 使用 asyncio.gather 等待多个 Future 完成
    print("Waiting for both futures...")
    results = await asyncio.gather(fut1, fut2)
    print("Results from futures:", results)

asyncio.run(main())

asyncio.Future 与 concurrent.futures.Future 的区别：
asyncio.Future 适用于基于 asyncio 的异步编程模型，主要用于协程间传递异步操作结果。
concurrent.futures.Future 适用于多线程或进程环境，用于管理并发任务和获取任务结果。
concurrent.futures.Future 是使用进程池和线程池使用到的模块,作用是进程池和线程池异步操作时使用的对象
区别：
asyncio.Future 适用于基于 asyncio 的异步编程模型，主要在协程之间传递异步操作的结果。
concurrent.futures.Future 适用于多线程或多进程环境，用于管理并发任务和获取任务的结果。
使用方式：
asyncio.Future 通常通过 await 关键字来等待结果，适用于协程内部。
concurrent.futures.Future 通常通过 .result() 方法来获取结果，适用于同步代码中。
事件循环：
asyncio.Future 需要在 asyncio 的事件循环中运行。
concurrent.futures.Future 不依赖于事件循环，可以在任何地方使用。

一般这2个模块不会有交集,但是如果调用的第三方产品假设Mysql不支持异步访问调用,那么用这个Future对象把他们的处理方式模拟成异步形态进行处理

使用 `run_in_executor()` 调用普通函数

下面是如何用协程函数调用普通函数,2个要点,

普通函数外面要包一层协程函数,

用循环器.run_in_executor()来跳开阻塞,用多线程运行,第一个参数是用来放进程池(线程池的)

import asyncio

def faa(idx):
    print("第%s个foo开始运行" % idx)
    time.sleep(2)
    return idx

async def faa_faster(idx):
    obj = asyncio.get_event_loop()
    #在创建迭代器时候就用run_in_executor调用多(进)线程
    ret = obj.run_in_executor(None,faa,idx)
    # print("第%s个foo开始运行" % idx)
    a = await ret
    print(a)


task = [faa_faster(i) for i in range(1000)]
obj = asyncio.get_event_loop()
obj.run_until_complete(asyncio.wait(task))

示例2.多线程执行普通函数,并以协程模式运行

import asyncio
from concurrent.futures.thread import ThreadPoolExecutor

def faa(idx):
    print("第%s个foo开始运行" % idx)
    time.sleep(2)
    return idx


async def faa_faster(pool,idx):
    obj = asyncio.get_event_loop()
    #第一个参数默认是None,如果传(进)线程池进去,就以多(进)线程形式运行普通函数
    ret = obj.run_in_executor(pool,faa,idx)
    # print("第%s个foo开始运行" % idx)
    a = await ret
    print(a)

pool = ThreadPoolExecutor(max_workers=5)
task = [faa_faster(pool,i) for i in range(1000)]
obj = asyncio.get_event_loop()
obj.run_until_complete(asyncio.wait(task))

`asyncio.gather` 方法和`as_completed`方法

gather方法 asyncio.gather用来并发运行任务，下面例子表示协同的执行a和b2个协程
asyncio.gather 用于并发运行多个协程，如下面的例子展示了并发执行三个协程。

asyncio.gather() 接收多个可等待对象并返回一个单一的可等待对象，完成后会返回所有任务结果的列表，并且结果的顺序与传入任务的顺序一致。

import asyncio
async def async_function1(wait_time):
    await asyncio.sleep(wait_time)
    return "需要%s秒的时间，所以会最后输出"%wait_time

async def main():
    tasks = [async_function1(7-i) for i in range(7)]
    # 使用 as_completed() 获取完成的协程
    #for task in asyncio.as_completed(tasks):
    #    result = await task  # 等待任务完成并获取结果
    #    print(result)  # 立即处理结果
    results = await asyncio.gather(*tasks)
    print(results)
        
asyncio.run(main())

# 返回顺序按照传入任务的顺序一致
['需要7秒的时间，所以会最后输出', '需要6秒的时间，所以会最后输出', '需要5秒的时间，所以会最后输出', '需要4秒的时间，所以会最后输出', '需要3秒的时间，所以会最后输出', '需要2秒的时间，所以会最后输出', '需要1秒的时间，所以会最后输出']

asyncio.as_completed()和asyncio.gather()的优势上就是我们不需要等待所有的任务组都处理完成就可以处理，

接收一个可等待对象（如协程）的集合，并返回一个迭代器。该迭代器会逐个返回每个可等待对象的结果，按它们完成的顺序返回，而不是按它们启动的顺序返回。

import asyncio
async def async_function1(wait_time):
    await asyncio.sleep(wait_time)
    return "需要%s秒的时间，所以会最后输出"%wait_time

async def main():
    tasks = [async_function1(7-i) for i in range(7)]
    # 使用 as_completed() 获取完成的协程
    for task in asyncio.as_completed(tasks):
        result = await task  # 等待任务完成并获取结果
        print(result)  # 立即处理结果

asyncio.run(main())

返回顺序按照完成的快慢返回
需要1秒的时间，所以会最后输出
需要2秒的时间，所以会最后输出
需要3秒的时间，所以会最后输出
需要4秒的时间，所以会最后输出
需要5秒的时间，所以会最后输出
需要6秒的时间，所以会最后输出
需要7秒的时间，所以会最后输出

uvloop模块

uvloop是asyncio事件循环的替代方案,执行效率比get_event_loop高一倍多.
注意点uvloop是3.7版本以上才支持.
使用方法:

import asyncio
import uvloop

# 2.设置uvloop循环器代替把Eventloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

其他的asyncio的代码和之前一样,设置完后内部的循环器事件就会用uvloop
asyncio.run() 开始执行协程

协程函数、Future 对象、事件循环器与回调函数的实际应用

同步函数的回调：

def follow(t):
    print(t.result())

async def test(i):
    print(f"from test, 开始阻塞, 传入的参数是{i}")
    await asyncio.sleep(1)
    print(f"from test, 停止阻塞, 传入的参数是{i}")
    return f"test 的返回结果{i}"

obj = asyncio.get_event_loop()
t1 = test("t1")
t2 = test("t2")
t3 = test("t3")

f1 = obj.create_task(t1)
f1.add_done_callback(follow)
f2 = asyncio.ensure_future(t2)
f2.add_done_callback(follow)
f3 = obj.create_task(t3)
obj.run_until_complete(asyncio.wait([f1, f2, f3], timeout=6))

使用协程函数进行回调：

async def follow2(t):
    print(f"follow2 开始运行, 传入的参数t是{t}")
    await asyncio.sleep(2)
    print(f"from 回调函数follow2, 上一个test返回值是{t.result()}")

async def test(i):
    print(f"from test, 开始阻塞, 传入的参数是{i}")
    await asyncio.sleep(1)
    print(f"from test, 停止阻塞, 传入的参数是{i}")
    return f"test 的返回结果{i}"

async def add_success_callback(fun1, fun2):
    await fun1
    ret = await fun2(fun1)
    return ret

obj = asyncio.get_event_loop()
f1 = obj.create_task(test("t1"))
f1 = add_success_callback(f1, follow2)
f2 = asyncio.ensure_future(test("t2"))
f2 = add_success_callback(f2, follow2)
obj.run_until_complete(asyncio.wait([f1, f2], timeout=6))

async def test(i):
    print(f"from test,开始阻塞,传入的参数是{i}")
    await asyncio.sleep(1)
    print(f"from test,停止阻塞,传入的参数是{i}")
    return f"test的返回结果{i}"
obj = asyncio.get_event_loop()
t1 = test("t1") #和普通函数不一样,这一步函数并不会执行,需要循环器开始执行
t2 = test("t2")
obj.run_until_complete(asyncio.wait([t1,t2],timeout=6))


"""
协程三要素
1.函数前加要用async 表示一个协程函数
2. 对于阻塞的函数或者代码块用await 进行阻塞挂起
3. 把协程函数丢到循环器执行
    3.1循环器先实例化
    3.2 调用循环器的run_until_complete
    3.3 循环器放入的是future对象,不是

"""
------------------------------------future对象协程函数,循环器的关系理解---------------------------------------------
关系如下:
- obj.run_until_complete 循环器,传入future对象
  - asyncio.wait 或者 asyncio.gather 把协程函数封装成future对象
    - async def test 协程函数,可以用await进行阻塞挂起

async def test(i):
    print(f"from test,开始阻塞,传入的参数是{i}")
    await asyncio.sleep(1)
    print(f"from test,停止阻塞,传入的参数是{i}")
    return f"test的返回结果{i}"

#obj = asyncio.get_event_loop()
#t1 = test("t1") #和普通函数不一样,这一步函数并不会执行,需要循环器开始执行
#t2 = test("t2")
#obj.run_until_complete(asyncio.wait([t1,t2],timeout=6))
"""
如果要运行多个协程函数,可以用asyncio.wait和asyncio.gather,前者返回done,pending,后者只返回运行结果
这里run_until_complete里面需要future对象,我们可以用create_task或者asyncio.ensure_future
把协程函数放进去封装成future对象, 下面这三行代码可以进行改造
t1 = test("t1") 
t2 = test("t2")
obj.run_until_complete(asyncio.wait([t1,t2],timeout=6))
"""
# 1. asyncio.wait 写法展示
t1 = obj.create_task(test("t1"))
t2 = asyncio.ensure_future(test("t2"))
obj.run_until_complete(asyncio.wait([t1,t2],timeout=6))
#也可以用下面的写法
t = asyncio.wait([t1,t2],timeout=6)
# 用asyncio.wait 会返回done,pending2个返回值
done,pending,=obj.run_until_complete(t)
print(done)

# 2. asyncio.gather 写法展示
t1 = obj.create_task(test("t1"))
t2 = asyncio.ensure_future(test("t2"))
c = asyncio.gather(t1,t2)
ret = obj.run_until_complete(c)
# 也可以用这种写法
ret = obj.run_until_complete(asyncio.gather(t1,t2))
print(ret)

----------------------------------------------回调函数-----------------------------------------------
1. 同步函数的回调

def follow(t):
    #这里的t是test函数的运行返回值,是一个task运行完的对象,result方法可以获取返回值
    print(t.result())
    # print(f"from follow,参数是{done.result}")


async def test(i):
    print(f"from test,开始阻塞,传入的参数是{i}")
    await asyncio.sleep(1)
    print(f"from test,停止阻塞,传入的参数是{i}")
    return f"test的返回结果{i}"


obj = asyncio.get_event_loop()
t1 = test("t1") #和普通函数不一样,这一步函数并不会执行,需要循环器开始执行
t2 = test("t2")
t3 = test("t3")

f1 = obj.create_task(t1)
#同步函数回调用add_done_callback 方法,不过要在task任务被封装成future对象后使用
#传入的follow是回调函数,同时也是t1的返回值
f1.add_done_callback(follow)
#也可以用 asyncio.ensure_future 封装
f2 = asyncio.ensure_future(t2)
f2.add_done_callback(follow)
#f3没有回调函数
f3 = obj.create_task(t3)
obj.run_until_complete(asyncio.wait([f1,f2,f3],timeout=6))

# 也可以用asyncio.gather进行运行
f1 = obj.create_task(t1)
f1.add_done_callback(follow)
f2 = asyncio.ensure_future(t2)
f2.add_done_callback(follow)
f3 = obj.create_task(t3)
obj.run_until_complete(asyncio.gather(f1,f2,f3))

2. 使用协程单函数并发示例
上面通过gather 和 wait示例了多个函数之间并发运行.下面是一个单个函数的并发示例.实用场景可以用于协程爬虫,一个协程函数不停的发送多个url进行获取数据爬取.
import asyncio
import random
from functools import partial

async def get_response(url,sleep_time):
    print("开始爬取,网页为%s"%url)
    await asyncio.sleep(sleep_time)
    print("网页%s爬取完毕"%url)

async def main(loop):
    url=1
    while 1:
        sleep_time = random.randint(1,3)
        task = get_response(url,sleep_time)
        #2种创建task对象的方式
        loop.create_task(task)
        # asyncio.ensure_future(task)
        # 如果这里不await,看不出效果,线程的使用权一直没有释放.
        await asyncio.sleep(0)
        url+=1

obj=asyncio.get_event_loop()
obj.run_until_complete(main(obj))



3. 使用协程函数进行回调


async def follow2(t):
    print(f"follow2开始运行,传入的参数t是{t}")
    await asyncio.sleep(2)
    #这里的t是test函数的运行返回值,是一个task运行完的对象,result方法可以获取返回值
    print(f"from回调函数follow2,上一个test返回值是{t.result()}")

async def test(i):
    print(f"from test,开始阻塞,传入的参数是{i}")
    await asyncio.sleep(1)
    print(f"from test,停止阻塞,传入的参数是{i}")
    return f"test的返回结果{i}"

async def add_success_callback(fun1,fun2):
    # 这里func1就是test函数的运行结果,只是会阻塞
    await fun1
    # test有了结果会传给fun2继续阻塞挂起
    ret =await fun2(fun1)
    return ret

obj = asyncio.get_event_loop()
f1 = obj.create_task(test("t1"))
"""
add_done_callback 被认为是"low level"接口 . 回调函数使用协同程序时，最好自定义一个衔接的协程函数,进行回调
比如add_success_callback 就是我们自定义的用来回调的函数, 因为回调的协程函数也需要被挂起
"""
#传入的follow是回调函数,同时也是t1的返回值
f1 = add_success_callback(f1,follow2)
#也可以用 asyncio.ensure_future 封装
f2 = asyncio.ensure_future(test("t2"))
f2 = add_success_callback(f2,follow2)
obj.run_until_complete(asyncio.wait([f1,f2],timeout=6))
# 使用gather方法
obj.run_until_complete(asyncio.gather(f1,f2))
# 使用wait方法
obj.run_until_complete(asyncio.wait[f1,f2])

# gather和wait方法的区别
asyncio.gather封装的Task全程黑盒，只告诉你协程结果。
asyncio.wait会返回封装的Task(包含已完成和挂起的任务)，如果你关注协程执行结果你需要从对应Task实例里面用result方法自己拿。
asyncio.wait支持一个接收参数 return_when，在默认情况下， asyncio.wait会等待全部任务完成(returnwhen=‘ALLCOMPLETED’)，它还支持FIRSTCOMPLETED（第一个协程完成就返回）和FIRSTEXCEPTION（出现第一个异常就返回）

# 正常生产环境不会一个个去对task封装成future对象,我们可以优化成下面这样

async def __after_done_callback(future_result):
    # await for something...
    pass

async def __future_job(number):
    await some_async_work(number)
    return number + 1

loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(__future_job(x)) for x in range(100)]  # create 100 future jobs

for f in asyncio.as_completed(tasks):
    result = await f
    await __after_done_callback(result)

loop.close()

asyncio模块其他方法

asyncio.wait 和 asyncio.wait_for和asyncio.gather 区别

对比总结
asyncio.wait:
适用于监控多个任务的完成情况。
可设置任务的完成条件（比如任何一个任务完成后立即返回）。
asyncio.wait_for:
适用于限制单个协程的执行时间。
会触发超时异常以处理长时间未完成的协程。
asyncio.gather:
适用于同时执行多个协程并等待所有协程完成。
顺序一致返回结果，不处理超时。
根据具体需求选择合适的工具，比如需要对多个任务进行超时管理，可以结合 asyncio.wait 和 asyncio.wait_for 来实现多任务的超时控制，asyncio.gather 用于便利地收集多个并发任务的结果。
示例说明
asyncio.wait 用于等待一个或多个协程完成。可以设置完成条件，如 FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED。
asyncio.wait 会阻塞协程上下文直至满足指定的条件（默认条件所有协程运行结束）
wait支持设置一个超时时间，但在超时发生时不会取消可等待对象，但若事件循环结束时未完成则会抛出CancelledError异常。如果要超时主动取消，可用wait_for方法
asyncio.wait 返回的结果集是按照事件循环中的任务完成顺序排列的，所以通常和原始任务顺序不同


import asyncio

async def task(name, duration):
    await asyncio.sleep(duration)
    return f"Task {name} finished"

async def main():
    # 创建多个任务
    tasks = [
        asyncio.create_task(task('A', 2)),
        asyncio.create_task(task('B', 3)),
        asyncio.create_task(task('C', 1))
    ]
    
    # 等待所有任务完成
    done, pending = await asyncio.wait(tasks, return_when=asyncio.ALL_COMPLETED)
    
    for d in done:
        print(await d)

asyncio.run(main())

asyncio.wait_for 用于等待一个单一协程在设定的超时时间内完成。如果该协程在设定时间内没有完成，将会引发 asyncio.TimeoutError。
使用场景

超时控制：用于希望在一定时间内完成某个协程操作，以防止长时间等待。
捕获长时间阻塞：对潜在的长时间阻塞操作进行超时检查。

import asyncio

async def task():
    await asyncio.sleep(3)
    return "Task finished"

async def main():
    try:
        result = await asyncio.wait_for(task(), timeout=2)
        print(result)
    except asyncio.TimeoutError:
        print("The task took too long!")

asyncio.run(main())

asyncio.gather
用途: 用于聚合多个协程，可以同时启动并等待所有协程完成，返回值按传递协程的顺序排列。
返回值: 返回协程执行结果的列表。
特性: 执行期间，如果有一个协程抛出异常，会立即停止并抛出该异常。该函数不会处理任务之间的超时。

import asyncio

async def task(name, duration):
    await asyncio.sleep(duration)
    return f"{name} finished"

async def main():
    results = await asyncio.gather(
        task("TaskA", 2),
        task("TaskB", 1),
        task("TaskC", 3)
    )
    for result in results:
        print(result)

asyncio.run(main())

Semaphore 设置协程的并发数量

当我们要限制一个协程的并发数的时候，可以在调用协程之前，先初始化一个Semaphore对象。然后把这个对象传到需要限制并发的协程里面.
在协程里面，使用异步上下文管理器包住你的正式代码：
async with sem:
正式代码

import asyncio
import random
from functools import partial

async def get_response(url,sleep_time):
    #在并发函数中添加,直接控制并发数
    async with sem:
        print("开始爬取,网页为%s"%url)
        await asyncio.sleep(sleep_time)
        print("网页%s爬取完毕"%url)

async def main(loop):
    url=1
    while 1:
        sleep_time = random.randint(1,3)
        task = get_response(url,sleep_time)
        #2种创建task对象的方式
        loop.create_task(task)
        # asyncio.ensure_future(task)
        # 如果这里不await,看不出效果,线程的使用权一直没有释放.
        await asyncio.sleep(0)
        url+=1
#设置协程的并发数
sem = asyncio.Semaphore(3)
obj=asyncio.get_event_loop()
obj.run_until_complete(main(obj))

这个写法其实跟多线程的加锁很像。只不过锁是确保同一个时间只有一个线程在运行，而Semaphore可以人为指定能有多少个协程同时运行。

如何限制1分钟内能够运行的协程数
其实非常简单，在并发的协程里面加个 asyncio.sleep 就可以了。例如上面的例子，我想限制每分钟只能有3个协程，代码示例如下

async def req(delay, sem):
    print(f'请求一个延迟为{delay}秒的接口')
    async with sem:
        async with httpx.AsyncClient(timeout=20) as client:
            resp = await client.get(f'http://127.0.0.1:8000/sleep/{delay}')
            result = resp.json()
            print(result)
    await asyncio.sleep(60)

你的程序里面，可能有多个不同的部分，有些部分限制并发数为 a，有些部分限制并发数为 b。那么你可以初始化多个Semaphore对象，分别传给不同的协程。

3.7版本以上设置并发量引发的 RuntimeError

错误详情:
参考文章:https://www.jb51.cc/python/3859535.html

Task exception was never retrieved
future: <Task finished name='Task-12' coro=<visit() done, defined at /Users/young_shi/PycharmProjects/test/test_fd/test4.py:65> exception=RuntimeError("Task <Task pending name='Task-12' coro=<visit() running at /Users/young_shi/PycharmProjects/test/test_fd/test4.py:66> cb=[_wait.<locals>._on_completion() at /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/tasks.py:515]> got Future <Future pending> attached to a different loop")>

RuntimeError: Task <Task pending name='Task-12' coro=<visit() running at /Users/young_shi/PycharmProjects/test/test_fd/test4.py:66> cb=[_wait.<locals>._on_completion() at /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/tasks.py:515]> got Future <Future pending> attached to a different loop

这是因为Semaphore构造函数在asyncio / locks.py中设置了_loop属性,
如果用3.6 都是用loop=asyncio.get_event_loop() 在当前的loop中以协程方式运行, 当然如果设置了新的loop那就按下面的操作,避免触发runtimeerror异常
但是在3.7以上版本用asyncio.run()启动会创建一个全新的循环–在asyncio / runners.py中.
在asyncio.run()外部启动的Semaphore将获取asyncio“默认”循环，因此不能与通过asyncio.run()创建的事件循环一起使用。
解决方法:
必须将它们传递到正确的位置，从asyncio.run()调用的代码中启动Semaphore 。

import asyncio

async def work(sem):
    async with sem:
        print('working')
        await asyncio.sleep(1)

async def main():
    sem = asyncio.Semaphore(2)  # 解决方法看这步
    await asyncio.gather(work(sem),work(sem),work(sem))

asyncio.run(main())

有多个 asyncio loop 时，在创建 asyncio.Semaphore 时需要指定使用哪个 loop，例如 asyncio.Semaphore(2,loop=xxx_loop)

set_event_loop()

和get_event_loop不同的是该方法会设置非当前 OS 线程的为当前事件循环通常和 new_event_loop() 搭配使用
下面代码实例我们首先设置了一个obj为协程主线程,然后有设置了一个new_loop为新的线程.实现了2个线程单独运行自己的协程程序.
print(obj is new_loop) 为false 可以看出2个是不同的线程


async def rnd(sleep_time):
    await asyncio.sleep(sleep_time)
    ret=random.randint(1,6)
    print("from rnd的结果是",ret,type(ret))
    return ret


obj=asyncio.get_event_loop()
new_loop = asyncio.new_event_loop()
asyncio.set_event_loop(new_loop)

print(obj is new_loop)
f1 = new_loop.create_task(rnd(2))
f2 = obj.create_task(rnd(3))
# f1.add_successful(f1,ret)

new_loop.run_until_complete(asyncio.wait([f1]))
obj.run_until_complete(asyncio.wait([f2]))
print("over")

asyncio模块+aiomysql的一个爬取文章title入库练习

代码中每个函数的功能大致介绍如下
async def request(url, client): 请求发包程序
def filter_link(link):对url按规则进行过滤
def extract_urls(HTML_response):提取响应体中的url.
async def extract_title(pool, HTML_response):提取响应体的文章title,并且入库处理
async def consumer(pool):模拟一个队列的功能,访问的waiting_url的url,访问后放到visited_url,
async def init_urls(start_url): 访问初始的start_url获取里面a标签,然后拼接url放入队列列表waiting_url中
async def main(loop):调度器里面放2个协程 init_urls 和 consumer
源码如下

import asyncio
import aiomysql
import httpx
from lxml import etree
import re
from urllib.parse import urljoin


async def request(url, client):
    # 发包程序,有些网页可能访问会报错,用try捕获
    # client是httpx.Asynclient(),外层用了with控制,因此这里不做手动关闭
    try:
        r = await client.get(url=url)
        # r = await client.get(url=url)
        if r.status_code == 200:
            return r
    except Exception as e:
        print("访问出错url >>%s" % url)
        return False


def filter_link(link):
    # 过滤规则,过滤出 /test/aa.html 之类的url
    ret = re.findall("^/.*/.*", link)
    if ret:
        return ret


def extract_urls(HTML_response):
    #从响应体中提取a标签的连接,提取出来的url会和base_url拼接,然后丢到等待访问的url中去
    urls = []
    tree = etree.HTML(HTML_response.text)
    a_links = tree.xpath(".//a/@href")
    if a_links:
        a_links = filter(filter_link, a_links)
    else:
        return False
    # 对其他路由进行网址拼接
    [urls.append(urljoin(start_url, i)) for i in a_links]
    if len(urls) > 0:
        [waiting_url.append(i) for i in urls]
        return urls
    else:
        return False


async def extract_title(pool, HTML_response):
    #提取响应体的文章标题,提取后用aiomysql对数据库进行插入
    tree = etree.HTML(HTML_response.text)
    a_title = tree.xpath(".//h3[@class='info-title shuang']//text()")
    if len(a_title) > 0:
        print("标题结果为%s,url是%s" % (a_title, HTML_response.url))
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                insert_content = []
                for k, v in enumerate(a_title):
                    insert_content.append((v))
                sql = 'insert into aiotest (title) values (%s);'
                # await cur.execute("SELECT 42;")
                await cur.executemany(sql, insert_content)
        # await pool.wait_close()
        return a_title
    else:
        return False


async def consumer(pool):
    #waiting_url 列表模拟了一个队列,从waiting_url列表提取url进行访问,如果没有停3秒等新的url添加进来
    #访问完以后添加到visited_url列表中,已访问过的url不进行二次请求
    async with httpx.AsyncClient(headers=headers) as client2:
        while STARTING:
            if len(waiting_url) == 0:
                print("消费队列已空,等3秒")
                asyncio.sleep(3)
                continue
            prepare_visit_url = waiting_url.pop()
            if prepare_visit_url not in visited_url:
                print("准备访问:%s" % prepare_visit_url)
                visited_url.add(prepare_visit_url)
                url_response = await request(prepare_visit_url, client2)
                if url_response and len(url_response.text) > 10:
                    extract_urls(url_response)
                    response = await extract_title(pool, url_response)
                    if response:
                        print("%s获取的title%s个" % (url_response, len(response)))
                    else:
                        print("url:%s  未获取标题" % prepare_visit_url)
                # await asyncio.sleep(2)


async def init_urls(start_url):
    #刚开始没有请求的url,定义一个start_url,从这个页面中开始获取
    async with httpx.AsyncClient() as client:
        print("起始url开始抓取")
        response = await request(url=start_url, client=client)
        extract_urls(response)


async def main(loop):
    #调度器 ,不要忘记await,血的教训
    pool = await aiomysql.create_pool(host='127.0.0.1', port=3306, user='root', password='XXXXX', db='test', loop=loop,
                                      autocommit=True)
    await asyncio.ensure_future(init_urls(start_url))
    await asyncio.ensure_future(consumer(pool))

# 消费模型判断变量
STARTING = True
# 起始url
start_url = "http://m.jobbole.com/"
# 已访问过的url
visited_url = set()
# 待访问的url
waiting_url = []
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
}
obj = asyncio.get_event_loop()
obj.run_until_complete(asyncio.gather(main(obj)))

参考的协程代码网址
https://www.cnblogs.com/xujunkai/articles/12343664.html Django使用协程创建数据
https://www.cnblogs.com/c-x-a/p/11022904.html Asyncio之EventLoop
https://blog.csdn.net/qq_42992919/article/details/97390957 深入理解asyncio(三)
https://blog.csdn.net/weixin_34293911/article/details/93467995 asyncio异步IO--协程（Coroutine）与任务(Task)详解
https://blog.csdn.net/BSSZDS930/article/details/117787290

posted @ 2021-10-23 13:16 零哭谷阅读(368) 评论(0) 收藏举报

刷新页面返回顶部

死了也要PY

协程的那点事