python爬虫 - 异步多任务

异步爬虫批量下载图片,文件下载链接已失效,不要直接运行

# 异步批量下载
import aiohttp
import asyncio
import time

async def job(session, url):
    # 声明为异步函数
    name = url.split('/')[-1]
    # 获得名字
    img = await session.get(url)
    # 触发到await就切换,等待get到数据
    imgcode = await img.read()
    # 读取内容
    with open("tmp/"+str(name), 'wb') as f:
        f.write(imgcode)
    return str(url)

async def main(loop, urls):
    async with aiohttp.ClientSession() as session:
        # 建立会话session
        tasks = [loop.create_task(job(session, url)) for url in urls]
        # 建立所有任务
        finished, unfinished = await asyncio.wait(tasks)
        # 触发await,等待任务完成
        all_results = [r.result() for r in finished]
        # 获取所有结果
        print("ALL RESULT:"+str(all_results))

urls = ['https://pythondict.com/wp-content/uploads/2019/07/2019073115192114.jpg',
        'https://pythondict.com/wp-content/uploads/2019/07/2019080216113098.jpg']
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop, urls))
loop.close()

  

参考资料:

爬虫学习

爬虫项目

爬虫之高性能异步爬虫

Python进程池multiprocessing.Pool的用法

正则表达式学习

unicode编码工具

测试代理可用性

66代理

20个正则表达式

腾讯哈勃安全分析系统

posted on 2020-12-08 16:01  iUpoint  阅读(236)  评论(0编辑  收藏  举报

导航