13、爬虫之协程
1 协程
首先我们需要知道的是requests
是同步的方法。而我们若想使用协程,写的方法都尽量不是使用同步的方法。
因些我们,选择使用一个新的模块库`aiohttp
官网 https://docs.aiohttp.org/en/stable/
1.1 安装
pip install aiohttp
1.2 快速开始
import aiohttp
import asyncio
async def main():
async with aiohttp.ClientSession() as session:
async with session.get('http://python.org') as response:
print("Status:", response.status)
print("Content-type:", response.headers['content-type'])
html = await response.text()
print("Body:", html[:15], "...")
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
1.3 clientSession的使用
header设置
headers={"Authorization": "Basic bG9naW46cGFzcw=="}
async with aiohttp.ClientSession(headers=headers) as session:
async with session.get("http://httpbin.org/headers") as r:
json_body = await r.json()
assert json_body['headers']['Authorization'] == \
'Basic bG9naW46cGFzcw=='
Cookie的使用
url = 'http://httpbin.org/cookies'
cookies = {'cookies_are': 'working'}
async with ClientSession(cookies=cookies) as session:
async with session.get(url) as resp:
assert await resp.json() == {
"cookies": {"cookies_are": "working"}}
Proxy的使用
async with aiohttp.ClientSession() as session:
async with session.get("http://python.org",
proxy="http://proxy.com") as resp:
print(resp.status)
1.4 Response的使用
- 属性
- status 状态码
- url url地址
- cookies cookie内容
- headers 响应头
- 方法
- read() 读bytes类型
- text() 读文本内容