requests模块
浏览器发送请求的本质:请求头,请求体
请求头:
第一次发请求需要告诉服务器,请求的url和访问的浏览器
BeautifulSoup会将获取到html字符串内部转换成一个html对象。
find找到匹配成功的第一个,返回对象
find_all 找到匹配成功的所有(子子孙孙都可以匹配),返回列表
find(name='',attrs={'':''}) 建以用该方式
find(name='',id='',_class='')不写name也可以
登录
1.查看首页
2.提交用户名和密码,发送POST请求
3.获取cookie:r2.cookies.get_dict()
保存浏览日志 preserve log
有些页面登录需要对首页的cookie进行授权
有些页面使用登录的cooki
referer 用于做防盗链,表示上一次的请求地址
get请求只有请求头,没有请求体
get有的post都有
session自动携带cookie,请求头
import requests """ # 1. 方法 requests.get requests.post requests.put requests.delete ... requests.request(method='POST') """ # 2. 参数 """ 2.1 url 2.2 headers 2.3 cookies 2.4 params 2.5 data,传请求体 requests.post( ..., data={'user':'alex','pwd':'123'} ) GET /index http1.1\r\nhost:c1.com\r\n\r\nuser=alex&pwd=123 2.6 json,传请求体 requests.post( ..., json={'user':'alex','pwd':'123'} ) GET /index http1.1\r\nhost:c1.com\r\nContent-Type:application/json\r\n\r\n{"user":"alex","pwd":123} 2.7 代理 proxies # 无验证 proxie_dict = { "http": "61.172.249.96:80", "https": "http://61.185.219.126:3128", } ret = requests.get("https://www.proxy360.cn/Proxy", proxies=proxie_dict) # 验证代理 from requests.auth import HTTPProxyAuth proxyDict = { 'http': '77.75.105.165', 'https': '77.75.106.165' } auth = HTTPProxyAuth('用户名', '密码') r = requests.get("http://www.google.com",data={'xxx':'ffff'} proxies=proxyDict, auth=auth) print(r.text) ----------------------------------------------------------------------------------------- 2.8 文件上传 files # 发送文件 file_dict = { 'f1': open('xxxx.log', 'rb') } requests.request( method='POST', url='http://127.0.0.1:8000/test/', files=file_dict ) 2.9 认证 auth 内部: 用户名和密码,用户和密码加密,放在请求头中传给后台。 - "用户:密码" - base64("用户:密码") - "Basic base64("用户|密码")" - 请求头: Authorization: "basic base64("用户|密码")" from requests.auth import HTTPBasicAuth, HTTPDigestAuth ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf')) print(ret.text) 2.10 超时 timeout # ret = requests.get('http://google.com/', timeout=1) # print(ret) # ret = requests.get('http://google.com/', timeout=(5, 1)) # print(ret) 2.11 允许重定向 allow_redirects ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False) print(ret.text) 2.12 大文件下载 stream from contextlib import closing with closing(requests.get('http://httpbin.org/get', stream=True)) as r1: # 在此处理响应。 for i in r1.iter_content(): print(i) stream=True表示一点一点的下载
stream=False表示一次全部下到磁盘 2.13 证书 cert - 百度、腾讯 => 不用携带证书(系统帮你做了) - 自定义证书 requests.get('http://127.0.0.1:8000/test/', cert="xxxx/xxx/xxx.pem") requests.get('http://127.0.0.1:8000/test/', cert=("xxxx/xxx/xxx.pem","xxx.xxx.xx.key")) 2.14 确认 verify =False """ requests.get('http://127.0.0.1:8000/test/', cert="xxxx/xxx/xxx.pem")
补充 1、GET请求 # 1、无参数实例 import requests ret = requests.get('https://github.com/timeline.json') print ret.url print ret.text # 2、有参数实例 payload = {'key1': 'value1', 'key2': 'value2'} ret = requests.get("http://httpbin.org/get", params=payload) print ret.url print ret.text 2、POST请求 # 1、基本POST实例 import requests payload = {'key1': 'value1', 'key2': 'value2'} ret = requests.post("http://httpbin.org/post", data=payload) print ret.text # 2、发送请求头和数据实例 import requests import json url = 'https://api.github.com/some/endpoint' payload = {'some': 'data'} headers = {'content-type': 'application/json'} ret = requests.post(url, data=json.dumps(payload), headers=headers) print ret.text print ret.cookies 3、其他请求 requests.get(url, params=None, **kwargs) requests.post(url, data=None, json=None, **kwargs) requests.put(url, data=None, **kwargs) requests.head(url, **kwargs) requests.delete(url, **kwargs) requests.patch(url, data=None, **kwargs) requests.options(url, **kwargs) # 以上方法均是在此方法的基础上构建 requests.request(method, url, **kwargs)