requests 库

1. requests 简介

2. get 请求

3. post 请求

4. 其他请求方法

5. 高级用法 

5.1 获取 json 格式的响应数据

5.2 获取原始的 socket 响应数据

5.3 配置请求头

5.4 上传文件

5.5 状态码

5.6 获取响应头信息

5.7 获取/发送 Cookie

5.8 请求超时

5.9 获取重定向响应数据

5.10 Session

 

 

1. requests 简介

Python 中有多种库可以用来处理 http 请求,比如 urllib、requests 库等。

requests VS urllib:

  • urllib 和 urllib2 是相互独立的模块,python3.0 以上把 urllib 和 urllib2 合并成一个库了,requests 库使用了 urllib3。
  • requests 库的口号是“HTTP For Humans”(为人类使用 HTTP 而生),因此比起 urllib 包的繁琐,requests 库特别简洁和容易理解。

 

2. get 请求

 1 # 使用 get 方法访问网页资源
 2 >>> resp = requests.get("http://www.baidu.com")
 3 
 4 # 返回响应对象
 5 >>> resp
 6 <Response [200]>
 7 
 8 # 状态码
 9 >>> resp.status_code
10 200
11 
12 # 请求地址
13 >>> resp.url
14 'http://www.baidu.com/'
15 
16 # 用 resp.encoding 对 resp.content 进行解码后的字符串
17 >>> print(resp.text[:100])
18 <!DOCTYPE html>
19 <!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charse
20 
21 # 请求所使用的编码
22 >>> resp.encoding
23 'ISO-8859-1'
24 
25 # 以字节方式获取的响应内容
26 >>> print(resp.content[:100])
27 b'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charse'

get 方法带请求参数:

 1 # 方式1:使用字典的请求参数
 2 >>> payload = {"key1":"value1", "key2":"value2"}
 3 >>> resp = requests.get("http://httpbin.org/get", params=payload)
 4 >>> print(resp.text)
 5 {
 6   "args": {
 7     "key1": "value1",
 8     "key2": "value2"
 9   },
10   "headers": {
11     "Accept": "*/*",
12     "Accept-Encoding": "gzip, deflate",
13     "Host": "httpbin.org",
14     "User-Agent": "python-requests/2.23.0",
15     "X-Amzn-Trace-Id": "Root=1-5fb685f6-23b6c5e864d8dc4e41e8de27"
16   },
17   "origin": "113.116.22.63",
18   "url": "http://httpbin.org/get?key1=value1&key2=value2"
19 }
20 
21 
22 # 方式2:使用字典+列表的请求参数
23 >>> payload = {"key1":"value1", "key2":["value2", "value3"]}
24 >>> resp = requests.get("http://httpbin.org/get", params=payload)
25 >>> resp.url
26 'http://httpbin.org/get?key1=value1&key2=value2&key2=value3'

 

3. post 请求

post 请求方法有两种方式:

  1. 表单提交:提交字典或二维元组的数据
  2. 非表单提交:提交 json 格式的数据

示例一:表单提交的两种方式

 1 # 方式一:使用字典
 2 >>> resp = requests.post("http://httpbin.org/post", data={"key": "value"})
 3 >>> print(resp.text)
 4 {
 5   "args": {},
 6   "data": "",
 7   "files": {},
 8   "form": {
 9     "key": "value"
10   },
11   "headers": {
12     "Accept": "*/*",
13     "Accept-Encoding": "gzip, deflate",
14     "Content-Length": "9",
15     "Content-Type": "application/x-www-form-urlencoded",
16     "Host": "httpbin.org",
17     "User-Agent": "python-requests/2.23.0",
18     "X-Amzn-Trace-Id": "Root=1-5fb67ae5-6c15961202281a1d70522539"
19   },
20   "json": null,
21   "origin": "113.116.22.63",
22   "url": "http://httpbin.org/post"
23 }
24 
25 
26 # 方式二:使用二维元组
27 >>> payload = (('key1', 'value1'), ('key1', 'value2'))
28 >>> resp = requests.post("http://httpbin.org/post", data=payload)
29 >>> print(resp.text)
30 {
31   "args": {},
32   "data": "",
33   "files": {},
34   "form": {
35     "key1": [
36       "value1",
37       "value2"
38     ]
39   },
40   "headers": {
41     "Accept": "*/*",
42     "Accept-Encoding": "gzip, deflate",
43     "Content-Length": "23",
44     "Content-Type": "application/x-www-form-urlencoded",
45     "Host": "httpbin.org",
46     "User-Agent": "python-requests/2.23.0",
47     "X-Amzn-Trace-Id": "Root=1-5fb67b74-716bca001516d46950d0d762"
48   },
49   "json": null,
50   "origin": "113.116.22.63",
51   "url": "http://httpbin.org/post"
52 }

示例二:非表单提交

 1 import requests
 2 
 3 # 方式1:使用json.dumps
 4 import json
 5 
 6 url = 'http://httpbin.org/post'
 7 payload = {'some': 'data'}
 8 
 9 resp = requests.post(url, data=json.dumps(payload))
10 >>> print(resp.text)
11 {
12   "args": {},
13   "data": "{\"some\": \"data\"}",
14   "files": {},
15   "form": {},
16   "headers": {
17     "Accept": "*/*",
18     "Accept-Encoding": "gzip, deflate",
19     "Content-Length": "16",
20     "Host": "httpbin.org",
21     "User-Agent": "python-requests/2.23.0",
22     "X-Amzn-Trace-Id": "Root=1-5fb67c87-78a1dd216e987f0226d5b97a"
23   },
24   "json": {
25     "some": "data"
26   },
27   "origin": "113.116.22.63",
28   "url": "http://httpbin.org/post"
29 }
30 
31 
32 # 方式2:使用内置参数 json
33 url = 'http://httpbin.org/post'
34 payload = {'some': 'data'}
35 
36 resp = requests.post(url, json=payload)

 

4. 其他请求方法

 1 # put:从客户端向服务器传送的数据取代指定的文档的内容
 2 >>> r = requests.put('http://httpbin.org/put', data={'key':'value'})
 3 >>> print("put:", r.text)
 4 put: {
 5   "args": {},
 6   "data": "", 
 7   "files": {},
 8   "form": {
 9     "key": "value"
10   },
11   "headers": {
12     "Accept": "*/*",
13     "Accept-Encoding": "gzip, deflate",
14     "Content-Length": "9",
15     "Content-Type": "application/x-www-form-urlencoded",
16     "Host": "httpbin.org",
17     "User-Agent": "python-requests/2.23.0",
18     "X-Amzn-Trace-Id": "Root=1-5fb6808c-7842d5b450d1777139efab8e"
19   },
20   "json": null,
21   "origin": "113.116.22.63",
22   "url": "http://httpbin.org/put"
23 }
24 
25 # delete:请求服务器删除指定的页面
26 >>> r = requests.delete('http://httpbin.org/delete')
27 >>> print("delete:", r.text)
28 delete: {
29   "args": {},
30   "data": "",
31   "files": {},
32   "form": {},
33   "headers": {
34     "Accept": "*/*",
35     "Accept-Encoding": "gzip, deflate",
36     "Content-Length": "0",
37     "Host": "httpbin.org",
38     "User-Agent": "python-requests/2.23.0",
39     "X-Amzn-Trace-Id": "Root=1-5fb6808e-042c04c61a6257820e4ff404"
40   },
41   "json": null,
42   "origin": "113.116.22.63",
43   "url": "http://httpbin.org/delete"
44 }
45 
46 # head:类似于get请求,只不过返回的响应中没有具体的内容,用于获取报头
47 >>> r = requests.head('http://httpbin.org/get')
48 >>> print("head:", r.text)
49 head:
50 >>> print(r.headers)
51 {'Date': 'Thu, 19 Nov 2020 14:26:23 GMT', 'Content-Type': 'application/json', 'Content-Length': '306', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
52 
53 # options:允许客户端查看服务器的性能
54 >>> r = requests.options('http://httpbin.org/get')
55 >>> print("options:", r.text)
56 options:
57 >>> print(r.headers)
58 {'Date': 'Thu, 19 Nov 2020 14:26:24 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '0', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Allow': 'OPTIONS, HEAD, GET', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, PATCH, OPTIONS', 'Access-Control-Max-Age': '3600'}

 

5. 高级用法

5.1 获取 json 格式的响应数据

1 r = requests.get('https://api.github.com/events')
2 print(r.json())  # (将json数据转成python对象)本例返回一个列表,里面是一个字典元素
3 print(type(r.json()))  # List

 

5.2 获取原始的 socket 响应数据

1 >>> resp = requests.get("https://api.github.com/events", stream=True)
2 >>> print(type(resp.raw))
3 <class 'urllib3.response.HTTPResponse'>
4 >>> print(resp.raw)
5 <urllib3.response.HTTPResponse object at 0x000001E3F2A0C2B0>
6 >>> print(resp.raw.read())  # 获取流格式的响应数据
7 b'\x1f\x8b\x08\x00\x00\x00\x00\ ...... 

将数据流保存到文件中:

 1 >>> resp = requests.get("https://api.github.com/events", stream=True)
 2 >>> with open("e:\\file.txt", "wb") as f:
 3 ...     for chunk in resp.iter_content(1000): 
 4 ...             f.write(chunk)
 5 ...
 6 2748
 7 2853
 8 4761
 9 4835
10 4691
11 4066
12 5545
13 7525
14 4489
15 2732
16 3259
17 2115
18 >>> with open("e:\\file.txt") as f:
19 ...     print(f.read(50))
20 ...
21 [{"id":"14250730635","type":"PushEvent","actor":{"...]

 

5.3 设置请求头

1 >>> url = "http://api.github.com/some/endpoint"
2 >>> headers = {"user-agent": "my-app/0.0.1"}  # 增加浏览器及版本信息
3 >>> r = requests.get(url, headers=headers)

 

5.4 上传文件

方式 1:

1 import requests
2 
3 url = 'http://httpbin.org/post'
4 files = {'file': open('e:\\test.xlsx', 'rb')}
5 
6 r = requests.post(url, files=files)
7 print(r.text)

方式 2:显式设置文件名、文件类型和请求头

1 import requests
2 
3 url = 'http://httpbin.org/post'
4 files = {'file': ('report.xls', open('e:\\test.xlsx', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
5 
6 r = requests.post(url, files=files)
7 print(r.text)

建议用二进制模式(binary mode)打开文件。这是因为 requests 可能会试图为你提供 Content-Length header,在它这样做的时候,这个字段值会被设为文件的字节数(bytes)。如果用文本模式(text mode)打开文件,就可能会发生错误。

 

5.5 状态码

 1 import requests
 2 
 3 r = requests.get('http://httpbin.org/get')
 4 print(r.status_code)  # 200
 5 print(r.status_code == requests.codes.ok)  # 状态码判断:True
 6 
 7 # 非200时抛出异常代码
 8 print(r.raise_for_status())  # None
 9 
10 r = requests.get('https://www.cnblogs.com/dinex.indd')
11 print(r.raise_for_status())  # 抛异常:...404 Client Error: Not Found...

 

5.6 获取响应头信息

1 import requests
2 
3 r = requests.get('https://api.github.com/events')
4 print(r.headers) 
5 print(r.headers['Content-Type'])
6 print(r.headers.get('content-type'))

 

5.7 获取/发送 Cookie

获取 Cookie:

1 import requests
2 
3 url = 'https://www.baidu.com'
4 r = requests.get(url)
5 print(r.cookies)  # 存储在字典里  # <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
6 for k, v in r.cookies.items():
7     print(k, v)  # BDORZ 27315

发送 Cookie:

1 import requests
2 
3 url = 'http://httpbin.org/cookies'
4 cookies = dict(cookies_are='working')
5 
6 r = requests.get(url, cookies=cookies)
7 print(r.text)  # {"cookies":{"cookies_are":"working"}}

设定跨多个路径的 Cookie:

1 import requests
2 
3 jar = requests.cookies.RequestsCookieJar()
4 jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
5 jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
6 
7 url = 'http://httpbin.org/cookies'
8 r = requests.get(url, cookies=jar)
9 print(r.text)  # {"cookies":{"tasty_cookie":"yum"}}

 

5.8 请求超时

1 import requests
2 
3 requests.get('http://github.com', timeout=0.001)  # 抛超时的异常

 

5.9 获取重定向响应数据

1 import requests
2 
3 r = requests.head('http://github.com', allow_redirects=True)
4 print(r.url) # 最终访问的url:'https://github.com/'
5 print(r.history[0].url)  # 跳转前的url:http://github.com/
6 print(r.history)  # 历史响应对象的列表  # [<Response [301]>]

禁止重定向:

1 import requests
2 
3 r = requests.get('http://github.com', allow_redirects=False)
4 print(r.status_code)  # 301
5 print(r.history)  # []

 

5.10 Session

会话对象让你能够跨请求保持某些参数,它也会在同一个 Session 实例发出的所有请求之间保持 Cookie。

 1 import requests
 2 
 3 s = requests.Session()
 4 
 5 # 跨请求主体去请求
 6 s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
 7 # 从上一个请求中获得的cookie信息,会自动的发给下一次请求的网址。
 8 r = s.get("http://httpbin.org/cookies")
 9 
10 print(r.text)  # {"cookies": {"sessioncookie": "123456789"}}

在会话中添加默认请求头配置:

 1 import requests
 2 
 3 s = requests.Session()
 4 s.auth = ('username', 'passwd')
 5 # 添加的一个默认header信息
 6 s.headers.update({'x-test': 'true'})
 7 
 8 # both 'x-test' and 'x-test2' are sent
 9 r=s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})
10 print(r.text)
11 
12 # both 'x-test' and 'x-test3' are sent
13 r=s.get('http://httpbin.org/headers', headers={'x-test3': 'true'})
14 print(r.text)

 

posted @ 2020-11-26 23:06  Juno3550  阅读(355)  评论(0编辑  收藏  举报