request模块
1、安装
2、requests.get(url,headers = headers)
作用:向网站请求并获取响应对象
2、响应对象res属性
1、res.text:获取响应内容,字符串
2、res.content:响应内容,bytes
3、res.status_code:http响应码
4、res.encoding = 'utf-8'
示例:
import requests url = 'http://www.baidu.com' headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Safari/537.36 Core/1.63.5221.400 QQBrowser/10.0.1125.400'} #发起请求获取响应 res = requests.get(url,headers = headers) res.encoding = 'utf-8' html = res.text #查看字符编码 print(res.encoding) #获取bytes数据类型 print(type(res.content)) #获取http响应码 print(res.status_code)
非机构数据爬取示例:(爬取特定的一张图)
import requests url = 'https://ss1.bdstatic.com/70cFuXSh_Q1YnxGkpoWK1HF6hhy/it/u=2640861078,2257373637&fm=26&gp=0.jpg' headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Safari/537.36 Core/1.63.5221.400 QQBrowser/10.0.1125.400'} #请求数据 res = requests.get(url,headers = headers) #请求的数据转码
res.encoding = 'utf-8' #获取二进制数据类型,图片 html = res.content #写入本地 with open('xiyy.jpg','wb') as ff: ff.write(html)