requests库

requests库的基本使用

get请求

# 示例代码
#encoding: utf-8
import requests
kw = {'wd':'我爱你'}
headers={
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36",
} #直接去浏览器里面找
#params接收一个字典或者字符串的查询参数，字典类型自动转换为url的编码
r=requests.get(url,params=kw,headers=headers)

#查看响应内容r.text返回的是Unicode格式的数据,str
print(r.text)
#r.content返回的是byte格式的数据，就是网络中传输的数据类型
print(r.content) #--->bytea
print(r.content.decode('utf-8'))  #-->str
print(r.encoding)   #-->打印网页编码方式

#保存
with open('baidu.html','w',encoding='utf-8') as fp:
    fp.write(response.content.decode('utf-8'))
    
#如果是要保存图片，视频等二进制内容，则不需要解码
#直接保存response.content

post请求

import requests

#根据浏览器开发者工具：Form data写
data = {
    'first':"true",
    'pn': '1',
    'kd': 'python'
}
headers = {
    'Referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36'
}

response = requests.post('https://www.lagou.com/jobs/positionAjax.json?city=%E6%B7%B1%E5%9C%B3&needAddtionalResult=false&isSchoolJob=0',data=data,headers=headers)


#1：post请求.text返回一个字符串，不一定是json字符串
print(r.text)
#再使用json.loads()函数是将json字符串转换为字典
json_str=t.text
dict1=jons.loads(json_str)
print(dict1)
 
#2：直接调用.json(),将json字符串转换为字典
print(type(r.json()))
print(response.json())


###Ajax请求：
去开发者工具看请求头：
X-Requested-With: XMLHttpRequest

使用代理

#只要在get或者post参数中传递proxies参数就行了
proxy={
    "http":"代理IP:端口号"
}

1:获取指定网页传递回来的cookies
	r=requests.get("https://www.baidu.com/")
    #<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

    print(r.cookies.get_dict())
	#get_dict()：返回cookies的字典形式

seeeion[会话]

#这里的session不是web开发中的那个session，这里只是一个会话对象而已。
#相当于打开一个新的选项卡，而不是新开一个网页
#用于模拟登陆成功之后的下一步操作

import requests
s = request.session()
r=s.get(url)
print(r.text)

处理不信任的证书和超时

#直接在请求中添加参数
verify=False
#为了防止服务器不能及时响应，设置一个超时时间
timeout=1  #1秒内没有响应，就发出异常
timeout=None  #值等待，直到响应

response.text和response.content的区别：

1. response.text：这个是str的数据类型，是requests库将response.content进行解码的字符串。解码需要指定一个编码方式，requests会根据自己的猜测来判断编码的方式。所以有时候可能会猜测错误，就会导致解码产生乱码。
   这时候就应该使用`response.content.decode('utf-8')`进行手动解码。
2. response.content：这个是直接从网络上面抓取的数据。没有经过任何解码。所以是一个bytes类型。其实在硬盘上和在网络上传输的字符串都是bytes类型。
3：response.content.decode('gbk')   #右键查看网页源代码的格式,解决乱码

posted @ 2020-02-21 16:45 Noob52037 阅读(121) 评论(0) 收藏举报

刷新页面返回顶部

Noob

bo be master。

requests库

requests库的基本使用

response.text和response.content的区别：

公告