18.urllib post请求

请求测试网页:httpbin.org

直接用python向一些网页发送请求会被识别为爬虫,报418错误.

 

#__author__:  zoe
#date: 2020/5/15

import urllib.parse
import urllib.request


data = bytes(urllib.parse.urlencode({'hello':'world'}),encoding="utf-8")
response = urllib.request.urlopen('http://httpbin.org/post',data=data)
print(response.read().decode('utf-8'))

返回结果:

{
"args": {},
"data": "",
"files": {},
"form": {
"hello": "world"     ###用户发送到数据
},
"headers": {
"Accept-Encoding": "identity",
"Content-Length": "11",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "Python-urllib/3.5",
"X-Amzn-Trace-Id": "Root=1-5ebe5151-44d851b4a5bd1fb8639d1284"
},
"json": null,
"origin": "111.165.35.237",
"url": "http://httpbin.org/post"
}

 

try:
response = urllib.request.urlopen('http://httpbin.org/post',data=data,timeout=0.01)
print(response.read().decode('utf-8'))
except urllib.error.URLError as e:
print('Time out!')
异常处理.返回错误类型.


修改headers报头,伪装为正常浏览器,访问网页并获取数据.
url = 'https://movie.douban.com/top250?start=0&'
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
} #####headers用浏览器开发模式,network监控获取到网页读取的headers.注意键一定要对应.
# url = 'https://movie.douban.com/top250?start=0&'
req = urllib.request.Request(url=url, headers=headers) ####请求对象
html = ''
try: ####网页请求异常处理
response = urllib.request.urlopen(req) #####获取请求对象返回的结果,并存储为一个对象
html = response.read().decode('utf-8') ####将返回的对象解读,用utf-8解码赋值给html
print(html)
except urllib.error.URLError as e:
if hasattr(e,'code'): ####打印编码错误
print(e.code)
if hasattr(e,'reason'): ####打印错误原因
print(e.reason)
posted @ 2020-05-15 16:52  十名知花香  阅读(175)  评论(0编辑  收藏  举报