python爬虫学习（一）保存页面+字符解码

from urllib.request import urlopen
#打开网址，得到一个响应，利用python自带的urlopen
url = "http://www.baidu.com"
resp = urlopen(url)
result = resp.read()

from urllib.request import urlopen
#打开网址，得到一个响应
url = "http://www.baidu.com"
resp = urlopen(url)
#获取内容read,字节b转字符串通过decode(),可以正常输出中文
result = resp.read().decode("utf-8")

with open("mybaidu.html",mode="w",encoding='utf-8') as f:
    f.write(result)

此时的输出为编码格式b,需要解码decode("utf-8")

抓包工具

请求头中

User-Agent:里面放的是客户机的信息，浏览器信息

Referer：防盗链

Cookie：本地字符串数据信息，（用于反爬）

响应头中

Cookie：本地字符串数据信息，（用于反爬）

以及其他一些东西

posted @ 2021-06-22 09:33 YuyuFishSmile 阅读(172) 评论(0) 收藏举报

刷新页面返回顶部

YuyuFishSmile

python爬虫学习（一）保存页面+字符解码

公告