解决Python爬虫中文乱码

最近这几天在学习Python爬虫，结果昨天爬取文章时，返回的数据为中文乱码，在网上找了好多方法也没有解决。

不管是 r=requests.get(url,headers=headers).text，还是改成 r=requests.get(url,headers=headers).content 都试过了，就是不好用。而且使用 r.apparent_encoding 返回页面代码格式时，竟然返回了None。这就有点丈二和尚摸不着头脑了。

    headers={
      # 我的header
     }
    r=requests.get(url,headers=headers)
    # r.encoding=r.apparent_encoding

    if r.status_code!=200:
        raise Exception()

    print(r.encoding)
    print(r.apparent_encoding)

后来，一个博主的博客解决了我的问题，原来我的header里有个代码，这个代码去掉就好了。

根据这个博主的描述，我将我header中的 Accept-Encoding去掉了，之后，就正常了。感谢博主解决了我的问题啊。

参考博客：https://www.cnblogs.com/hushaojun/p/16174071.html

posted @ 2023-03-24 10:53 CherryJry 阅读(28) 评论(0) 收藏举报

刷新页面返回顶部

解决Python爬虫中文乱码

公告