python爬虫编码的局限性

源代码
# *_*coding:utf-8 *_*

import requests
keyword={'wd':'中国'}
header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3704.400 QQBrowser/10.4.3587.400'}
response=requests.get('https://www.baidu.com/s',params=keyword,headers=header)
1、#print(response.content.decode('utf-8'))

2、#with open('百度.html','w',encoding='utf-8') as f:
#    f.write(response.content.decode('utf-8'))

3、#with open('百度.txt','w',encoding='utf-8') as f:
#   f.write(response.content.decode('utf-8'))



控制台显示

Traceback (most recent call last):
File "C:/Users/20281/Desktop/代码文件/爬虫/requests库的使用/requests库简单的使用.py", line 7, in <module>
print(response.content.decode('utf-8'))
UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 158783: illegal multibyte sequence

Process finished with exit code 1

报错翻译后：gbk解码器不能够编码'\xbb'在158783这个位置处：非法的多字节出

为什么在这里只有2可以运行成功，在这里1实在控制台上显示，3是在文本显示各有各的编码规则没有统一规则，然而2的手动解码只针对html格式的编码方式所以只有2可以成功。

posted @ 2019-07-22 20:23 wss9806 阅读(333) 评论(0) 收藏举报

刷新页面返回顶部

wss9806

python爬虫编码的局限性

公告