python爬虫编码的局限性
源代码
# *_*coding:utf-8 *_*
import requests
keyword={'wd':'中国'}
header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3704.400 QQBrowser/10.4.3587.400'}
response=requests.get('https://www.baidu.com/s',params=keyword,headers=header)
1、#print(response.content.decode('utf-8'))
2、#with open('百度.html','w',encoding='utf-8') as f:
# f.write(response.content.decode('utf-8'))
3、#with open('百度.txt','w',encoding='utf-8') as f:
# f.write(response.content.decode('utf-8'))
控制台显示
Traceback (most recent call last):
File "C:/Users/20281/Desktop/代码文件/爬虫/requests库的使用/requests库简单的使用.py", line 7, in <module>
print(response.content.decode('utf-8'))
UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 158783: illegal multibyte sequence
Process finished with exit code 1
报错翻译后:gbk解码器不能够编码'\xbb'在158783这个位置处:非法的多字节出
为什么在这里只有2可以运行成功,在这里1实在控制台上显示,3是在文本显示各有各的编码规则没有统一规则,然而2的手动解码只针对html格式的编码方式所以只有2可以成功。

浙公网安备 33010602011771号