爬虫GB2312解码utf-8
今天有个小练习做网页爬虫遇到网页编码是GB2312,,用PythonIDLE爬取后不分是乱码,
Traceback (most recent call last):
File "D:/我的Python文件/007.py", line 13, in <module>
print(res.content.decode('gb2312'))
UnicodeDecodeError: 'gb2312' codec can't decode byte 0xd9 in position 179: illegal multibyte sequence
百度了很多方法,最后只把print(res.content.decode('gb2312'))改成print(res.content.decode('gbk'))就可以了,汗!!!
不过还是知其然不知其所以然,至少代码能跑了。