python3:: UnicodeDecodeError: 'gb2312' codec can't decode byte 0xa8 in position 193488: illegal multibyte sequence
用python3 读取pubmed API中下载的所有文献信息txt文档时,报错:
UnicodeDecodeError: 'gb2312' codec can't decode byte 0xa8 in position 193488: illegal multibyte sequence
后来搜索答案,用以下办法解决了问题:
1.将txt文档用windows的记事本打开,然后另存为"_UTF8.txt",就是在另存时选择编码为UTF-8
2. 然后再在python 中打开文档,使用代码如下
fname = '2020-01-14_endometriosis_2020-01-01_UTF8.txt'
with open(fname, "r", encoding = 'utf-8') as f:
abstracts = f.read()

浙公网安备 33010602011771号