提取txt文件,读取多种编码格式!

废话少说,直接上代码:

import chardet


# 抽取txt文件内容
def parseTxt(filename):
    texts = []
    encoding = chardet.detect(open(filename, 'rb').read()).get('encoding', 'utf-8')
    with open(filename, "r", encoding=encoding) as f:
        for item in f.readlines():
            texts.append(item)
    return {
        "title": texts[0][:100],
        "content": texts
    }

 

posted @ 2021-01-28 19:46  数据驱动  阅读(42)  评论(0编辑  收藏