Python 判断文本文件的编码类型

from chardet.universaldetector import UniversalDetector

def GetEncoding(file):
    """
    获取文本文件的编码类型
    :param file:
    :return: 返回值是字典 {'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}
    """
    txt = open(file, "rb")
    detector = UniversalDetector()
    for line in txt.readlines():
        detector.feed(line)
        if detector.done:
            break
    detector.close()
    txt.close()
    return detector.result

在使用 chardet.detect() 方法时，有些文件无法获取到编码，所以就使用了上面的方法。

f = open('test.txt', 'rb')  #以二进制方式读取文件
str1 = f.read()
char_encoding= chardet.detect(str1)
print(f'该字符串为：{str1}')
print(f'该字符串编码信息为：{char_encoding}' ) 
print(f'该字符串编码为： {char_encoding["encoding"]}')

posted @ 2021-08-30 23:52 WIN&迷失阅读(2146) 评论(0) 收藏举报

刷新页面返回顶部

WIN&迷失

Python 判断文本文件的编码类型

公告