识别文本语言
使用langdetect包判断文本语言
Language detection library ported from Google's language-detection.
支持55种语言
af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he,
hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl,
pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw
判断类型
str1 = 'Language detection algorithm is non-deterministic'
str2 = 'Ẩm thực, làm đẹp, deal sốc, hoàn tiền'
# 判断语言种类
print(detect(str1))
print(detect(str2))
输出
en
vi
概率
print(detect_langs(str2))
输出
[vi:0.9999997953867797]
增强
为了让识别结果唯一,加上以下两行保持结果准确
from langdetect import DetectorFactory
DetectorFactory.seed = 0

浙公网安备 33010602011771号