2018 年 5月 13 日随笔档案 - change_world

2018年5月13日

sklearn 中的Countvectorizer/TfidfVectorizer保留长度小于2的字符方法

摘要：在sklearn中的sklearn.feature_extraction.text.Countvectorizer()或者是sklearn.feature_extraction.text.TfidfVectorizer()中其在进行却分token的时候，会默认把长度<2的字符抛弃，例如下面的例子：阅读全文

posted @ 2018-05-13 20:41 change_world 阅读(889) 评论(0) 推荐(0)

change_world

公告