NLTK vs SKLearn vs Gensim vs TextBlob vs spaCy

Generally,

NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)
Sklearn is used primarily for machine learning (classification, clustering, etc.)
Gensim is used primarily for topic modeling and document similarity.

Having said that, NLTK provides a nice wrapper for Sklearn's classifiers -
nltk.classify package
Combining Scikit-Learn and NTLK
Python NLP - NLTK and scikit-learn

And, to confuse you further, there also exist TextBlob: Simplified Text Processing

and spaCy.io | Build Tomorrow's Language Technologies -
aiming to give industry-ready NLP modules instead of NLTK,
including a single quick algorithm for each of tokenization, POS tagging and parsing and word vectors for similarity calculation.

I suggest that you mix and match, according to your needs.

通常，
NLTK主要用于一般NLP任务（标记化，POS标记，解析等）
Sklearn主要用于机器学习（分类，聚类等）
Gensim主要用于主题建模和文档相似性。
话虽如此，NLTK为Sklearn的分类器提供了一个很好的包装器 -
nltk.classify包
 结合Scikit-Learn和NTLK
Python NLP - NLTK和scikit学习

而且，更为混淆的是，还有TextBlob：简化文本处理

和spaCy.io | 构建明天的语言技术 -
旨在提供行业准备的NLP模块而不是NLTK，
包括用于每个标记化，POS标记和解析的单个快速算法和用于相似性计算的字矢量。

我建议你根据你的需要混合搭配。

posted @ 2017-05-24 15:13 Donal 阅读(2619) 评论(0) 编辑收藏举报

刷新页面返回顶部

Donal's Blog

IT人生，记录点点滴滴
http://dzang.posterous.com/

NLTK vs SKLearn vs Gensim vs TextBlob vs spaCy

公告

Donal's Blog

IT人生，记录点点滴滴 http://dzang.posterous.com/

NLTK vs SKLearn vs Gensim vs TextBlob vs spaCy

公告

IT人生，记录点点滴滴
http://dzang.posterous.com/