Python - 随笔分类(第2页) - fxjwind

Extracting Information from Text With NLTK

摘要：因为现实中的数据多为‘非结构化数据’，比如一般的txt文档，或是‘半结构化数据’，比如html，对于这样的数据需要采用一些技术才能从中提取出有用的信息。如果所有数据都是‘结构化数据’，比如Xml或关系数据库，那么就不需要特别去提取了，可以根据元数据去任意取到你想要的信息。那么就来讨论一下用NLTK来实现文本信息提取的方法，first, the raw text of the document is split into sentences using a sentence segmenter, and each sentence is further subdivided into word 阅读全文

posted @ 2011-07-04 20:52 fxjwind 阅读(461) 评论(0) 推荐(0)

Classify Text With NLTK

摘要：Classification is the task of choosing the correct class label for a given input.A classifier is called supervised if it is built based on training corpora containing the correct label for each input.这里就以一个例子来说明怎样用nltk来实现分类器训练和分类一个简单的分类任务，给定一个名字，判断其性别，就是在male，female两类进行分类好，先来训练，训练就要有corpus，就是分好类的名字的阅读全文

posted @ 2011-07-04 20:48 fxjwind 阅读(713) 评论(0) 推荐(0)

POS Tagging with NLTK

摘要：POS tagging :part-of-speech tagging , or word classes or lexical categories . 说法很多其实就是词性标注。那么用nltk的工具集的off-the-shelf工具可以简单的对文本进行POS tagging>>> text = nltk.word_tokenize("And now for something completely different")>>> nltk.pos_tag(text)[(''And'', '' 阅读全文

posted @ 2011-07-04 20:46 fxjwind 阅读(1362) 评论(0) 推荐(0)

fxjwind

随笔分类 - Python

Extracting Information from Text With NLTK

Classify Text With NLTK

POS Tagging with NLTK

导航

公告