随笔分类 -  Python

上一页 1 2

Extracting Information from Text With NLTK
摘要:因为现实中的数据多为‘非结构化数据’,比如一般的txt文档,或是‘半结构化数据’,比如html,对于这样的数据需要采用一些技术才能从中提取 出有用的信息。如果所有数据都是‘结构化数据’,比如Xml或关系数据库,那么就不需要特别去提取了,可以根据元数据去任意取到你想要的信息。那么就来讨论一下用NLTK来实现文本信息提取的方法,first, the raw text of the document is split into sentences using a sentence segmenter, and each sentence is further subdivided into word 阅读全文

posted @ 2011-07-04 20:52 fxjwind 阅读(461) 评论(0) 推荐(0)

Classify Text With NLTK
摘要:Classification is the task of choosing the correct class label for a given input.A classifier is called supervised if it is built based on training corpora containing the correct label for each input.这里就以一个例子来说明怎样用nltk来实现分类器训练和分类一个简单的分类任务,给定一个名字,判断其性别,就是在male,female两类进行分类好,先来训练,训练就要有corpus,就是分好类的名字的 阅读全文

posted @ 2011-07-04 20:48 fxjwind 阅读(713) 评论(0) 推荐(0)

POS Tagging with NLTK
摘要:POS tagging :part-of-speech tagging , or word classes or lexical categories . 说法很多其实就是词性标注。那么用nltk的工具集的off-the-shelf工具可以简单的对文本进行POS tagging>>> text = nltk.word_tokenize("And now for something completely different")>>> nltk.pos_tag(text)[(''And'', '' 阅读全文

posted @ 2011-07-04 20:46 fxjwind 阅读(1362) 评论(0) 推荐(0)

上一页 1 2