摘要: 5.10Exercises 练习 ☼ Search the web for "spoof newspaper headlines", to find such gems as: British Left Waffles on Falkland Islands, and Juvenile Court to Try Shooting Defendant. Manually tag these headlines to see if knowledge of the part-of-speech tags removes the ambiguity. ☼... 阅读全文
posted @ 2011-08-30 22:51 牛皮糖NewPtone 阅读(1683) 评论(0) 推荐(0) 编辑
摘要: 5.9Further Reading 深入阅读 Extra materials for this chapter are posted at http://www.nltk.org/, including links to freely available resources on the web. For more examples of tagging with NLTK, please see the Tagging HOWTO at http://www.nltk.org/howto. Chapters 4 and 5 of (Jurafsky & Martin, 2008) 阅读全文
posted @ 2011-08-30 22:49 牛皮糖NewPtone 阅读(550) 评论(0) 推荐(0) 编辑
摘要: 5.8Summary小结 • Words can be grouped into classes, such as nouns, verbs, adjectives, and adverbs. These classes are known as lexical categories or parts-of-speech. Parts-of-speech are assigned short labels, or tags, such as NN and VB. 单词可以分成类,例如名词,动词,形容词以及副词。这些类被称为词汇类别或者词性。词性被赋给了短标签或者标记,例如NN或者VB。... 阅读全文
posted @ 2011-08-30 22:46 牛皮糖NewPtone 阅读(581) 评论(0) 推荐(0) 编辑
摘要: 5.7How to Determine the Category of a Word 如何判断词的分类 Now that we have examined word classes in detail, we turn to a more basic question: how do we decide what category a word belongs to in the first place? In general, linguists use morphological(形态学的), syntactic(语法的), and semantic clues to determine. 阅读全文
posted @ 2011-08-30 22:45 牛皮糖NewPtone 阅读(1958) 评论(0) 推荐(0) 编辑
摘要: 5.6Transformation-Based Tagging基于转换的标记 A potential issue with n-gram taggers is the size of their n-gram table (表的大小问题or language model). If tagging is to be employed in a variety of language technologies deployed on mobile computing devices, it is important to strike a balance(公平处理) between model . 阅读全文
posted @ 2011-08-30 22:40 牛皮糖NewPtone 阅读(925) 评论(0) 推荐(0) 编辑
摘要: HTMLParser是python用来解析html和xhtml文件格式的模块。它可以分析出html里面的标签、数据等等,是一种处理html的简便途径。HTMLParser采用的是一种事件驱动的模式,当HTMLParser找到一个特定的标记时,它会去调用一个用户定义的函数,以此来通知程序处理。它主要的回调函数的命名都是以handler_开头的,都HTMLParser的成员函数。当我们使用时,就从HTMLParser派生出新的类,然后重新定义这几个以handler_开头的函数即可。和在htmllib中的解析器不同,这个解析器并不是基于sgmllib模块的SGML解析器。htmllib模块和sgm. 阅读全文
posted @ 2011-08-30 13:32 牛皮糖NewPtone 阅读(5751) 评论(0) 推荐(0) 编辑