Python自然语言处理学习笔记(15):2.7 Further Reading 深入阅读

转载请注明出处一块努力的牛皮糖”:http://www.cnblogs.com/yuxc/

新手上路,翻译不恰之处,恳请指出,不胜感谢 

 

2.7 Further Reading 深入阅读

 

Extra materials for this chapter are posted at http://www.nltk.org/ , including links to freely available resources on the Web. The corpus methods are summarized in the Corpus HOWTO, at http://www.nltk.org/howto , and documented extensively in the online API documentation.

Significant sources of published corpora are the Linguistic Data Consortium (LDC) and the European Language Resources Agency (ELRA). Hundreds of annotated text and speech corpora are available in dozens of languages. Non-commercial licenses permit the data to be used in teaching and research. For some corpora, commercial licenses are also available (but for a higher fee).

 

These and many other language resources have been documented using OLAC Metadata, and can be searched via the OLAC home page at http://www.language-archives.org/.Corpora List (see http://gandalf.aksis.uib.no/corpora/sub.html ) is a mailing list for discussions about corpora, and you can find resources by searching the list archives or posting to the list. The most complete inventory of the world’s languages is Ethnologue, http://www.ethnologue.com/ . Of 7,000 languages, only a few dozen have substantial digital resources suitable for use in NLP.

 

This chapter has touched on the field of Corpus Linguistics(语料库语言学). Other useful books in this area include (Biber, Conrad, & Reppen, 1998), (McEnery, 2006), (Meyer, 2002), (Sampson & McCarthy, 2005), and (Scott & Tribble, 2006). Further readings in quantitative data analysis in linguistics are: (Baayen, 2008), (Gries, 2009), and (Woods, Fletcher, & Hughes, 1986).

The original description of WordNet is (Fellbaum, 1998). Although WordNet was originally developed for research in psycholinguistics, it is now widely used in NLP and Information Retrieval. WordNets are being developed for many other languages, as documented at http://www.globalwordnet.org/ . For a study of WordNet similarity measures, see (Budanitsky & Hirst, 2006).

Other topics touched on in this chapter were phonetics and lexical semantics, and we refer readers to Chapters 7 and 20 of (Jurafsky & Martin, 2008).

posted @ 2011-08-05 21:26  牛皮糖NewPtone  阅读(703)  评论(0编辑  收藏  举报