2.6 Summary   小结


A text corpus is a large, structured collection of texts. NLTK comes with many corpora, e.g., the Brown Corpus, nltk.corpus.brown.

 文本语料库是一个大型的结构化的一系列的文本。NLTK包含了许多语料库,例如,Brown Corpusnltk.corpus.brown


Some text corpora are categorized, e.g., by genre or topic; sometimes the categories of a corpus overlap each other.



A conditional frequency distribution is a collection of frequency distributions, each one for a different condition. They can be used for counting word frequencies,given a context or a genre.



Python programs more than a few lines long should be entered using a text editor, saved to a file with a .py extension, and accessed using an import statement.



Python functions permit you to associate a name with a particular block of code, and reuse that code as often as necessary.



Some functions, known as “methods,” are associated with an object, and we give the object name followed by a period followed by the method name, like this: x.funct(y), e.g., word.isalpha().



To find out about some variable v, type help(v) in the Python interactive interpreter to read the help entry for this kind of object.



WordNet is a semantically oriented dictionary of English, consisting of synonym sets—or synsets—and organized into a network.



Some functions are not available by default, but must be accessed using Python’s import statement.


