牛皮糖NewPtone - 博客园

2011年8月30日

摘要： 5.7How to Determine the Category of a Word 如何判断词的分类 Now that we have examined word classes in detail, we turn to a more basic question: how do we decide what category a word belongs to in the first place? In general, linguists use morphological（形态学的）, syntactic（语法的）, and semantic clues to determine. 阅读全文

posted @ 2011-08-30 22:45 牛皮糖NewPtone 阅读(1958) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(45)：5.6 基于转换的标记

摘要： 5.6Transformation-Based Tagging基于转换的标记 A potential issue with n-gram taggers is the size of their n-gram table (表的大小问题or language model). If tagging is to be employed in a variety of language technologies deployed on mobile computing devices, it is important to strike a balance（公平处理） between model . 阅读全文

posted @ 2011-08-30 22:40 牛皮糖NewPtone 阅读(925) 评论(0) 推荐(0) 编辑

使用HTMLParser模块解析HTML页面

摘要： HTMLParser是python用来解析html和xhtml文件格式的模块。它可以分析出html里面的标签、数据等等，是一种处理html的简便途径。HTMLParser采用的是一种事件驱动的模式，当HTMLParser找到一个特定的标记时，它会去调用一个用户定义的函数，以此来通知程序处理。它主要的回调函数的命名都是以handler_开头的，都HTMLParser的成员函数。当我们使用时，就从HTMLParser派生出新的类，然后重新定义这几个以handler_开头的函数即可。和在htmllib中的解析器不同，这个解析器并不是基于sgmllib模块的SGML解析器。htmllib模块和sgm. 阅读全文

posted @ 2011-08-30 13:32 牛皮糖NewPtone 阅读(5755) 评论(0) 推荐(0) 编辑

2011年8月29日

《Python自然语言处理》学习笔记索引

摘要：关于Python自然语言处理关于该书的简介：《Python自然语言处理》提供了非常易学的自然语言处理入门介绍，该领域涵盖从文本和电子邮件预测过滤，到自动总结和翻译等多种语言处理技术。在《Python自然语言处理(影印版)》中，你将学会编写Python程序处理大量非结构化文本。你还将通过使用综合语言数据结构访问含有丰富注释的数据集，理解用于分析书面通信内容和结构的主要算法。《Python自然语言处理》准备了充足的示例和练习，可以帮助你：从非结构化文本中抽取信息，甚至猜测主题或识别“命名实体”；分析文本语言结构，包括解析和语义分析；访问流行的语言学数据库，包括Word... 阅读全文

posted @ 2011-08-29 10:44 牛皮糖NewPtone 阅读(20557) 评论(12) 推荐(5) 编辑

2011年8月28日

Python自然语言处理学习笔记(44)：5.5 N-Gram标注

摘要： 5.5 N-Gram Tagging N-Gram标注Unigram Tagging 一元标注Unigramtaggers are based on a simple statistical algorithm: for each token, assign thetag that is most likely for that particular token. For example, it will assignthe tag JJ to any occurrence of the word frequent,since frequent is used as anadjective ( 阅读全文

posted @ 2011-08-28 21:54 牛皮糖NewPtone 阅读(5656) 评论(0) 推荐(0) 编辑

2011年8月26日

Python自然语言处理学习笔记(43)：5.4 自动标注

摘要： 5.4Automatic Tagging 自动标注In the rest of this chapter we will explore various ways to automatically add part-of-speech tags to text. We will see that the tag of a word depends on the word and its context within a sentence. For this reason, we will be working with data at the level of (tagged) sentenc 阅读全文

posted @ 2011-08-26 22:05 牛皮糖NewPtone 阅读(1366) 评论(2) 推荐(1) 编辑

从蒙特卡洛方法计算pi值谈random模块

摘要：计算机模拟常常需要用到随机选择的数。本文从随机数的一个简单应用开始简要地介绍Python的random模块。使用蒙特卡洛方法计算pi值Links:该问题来自于pudure university（普渡大学）python课程中的problem set2Monte Carlo methods are used to simulate complex physical and mathematical systems by repeated random sampling. In simple terms, given a probability, p, that an event will occu 阅读全文

posted @ 2011-08-26 11:14 牛皮糖NewPtone 阅读(7615) 评论(1) 推荐(1) 编辑

2011年8月25日

Python自然语言处理学习笔记(42)：5.3 使用Python字典将单词映射到属性

摘要： Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE MicrosoftInternetExplorer4 ... 阅读全文

posted @ 2011-08-25 22:13 牛皮糖NewPtone 阅读(3425) 评论(0) 推荐(0) 编辑

Python 2.7的新特性

摘要： What’s New in Python 2.7 Author:A.M. Kuchling (amk at amk.ca)Release:2.7.2Date:August 25, 2011This article explains the new features in Python 2.7. Python 2.7 was released on July 3, 2010.本文解释了Python2.7中的新特性。该版本于2010年7月3日发布。Numeric handling has been improved in many ways, for both floating-point n.. 阅读全文

posted @ 2011-08-25 21:26 牛皮糖NewPtone 阅读(2309) 评论(0) 推荐(0) 编辑

2011年8月24日

Python自然语言处理学习笔记(41)：5.2 标注语料库

摘要： 5.2Tagged Corpora 标注语料库 Representing Tagged Tokens 表示标注的语言符号 By convention in NLTK, a tagged token is represented using a tuple consisting of the token and the tag. We can create one of these special tuples from the standard string representation of a tagged token, using the function str2tuple(): .. 阅读全文

posted @ 2011-08-24 23:22 牛皮糖NewPtone 阅读(3530) 评论(0) 推荐(0) 编辑

公告