浙江省高等学校教师教育理论培训

微信搜索“毛凌志岗前心得”小程序

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

html5lib - Library for working with HTML documents - Google Project Hosting

A Python and PHP implementations of a HTML parser based on the WHATWG HTML5 specification for maximum compatibility with major desktop web browsers. Note that the separate ports are not kept in sync; they are effectively different projects offering similar functionality for their respective languages. Notes Users of the sanitizer must ensure that they serialize with quoted attribute values to avoid some known script injection holes in older browsers including IE < 8 The Ruby port is currently unmaintained Python 0.95 Release Features Parses valid and invalid HTML documents to a tree Support for minidom, ElementTree (including cElementTree and lxml.etree), BeautifulSoup (deprecated) and custom simpletree output formats DOM to SAX converter Reports parse errors Character encoding detection Filtering and serializing of trees HTML+CSS sanitizer Many unit tests Documentation Using html5Lib Getting help/getting involved IRC: the #whatwg channel on the Freenode IRC server html5lib-discuss mailing list.
posted on 2012-04-16 22:47  lexus  阅读(162)  评论(0)    收藏  举报