浙江省高等学校教师教育理论培训

微信搜索“毛凌志岗前心得”小程序

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

Scrape javascript pages with PhantomJS

PhantomJS

http://www.phantomjs.org/



PhantomJS is a command-line tool based on Webkit. It can execute javascript and be used for testing of web-based applications, web scraping, pages capture, PDF converter, SVG renderer, and many other use cases.



The javascript file look likes:


console.log('Hello, world!');
phantom.exit();






It's a good tool for scraping dynamic page with javascript/ajax. To extracting a site, people familiar with javascript can write javascript script file using PhantomJS's Api and scrape the pages directly; Others can use PhantomJS and a simple javascript file open the pages and output the pages contents to PIPE or files, then use other tools or program languages to parse and scrape the result.



There are some examples: http://code.google.com/p/phantomjs/wiki/QuickStart

posted on 2012-03-11 17:44  lexus  阅读(478)  评论(0)    收藏  举报