浙江省高等学校教师教育理论培训

微信搜索“毛凌志岗前心得”小程序

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

Summary

Whalebot is open-source web crawler. It is intended to be simple, fast and memory efficient. It was created as a targeted spider, but you may use it as common.

Current release 0.02

Current state. Bold - done, normal - TODO

If something broken or you have an idea, please visit http://groups.google.com/group/whalebot

Usages

  • It was used for collecting papers on target thematic from http://citeseerx.ist.psu.edu for my master degree work
  • Candidates for logo were collected using whalebot
  • Eating own dogs food (links for url parsing benchmark)

Features

  • Simple configuration from command line and text files
  • Start/Stop/Resume fetching sessions
posted on 2012-06-24 08:53  lexus  阅读(338)  评论(0)    收藏  举报