朗志工作室(Langzhi Studio)

职业生涯管理

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

crawler 0.1.0 : Python Package Index

crawler 0.1.0

python crawler.

Latest Version:
0.1.2

python crawler.
=====
## Example
=====

from crawler.crawler import Crawler

mycrawler = Crawler()
seeds = ['http://www.example.com/'] # list of url
mycrawler.add_seeds(seeds)
url_patterns = ['^(.+example\.com)(.+)

] # list of regular expression for urls that crawler will work on. mycrawler.start(url_patterns) # start crawling ################# data files ################# three database (Berkeley DB) files will be generated. queue.db webpage.db duplcheck.db

 

posted on 2012-05-03 17:27  lexus  阅读(254)  评论(0编辑  收藏