nutch 采集效率--设置采集间隔

fetcher.max.crawl.delay  默认是30秒,这里改为 5秒
修改nutch-default.xml
<property> <name>fetcher.max.crawl.delay</name> <value>5</value> <description> If the Crawl-Delay in robots.txt is set to greater than this value (in seconds) then the fetcher will skip this page, generating an error report. If set to -1 the fetcher will never skip such pages and will wait the amount of time retrieved from robots.txt Crawl-Delay, however long that might be. </description> </property>

 

posted on 2014-09-05 11:20  雨渐渐  阅读(267)  评论(0)    收藏  举报

导航