分布式爬虫scrapy-redis
1,分布式爬虫
pip install Scrapy
pip install Scrapy-redis
2,实例
# 继承的类修改
CrawlSpider---------->RedisCrawlSpider
# 注释掉start_urls
redis_key =
# start scrapy_redis settings
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
ITEM_PIPELINES = {
'scrapy_redis.pipelines.RedisPipeline': 100,
}
SCHEDULER_PERSIST = True
#
REDIS_HOST = '127.0.0.1'
REDIS_PORT = 6379
REDIS_ENCODING = 'utf-8'
REDIS_PARAMS = {'password': '123456'}
# end scrapy_redis settings
3,启动项目
Scrapy crawl xxxx
lpush redis_key start_url
浙公网安备 33010602011771号