分布式爬虫scrapy-redis

1,分布式爬虫

pip install Scrapy
pip install Scrapy-redis

2,实例

# 继承的类修改
CrawlSpider---------->RedisCrawlSpider
# 注释掉start_urls
redis_key = 
# start scrapy_redis settings

DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
ITEM_PIPELINES = {
    'scrapy_redis.pipelines.RedisPipeline': 100,
}
SCHEDULER_PERSIST = True
#
REDIS_HOST = '127.0.0.1'
REDIS_PORT = 6379
REDIS_ENCODING = 'utf-8'
REDIS_PARAMS = {'password': '123456'}
# end scrapy_redis settings

3,启动项目

Scrapy crawl xxxx
lpush  redis_key  start_url
posted @ 2022-06-16 21:34  下个ID见  阅读(23)  评论(0)    收藏  举报