scrapy 问题汇总
scrapy 如何增加代理
- 在middlewares.py中新增一个中间件,代码如下
class MyProxySpiderMiddleware(object):
def process_request(self, request, spider):
"""请求之前设置代理"""
proxy = random.choice(IPOOL)
request.meta['proxy'] = 'http://' + proxy
return None
- 在setting.py 中配置代理池:
IPOOL = [
'223.223.23.216:8085',
'111.3.118.247:30001',
'112.14.47.6:52024',
'118.163.120.181:58837',
'223.82.60.202:8060',
'61.216.156.222:60808',
'223.82.60.202:8060',
'122.9.101.6:8888',
'47.106.105.236:80',
'121.13.252.62:41564',
'118.163.120.181:58837',
]
- 在setting.py 中启动中间件,顺序高于其他的中间件(后面的数字低于其他的中间件,数字越低越先执行)
DOWNLOADER_MIDDLEWARES = {
'ggzy_deal.middlewares.GgzyDealDownloaderMiddleware': 543,
'ggzy_deal.middlewares.MyProxySpiderMiddleware': 125
}