Scrapy 设置代理IP并查看
1.设置代理可参考文章
https://blog.csdn.net/qq_42712552/article/details/88906955
2.在middlewares.py文件中设置
找到xxx_DownloaderMiddleware下载器中间件类,我创建的项目为scrapy_sample,所以名称为ScrapySampleDownloaderMiddleware,在process_request方面中设置,代码如下所示:
def process_request(self, request, spider): # Called for each request that goes through the downloader # middleware. abuyun_proxy="http://xxxxx.com:9020" proxy_user=b"Hxxxxxxxxx" proxy_pass=b"48xxxxxxxx" #设置代理认证 proxyAuth = "Basic " + base64.b64encode(proxy_user + b":" + proxy_pass).decode()
#meta是一个字典类型 request.meta['proxy']=abuyun_proxy request.headers['Proxy-Authorization']=proxyAuth request.headers["Connection"] = "close" # Must either: # - return None: continue processing this request # - or return a Response object # - or return a Request object # - or raise IgnoreRequest: process_exception() methods of # installed downloader middleware will be called return None
3.激活下载器中间件
在settings.py中找到DOWNLOADER_MIDDLEWARES,如下所示:
DOWNLOADER_MIDDLEWARES = { 'scrapy_sample.middlewares.ScrapySampleDownloaderMiddleware': 543, }
4.验证
使用shell命令来请求
F:\python_work\scrapy_sample> scrapy shell https://www.cnblogs.com/MrHSR/p/16386803.html #查看请求头信息 In [1]: request.headers Out[1]: {b'Proxy-Authorization': b'Basic xxxxx', b'Connection': b'close', b'Accept': b'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', b'Accept-Language': b'en', b'User-Agent': b'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36', b'Accept-Encoding': b'gzip, deflate'} #查看meta信息 In [2]: request.meta['proxy'] Out[2]: 'http://xxxxx.com:9020'
浙公网安备 33010602011771号