关于Python Scrapy框架 yield scrapy.Request(next_url, call_back="")无法翻页情况解决

错误的代码:


class XXSpider(scrapy.Spider):
    name = 'xxspider'
    allowed_domains = ['https://www.xx.com']
    start_urls = ['https://www.xx.com/ask/highlight/']

 正确的代码:

class XXSpider(scrapy.Spider):
    name = 'xxspider'
    allowed_domains = ['www.xx.com']
    start_urls = ['https://www.xx.com/ask/highlight/']

这里, allowed_domains中域名设置问题, Request需要的是一组域名而不是一组url

还有一情况也会导致yield scrapy.Request()失效:

    系统don't_filter将该Url过滤掉了

解决方案: 

yield scrapy.Request(next_url, call_back=self.parse, dont_filter=True)

 

 

posted @ 2018-08-11 18:42  数据民工  阅读(10)  评论(0)    收藏  举报