SCRAPY 根据次数到时关闭爬虫

def __init__(self, *args, **kwargs):
        super(MmzzSpider, self).__init__(*args, **kwargs)  # 这里是关键
        self.count=0
    def parse_item(self, response):
        soup =BeautifulSoup(response.text, "lxml")
        urla= []
        item=MmzItem()
        for itema in soup.select(".job-list-item"):
            uu=itema.select_one("a").get('href').split("?")[0]
            if uu is not None:
                self.count=self.count+1
                ebot.add_row([uu])
            if self.count>1100:
                ebot.save('wangzhi221')
                self.count=0
            if self.count>100000:
                self.crawler.engine.close_spider(self)

posted @ 2021-10-19 22:06 myrj 阅读(59) 评论(0) 收藏举报

刷新页面返回顶部

myrj

SCRAPY 根据次数到时关闭爬虫

公告