公告

03 2021 档案

scrapy下载文件，当传递一个url列表到pipelines管道处理时，如何做到不受多线程影响进行排序。

摘要：与下载图片类似： 1.item中需要有固定的字段 file_urls = scrapy.Field() files = scrapy.Field() 2.获取到文件的url，通过item["file_urls"]传送到 pipelines def parse_item(self, response) 阅读全文

posted @ 2021-03-31 10:53 .Tang 阅读(276) 评论(0) 推荐(0)

scrapy下载图片坑

摘要：ptt = r"http[s]*://[a-zA-Z0-9-./]+(?:jpg|jpeg|png)" 先是爬取到图片url -> yeild url到piplines中定义图片下载的专属piplines，类中的3个函数名固定的，是从写方法，注意图片命名 class DownloadImages 阅读全文

posted @ 2021-03-29 17:31 .Tang 阅读(220) 评论(0) 推荐(0)

一个连续的scrapy

摘要：1.创建一个scrapy项目 scrapy startproject SpiderAnything 2.生成一个爬虫 itcash爬虫名字， itcash.cn爬虫范围 scrapy genspider tb 'taobao.com' # 启动爬虫 or 创建py启动文件 ps:windows可通过阅读全文

posted @ 2021-03-29 15:17 .Tang 阅读(131) 评论(0) 推荐(0)