2023 年 6月 24 日随笔档案 - jiang_jiayun

2023年6月24日

摘要：爬虫中请求与响应是最常见的操作，Request对象在爬虫程序中生成并传递到下载器中，后者执行请求并返回一个Response对象一个Request对象表示一个HTTP请求，它通常是在爬虫生成，并由下载执行，从而生成Response 参数 url（string） - 此请求的网址 callback（c 阅读全文

posted @ 2023-06-24 22:44 jiang_jiayun 阅读(331) 评论(0) 推荐(0)

Scrapy 中 CrawlSpider 使用(二)

摘要： LinkExtractor提取链接创建爬虫 scrapy genspider 爬虫名域名 -t crawl spider from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, 阅读全文

posted @ 2023-06-24 19:52 jiang_jiayun 阅读(30) 评论(0) 推荐(0)

Scrapy 中 CrawlSpider 使用(一)

摘要：创建CrawlSpider scrapy genspider -t crawl 爬虫名 (allowed_url) Rule对象 Rule类与CrawlSpider类都位于scrapy.contrib.spiders模块中 class scrapy.contrib.spiders.Rule( lin 阅读全文

posted @ 2023-06-24 19:17 jiang_jiayun 阅读(60) 评论(0) 推荐(0)

Scrapy 保存数据案例-小说保存

摘要： spider import scrapy class XiaoshuoSpider(scrapy.Spider): name = "爬虫名" allowed_domains = ["域名"] start_urls = ["第一章url地址"] def parse(self, response): # 阅读全文

posted @ 2023-06-24 19:02 jiang_jiayun 阅读(179) 评论(0) 推荐(0)

Scrapy_ImagePipeline保存图片

摘要：创建一个项目 scrapy startproject myfrist(project_name) 创建一个爬虫 scrapy genspider 爬虫名爬虫地址需要安装pillow pip install pillow 报错：twisted.python.failure.Failure Open 阅读全文

posted @ 2023-06-24 18:51 jiang_jiayun 阅读(118) 评论(0) 推荐(0)

jiangjiayun

公告