文章分类 -  scrapy

摘要:1 阅读全文
posted @ 2023-08-25 22:05 严永富 阅读(2) 评论(0) 推荐(0)
摘要:import scrapyfrom scrapy.linkextractors import LinkExtractor # 链接提取器导包class ErshoucheSpider(scrapy.Spider): name = 'ershouche' allowed_domains = ['che 阅读全文
posted @ 2023-08-25 22:04 严永富 阅读(6) 评论(0) 推荐(0)
摘要:import scrapyfrom scrapy.linkextractors import LinkExtractor # 链接提取器导包class ErshoucheSpider(scrapy.Spider): # 子类对父类提供的某个方法不满意了,不满足。重写它即可 , cookies= No 阅读全文
posted @ 2023-08-25 22:04 严永富 阅读(80) 评论(0) 推荐(0)
摘要:import scrapyfrom scrapy.linkextractors import LinkExtractor # 链接提取器导包class ErshoucheSpider(scrapy.Spider): # 子类对父类提供的某个方法不满意了,不满足。重写它即可 , cookies= No 阅读全文
posted @ 2023-08-25 22:04 严永富 阅读(119) 评论(0) 推荐(0)
摘要:# Define here the models for your spider middleware## See documentation in:# https://docs.scrapy.org/en/latest/topics/spider-middleware.htmlfrom scrap 阅读全文
posted @ 2023-08-25 22:04 严永富 阅读(4) 评论(0) 推荐(0)
摘要:# 开启下载中间件DOWNLOADER_MIDDLEWARES = { # 数越小,就先执行 'qiche.middlewares.QicheDownloaderMiddleware': 544, 'qiche.middlewares.QicheDownloaderMiddleware2': 545 阅读全文
posted @ 2023-08-25 22:04 严永富 阅读(1) 评论(0) 推荐(0)
摘要:Ctrl+c 或者 Ctrl+d 使劲按 阅读全文
posted @ 2023-08-25 22:04 严永富 阅读(3) 评论(0) 推荐(0)
摘要:middlewares.py# Define here the models for your spider middleware## See documentation in:# https://docs.scrapy.org/en/latest/topics/spider-middleware. 阅读全文
posted @ 2023-08-25 22:04 严永富 阅读(24) 评论(0) 推荐(0)
摘要:1. 手动创建一个目录(文件夹),鼠标移到目录,右键点击 Open in Terminal(打开于 终端)2. 创建项目 scrapy startproject 项目名称3. 进入项目 cd 项目名称4. 创建爬虫 scrapy genspider 名字 域名5. 可能需要修改 start_urls 阅读全文
posted @ 2023-08-25 22:04 严永富 阅读(7) 评论(0) 推荐(0)
摘要:# Define your item pipelines here## Don't forget to add your pipeline to the ITEM_PIPELINES setting# See: https://docs.scrapy.org/en/latest/topics/ite 阅读全文
posted @ 2023-08-16 15:56 严永富 阅读(10) 评论(0) 推荐(0)
摘要:# windows安装方法1 :# pip install -i https://pypi.tuna.tsinghua.edu.cn/simple scrapy# 问题1 timeout 超时,上面命令多跑几次# 问题2 报错,可能需要安装VC++14.0 , 后面有个网址,去下载下来,装一下,再跑 阅读全文
posted @ 2023-08-16 15:56 严永富 阅读(32) 评论(0) 推荐(0)
摘要:import scrapy# clear 清屏class XiaoSpider(scrapy.Spider): name = 'xiao' # 爬虫的名字 allowed_domains = ['4399.com'] # 允许的域名 start_urls = ['https://www.4399.c 阅读全文
posted @ 2023-08-16 15:55 严永富 阅读(17) 评论(0) 推荐(0)