scrapy - 文章分类 - 严永富

scrapy案例

摘要：1 阅读全文

posted @ 2023-08-25 22:05 严永富阅读(2) 评论(0) 推荐(0)

from scrapy.linkextractors import LinkExtractor # 链接提取器导包_????

摘要：import scrapyfrom scrapy.linkextractors import LinkExtractor # 链接提取器导包class ErshoucheSpider(scrapy.Spider): name = 'ershouche' allowed_domains = ['che 阅读全文

posted @ 2023-08-25 22:04 严永富阅读(6) 评论(0) 推荐(0)

scrapy对于start_urls的处理,重写加入cookies，手动在浏览器复制cookies，在堆到scrapy中

摘要：import scrapyfrom scrapy.linkextractors import LinkExtractor # 链接提取器导包class ErshoucheSpider(scrapy.Spider): # 子类对父类提供的某个方法不满意了，不满足。重写它即可 , cookies= No 阅读全文

posted @ 2023-08-25 22:04 严永富阅读(80) 评论(0) 推荐(0)

scrapy_登录后_拿到cookie

摘要：import scrapyfrom scrapy.linkextractors import LinkExtractor # 链接提取器导包class ErshoucheSpider(scrapy.Spider): # 子类对父类提供的某个方法不满意了，不满足。重写它即可 , cookies= No 阅读全文

posted @ 2023-08-25 22:04 严永富阅读(119) 评论(0) 推荐(0)

middlewares.py

摘要：# Define here the models for your spider middleware## See documentation in:# https://docs.scrapy.org/en/latest/topics/spider-middleware.htmlfrom scrap 阅读全文

posted @ 2023-08-25 22:04 严永富阅读(4) 评论(0) 推荐(0)

settings

摘要：# 开启下载中间件DOWNLOADER_MIDDLEWARES = { # 数越小，就先执行 'qiche.middlewares.QicheDownloaderMiddleware': 544, 'qiche.middlewares.QicheDownloaderMiddleware2': 545 阅读全文

posted @ 2023-08-25 22:04 严永富阅读(1) 评论(0) 推荐(0)

scrapy停

摘要：Ctrl+c 或者 Ctrl+d 使劲按阅读全文

posted @ 2023-08-25 22:04 严永富阅读(3) 评论(0) 推荐(0)

USER_AGENT_免费代理

摘要：middlewares.py# Define here the models for your spider middleware## See documentation in:# https://docs.scrapy.org/en/latest/topics/spider-middleware. 阅读全文

posted @ 2023-08-25 22:04 严永富阅读(24) 评论(0) 推荐(0)

scrapy 套路

摘要：1. 手动创建一个目录（文件夹），鼠标移到目录，右键点击 Open in Terminal(打开于终端)2. 创建项目 scrapy startproject 项目名称3. 进入项目 cd 项目名称4. 创建爬虫 scrapy genspider 名字域名5. 可能需要修改 start_urls 阅读全文

posted @ 2023-08-25 22:04 严永富阅读(7) 评论(0) 推荐(0)

pipelines.py

摘要：# Define your item pipelines here## Don't forget to add your pipeline to the ITEM_PIPELINES setting# See: https://docs.scrapy.org/en/latest/topics/ite 阅读全文

posted @ 2023-08-16 15:56 严永富阅读(10) 评论(0) 推荐(0)

scrapy安装_使用

摘要：# windows安装方法1 ：# pip install -i https://pypi.tuna.tsinghua.edu.cn/simple scrapy# 问题1 timeout 超时，上面命令多跑几次# 问题2 报错，可能需要安装VC++14.0 , 后面有个网址，去下载下来，装一下，再跑阅读全文

posted @ 2023-08-16 15:56 严永富阅读(32) 评论(0) 推荐(0)

xiao.py

摘要：import scrapy# clear 清屏class XiaoSpider(scrapy.Spider): name = 'xiao' # 爬虫的名字 allowed_domains = ['4399.com'] # 允许的域名 start_urls = ['https://www.4399.c 阅读全文

posted @ 2023-08-16 15:55 严永富阅读(17) 评论(0) 推荐(0)

yanyongfu

文章分类 - scrapy

公告