运维爱背锅 - 博客园

2023年7月17日

摘要： [https://pymongo.readthedocs.io/en/stable/examples/high_availability.html#](https://pymongo.readthedocs.io/en/stable/examples/high_availability.html#) 阅读全文

posted @ 2023-07-17 11:55 运维爱背锅阅读(408) 评论(0) 推荐(0)

python操作mongodb基本使用

摘要：使用pymongo，具体可以参考官方文档：语法上基本和原生mongodb是一样的，所以非常容易入手... [https://pymongo.readthedocs.io/en/stable/tutorial.html](https://pymongo.readthedocs.io/en/stabl 阅读全文

posted @ 2023-07-17 11:54 运维爱背锅阅读(36) 评论(0) 推荐(0)

Scrapyd、scrapyd-client部署爬虫项目

摘要：命令参考：[https://github.com/scrapy/scrapyd-client](https://github.com/scrapy/scrapyd-client) [https://scrapyd.readthedocs.io](https://scrapyd.readthedocs 阅读全文

posted @ 2023-07-17 11:49 运维爱背锅阅读(174) 评论(0) 推荐(0)

Scrapy框架爬取HTTP/2网站

摘要： scrapy本身是自带支持HTTP2的爬取： [https://docs.scrapy.org/en/latest/topics/settings.html?highlight=H2DownloadHandler#download-handlers-base](https://docs.scrapy 阅读全文

posted @ 2023-07-17 11:47 运维爱背锅阅读(196) 评论(0) 推荐(0)

Scrapy如何在启动时向爬虫传递参数

摘要： **高级方法：** **一般方法：** 运行爬虫时使用-a传递参数 ```Bash scrapy crawl 爬虫名 -a key=values ``` 然后在爬虫类的__init__魔法方法中获取kwargs ```Python class Bang123Spider(RedisCrawlSpid 阅读全文

posted @ 2023-07-17 11:44 运维爱背锅阅读(40) 评论(0) 推荐(0)

Scrapy在pipeline中集成mongodb

摘要： settings.py中设置配置项 ```Python MONGODB_HOST = "127.0.0.1" MONGODB_PORT = 27017 MONGODB_DB_NAME = "bang123" ``` pipelines.py： ```Python from scrapy.pipeli 阅读全文

posted @ 2023-07-17 11:44 运维爱背锅阅读(27) 评论(0) 推荐(0)

Scrapy集成selenium-案例-淘宝首页推荐商品获取

摘要： scrapy特性就是效率高，异步，如果非要集成selenium实际上意义不是特别大....因为selenium慢.... 案例：淘宝首页推荐商品的标题获取爬虫类 toabao.py ```Python import scrapy from scrapy.http import HtmlRespon 阅读全文

posted @ 2023-07-17 11:42 运维爱背锅阅读(70) 评论(0) 推荐(0)

Scrapy-redis组件，实现分布式爬虫

摘要：安装包 ```Python pip install -U scrapy-redis ``` settings.py ```Python ##### Scrapy-Redis ##### ### Scrapy指定Redis 配置 ### # 其他默认配置在scrapy_redis.default.py 阅读全文

posted @ 2023-07-17 11:40 运维爱背锅阅读(83) 评论(0) 推荐(0)

Scrapy自带的断点续爬JOB-DIR参数

摘要：参考官方文档：[https://docs.scrapy.org/en/latest/topics/jobs.html?highlight=JOBDIR#jobs-pausing-and-resuming-crawls](https://docs.scrapy.org/en/latest/topics 阅读全文

posted @ 2023-07-17 11:39 运维爱背锅阅读(619) 评论(0) 推荐(0)

Scrapy-CrawlSpider爬虫类使用案例

摘要： CrawlSpider类型的爬虫会根据指定的rules规则自动找到url比自动爬取。优点：适合整站爬取，自动翻页爬取缺点：比较难以通过meta传参，只适合一个页面就能拿完数据的。 ```Python import scrapy from scrapy.http import HtmlRespon 阅读全文

posted @ 2023-07-17 11:38 运维爱背锅阅读(33) 评论(0) 推荐(0)

全网同号，关注《运维爱背锅》，用通俗易懂的方式学会运维！从零基础到进阶，分享运维技术和项目案例，一起探讨运维背锅人生！开启背锅之旅！

公告