2019 年 1月 19 日随笔档案 - 哈喽哈喽111111

2019年1月19日

Scrapy实战：爬取http://quotes.toscrape.com网站数据

摘要：需要学习的地方： 1.Scrapy框架流程梳理，各文件的用途等 2.在Scrapy框架中使用MongoDB数据库存储数据 3.提取下一页链接，回调自身函数再次获取数据重点：从当前页获取下一页的链接，传给函数自身继续发起请求 next = response.css('.pager .next a:: 阅读全文

posted @ 2019-01-19 18:18 哈喽哈喽111111 阅读(2905) 评论(0) 推荐(1)

一、Scrapy入门教程

摘要：本文转载自以下链接：https://scrapy-chs.readthedocs.io/zh_CN/latest/intro/tutorial.html 在本篇教程中，我们假定您已经安装好Scrapy。接下来以 Open Directory Project(dmoz) (dmoz) 为例来讲述爬取。阅读全文

posted @ 2019-01-19 17:55 哈喽哈喽111111 阅读(445) 评论(0) 推荐(0)

二、Scrapy命令行工具

摘要：本文转载自以下链接：https://scrapy-chs.readthedocs.io/zh_CN/latest/topics/commands.html Scrapy是通过 scrapy 命令行工具进行控制的。这里我们称之为 “Scrapy tool” 以用来和子命令进行区分。对于子命令，我们称为 “command” 或者 “Scrapy commands”。 Scrapy tool ... 阅读全文

posted @ 2019-01-19 17:51 哈喽哈喽111111 阅读(293) 评论(0) 推荐(0)

三、Scrapy中选择器用法

摘要：官方示例源码<html> <head> <base href='http://example.com/' /> <title>Example website</title> </head> <body> <div id='images'> <a href='image1.html'>Name: My 阅读全文

posted @ 2019-01-19 17:48 哈喽哈喽111111 阅读(416) 评论(0) 推荐(0)