爬虫 - 随笔分类 - 孙昌恒

Scrapy框架

摘要：介绍 Scrapy是一个基于Twisted的异步处理框架, 是纯Python实现的爬虫框架, 其架构清晰, 模块之间的耦合程度低, 可扩展性极强, 可以灵活完成各种需求. 我们只需要定制开发几个模块就可以轻松实现一个爬虫. Scrapy依赖twisted 安装 linux下, 目录结构 Scrapy 阅读全文

posted @ 2019-04-24 19:06 孙昌恒阅读(169) 评论(0) 推荐(0)

xpath语法

摘要：基础命令循环阅读全文

posted @ 2019-04-24 18:25 孙昌恒阅读(675) 评论(0) 推荐(0)

beautiful模块

摘要：from bs4 import BeautifulSoup html_doc = """ The Dormouse's story asdf The Dormouse's story总共 f Once upon a time there were three little sisters; and their names were ... 阅读全文

posted @ 2019-04-24 18:23 孙昌恒阅读(222) 评论(0) 推荐(0)

requests模块

摘要：方法参数响应阅读全文

posted @ 2019-04-24 18:22 孙昌恒阅读(451) 评论(0) 推荐(0)

爬虫知识点

摘要：爬虫的三个步骤: 1 下载源码 2 解析源码 3 保存数据请求头: user-agent referer host cookie 特殊请起头 (查看上一次请求获取特殊请求头的值。) 请求体: - 原始数据 - 原始数据 + token - 密文 - 找算法 - 直接使用密文阅读全文

posted @ 2019-04-24 18:16 孙昌恒阅读(131) 评论(0) 推荐(0)

Jimmy's Blog

随笔分类 - 爬虫

公告