1 2

Fork me on GitHub

6

随笔分类 - python爬虫

详解会话技术cookie、session和token

摘要：用户会话技术产生的原因我们都知道浏览器是没有状态的(HTTP 协议无状态)，非持久连接。也就是说，你第二次通过某个浏览器访问WEB应用，他其实不知道你已经来过一次了。此时用户会话技术就油然而生。用户会话技术之cookie篇工作原理浏览器端第一次发送请求到服务器端，服务器端创建Cookie，该阅读全文

posted @ 2021-01-08 14:08 peng_li 阅读(249) 评论(0) 推荐(0)

Requests爬虫包及解析工具 xpath、正则、Beautiful Soup

摘要：”python爬虫系列“目录： Python爬虫（一）-必备基础 Python爬虫（二）- Requests爬虫包及解析工具 xpath Python爬虫（三）- Scrapy爬虫框架系列 scrapy (1)- 基础用法 scrapy (2)- get请求 scrapy (3)- post请求 s 阅读全文

posted @ 2021-01-05 17:41 peng_li 阅读(889) 评论(0) 推荐(0)

Python爬虫（一）-必备基础

摘要：”python爬虫系列“目录： Python爬虫（一）-必备基础 Python爬虫（二）- Requests爬虫包及解析工具 xpath Python爬虫（三）- Scrapy爬虫框架系列 scrapy (1)- 基础用法 scrapy (2)- get请求 scrapy (3)- post请求 s 阅读全文

posted @ 2021-01-05 17:33 peng_li 阅读(355) 评论(0) 推荐(0)

scrapy (6)-CrawlSpider的使用

摘要：”python爬虫系列“目录： Python爬虫（一）-必备基础 Python爬虫（二）- Requests爬虫包及解析工具 xpath Python爬虫（三）- Scrapy爬虫框架系列 scrapy (1)- 基础用法 scrapy (2)- get请求 scrapy (3)- post请求 s 阅读全文

posted @ 2020-05-26 14:51 peng_li 阅读(539) 评论(0) 推荐(0)

scrapy (5)-爬取二级页面的内容

摘要：”python爬虫系列“目录： Python爬虫（一）-必备基础 Python爬虫（二）- Requests爬虫包及解析工具 xpath Python爬虫（三）- Scrapy爬虫框架系列 scrapy (1)- 基础用法 scrapy (2)- get请求 scrapy (3)- post请求 s 阅读全文

posted @ 2020-05-26 13:18 peng_li 阅读(3753) 评论(0) 推荐(0)

scrapy (4)-请求传参

摘要：”python爬虫系列“目录： Python爬虫（一）-必备基础 Python爬虫（二）- Requests爬虫包及解析工具 xpath Python爬虫（三）- Scrapy爬虫框架系列 scrapy (1)- 基础用法 scrapy (2)- get请求 scrapy (3)- post请求 s 阅读全文

posted @ 2020-05-26 13:17 peng_li 阅读(380) 评论(0) 推荐(0)

scrapy (3)- post请求

摘要：”python爬虫系列“目录： Python爬虫（一）-必备基础 Python爬虫（二）- Requests爬虫包及解析工具 xpath Python爬虫（三）- Scrapy爬虫框架系列 scrapy (1)- 基础用法 scrapy (2)- get请求 scrapy (3)- post请求 s 阅读全文

posted @ 2020-05-26 13:15 peng_li 阅读(319) 评论(2) 推荐(0)

scrapy (2)- get请求

摘要：”python爬虫系列“目录： Python爬虫（一）-必备基础 Python爬虫（二）- Requests爬虫包及解析工具 xpath Python爬虫（三）- Scrapy爬虫框架系列 scrapy (1)- 基础用法 scrapy (2)- get请求 scrapy (3)- post请求 s 阅读全文

posted @ 2020-05-26 13:14 peng_li 阅读(1522) 评论(0) 推荐(0)

requests.session()会话保持

摘要：首先说一下，为什么要进行会话保持的操作？ requests库的session会话对象可以跨请求保持某些参数，说白了，就是比如你使用session成功的登录了某个网站，则在再次使用该session对象请求该网站的其他网页都会默认使用该session之前使用的cookie等参数。尤其是在保持登陆状态时运阅读全文

posted @ 2020-05-19 22:43 peng_li 阅读(2614) 评论(0) 推荐(0)

scrapy (1)- 基础用法

摘要：”python爬虫系列“目录： Python爬虫（一）-必备基础 Python爬虫（二）- Requests爬虫包及解析工具 xpath Python爬虫（三）- Scrapy爬虫框架系列 scrapy (1)- 基础用法 scrapy (2)- get请求 scrapy (3)- post请求 s 阅读全文

posted @ 2020-05-19 18:29 peng_li 阅读(469) 评论(0) 推荐(0)

python 爬虫由于网络或代理不能用导致的问题处理方法

摘要：平时在爬取某些网页的时候，可能是由于网络不好或者代理池中的代理不能用导致请求失败。此时有们需要重复多次去请求，python中有现成的，相应的包供我们使用： 1. 我们可以利用retry模块进行多次请求，如果全部都失败才报错。当然使用retry库之前也需要先安装,eg: 阅读全文

posted @ 2020-05-18 18:25 peng_li 阅读(1746) 评论(0) 推荐(0)

scrapy shell 的使用

摘要：是什么？：是一个终端下的调试工具，用来调试scrapy 安装ipython ：pip install ipython 启动： scrapy shell + 需要请求的url 进来之后，response就是响应对象，可以直接使用 response.text response.body response 阅读全文

posted @ 2020-05-06 16:58 peng_li 阅读(237) 评论(0) 推荐(0)

1