爬虫 - 随笔分类(第2页) - 干it的小张

空气质量数据爬取-checkpoint.ipynb

摘要：- 分析：- 1.改变页面中的查询条件，然后点击查询按钮，通过抓包工具捕获相关的数据包，最终定位到了想要的空气质量数据对应的数据包- 2.该数据包中发现：post请求携带了一个动态变化且加密的请求参数d，并且请求到的数据也是一组密文数据。- 3.发现点击了查询按钮后发起了一个ajax请求，该请求帮我阅读全文

posted @ 2020-02-22 19:02 干it的小张阅读(438) 评论(0) 推荐(0)

中国空气质量在线监测平台加密数据爬取

摘要：中国空气质量在线监测平台加密数据爬取 - 中国空气质量在线监测分析平台是一个收录全国各大城市天气数据的网站，包括温度、湿度、PM 2.5、AQI 等数据，链接为：https://www.aqistudy.cn/html/city_detail.html,网站显示为：该网站所有的空气质量数据都是基于阅读全文

posted @ 2020-02-22 18:44 干it的小张阅读(501) 评论(0) 推荐(0)

8.梨视频数据的爬取1.py

摘要：import reimport requestsfrom lxml import etreeheaders = { "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Geck 阅读全文

posted @ 2020-02-22 16:26 干it的小张阅读(362) 评论(0) 推荐(0)

#如何规避selenium被检测

摘要：from selenium import webdriverfrom selenium.webdriver import ChromeOptionsfrom time import sleepoption = ChromeOptions()option.add_experimental_option 阅读全文

posted @ 2020-02-22 16:24 干it的小张阅读(1433) 评论(0) 推荐(0)

使用谷歌无头浏览器

摘要：from selenium import webdriverfrom time import sleepfrom selenium.webdriver.chrome.options import Optionschrome_options = Options()chrome_options.add_ 阅读全文

posted @ 2020-02-22 16:23 干it的小张阅读(357) 评论(0) 推荐(0)

cookie的处理、selenium模块在爬虫中的使用、动作链、移动端数据的爬取

摘要：- cookie的处理 - 手动处理 - cookie从抓包工具中捕获封装到headers中 - 自动处理 - session对象。- 代理 - 代理服务器 - 进行请求转发 - 代理ip：port作用到get、post方法的proxies = {'http':'ip:port'}中 - 代理池（列阅读全文

posted @ 2020-02-22 16:06 干it的小张阅读(286) 评论(0) 推荐(0)

6.12306模拟登陆.py

摘要：from selenium import webdriverfrom time import sleepfrom PIL import Imagefrom selenium.webdriver import ActionChainsfrom Cjy import Chaojiying_Clientf 阅读全文

posted @ 2020-02-20 17:48 干it的小张阅读(189) 评论(0) 推荐(0)

超级鹰验证码识别源码：

摘要：import requestsfrom hashlib import md5class Chaojiying_Client(object): def __init__(self, username, password, soft_id): self.username = username passw 阅读全文

posted @ 2020-02-20 17:21 干it的小张阅读(626) 评论(0) 推荐(0)

5.动作链.py菜鸟在线编辑代码

摘要：from selenium import webdriverfrom time import sleepfrom selenium.webdriver import ActionChainsbro = webdriver.Chrome(executable_path='chromedriver.ex 阅读全文

posted @ 2020-02-20 15:46 干it的小张阅读(267) 评论(0) 推荐(0)

4.selenium爬取动态加载的数据.py药监局拿企业名称和企业编号

摘要：from selenium import webdriverfrom time import sleepfrom lxml import etreebro = webdriver.Chrome(executable_path='chromedriver.exe')# 拿到网址bro.get('htt 阅读全文

posted @ 2020-02-20 15:29 干it的小张阅读(379) 评论(0) 推荐(0)

3.selenium的基本操作.py鼠标滑动到页面底部

摘要：from selenium import webdriverfrom time import sleepbro = webdriver.Chrome(executable_path='chromedriver.exe')bro.get('https://www.jd.com/')sleep(1)# 阅读全文

posted @ 2020-02-20 15:07 干it的小张阅读(656) 评论(0) 推荐(1)

requests模块高级.ipynb、获取cookie、代理操作、代理池、爬西刺免费代理IP、爬雪球网、模拟登陆古诗文网、验证码的识别、进程(multiprocessing)中的线程(dummy)、协程、多任务、flask_server、单线程+多任务异步协程在爬虫中的应用、

摘要：- HttpConnectinPool: - 原因： - 1.短时间内发起了高频的请求导致ip被禁 - 2.http连接池中的连接资源被耗尽 - 解决： - 1.代理 - 2.headers中加入Conection：“close” - 代理：代理服务器，可以接受请求然后将其转发。- 匿名度 - 高匿阅读全文

posted @ 2020-02-17 21:59 干it的小张阅读(380) 评论(0) 推荐(0)

数据解析、正则解析、bs4解析、定位标签的操作、xpath解析、爬PM2.5历史数据、爬三国片、爬糗事百科、爬药业、爬视频、爬免费建立模板

摘要：- 数据解析 - 数据解析的作用： - 可以帮助我们实现聚焦爬虫 - 数据解析的实现方式： - 正则 - bs4 - xpath - pyquery - 数据解析的通用原理 - 问题1:聚焦爬虫爬取的数据是存储在哪里的？ - 都被存储在了相关的标签之中and相关标签的属性中 - 1.定位标签 - 2 阅读全文

posted @ 2020-02-15 17:53 干it的小张阅读(340) 评论(0) 推荐(0)

再来爬取4K美女图片

摘要：import requestsimport osfrom lxml import etreedirName = "./4kmeimv"if not os.path.exists(dirName): os.mkdir(dirName)url = "http://pic.netbian.com/4kme 阅读全文

posted @ 2020-02-15 11:27 干it的小张阅读(314) 评论(0) 推荐(0)

爬4k美女图片也很简单：

摘要：import osimport reimport requestsfrom urllib import requestfrom bs4 import BeautifulSoupdirName = './美女图片'if not os.path.exists(dirName): os.mkdir(dir 阅读全文

posted @ 2020-02-14 15:24 干it的小张阅读(354) 评论(0) 推荐(0)

requests作用、参数动态化、什么是UA、反反爬策略

摘要：- requests作用：模拟浏览器发起请求- urllib：requests的前身- requests模块的编码流程：- 指定url- 发起请求：- get（url, params, headers）- post（url, data, headers）- 获取响应数据- 持久化存储- 参数动态化：阅读全文

posted @ 2020-02-14 00:53 干it的小张阅读(612) 评论(0) 推荐(0)

爬虫代理IP设置

摘要：代理网站：http://www.goubanjia.com/ 浏览器访问下试试：阅读全文

posted @ 2020-02-13 22:32 干it的小张阅读(190) 评论(0) 推荐(0)

1.requests模块的基本使用.ipynb

摘要：- 什么是requests模块？ - Python中封装好的一个基于网络请求的模块。- requests模块的作用？ - 用来模拟浏览器发请求- requests模块的环境安装： - pip install requests- requests模块的编码流程： - 1.指定url - 2.发起请求阅读全文

posted @ 2020-02-13 21:12 干it的小张阅读(321) 评论(0) 推荐(0)

干it的小张

随笔分类 - 爬虫

公告