2022 年 8月 14 日随笔档案 - 机械猿

2022年8月14日

摘要：解决阻塞问题： import asyncio async def request(url): print("正在请求的url",url) print('请求成功') return url #async修饰的函数，调用之后返回一个协程对象 c = request("www.baidu.com") # 阅读全文

posted @ 2022-08-14 11:42 机械猿阅读(37) 评论(0) 推荐(0)

六、aiphttp高性能异步爬虫实现

摘要： import asyncio #协程模块引入 import time import aiohttp #实现异步操作的模块用于替代request模块（同步模块） start = time.time() urls =['http://XXX','http://xxx1']#异步请求列表 async de 阅读全文

posted @ 2022-08-14 11:05 机械猿阅读(107) 评论(0) 推荐(0)

四.验证码

摘要： 1.验证码是门户网站的反爬机制（1）反爬机制：验证码：识别验证码图片中的数据，用于模拟登陆。（2）识别验证码的操作： -人工肉眼识别（肉眼识别） - 第三方自动识别（推荐方式）阅读全文

posted @ 2022-08-14 10:17 机械猿阅读(87) 评论(0) 推荐(0)

三、爬虫数据分析-Xpath

摘要： 1.环境安装： -pip install lxml 2.如何实例化一个etree对象： from lxml import etree（1）将本地的html文档中的源码数据加载到etree对象中： etree.parse(filePath) (2)可以将从互联网上获取的源码数据加载到etree对象中：阅读全文

posted @ 2022-08-14 10:16 机械猿阅读(58) 评论(0) 推荐(0)

jxyl

公告