爬虫 - 随笔分类 - iUpoint

抓取url中图片并保存到本地demo

摘要：import requests from lxml import etree from furl import furl url = 'https://dsd.com' html = requests.get(url).text #re.findall('"objURL":"(.*?)",',htm 阅读全文

posted @ 2021-11-30 14:46 iUpoint 阅读(137) 评论(0) 推荐(0)

使用 lxml 中的 xpath 高效提取文本与标签属性值

摘要：转自：使用 lxml 中的 xpath 高效提取文本与标签属性值 # 我们爬取网页的目的，无非是先定位到DOM树的节点，然后取其文本或属性值 myPage = '''<html> <title>TITLE</title> <body> <h1>我的博客</h1> <div>我的文章</div> <d 阅读全文

posted @ 2021-11-30 12:54 iUpoint 阅读(892) 评论(0) 推荐(0)

爬虫示例

摘要：import requests class MyRequests: # 初始化方法 def __init__(self): # 请求头 self.headers = {"X-Lemonban-Media-Type": "lemonban.v2"} # 属性 # 方法 post/put.. json= 阅读全文

posted @ 2021-06-03 09:06 iUpoint 阅读(64) 评论(0) 推荐(0)

selenium笔记

摘要：selenium 元素定位方法通过id定位元素：find_element_by_id("id_vaule") 通过name定位元素：find_element_by_name("name_vaule") 通过tag_name定位元素：find_element_by_tag_name("tag_nam 阅读全文

posted @ 2021-02-23 17:02 iUpoint 阅读(48) 评论(0) 推荐(0)

python爬虫 - 异步多任务

摘要：异步爬虫批量下载图片，文件下载链接已失效，不要直接运行 # 异步批量下载 import aiohttp import asyncio import time async def job(session, url): # 声明为异步函数 name = url.split('/')[-1] # 获得名字阅读全文

posted @ 2020-12-08 16:01 iUpoint 阅读(263) 评论(0) 推荐(0)

python爬虫 - 代理ip的使用

摘要：python爬虫 - 代理ip的使用 import sys import time import hashlib import requests import urllib3 from lxml import etree urllib3.disable_warnings(urllib3.except 阅读全文

posted @ 2020-12-03 13:48 iUpoint 阅读(346) 评论(0) 推荐(0)

爬代理ip并验证可用性

摘要：爬代理ip # -*- coding: utf-8 -*- """ Created on Thu Aug 13 17:30:36 2020 @author: Administrator """ #生成可用代理ip#python版本2.7 import sys import time import r 阅读全文

posted @ 2020-10-27 15:40 iUpoint 阅读(232) 评论(0) 推荐(0)

多线程爬虫

摘要：爬取糗事百科 # 使用了线程库 import threading # 队列 from Queue import Queue # 解析库 from lxml import etree # 请求处理 import requests # json处理 import json import time cla 阅读全文

posted @ 2019-12-09 15:02 iUpoint 阅读(154) 评论(0) 推荐(0)

爬取沪深a股数据

摘要：首先从东方财富网获取股票代码再从网易财经下载股票历史数据参考资料：爬虫：爬取股票历史交易数据爬取东方财富股票信息网 Python爬虫（5）：比Selenium快100倍的方法爬东方财富网财务报表阅读全文

posted @ 2019-08-19 11:05 iUpoint 阅读(2418) 评论(0) 推荐(0)

爬虫资料

摘要：python爬虫从入门到放弃系列 Python 爬虫基础Selenium库的使用 python学习指南 selenium.webdriver chromedriver INDEX url encode/decode FontEditor、json验证、Unicode编码转换、regExr 网络爬虫教阅读全文

posted @ 2019-04-25 10:36 iUpoint 阅读(245) 评论(0) 推荐(0)

随笔分类 - 爬虫