爬虫 - 随笔分类 - 刘宏缔的架构森林

摘要：阅读全文

posted @ 2025-12-06 09:32 刘宏缔的架构森林阅读(6) 评论(0) 推荐(0)

摘要：一，自动关闭标签页代码： def close_browser_tab(): handles = driver.window_handles size = len(handles) if size>5: dest = size-5 for i in range(0,dest): driver.swi 阅读全文

posted @ 2025-12-06 09:10 刘宏缔的架构森林阅读(29) 评论(0) 推荐(0)

chrome driver下载地址

摘要：一，最新版的下载地址： https://googlechromelabs.github.io/chrome-for-testing/ 如图: 二，较旧版本（115以下）的driver下载： https://chromedriver.storage.googleapis.com/index.html 阅读全文

posted @ 2025-12-02 21:17 刘宏缔的架构森林阅读(708) 评论(0) 推荐(0)

selenium: 滚动到页面底部

摘要：一，代码： from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from 阅读全文

posted @ 2025-11-24 19:00 刘宏缔的架构森林阅读(14) 评论(0) 推荐(0)

ddddocr: 滑块验证码的一个例子

摘要：一，代码： from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from 阅读全文

posted @ 2025-11-24 18:22 刘宏缔的架构森林阅读(42) 评论(0) 推荐(0)

ddddocr: 得到滑块的目标位置

摘要：一，代码： import base64 from ddddocr import DdddOcr import numpy as np from PIL import Image import io from PIL import Image, ImageFilter from io import B 阅读全文

posted @ 2025-11-24 15:07 刘宏缔的架构森林阅读(42) 评论(0) 推荐(0)

selenium+pyautogui: 保存页面上图片文件，避免使用requests被限制访问

摘要：一，安装用到的库： linux # apt install python3-tk python3-dev # apt-get install xclip # apt-get install xselect # apt-get install wl-clipboard pip $ pip instal 阅读全文

posted @ 2025-11-24 13:45 刘宏缔的架构森林阅读(18) 评论(0) 推荐(0)

python: 安装pyautogui

摘要：一，安装所需的linux库 # apt install python3-tk python3-dev 二，安装pyautogui $ pip install pyautogui 阅读全文

posted @ 2025-11-24 13:04 刘宏缔的架构森林阅读(4) 评论(0) 推荐(0)

selenium: 移动鼠标到指定元素并悬停hover

摘要：一，代码： from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from 阅读全文

posted @ 2025-11-24 11:55 刘宏缔的架构森林阅读(23) 评论(0) 推荐(0)

ddddocr: 对图片处理提升识别率

摘要：一，识别有误 dataurl: data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAPAAAABQCAMAAAAQlwhOAAAA81BMVEUAAAB/TiCdbD6GVSdoNwnPnnB7ShyDUiTDkmR5SBrVpHbMm22FVCa5iFrU 阅读全文

posted @ 2025-11-22 22:27 刘宏缔的架构森林阅读(41) 评论(0) 推荐(0)

ddddocr: 安装ddddocr

摘要：一，项目官方地址 https://github.com/sml2h3/ddddocr 二，安装 $ pip install ddddocr 三，代码： import base64 from ddddocr import DdddOcr ocr = DdddOcr() img = "data:imag 阅读全文

posted @ 2025-11-22 18:44 刘宏缔的架构森林阅读(172) 评论(0) 推荐(0)

selenium: 找到页面上的指定元素并点击

摘要：一，代码： from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from 阅读全文

posted @ 2025-11-22 15:11 刘宏缔的架构森林阅读(11) 评论(0) 推荐(0)

selenium:连接到已打开的浏览器

摘要：一，代码：首先以调试模式启动浏览器 from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import 阅读全文

posted @ 2025-11-22 14:32 刘宏缔的架构森林阅读(83) 评论(0) 推荐(0)

selenium: 安装selenium

摘要：一，官网：地址： https://www.selenium.dev/ 代码站： https://github.com/SeleniumHQ/selenium 二，安装： $ pip install selenium 三，安装driver 查看chrome的版本: $ google-chrome - 阅读全文

posted @ 2025-11-22 13:43 刘宏缔的架构森林阅读(8) 评论(0) 推荐(0)

pyppeteer: 得到当前运行中的浏览器

摘要：一，代码： import requests from requests.exceptions import HTTPError from pyppeteer.launcher import connect def get_debugger_url(): url = "http://localhost 阅读全文

posted @ 2025-11-21 20:40 刘宏缔的架构森林阅读(10) 评论(0) 推荐(0)

chrome: 允许远程调试

摘要：一，默认不能从远程访问chrome的调试端口：例子： $ google-chrome --remote-debugging-port=9222 --user-data-dir=/data/python/xianyu/userdata 通过局域网ip访问: 本地可以访问: 二，通过端口转发供远程访问阅读全文

posted @ 2025-11-21 19:43 刘宏缔的架构森林阅读(84) 评论(0) 推荐(0)

python:crawl4ai安装

摘要：一，项目地址： https://github.com/unclecode/crawl4ai 二，通过pip安装： $ mkdir crawl4ai $ cd crawl4ai/ $ python3 -m venv venv $ source venv/bin/activate (venv) liu 阅读全文

posted @ 2025-11-20 22:10 刘宏缔的架构森林阅读(75) 评论(0) 推荐(0)

chrome:在linux上打开调试端口9222失败

摘要：一，以无头方式打开时，调试端口会打开 $ google-chrome --headless --remote-debugging-port=9222 DevTools listening on ws://127.0.0.1:9222/devtools/browser/d445e793-89bf-42 阅读全文

posted @ 2025-11-20 10:37 刘宏缔的架构森林阅读(140) 评论(0) 推荐(0)

python: 用pyppeteer以无头方式抓取页面

摘要：一，安装第三方库： $ pip install pyppeteer $ pip install beautifulsoup4 二，代码 import asyncio from pyppeteer import launch async def check(): browser = await lau 阅读全文

posted @ 2025-11-16 17:38 刘宏缔的架构森林阅读(13) 评论(0) 推荐(0)

随笔分类 - 爬虫

公告