pyppeteer: 连接到已打开的chrome

一,chrome启动调试端口

$ google-chrome --remote-debugging-port=9222 --user-data-dir=/data/python/xianyu/userdata

DevTools listening on ws://127.0.0.1:9222/devtools/browser/faddaa6e-98ec-444e-9710-9b71985b602c

 

二,从调试端口获得url

访问:注意:地址不能错误,不能缺少后面的version,因为page的地址不能连接到chrome

http://localhost:9222/json/version

如下:

image

三,python代码:

import asyncio

from lxml import etree
from faker import Faker
from pyppeteer import launch
from pyppeteer.launcher import connect
from bs4 import BeautifulSoup

import pyppeteer

fake = Faker()
URL = 'https://movie.douban.com/explore#!type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=20&page_start=0'

async def main():

    debugUrl = 'ws://localhost:9222/devtools/browser/faddaa6e-98ec-444e-9710-9b71985b602c'
    browser = await connect(
                browserWSEndpoint=debugUrl,
                defaultViewport=None,
                ignoreHTTPSErrors=True,
                ignoreDefaultArgs=['--enable-automation'],
                logLevel=3
            )

    print('end connect')
    page = await browser.newPage()


    # await page.setUserAgent(fake.user_agent())
    await page.goto(URL, options={'timeout': 30000})
    await page.evaluate('''()=>{Object.defineProperties(navigator,{webdriver:{get:()=>false}})}''')
    await asyncio.sleep(3)
    doc1 = await page.content()
    
    

 

posted @ 2025-11-20 12:08  刘宏缔的架构森林  阅读(20)  评论(0)    收藏  举报