pyppeteer: 连接到已打开的chrome
一,chrome启动调试端口
$ google-chrome --remote-debugging-port=9222 --user-data-dir=/data/python/xianyu/userdata
DevTools listening on ws://127.0.0.1:9222/devtools/browser/faddaa6e-98ec-444e-9710-9b71985b602c
二,从调试端口获得url
访问:注意:地址不能错误,不能缺少后面的version,因为page的地址不能连接到chrome
http://localhost:9222/json/version
如下:

三,python代码:
import asyncio
from lxml import etree
from faker import Faker
from pyppeteer import launch
from pyppeteer.launcher import connect
from bs4 import BeautifulSoup
import pyppeteer
fake = Faker()
URL = 'https://movie.douban.com/explore#!type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=20&page_start=0'
async def main():
debugUrl = 'ws://localhost:9222/devtools/browser/faddaa6e-98ec-444e-9710-9b71985b602c'
browser = await connect(
browserWSEndpoint=debugUrl,
defaultViewport=None,
ignoreHTTPSErrors=True,
ignoreDefaultArgs=['--enable-automation'],
logLevel=3
)
print('end connect')
page = await browser.newPage()
# await page.setUserAgent(fake.user_agent())
await page.goto(URL, options={'timeout': 30000})
await page.evaluate('''()=>{Object.defineProperties(navigator,{webdriver:{get:()=>false}})}''')
await asyncio.sleep(3)
doc1 = await page.content()
浙公网安备 33010602011771号