CDP 协议自动化实战:用 Chrome DevTools Protocol 控制 Electron 桌面应用

Electron自动化的痛点

Electron应用本质上是"一个定制的Chromium浏览器 + Node.js运行时"。市面上大部分UI自动化框架都无法很好地处理这个混合体:


常见方案的问题:
┌───────────────────┬──────────────────────────────────────────┐
│ 方案              │ 问题                                     │
├───────────────────┼──────────────────────────────────────────┤
│ Selenium          │ 只能控制浏览器,无法连接Electron进程     │
│ PyAutoGUI         │ 基于屏幕坐标,分辨率/缩放变了就挂        │
│ WinAppDriver      │ 无法穿透Chromium渲染层到达DOM            │
│ Spectron(已停维)  │ 官方弃坑,不再维护                       │
│ Playwright        │ 对Electron支持有限,部分API不可用         │
└───────────────────┴──────────────────────────────────────────┘

核心矛盾在于:Electron的UI是Web技术(HTML/CSS/JS),但它不是运行在标准浏览器中。 需要一个能直接连接Electron内部Chromium引擎的通道——这就是CDP协议的价值所在。

CDP协议原理

什么是CDP

Chrome DevTools Protocol(CDP)是Chromium暴露的一套调试协议。你平时用Chrome DevTools(F12)看到的Elements、Console、Network面板,底层都是通过CDP协议与浏览器通信的。


CDP协议架构:
┌──────────────────────────────────────────────────────────┐
│                    Electron应用                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │                Chromium引擎                         │  │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐         │  │
│  │  │ Renderer │  │ Renderer │  │   GPU    │         │  │
│  │  │ Process  │  │ Process  │  │ Process  │         │  │
│  │  │(BrowserWindow)│(BrowserWindow)│        │         │  │
│  │  └──────────┘  └──────────┘  └──────────┘         │  │
│  │                                                      │  │
│  │  ┌──────────────────────────────────────────┐       │  │
│  │  │            Browser Process                │       │  │
│  │  │  ┌──────────────────────────────────┐    │       │  │
│  │  │  │   DevTools Server (CDP)           │    │       │  │
│  │  │  │   ws://127.0.0.1:9222            │◄───┼──┐    │  │
│  │  │  └──────────────────────────────────┘    │  │    │  │
│  │  └──────────────────────────────────────────┘  │    │  │
│  └────────────────────────────────────────────────────┘  │
│                                                          │
│  ┌────────────────────────────────────────────────────┐  │
│  │                Node.js Runtime                      │  │
│  │  · 文件系统  · 网络  · 系统API  · 原生模块        │  │
│  └────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────┘
                                                    │
                              WebSocket连接          │
                              ┌─────────────────────┘
                              ▼
                    ┌──────────────────┐
                    │  自动化脚本      │
                    │  (Python/Node)   │
                    └──────────────────┘

CDP的Domain模型

CDP协议按功能划分为多个Domain,每个Domain提供一组方法和事件:

| Domain | 功能 | 自动化常用方法 |

|--------|------|--------------|

| Page | 页面管理 | navigate, reload, captureScreenshot |

| DOM | DOM操作 | getDocument, querySelector, getOuterHTML |

| Runtime | JS执行 | evaluate, callFunctionOn |

| Input | 输入模拟 | dispatchMouseEvent, dispatchKeyEvent |

| Network | 网络拦截 | enable, interceptRequests |

| Emulation | 设备模拟 | setDeviceMetricsOverride |

| Target | 目标管理 | getTargets, attachToTarget |

启动配置:开启远程调试端口

基本配置

Electron应用需要以特定参数启动才能开启CDP端口:


# 方式一:直接命令行启动
./your-electron-app --remote-debugging-port=9222

# 方式二:带额外调试参数
./your-electron-app \
  --remote-debugging-port=9222 \
  --remote-debugging-address=0.0.0.0 \
  --no-sandbox

# 方式三:如果是已打包的应用(Windows)
"C:\Program Files\YourApp\YourApp.exe" --remote-debugging-port=9222

# 方式四:macOS
open -a "YourApp" --args --remote-debugging-port=9222

验证CDP端口是否开启


# 检查端口是否监听
curl http://127.0.0.1:9222/json/version

# 预期返回:
{
  "Browser": "Chrome/120.0.6099.0",
  "Protocol-Version": "1.3",
  "webSocketDebuggerUrl": "ws://127.0.0.1:9222/devtools/browser/xxxx-xxxx"
}

# 列出所有可调试的页面/窗口
curl http://127.0.0.1:9222/json/list

在Electron代码中配置(如果你有源码权限)


// main.js - Electron主进程
const { app, BrowserWindow } = require('electron');

app.commandLine.appendSwitch('remote-debugging-port', '9222');
app.commandLine.appendSwitch('remote-debugging-address', '0.0.0.0');

app.on('ready', () => {
  const mainWindow = new BrowserWindow({
    width: 1200,
    height: 800,
    webPreferences: {
      nodeIntegration: false,
      contextIsolation: true
    }
  });
  
  mainWindow.loadFile('index.html');
  
  // 获取DevTools WebSocket地址
  const debugger_url = mainWindow.webContents.debugger;
  console.log('CDP available at:', 
    `ws://127.0.0.1:9222/devtools/page/${mainWindow.webContents.id}`
  );
});

WebSocket连接:建立CDP通信

Python实现


import asyncio
import json
import websockets

class CDPClient:
    """轻量级CDP客户端"""
    
    def __init__(self, ws_url: str):
        self.ws_url = ws_url
        self.ws = None
        self.message_id = 0
        self.responses = {}
    
    async def connect(self):
        """建立WebSocket连接"""
        self.ws = await websockets.connect(self.ws_url)
        # 启动消息接收循环
        asyncio.create_task(self._receive_loop())
    
    async def _receive_loop(self):
        """持续接收CDP响应"""
        async for message in self.ws:
            data = json.loads(message)
            if 'id' in data:
                self.responses[data['id']] = data
    
    async def send_command(self, method: str, params: dict = None) -> dict:
        """发送CDP命令"""
        self.message_id += 1
        msg = {
            "id": self.message_id,
            "method": method,
            "params": params or {}
        }
        await self.ws.send(json.dumps(msg))
        
        # 等待响应
        while self.message_id not in self.responses:
            await asyncio.sleep(0.01)
        
        return self.responses.pop(self.message_id)
    
    async def close(self):
        await self.ws.close()


# 使用示例
async def main():
    # 1. 获取WebSocket URL
    import aiohttp
    async with aiohttp.ClientSession() as session:
        async with session.get('http://127.0.0.1:9222/json/list') as resp:
            targets = await resp.json()
    
    # 选择第一个page类型的target
    page_target = next(t for t in targets if t['type'] == 'page')
    ws_url = page_target['webSocketDebuggerUrl']
    
    # 2. 连接CDP
    cdp = CDPClient(ws_url)
    await cdp.connect()
    
    # 3. 启用必要的Domain
    await cdp.send_command("Page.enable")
    await cdp.send_command("DOM.enable")
    await cdp.send_command("Runtime.enable")
    
    # 4. 获取页面标题
    result = await cdp.send_command("Runtime.evaluate", {
        "expression": "document.title"
    })
    print(f"Page title: {result['result']['result']['value']}")
    
    await cdp.close()

asyncio.run(main())

Node.js实现(使用chrome-remote-interface)


const CDP = require('chrome-remote-interface');

async function main() {
  // 连接到Electron应用
  const client = await CDP({
    host: '127.0.0.1',
    port: 9222,
    target: (targets) => {
      // 选择主窗口
      return targets.find(t => t.type === 'page' && t.url !== 'about:blank');
    }
  });

  const { Page, DOM, Runtime, Input } = client;

  // 启用必要的Domain
  await Promise.all([
    Page.enable(),
    DOM.enable(),
    Runtime.enable()
  ]);

  // 获取页面标题
  const { result } = await Runtime.evaluate({
    expression: 'document.title'
  });
  console.log('Page title:', result.value);

  await client.close();
}

main().catch(console.error);

DOM操作:元素定位与交互

获取DOM树


async def get_element(cdp: CDPClient, selector: str) -> dict:
    """通过CSS选择器定位元素"""
    # 获取文档根节点
    doc = await cdp.send_command("DOM.getDocument")
    root_node_id = doc['result']['root']['nodeId']
    
    # 通过选择器查询
    result = await cdp.send_command("DOM.querySelector", {
        "nodeId": root_node_id,
        "selector": selector
    })
    
    node_id = result['result']['nodeId']
    if node_id == 0:
        raise ElementNotFoundError(f"Element not found: {selector}")
    
    # 获取元素信息
    element = await cdp.send_command("DOM.describeNode", {
        "nodeId": node_id,
        "depth": 0
    })
    
    return {
        "nodeId": node_id,
        "nodeName": element['result']['node']['nodeName'],
        "attributes": element['result']['node'].get('attributes', [])
    }


async def get_element_box(cdp: CDPClient, node_id: int) -> dict:
    """获取元素的位置和尺寸"""
    result = await cdp.send_command("DOM.getBoxModel", {
        "nodeId": node_id
    })
    
    model = result['result']['model']
    # content区域的四个角坐标 [x1,y1, x2,y2, x3,y3, x4,y4]
    content = model['content']
    
    # 计算中心点
    center_x = (content[0] + content[2] + content[4] + content[6]) / 4
    center_y = (content[1] + content[3] + content[5] + content[7]) / 4
    
    return {
        "center": (center_x, center_y),
        "width": model['width'],
        "height": model['height'],
        "content": content
    }

模拟用户输入


async def click_element(cdp: CDPClient, selector: str):
    """点击指定元素"""
    element = await get_element(cdp, selector)
    box = await get_element_box(cdp, element['nodeId'])
    x, y = box['center']
    
    # 模拟鼠标点击(mousedown → mouseup)
    await cdp.send_command("Input.dispatchMouseEvent", {
        "type": "mousePressed",
        "x": x,
        "y": y,
        "button": "left",
        "clickCount": 1
    })
    await cdp.send_command("Input.dispatchMouseEvent", {
        "type": "mouseReleased",
        "x": x,
        "y": y,
        "button": "left",
        "clickCount": 1
    })


async def type_text(cdp: CDPClient, selector: str, text: str):
    """在输入框中输入文本"""
    # 先点击输入框获取焦点
    await click_element(cdp, selector)
    await asyncio.sleep(0.1)
    
    # 清空现有内容
    await cdp.send_command("Runtime.evaluate", {
        "expression": f"""
            (function() {{
                const el = document.querySelector('{selector}');
                el.value = '';
                el.dispatchEvent(new Event('input', {{bubbles: true}}));
            }})()
        """
    })
    
    # 逐字符输入
    for char in text:
        await cdp.send_command("Input.dispatchKeyEvent", {
            "type": "keyDown",
            "text": char,
            "key": char,
            "unmodifiedText": char
        })
        await cdp.send_command("Input.dispatchKeyEvent", {
            "type": "keyUp",
            "key": char
        })
        await asyncio.sleep(0.02)  # 模拟人类输入速度


async def scroll_to_element(cdp: CDPClient, selector: str):
    """滚动到指定元素可见"""
    await cdp.send_command("Runtime.evaluate", {
        "expression": f"""
            document.querySelector('{selector}')
                .scrollIntoView({{behavior: 'smooth', block: 'center'}})
        """
    })
    await asyncio.sleep(0.5)  # 等待滚动完成

截图与视觉对比

页面截图


import base64
from PIL import Image
import io

async def take_screenshot(cdp: CDPClient, save_path: str, 
                          clip: dict = None) -> str:
    """
    截取页面截图
    clip: 可选,指定裁剪区域 {x, y, width, height, scale}
    """
    params = {
        "format": "png",
        "quality": 100
    }
    
    if clip:
        params["clip"] = clip
    
    result = await cdp.send_command("Page.captureScreenshot", params)
    
    # 解码Base64图片数据
    image_data = base64.b64decode(result['result']['data'])
    
    # 保存文件
    with open(save_path, 'wb') as f:
        f.write(image_data)
    
    return save_path


async def take_element_screenshot(cdp: CDPClient, selector: str, 
                                  save_path: str) -> str:
    """截取指定元素的截图"""
    element = await get_element(cdp, selector)
    box = await get_element_box(cdp, element['nodeId'])
    
    content = box['content']
    x_min = min(content[0], content[2], content[4], content[6])
    y_min = min(content[1], content[3], content[5], content[7])
    
    clip = {
        "x": x_min,
        "y": y_min,
        "width": box['width'],
        "height": box['height'],
        "scale": 1
    }
    
    return await take_screenshot(cdp, save_path, clip)

截图对比(回归测试)


import numpy as np
from PIL import Image, ImageChops

class ScreenshotComparator:
    """截图对比器,用于UI回归测试"""
    
    def __init__(self, threshold: float = 0.01):
        """
        threshold: 允许的像素差异比例(0.01 = 1%)
        """
        self.threshold = threshold
    
    def compare(self, baseline_path: str, current_path: str) -> dict:
        """对比两张截图"""
        baseline = Image.open(baseline_path).convert('RGB')
        current = Image.open(current_path).convert('RGB')
        
        # 尺寸检查
        if baseline.size != current.size:
            return {
                "match": False,
                "reason": "size_mismatch",
                "baseline_size": baseline.size,
                "current_size": current.size
            }
        
        # 像素差异计算
        diff = ImageChops.difference(baseline, current)
        diff_array = np.array(diff)
        
        # 统计非零像素
        non_zero_pixels = np.count_nonzero(diff_array.sum(axis=2))
        total_pixels = diff_array.shape[0] * diff_array.shape[1]
        diff_ratio = non_zero_pixels / total_pixels
        
        # SSIM结构相似度(简化版)
        ssim = self._calculate_ssim(
            np.array(baseline).astype(float),
            np.array(current).astype(float)
        )
        
        return {
            "match": diff_ratio <= self.threshold,
            "diff_ratio": round(diff_ratio, 6),
            "ssim": round(ssim, 4),
            "non_zero_pixels": non_zero_pixels,
            "total_pixels": total_pixels
        }
    
    def _calculate_ssim(self, img1: np.ndarray, img2: np.ndarray) -> float:
        """计算结构相似度(简化版)"""
        C1 = (0.01 * 255) ** 2
        C2 = (0.03 * 255) ** 2
        
        mu1 = img1.mean()
        mu2 = img2.mean()
        sigma1 = img1.std()
        sigma2 = img2.std()
        sigma12 = np.mean((img1 - mu1) * (img2 - mu2))
        
        ssim = ((2 * mu1 * mu2 + C1) * (2 * sigma12 + C2)) / \
               ((mu1**2 + mu2**2 + C1) * (sigma1**2 + sigma2**2 + C2))
        
        return float(ssim)

完整自动化测试框架

框架封装


class ElectronAutomator:
    """Electron自动化测试框架"""
    
    def __init__(self, app_path: str, debug_port: int = 9222):
        self.app_path = app_path
        self.debug_port = debug_port
        self.cdp = None
        self.process = None
    
    async def start(self):
        """启动Electron应用并连接CDP"""
        import subprocess
        import time
        
        # 启动应用
        self.process = subprocess.Popen([
            self.app_path,
            f'--remote-debugging-port={self.debug_port}',
            '--no-sandbox'
        ])
        
        # 等待CDP端口就绪
        for _ in range(30):
            try:
                import aiohttp
                async with aiohttp.ClientSession() as session:
                    async with session.get(
                        f'http://127.0.0.1:{self.debug_port}/json/list'
                    ) as resp:
                        if resp.status == 200:
                            break
            except:
                pass
            await asyncio.sleep(1)
        else:
            raise TimeoutError("CDP port not available after 30s")
        
        # 连接到第一个page
        async with aiohttp.ClientSession() as session:
            async with session.get(
                f'http://127.0.0.1:{self.debug_port}/json/list'
            ) as resp:
                targets = await resp.json()
        
        page = next(t for t in targets if t['type'] == 'page')
        self.cdp = CDPClient(page['webSocketDebuggerUrl'])
        await self.cdp.connect()
        
        # 启用必要的Domain
        await self.cdp.send_command("Page.enable")
        await self.cdp.send_command("DOM.enable")
        await self.cdp.send_command("Runtime.enable")
    
    async def navigate(self, url: str):
        """导航到指定URL"""
        await self.cdp.send_command("Page.navigate", {"url": url})
        await asyncio.sleep(1)  # 等待页面加载
    
    async def wait_for_selector(self, selector: str, timeout: float = 10):
        """等待元素出现"""
        start = time.time()
        while time.time() - start < timeout:
            try:
                element = await get_element(self.cdp, selector)
                if element['nodeId'] != 0:
                    return element
            except:
                pass
            await asyncio.sleep(0.5)
        raise TimeoutError(f"Element not found: {selector}")
    
    async def execute_js(self, expression: str) -> any:
        """执行JavaScript表达式"""
        result = await self.cdp.send_command("Runtime.evaluate", {
            "expression": expression,
            "returnByValue": True
        })
        return result['result']['result'].get('value')
    
    async def stop(self):
        """关闭应用"""
        if self.cdp:
            await self.cdp.close()
        if self.process:
            self.process.terminate()


# 使用示例
async def test_login_flow():
    """测试登录流程"""
    bot = ElectronAutomator("/path/to/your-app")
    await bot.start()
    
    try:
        # 等待登录页面加载
        await bot.wait_for_selector('#login-form')
        
        # 输入用户名和密码
        await type_text(bot.cdp, '#username', 'testuser@example.com')
        await type_text(bot.cdp, '#password', 'securePassword123')
        
        # 点击登录按钮
        await click_element(bot.cdp, '#login-button')
        
        # 等待登录完成(主页出现)
        await bot.wait_for_selector('.dashboard-container', timeout=15)
        
        # 验证登录状态
        title = await bot.execute_js('document.title')
        assert 'Dashboard' in title, f"Expected Dashboard, got: {title}"
        
        # 截图保存
        await take_screenshot(bot.cdp, '/tmp/login_success.png')
        
        print("✓ Login flow test passed")
    
    finally:
        await bot.stop()

asyncio.run(test_login_flow())

常见坑与解决方案

坑一:多窗口/BrowserWindow处理


问题:Electron应用可能有多个BrowserWindow,CDP连接的是哪个?

解决方案:
┌──────────────────────────────────────────────────────────┐
│  1. 通过/json/list获取所有target                        │
│  2. 根据URL或title选择正确的窗口                         │
│  3. 如果需要操作新弹出的窗口,需要重新连接target        │
│                                                          │
│  # 列出所有窗口                                         │
│  curl http://127.0.0.1:9222/json/list                   │
│                                                          │
│  # 返回示例:                                           │
│  [                                                      │
│    {"type":"page", "url":"file:///index.html",          │
│     "title":"Main Window", "id":"abc123"},              │
│    {"type":"page", "url":"file:///settings.html",       │
│     "title":"Settings", "id":"def456"},                 │
│    {"type":"background_page", "url":"...",              │
│     "title":"Service Worker", "id":"ghi789"}            │
│  ]                                                      │
└──────────────────────────────────────────────────────────┘

坑二:Electron的Node集成导致的安全限制


# 问题:某些Electron应用禁用了webPreferences中的某些功能
# 导致Runtime.evaluate无法访问Node.js API

# 解决方案:通过main process的IPC通道执行Node操作
async def execute_node_code(cdp: CDPClient, code: str):
    """通过IPC在main process执行Node代码"""
    # 在renderer process中发送IPC消息
    await cdp.send_command("Runtime.evaluate", {
        "expression": f"""
            (function() {{
                const {{ ipcRenderer }} = require('electron');
                ipcRenderer.send('automation-execute', `{code}`);
                return new Promise((resolve) => {{
                    ipcRenderer.once('automation-result', (event, result) => {{
                        resolve(JSON.stringify(result));
                    }});
                }});
            }})()
        """,
        "awaitPromise": True
    })

坑三:Shadow DOM和Web Components


# 问题:querySelectorAll无法穿透Shadow DOM
# 解决方案:使用deep combinator 或 递归查询

async def query_shadow_dom(cdp: CDPClient, selector_chain: list) -> int:
    """
    穿透Shadow DOM查询元素
    selector_chain: ['#host-element', 'shadow::part-name', '#inner-element']
    """
    js_code = """
    (function() {
        const selectors = SELECTORS_PLACEHOLDER;
        let current = document;
        
        for (let i = 0; i < selectors.length; i++) {
            const sel = selectors[i];
            if (sel.startsWith('shadow::')) {
                // 进入shadow root
                const partName = sel.replace('shadow::', '');
                const host = current.querySelector(partName);
                current = host.shadowRoot;
            } else {
                current = current.querySelector(sel);
                if (!current) return null;
            }
        }
        return current;
    })()
    """.replace("SELECTORS_PLACEHOLDER", json.dumps(selector_chain))
    
    result = await cdp.send_command("Runtime.evaluate", {
        "expression": js_code
    })
    return result

坑四:文件上传对话框


# 问题:点击上传按钮会弹出系统文件选择对话框
# CDP无法控制操作系统原生对话框

# 解决方案:使用DOM.setFileInputFiles直接设置文件
async def upload_file(cdp: CDPClient, input_selector: str, file_paths: list):
    """
    绕过系统对话框,直接设置文件上传
    """
    # 找到input[type=file]元素
    element = await get_element(cdp, input_selector)
    
    # 通过CDP直接设置文件路径(不触发系统对话框)
    await cdp.send_command("DOM.setFileInputFiles", {
        "nodeId": element['nodeId'],
        "files": file_paths  # 绝对路径列表
    })

# 使用示例
await upload_file(cdp, 'input[type="file"]', [
    '/home/user/documents/report.pdf',
    '/home/user/images/screenshot.png'
])

坑五:WebSocket重连和连接超时


class RobustCDPClient(CDPClient):
    """带自动重连的CDP客户端"""
    
    def __init__(self, ws_url: str, max_retries: int = 3):
        super().__init__(ws_url)
        self.max_retries = max_retries
        self._connected = False
    
    async def send_command(self, method: str, params: dict = None) -> dict:
        """带重试的命令发送"""
        for attempt in range(self.max_retries):
            try:
                if not self._connected:
                    await self.connect()
                
                return await asyncio.wait_for(
                    super().send_command(method, params),
                    timeout=30.0
                )
            except (websockets.ConnectionClosed, asyncio.TimeoutError) as e:
                self._connected = False
                if attempt < self.max_retries - 1:
                    await asyncio.sleep(2 ** attempt)  # 指数退避
                else:
                    raise CDPConnectionError(
                        f"Failed after {self.max_retries} attempts: {e}"
                    )

性能优化建议


CDP自动化性能优化清单:
┌──────────────────────────────────────────────────────────────┐
│  1. 减少Round-trip                                           │
│     · 批量命令使用Runtime.evaluate一次执行多步操作           │
│     · 避免逐个获取元素属性,用一次JS调用返回所有需要的数据   │
│                                                              │
│  2. 避免频繁的DOM查询                                        │
│     · 缓存nodeId,但要注意DOM变更后需要刷新                │
│     · 对于稳定的元素,使用XPath替代CSS选择器(更快)        │
│                                                              │
│  3. 截图优化                                                │
│     · 使用JPEG格式(比PNG快3-5倍)                          │
│     · 只在需要时截取全屏,平时只截取关注区域                │
│     · 降低截图分辨率(Emulation.setDeviceMetricsOverride)   │
│                                                              │
│  4. 并行操作                                                │
│     · 多个BrowserWindow可以并行操作                         │
│     · 独立的测试用例可以并行执行                             │
│                                                              │
│  5. 避免不必要的等待                                        │
│     · 用waitForSelector替代固定sleep                        │
│     · 用Page.loadEventFired替代固定等待时间                 │
└──────────────────────────────────────────────────────────────┘

CDP协议为Electron应用的自动化提供了一条"官方通道"。它不依赖屏幕坐标、不依赖第三方框架、直接与Chromium引擎对话。唯一的"代价"是API层级较低,需要自己封装一些便利方法。但对于需要高稳定性和深度控制的场景(如CI/CD自动化测试、RPA流程自动化),这套方案是目前最可靠的选择。

在实际项目中,建议将上述代码封装为一个内部SDK,上层业务只需关注"点击哪个按钮、输入什么内容、验证什么结果",底层的CDP通信细节对使用者透明。这样既保证了自动化的稳定性,也降低了团队成员的使用门槛。


原文链接:https://wenyiblog.top/2026/06/cdp-electron-automation/

首发于文艺技术笔记(wenyiblog.top),转载请注明出处。

posted @ 2026-06-22 19:29  软件工程师文艺  阅读(1)  评论(0)    收藏  举报