python--playwright修改GET/POST请求参数,抓包修改数据

首先,playwright修改GET/POST请求参数后在浏览器(chromium)的network面板的入参是没有变化的,但实际上传给服务端的参数是已经发生变化了的,下面先搭建了一个返回入参的flask服务,地址为"http://127.0.0.1:8083"。

接着通过playwright分别发送GET和POST请求,参数均为:{"key1": "value1", "key2": "value2"},并实现如下功能:

1、将GET请求的key1的值修改为“GET”;

2、将POST请求的key1的值修改为“POST”。

首先需要一个方法处理GET/POST的参数,将key1的值做修改:

async def handle_route(route):
    url = route.request.url
    if route.request.method == "GET":
        print(f"GET请求url为:{route.request.url}")
        bits = list(parse.urlparse(url))
        qs = parse.parse_qs(bits[4])
        qs["key1"] = ["GET"] # 此处替换key1的值
        bits[4] = parse.urlencode(qs, True)
        url = parse.urlunparse(bits)
        print(f"改变后的url为: {url}\n")
        await route.continue_(url=url)
    elif route.request.method == "POST":
        print(f"POST请求入参为:{route.request.post_data}")
        text_list = route.request.post_data.split("&")
        for i in range(len(text_list)):
            text_item = text_list[i]
            if "key1=" in text_item:
                text_list[i] = "key1=POST" # 此处替换key1的值
        print(f"改变后的数据为: {'&'.join(text_list)}\n")
        await route.continue_(post_data="&".join(text_list))

然后就可以通过playwright的contexts.route或page.route对请求进行拦截处理,代码如下:

async def main():
    url = "http://127.0.0.1:8083"
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        context = await browser.new_context()
        page = await context.new_page()
        await context.route("*/**", lambda route: handle_route(route))
        await page.goto(f"{url}?key1=value1&key2=value2")
        await page.wait_for_load_state('networkidle')
 
        print("--新建窗口,使用js执行post请求--")
        page = await context.new_page()
        await page.evaluate(
        """
            //发送POST请求跳转到指定页面
            function httpPost(URL, PARAMS) {
                var temp = document.createElement("form");
                temp.action = URL;
                temp.method = "post";
                temp.style.display = "none";
                for (var x in PARAMS) {
                    var opt = document.createElement("textarea");
                    opt.name = x;
                    opt.value = PARAMS[x];
                    temp.appendChild(opt);
                }
                document.body.appendChild(temp);
                temp.submit();
                return temp;
            }
            httpPost('""" + url + """', {"key1": "value1", "key2": "value2"})
        """)
        await page.wait_for_timeout(1000)
        input("任意键关闭浏览器")
        await browser.close()
        await p.stop()
 
 
if __name__ == "__main__":
    asyncio.run(main())

拦截更改网络请求

可以通过 page.on("request") 和 page.on("response") 来监听请求和响应事件。

from playwright.sync_api import sync_playwright as playwright
 
def run(pw):
    browser = pw.webkit.launch()
    page = browser.new_page()
    # Subscribe to "request" and "response" events.
    page.on("request", lambda request: print(">>", request.method, request.url))
    page.on("response", lambda response: print("<<", response.status, response.url))
    page.goto("https://example.com")
    browser.close()
 
with playwright() as pw:
    run(pw)

其中 request 和 response 的属性和方法,可以查阅文档:https://playwright.dev/python/docs/api/class-request

通过 context.route, 还可以伪造修改拦截请求等。比如说,拦截所有的图片请求以减少带宽占用:

context = browser.new_context()
page = context.new_page()
# route 的参数默认是通配符,也可以传递编译好的正则表达式对象
context.route("**/*.{png,jpg,jpeg}", lambda route: route.abort())
context.route(re.compile(r"(\.png$)|(\.jpg$)"), lambda route: route.abort())
page.goto("https://example.com")
browser.close()

其中 route 对象的相关属性和方法,可以查阅文档:https://playwright.dev/python/docs/api/class-route

灵活设置代理

Playwright 还可以很方便地设置代理。Puppeteer 在打开浏览器之后就无法在更改代理了,对于爬虫类应用非常不友好,而 Playwright 可以通过 Context 设置代理,这样就非常轻量,不用为了切换代理而重启浏览器。

context = browser.new_context(
    proxy={"server": "http://example.com:3128", "bypass": ".example.com", "username": "", "password": ""}
)

 

最终服务端接收到的GET请求结果为:

 POST请求结果为:

 

posted @ 2023-10-25 11:22  mingruqi  阅读(584)  评论(0编辑  收藏  举报