python:crawl4ai安装

一,项目 地址:

https://github.com/unclecode/crawl4ai

 

二,通过pip安装:

$ mkdir crawl4ai
$ cd crawl4ai/
$ python3 -m venv venv
$ source venv/bin/activate
(venv) liuhongdi@liuhongdi-pc:/data/python/crawl4ai$ pip install -U crawl4ai
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple

执行安装命令:

(venv) liuhongdi@liuhongdi-pc:/data/python/crawl4ai$ crawl4ai-setup

三,测试效果:

import asyncio
from crawl4ai import *

async def main():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            # url="https://movie.douban.com/explore?support_type=movie&is_all=false&category=%E7%83%AD%E9%97%A8&type=%E5%85%A8%E9%83%A8",
            url="https://baidu.com",
            # js_code="window.scrollTo(0, document.body.scrollHeight);",
            timeout=6000,  # 6秒超时
            # wait_for="document.querySelector('.drc-subject-card')",
            # wait_for="css:.drc-subject-card"
            
        )
        print(result.markdown)
        html_content = result.model_dump_json()
        print(html_content)

if __name__ == "__main__":
    asyncio.run(main())

 

posted @ 2025-11-20 22:10  刘宏缔的架构森林  阅读(0)  评论(0)    收藏  举报