python:crawl4ai安装
一,项目 地址:
https://github.com/unclecode/crawl4ai
二,通过pip安装:
$ mkdir crawl4ai
$ cd crawl4ai/
$ python3 -m venv venv
$ source venv/bin/activate
(venv) liuhongdi@liuhongdi-pc:/data/python/crawl4ai$ pip install -U crawl4ai
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
执行安装命令:
(venv) liuhongdi@liuhongdi-pc:/data/python/crawl4ai$ crawl4ai-setup
三,测试效果:
import asyncio
from crawl4ai import *
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
# url="https://movie.douban.com/explore?support_type=movie&is_all=false&category=%E7%83%AD%E9%97%A8&type=%E5%85%A8%E9%83%A8",
url="https://baidu.com",
# js_code="window.scrollTo(0, document.body.scrollHeight);",
timeout=6000, # 6秒超时
# wait_for="document.querySelector('.drc-subject-card')",
# wait_for="css:.drc-subject-card"
)
print(result.markdown)
html_content = result.model_dump_json()
print(html_content)
if __name__ == "__main__":
asyncio.run(main())
浙公网安备 33010602011771号