python爬虫环境配置
环境配置
python3/请求库/解析库/数据库/存储库/web库/app爬取库/爬虫框架库
-
python3
- win11下可以直接商店下载了(
- Linux下
apt-get install python3
-
请求库
-
requests
pip3 install requests -
selenium
pip install selenium -
chromeDriver
- 在关于查看chrome版本
- 在chromeDriver下载对应版本
- 将chromeDriver配置到环境变量
-
phantomJS新版selenium已经不支持phantomJS了,可以在chromedriver里面直接使用
验证:
from selenium import webdriver from selenium.webdriver.chrome.options import Options chrome_options = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') driver = webdriver.Chrome(options=chrome_options) driver.get("https://dreaife.icu/") print(driver.current_url) -
aiohttp
pip install aiodns
-
-
解析库
-
lxml
pip install lxml -
beautifulsoup4
pip install beautifulsoup4 -
pyquery
pip install pyquery -
tesserocr
-
安装tesseract
-
安装tesserocr
windows使用
pip install <name>.whl安装 -
验证

import tesserocr from PIL import Image image = Image.open('G:/codeS/backOnGithub/Jupyter/spider/image.png') print(tesserocr.image_to_text(image))注意:如果出现
File "tesserocr.pyx", line 2580, in tesserocr._tesserocr.image_to_text
RuntimeError: Failed to init API, possibly an invalid tessdata path错误,需要先将tesseract的test_data放到错误文件夹下
-
-
-
数据库
- MySQL
- MongoDB
- Redis
-
存储库
-
PyMySQL
pip install pymysql -
PyMongo
pip install pymongo -
redis-py
pip install redis -
RedisDump
安装ruby
gem install redis-dump
-
-
web库
-
Flask
pip install flask -
Tornado
pip install tornado
-
-
app爬取库
-
charles
-
mitmproxy
pip install mitmproxy -
appium
-
-
爬虫框架
-
pyspider
pip install pyspider如果win11无法运行可以看我这篇
-
scrapy
-
scrapy-splash
-
scrapy-redis
-

浙公网安备 33010602011771号