一、使用pytesseract识别图片中的问题-11
1、安装pytesseract
- 目录:d:\python\lib\site-packages
C:\Users\jieqiong>pip install pytesseract Collecting pytesseract Downloading https://files.pythonhosted.org/packages/8b/0d/6efe2a9bddf1b1efe82a86fdd057f4affaeebd14347f32d03bbbbc45821c/pytesseract-0.3.9-py2.py3-none-any.whl pytesseract requires Python '>=3.7' but the running Python is 3.6.5 You are using pip version 9.0.3, however version 22.2.2 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command. C:\Users\jieqiong>python -m pip install --upgrade pip Collecting pip Downloading https://files.pythonhosted.org/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl (1.7MB) 100% |████████████████████████████████| 1.7MB 64kB/s Installing collected packages: pip Found existing installation: pip 9.0.3 Uninstalling pip-9.0.3: Successfully uninstalled pip-9.0.3 Successfully installed pip-21.3.1 You are using pip version 21.3.1, however version 22.2.2 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command. C:\Users\jieqiong>pip install pytesseract Collecting pytesseract Using cached pytesseract-0.3.9-py2.py3-none-any.whl (14 kB) Requirement already satisfied: Pillow>=8.0.0 in d:\python\lib\site-packages (from pytesseract) (8.4.0) Collecting packaging>=21.3 Downloading packaging-21.3-py3-none-any.whl (40 kB) |████████████████████████████████| 40 kB 653 kB/s Collecting pytesseract Downloading pytesseract-0.3.8.tar.gz (14 kB) Preparing metadata (setup.py) ... done Using legacy 'setup.py install' for pytesseract, since package 'wheel' is not installed. Installing collected packages: pytesseract Running setup.py install for pytesseract ... done Successfully installed pytesseract-0.3.8
2、运行报错
D:\imooc\selenium\read_image.py
# coding=utf-8 # 识别图片的包 import pytesseract # 取图片的包 from PIL import Image # 需要一个图片的对象,并打开图片 image = Image.open("D:/imooc/imooc2.jpg") # 运用包,将对象image转换成字符串 text = pytesseract.image_to_string(image) print(text)
PS D:\imooc\selenium> python .\read_image.py
TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
3、安装tesseract-ocr
Tesseract-OCR 安装、中文识别与训练字库 - 简书 (jianshu.com)
C:\Users\jieqiong>tesseract -v tesseract 4.00.00alpha leptonica-1.74.1 libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
4、修改后的代码
# coding=utf-8 # 识别图片的包 import pytesseract tesseract_cmd = 'D:\Python\Tesseract-OCR' # 取图片的包 from PIL import Image # 需要一个图片的对象,并打开图片 image = Image.open("D:/imooc/imooc2.jpg") # 运用包,将对象image转换成字符串 text = pytesseract.image_to_string(image) print(text)
PS D:\imooc\selenium> python .\read_image.py 0 6 6: 4 7.bmp 94-9 7 1 22.bmp l