python Tesseract-OCR、pytesseract 图片文字识别

1.安装 pytesseract

pip install pytesseract

2.安装 Tesseract-OCR
下载地址：https://tesseract-ocr.github.io/tessdoc/Home.html#binaries

　　2.1 根据系统选择相应的版本 (这里以windows 64位为例)

　　点击后跳转下载页：

　　2.2 安装：一直点 “NEXT” 直到安装完成 (记得安装目录后面要配置环境变量)。

3.配置环境变量

4.到此为止，已经可以识别文字了。

4.1 识别中文

但是只能识别英文，默认不支持中文，需要下载语言包 (语言包下载地址：https://tesseract-ocr.github.io/tessdoc/Data-Files )，放到 tessdata 文件夹下

　　然后配置 tessdata 环境变量：

　　最后在代码中设置识别中文语言：lang='chi_sim'

　　设置后，中英文都可以识别。

　　嗯，好像有些字识别错了...

　　完整代码：

from PIL import Image
import pytesseract
import os
os.chdir("G:\py\img")
img = Image.open('js.png')
# lang='chi_sim' 指定识别的语言为中文
text = pytesseract.image_to_string(img, lang='chi_sim')
print(text)

参考：https://blog.csdn.net/m0_62501574/article/details/122388612

　　　https://blog.csdn.net/ungoing/article/details/123579251

posted @ 2022-06-20 16:56 Evengod 阅读(556) 评论(0) 收藏举报

刷新页面返回顶部

Evengod

python Tesseract-OCR、pytesseract 图片文字识别

公告