python本地图片文字识别

1、首先需要安装tesseract-ocr

https://digi.bib.uni-mannheim.de/tesseract/

 

2、安装python所需模块

pip install pytesseract

 

3、下载语言包

https://github.com/tesseract-ocr/tessdoc/blob/main/Data-Files.md

 

 

 

 

下载后复制到 Tesseract-OCR\tessdata 目录

 

测试代码:

#coding=utf-8

from PIL import Image
import pytesseract
#上面都是导包,只需要下面这一行就能实现图片文字识别,中文识别
text=pytesseract.image_to_string(Image.open('D:/workspace/pys/img/yingwen.jpg'),lang='eng')
print(text)

print("\n")
print('='*100)
print("\n")

#chi_tra
text=pytesseract.image_to_string(Image.open('D:/workspace/pys/img/hanzi.jpg'),lang='chi_sim')
print(text)

print("\n")
print('='*100)
print("\n")

#chi_tra
text=pytesseract.image_to_string(Image.open('D:/workspace/pys/img/hanzi.jpg'),lang='chi_tra')
print(text)

 

运行效果:

 

 

遇到错误:

1、找不到Tesseract-OCR

\python38\Lib\site-packages\pytesseract\pytesseract.py

修改这行(指向之际安装Tesseract-OCR的文件):

tesseract_cmd = 'D:/software/Tesseract-OCR/tesseract.exe'

  

 

 

 2、提示找不到相应的语言包,参考(下载语言包)

 

posted @ 2023-02-15 16:22  河北大学-徐小波  阅读(150)  评论(0编辑  收藏  举报