python本地图片文字识别
1、首先需要安装tesseract-ocr
https://digi.bib.uni-mannheim.de/tesseract/
2、安装python所需模块
pip install pytesseract
3、下载语言包
https://github.com/tesseract-ocr/tessdoc/blob/main/Data-Files.md
下载后复制到 Tesseract-OCR\tessdata 目录
测试代码:
#coding=utf-8 from PIL import Image import pytesseract #上面都是导包,只需要下面这一行就能实现图片文字识别,中文识别 text=pytesseract.image_to_string(Image.open('D:/workspace/pys/img/yingwen.jpg'),lang='eng') print(text) print("\n") print('='*100) print("\n") #chi_tra text=pytesseract.image_to_string(Image.open('D:/workspace/pys/img/hanzi.jpg'),lang='chi_sim') print(text) print("\n") print('='*100) print("\n") #chi_tra text=pytesseract.image_to_string(Image.open('D:/workspace/pys/img/hanzi.jpg'),lang='chi_tra') print(text)
运行效果:
遇到错误:
1、找不到Tesseract-OCR
\python38\Lib\site-packages\pytesseract\pytesseract.py
修改这行(指向之际安装Tesseract-OCR的文件):
tesseract_cmd = 'D:/software/Tesseract-OCR/tesseract.exe'
2、提示找不到相应的语言包,参考(下载语言包)
本文来自博客园,作者:河北大学-徐小波,转载请注明原文链接:https://www.cnblogs.com/xuxiaobo/p/17123528.html