展开
拓展 关闭
订阅号推广码
GitHub
视频
公告栏 关闭

文字识别

案例1

  • 下载tesseract-ocr

  • 双击安装

  • 同意

  • 为所有人安装

  • 下一步

  • 指定安装目录

  • 开始安装

  • 完成

  • 配置环境变量

  • 配置如下

C:\Program Files (x86)\Tesseract-OCR
  • 验证
# 打开cmd测试
C:\Users\ychen>tesseract -v
tesseract 4.00.00alpha
 leptonica-1.74.1
  libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
  • 测试
# 文件夹中放1张图片,cmd进入改目录,执行如下命令,识别文字后在当前目录保存为result.txt
tesseract XXX.png result

案例2

  • 安装依赖
C:\Users\ychen\Downloads>pip install pytesseract
Collecting pytesseract
  Using cached https://mirrors.aliyun.com/pypi/packages/8b/0d/6efe2a9bddf1b1efe82a86fdd057f4affaeebd14347f32d03bbbbc45821c/pytesseract-0.3.9-py2.py3-none-any.whl
pytesseract requires Python '>=3.7' but the running Python is 3.6.3
You are using pip version 9.0.1, however version 24.0 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

C:\Users\ychen\Downloads>python -m pip install --upgrade pip
Cache entry deserialization failed, entry ignored
Collecting pip
  Downloading https://mirrors.aliyun.com/pypi/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl (1.7MB)
    100% |████████████████████████████████| 1.7MB 690kB/s
Installing collected packages: pip
  Found existing installation: pip 9.0.1
    Uninstalling pip-9.0.1:
      Successfully uninstalled pip-9.0.1
Successfully installed pip-21.3.1
You are using pip version 21.3.1, however version 24.0 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

C:\Users\ychen\Downloads>pip install pytesseract
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting pytesseract
  Using cached https://mirrors.aliyun.com/pypi/packages/8b/0d/6efe2a9bddf1b1efe82a86fdd057f4affaeebd14347f32d03bbbbc45821c/pytesseract-0.3.9-py2.py3-none-any.whl (14 kB)
Collecting packaging>=21.3
  Downloading https://mirrors.aliyun.com/pypi/packages/05/8e/8de486cbd03baba4deef4142bd643a3e7bbe954a784dc1bb17142572d127/packaging-21.3-py3-none-any.whl (40 kB)
     |████████████████████████████████| 40 kB 523 kB/s
Collecting Pillow>=8.0.0
  Downloading https://mirrors.aliyun.com/pypi/packages/8f/10/c8dc9fff37b69b5962b7783ab4835611e83dada453cd9913d82ca2a1321b/Pillow-8.4.0-cp36-cp36m-win_amd64.whl (3.2 MB)
     |████████████████████████████████| 3.2 MB 731 kB/s
Collecting pytesseract
  Downloading https://mirrors.aliyun.com/pypi/packages/a3/c9/d6e8903482bd6fb994c32722831d15842dd8b614f94ad9ca735807252671/pytesseract-0.3.8.tar.gz (14 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: Pillow in c:\programdata\anaconda3\lib\site-packages (from pytesseract) (4.2.1)
Requirement already satisfied: olefile in c:\programdata\anaconda3\lib\site-packages (from Pillow->pytesseract) (0.44)
Building wheels for collected packages: pytesseract
  Building wheel for pytesseract (setup.py) ... done
  Created wheel for pytesseract: filename=pytesseract-0.3.8-py2.py3-none-any.whl size=18780 sha256=b49587077ddccb20cbf67c10130b4c15f04fc585cbc36dcf53563d169d9df4de
  Stored in directory: c:\users\ychen\appdata\local\pip\cache\wheels\ab\76\70\c080b97e409de2fe41cf2d9ecb97f0629a66c7126eb7c9eb44
Successfully built pytesseract
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.8
  • 配置路径
# 使用编辑器打开如下文件
C:\ProgramData\Anaconda3\Lib\site-packages\pytesseract\pytesseract.py

# 配置路径如下
#tesseract_cmd = 'tesseract'
tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
  • 代码
from PIL import Image
import pytesseract
import cv2
import os

preprocess = 'blur' #thresh

image = cv2.imread('scan.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

if preprocess == "thresh":
    gray = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

if preprocess == "blur":
    gray = cv2.medianBlur(gray, 3)
    
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)
    
text = pytesseract.image_to_string(Image.open(filename))
print(text)
os.remove(filename)

cv2.imshow("Image", image)
cv2.imshow("Output", gray)
cv2.waitKey(0)   
  • 执行结果
点击查看详情
we owe oak wk ome owe ow wo Sk we %o %o %K

 

WHOLE FOODS MARKET - WESTPORT,.CT 06880
399 POST RD WEST - (203) 227-6858

64
365
365

365

BACULN LS
BACON LS
BACON LS
BACON iS
BRO TH CHIC

FLOUR ALMUNU
CHKN BRST BNLSS SK
HEAVY CREAM

BALSMC REDUCT

BEEF

GRND
JUICE COF CRSHEW

85/15

L.

DOCS PINT QORGAK IC
HNY ALMOND Bui TR

* x ## TAX

. 00

BAL

NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP

4 99
4.99
4.99
1 39
2.19
1.99
. 80
. 39
. 49

tl &

on

8.99

14.49

9.99
101.33

m

"Ti

m n m



posted @ 2024-02-28 14:13  DogLeftover  阅读(21)  评论(0)    收藏  举报