应用
安装
1、安装 pytesseract
2、安装 tesseract-ocr
2.1、ubuntu 系统
sudo apt install tesseract-ocr
2.2、win 系统
1、选择适合的版本(32位 64位)
tesseract的exe安装文件 https://github.com/UB-Mannheim/tesseract/wiki
2、配置环境变量(默认安装位置)
PS:安装时可以将语言包选上(比如不选择的话,默认的只能解析英文)
C:\Program Files\Tesseract-OCR
3、测试 [有正常返回未成功]
tesseract
tesseract -v
tesseract --list-langs 查看支持的语言
使用
1、基础使用黑白文字图片
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
img = Image.open("图片路径")
img.show()
resp = pytesseract.image_to_string(img)
print(resp)
2、进阶使用文字图片带噪点[对图片进行灰度加二值化处理]
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
def convert_img(img,threshold):
"""灰度加二极化"""
img = img.convert("L") # 处理灰度
pixels = img.load()
for x in range(img.width):
for y in range(img.height):
if pixels[x, y] > threshold:
pixels[x, y] = 255
else:
pixels[x, y] = 0
return img
img = Image.open("图片路径")
img = convert_img(img ,150)
img.show()
resp = pytesseract.image_to_string(img)
print(resp)
3、终极使用文字图片带噪点加条纹线[灰度加二极化加降噪]
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
def convert_img(img, threshold):
img = img.convert("L") # 处理灰度
pixels = img.load()
for x in range(img.width):
for y in range(img.height):
if pixels[x, y] > threshold:
pixels[x, y] = 255
else:
pixels[x, y] = 0
data = img.getdata()
w, h = img.size
count = 0
for x in range(1, h - 1):
for y in range(1, h - 1):
# 找出各个像素方向
mid_pixel = data[w * y + x]
if mid_pixel == 0:
top_pixel = data[w * (y - 1) + x]
left_pixel = data[w * y + (x - 1)]
down_pixel = data[w * (y + 1) + x]
right_pixel = data[w * y + (x + 1)]
if top_pixel == 0:
count += 1
if left_pixel == 0:
count += 1
if down_pixel == 0:
count += 1
if right_pixel == 0:
count += 1
if count > 4:
img.putpixel((x, y), 0)
return img
img = Image.open("图片路径")
img = convert_img(img,150)
img.show()
resp = pytesseract.image_to_string(img)
print(resp)
问题
报错:pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path
是源码找不到安装的tesseract程序
点击 image_to_string 进入搜索 tesseract_cmd 关键字 将值修改为安装位置
例
tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'