关于python进行验证码识别解决办法

应用

Python-tesseract

安装

1、安装 pytesseract  

pip install pytesseract

2、安装 tesseract-ocr

2.1、ubuntu 系统 

sudo apt install tesseract-ocr

2.2、win 系统

1、选择适合的版本(32位 64位)
    tesseract的exe安装文件 https://github.com/UB-Mannheim/tesseract/wiki
2、配置环境变量(默认安装位置)
    PS:安装时可以将语言包选上(比如不选择的话,默认的只能解析英文)
    C:\Program Files\Tesseract-OCR
3、测试 [有正常返回未成功]
    tesseract
    tesseract -v
    tesseract  --list-langs  查看支持的语言

使用

1、基础使用黑白文字图片

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract
img = Image.open("图片路径")
img.show()
resp = pytesseract.image_to_string(img)
print(resp)

2、进阶使用文字图片带噪点[对图片进行灰度加二值化处理]

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract

def convert_img(img,threshold):
    """灰度加二极化"""
    img = img.convert("L")  # 处理灰度
    pixels = img.load()
    for x in range(img.width):
        for y in range(img.height):
            if pixels[x, y] > threshold:
                pixels[x, y] = 255
            else:
                pixels[x, y] = 0
    return img

img = Image.open("图片路径")
img = convert_img(img ,150)
img.show()
resp = pytesseract.image_to_string(img)
print(resp)

3、终极使用文字图片带噪点加条纹线[灰度加二极化加降噪]

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract

def convert_img(img, threshold):
    img = img.convert("L")  # 处理灰度
    pixels = img.load()
    for x in range(img.width):
        for y in range(img.height):
            if pixels[x, y] > threshold:
                pixels[x, y] = 255
            else:
                pixels[x, y] = 0
    data = img.getdata()
    w, h = img.size
    count = 0
    for x in range(1, h - 1):
        for y in range(1, h - 1):
            # 找出各个像素方向
            mid_pixel = data[w * y + x]
            if mid_pixel == 0:
                top_pixel = data[w * (y - 1) + x]
                left_pixel = data[w * y + (x - 1)]
                down_pixel = data[w * (y + 1) + x]
                right_pixel = data[w * y + (x + 1)]

                if top_pixel == 0:
                    count += 1
                if left_pixel == 0:
                    count += 1
                if down_pixel == 0:
                    count += 1
                if right_pixel == 0:
                    count += 1
                if count > 4:
                    img.putpixel((x, y), 0)
    return img
img = Image.open("图片路径")
img = convert_img(img,150)
img.show()
resp = pytesseract.image_to_string(img)
print(resp)

问题

报错:pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

是源码找不到安装的tesseract程序
点击 image_to_string 进入搜索 tesseract_cmd 关键字  将值修改为安装位置
例
    tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

 

posted @ 2019-10-23 12:26  争-渡  阅读(360)  评论(0)    收藏  举报