ddddocr: 对图片处理提升识别率

一,识别有误

dataurl:

data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAPAAAABQCAMAAAAQlwhOAAAA81BMVEUAAAB/TiCdbD6GVSdoNwnPnnB7ShyDUiTDkmR5SBrVpHbMm22FVCa5iFrUo3WUYzW9jF5yQRPDkmS/jmCFVCZyQRNnNghvPhCVZDZfLgCbajy8i116SRundkjXpni7ilx2RRdlNAbRoHJuPQ/OnW+5iFrbqnyvflCFVCaIVynWpXfFlGaCUSOPXjB2RRdkMwV8Sx3JmGqcaz2kc0XWpXeMWy3ZqHrOnW+xgFJxQBJsOw1qOQuHViiATyG4h1m8i12ndkiUYzV2RRe7ilyHViiMWy2EUyWSYTPQn3GUYzXIl2lpOAraqXuNXC51RBbQn3F8Sx3mH0b2AAAAAXRSTlMAQObYZgAABnpJREFUeJzkm+lOIzkQx11CZEbiEEggAmHEh1E4IsQlQFGiwHAI5kMP7/84K9Lddl0++grH1q5GuGOX6+e/XbY7YD6JXX90AAu26+tvQnyRWjGZd612LIuwiwsf8WZNj2trn5zY83xzszYxLtzXdPIBFuQ9T3Ryf8+JR/Uj+kA7P08mZuXR6IsS125Zjfdrjk59+6rzob45XjAQqLe1kGiwhaLRbVbJO8zNh721pRP/qBxVWjDzcCo2m83SiaEEzvt5lTU8vD/CxFXGvIjEjn27Ct/QPoiZ11eFWLcIb+KYj8o4cCSpISTZzY0jBjSZ/QrXs0Te0QjksLdrTGFTdKD25V3YrdmIw9qAujDMqAEDHJdRdBkBTpodKOw6o97l4II5Pj6ulzYj3boC2yE6nFJivXCs+cfHoQ2L2TTep8sVKAYD3hhaNDlVFf5yTWlh/OIPptPI7U1ZpcD17jBjqILK+NS677y/BLG8vbEOkcKeRNmpwnwwRd9uJmuJS/B67+e93AebM+XDQAitmpynymDbsjdxsVXBPpl/2uv13JjleTjDCptoJy2Yb8+lFSKLi6zuWV60nOgQ0aPewWRZpgjcxZkj5JznLLZnKQOEq8xmM9AMO7eFl0wBS94M6lgCL1u1SgNKofOCBvPy8qJJSbo4bAYonYuxjOVLtYzcCFLxqWv6op0fyQAfHrZLrGesMKBI6mydA/vPcIVD3a90rLC6IsNJW0niRlEYVUC7mjZ8uLyystLlocOzgiNPiJp5cFJgrTpMooNpVjq/IymPlCl7qrcqFAMusF57MploCXJhJ46EFF0+Oj1ViW0G5wprlQuFpfPwg1YtmKLH+NmpHlQx/wKrjl6BZDVxEV2swph3PPbGUD6BwJ2iqGn4oUP35MpdvlfRlrDrbxyNCqIKg7zkKgmCluTQKc1q2bubLd45cT0s6hUxjykHyjhBhWmPMgZWEvs0ul405Ab6ah0fibP3f4bDIY6KTnJ6aQwpTHqMK4wbOy/r/GEtA9iSrzbef8qynJiGPWY1QT8wGurRX9QVtj/jiNbX1+3P9ZHpHAHoO6EyJapTFAsop4+If/VUym/Rbv6ysViff1wc62oSE5XAQL/f98Q9jwrtxVAeN1BZaYcAH008S4NL/ibwNRpA7Yltl2Lxo84rFeYJRFcYV3l8zIlDCrtYYkDAzzqpZleiOyF66wWf+BUuLUFhgFK7OAmeClVMvIMxf33+wa5deQCOK4z6Y3UM5TWhyWxt4saHPB9HW6L9Jef9qxLbJOVmnDe7kiZKb95K6d+QTiaT0iEhdvum34Bd3z28NhYA9gUf7/cIsYXOccawcUtjzW2CPaJWYV7wBkK9lf/TnIKQsU5HR0fOpeAVCpNdogoyoUhtFfIPVjcgCqNAOXTOd+RhK6lYBPgFfwT3yUvhy+q7gsmTWQpRqcJGqiG5MZnCC8Bvi+C2mIhOT0+cOHN+1Ra7u5yYncsB7U5EYVTdbWFG/HoCUDJdYTZPio5SVrDgzbJIh7uKFy4MURjbwIVLvPN9jaBo/ZFvHvBlq/LyzTTvSaYozG0wGBg3s0FsShOcLkNy0UxHE2BatAHHla/OgaoDgxYh8wrz13PoVB954wPYkW3QjLf0birucFGXIvMUCuMJVVHhlkKb2x1OTG05FZOe/v5PKdeyv7m7brWIu/3Oe3fXssKa8Qme2/Kylxjsbzu0GNb29pwY99Kec7NPSkBmc7k0fbz2BJAs71VSre00Z7Vsf3/fvNmSpnAERrnplPZPPLm6UoiHex7XO6F+S7tNqYRt37y9OWL63sIeiiOTysP7TyGW1YZ7ezrxzk4C8e1tZWKDFDb8POayryJkbNeVvKotWOGAuS215AZ7i+wiey617bCRWYXRWVRS9xr0sLT0uYhTFM5/mylkvwOffTLe0oKTOcr7O0T8He3/xvvt7KFZ835bcSzKHh4aEff7fuLHJo6NMRuNWr/5Pkjm1f+MKMD72Ix4Y6MJcX4YHTTwkH+b/adCi4YKnzVXeP6qqLbNef9UIW5kZ2dnzZ004c1tYbzGtMDbsR18dAARe27Z38FBbeLVdiPR7fk5nTjp1qcrfJfQcHW1KrH2lj9qFXhT7vW65S/2YpbIe2mKP8sS3+NE7KRS7WSFVUvhTbTLy0tjptNpZYVPTioTO/tZv2ltKy90VuGq1oT35+KJ41f2Lu0jFf5K9l8AAAD//4N+SSTpkEOFAAAAAElFTkSuQmCC

图片如下:

image

识别结果:

$ python3 ocr1.py 
欢迎使用ddddocr,本项目专注带动行业内卷,个人博客:wenanzhe.com
训练数据支持来源于:http://146.56.204.113:19199/preview
爬虫框架feapder可快速一键接入,快速开启爬虫之旅:https://github.com/Boris-code/feapder
谷歌reCaptcha验证码 / hCaptcha验证码 / funCaptcha验证码商业级识别接口:https://yescaptcha.com/i/NSwk7i
识别结果: 00470

二,处理:

为了提高识别比率,对图片做一定的处理:转为灰度+去噪

代码:

import base64
from ddddocr import DdddOcr
import cv2
import numpy as np
from PIL import Image
import io
from PIL import Image, ImageFilter
from io import BytesIO


ocr = DdddOcr(det=False, ocr=True)

data_url = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAPAAAABQCAMAAAAQlwhOAAAA81BMVEUAAAB/TiCdbD6GVSdoNwnPnnB7ShyDUiTDkmR5SBrVpHbMm22FVCa5iFrUo3WUYzW9jF5yQRPDkmS/jmCFVCZyQRNnNghvPhCVZDZfLgCbajy8i116SRundkjXpni7ilx2RRdlNAbRoHJuPQ/OnW+5iFrbqnyvflCFVCaIVynWpXfFlGaCUSOPXjB2RRdkMwV8Sx3JmGqcaz2kc0XWpXeMWy3ZqHrOnW+xgFJxQBJsOw1qOQuHViiATyG4h1m8i12ndkiUYzV2RRe7ilyHViiMWy2EUyWSYTPQn3GUYzXIl2lpOAraqXuNXC51RBbQn3F8Sx3mH0b2AAAAAXRSTlMAQObYZgAABnpJREFUeJzkm+lOIzkQx11CZEbiEEggAmHEh1E4IsQlQFGiwHAI5kMP7/84K9Lddl0++grH1q5GuGOX6+e/XbY7YD6JXX90AAu26+tvQnyRWjGZd612LIuwiwsf8WZNj2trn5zY83xzszYxLtzXdPIBFuQ9T3Ryf8+JR/Uj+kA7P08mZuXR6IsS125Zjfdrjk59+6rzob45XjAQqLe1kGiwhaLRbVbJO8zNh721pRP/qBxVWjDzcCo2m83SiaEEzvt5lTU8vD/CxFXGvIjEjn27Ct/QPoiZ11eFWLcIb+KYj8o4cCSpISTZzY0jBjSZ/QrXs0Te0QjksLdrTGFTdKD25V3YrdmIw9qAujDMqAEDHJdRdBkBTpodKOw6o97l4II5Pj6ulzYj3boC2yE6nFJivXCs+cfHoQ2L2TTep8sVKAYD3hhaNDlVFf5yTWlh/OIPptPI7U1ZpcD17jBjqILK+NS677y/BLG8vbEOkcKeRNmpwnwwRd9uJmuJS/B67+e93AebM+XDQAitmpynymDbsjdxsVXBPpl/2uv13JjleTjDCptoJy2Yb8+lFSKLi6zuWV60nOgQ0aPewWRZpgjcxZkj5JznLLZnKQOEq8xmM9AMO7eFl0wBS94M6lgCL1u1SgNKofOCBvPy8qJJSbo4bAYonYuxjOVLtYzcCFLxqWv6op0fyQAfHrZLrGesMKBI6mydA/vPcIVD3a90rLC6IsNJW0niRlEYVUC7mjZ8uLyystLlocOzgiNPiJp5cFJgrTpMooNpVjq/IymPlCl7qrcqFAMusF57MploCXJhJ46EFF0+Oj1ViW0G5wprlQuFpfPwg1YtmKLH+NmpHlQx/wKrjl6BZDVxEV2swph3PPbGUD6BwJ2iqGn4oUP35MpdvlfRlrDrbxyNCqIKg7zkKgmCluTQKc1q2bubLd45cT0s6hUxjykHyjhBhWmPMgZWEvs0ul405Ab6ah0fibP3f4bDIY6KTnJ6aQwpTHqMK4wbOy/r/GEtA9iSrzbef8qynJiGPWY1QT8wGurRX9QVtj/jiNbX1+3P9ZHpHAHoO6EyJapTFAsop4+If/VUym/Rbv6ysViff1wc62oSE5XAQL/f98Q9jwrtxVAeN1BZaYcAH008S4NL/ibwNRpA7Yltl2Lxo84rFeYJRFcYV3l8zIlDCrtYYkDAzzqpZleiOyF66wWf+BUuLUFhgFK7OAmeClVMvIMxf33+wa5deQCOK4z6Y3UM5TWhyWxt4saHPB9HW6L9Jef9qxLbJOVmnDe7kiZKb95K6d+QTiaT0iEhdvum34Bd3z28NhYA9gUf7/cIsYXOccawcUtjzW2CPaJWYV7wBkK9lf/TnIKQsU5HR0fOpeAVCpNdogoyoUhtFfIPVjcgCqNAOXTOd+RhK6lYBPgFfwT3yUvhy+q7gsmTWQpRqcJGqiG5MZnCC8Bvi+C2mIhOT0+cOHN+1Ra7u5yYncsB7U5EYVTdbWFG/HoCUDJdYTZPio5SVrDgzbJIh7uKFy4MURjbwIVLvPN9jaBo/ZFvHvBlq/LyzTTvSaYozG0wGBg3s0FsShOcLkNy0UxHE2BatAHHla/OgaoDgxYh8wrz13PoVB954wPYkW3QjLf0birucFGXIvMUCuMJVVHhlkKb2x1OTG05FZOe/v5PKdeyv7m7brWIu/3Oe3fXssKa8Qme2/Kylxjsbzu0GNb29pwY99Kec7NPSkBmc7k0fbz2BJAs71VSre00Z7Vsf3/fvNmSpnAERrnplPZPPLm6UoiHex7XO6F+S7tNqYRt37y9OWL63sIeiiOTysP7TyGW1YZ7ezrxzk4C8e1tZWKDFDb8POayryJkbNeVvKotWOGAuS215AZ7i+wiey617bCRWYXRWVRS9xr0sLT0uYhTFM5/mylkvwOffTLe0oKTOcr7O0T8He3/xvvt7KFZ835bcSzKHh4aEff7fuLHJo6NMRuNWr/5Pkjm1f+MKMD72Ix4Y6MJcX4YHTTwkH+b/adCi4YKnzVXeP6qqLbNef9UIW5kZ2dnzZ004c1tYbzGtMDbsR18dAARe27Z38FBbeLVdiPR7fk5nTjp1qcrfJfQcHW1KrH2lj9qFXhT7vW65S/2YpbIe2mKP8sS3+NE7KRS7WSFVUvhTbTLy0tjptNpZYVPTioTO/tZv2ltKy90VuGq1oT35+KJ41f2Lu0jFf5K9l8AAAD//4N+SSTpkEOFAAAAAElFTkSuQmCC"

def data_url_to_image(data_url):
    # 解析URL字符串
    mediatype, data = data_url.split(',',1)
    encoding = mediatype.split(';')[1] if ';' in  mediatype  else '' 

    # 解码数据
    if encoding =='base64':
        data = base64.b64decode(data)

    # 创建Image对象
    image = Image.open(BytesIO(data))

    return image

image = data_url_to_image(data_url)
image.save('x1.png')

# 打开图像并进行处理
image = Image.open('x1.png')
# 转换为灰度图像
image = image.convert('L')
# 去噪
image = image.filter(ImageFilter.MedianFilter())
# 保存处理后的图像
image.save('x1_opted.png')

with open('x1_opted.png', 'rb') as f:
    img_bytes = f.read()
    result = ocr.classification(img_bytes)
    print(f"识别结果:{result}")  # 处理后的

with open('x1.png', 'rb') as f:
    img_bytes = f.read()
    result = ocr.classification(img_bytes)
    print(f"识别结果:{result}")  # 原图的输出

结果 :

$ python3 ocr1.py 
欢迎使用ddddocr,本项目专注带动行业内卷,个人博客:wenanzhe.com
训练数据支持来源于:http://146.56.204.113:19199/preview
爬虫框架feapder可快速一键接入,快速开启爬虫之旅:https://github.com/Boris-code/feapder
谷歌reCaptcha验证码 / hCaptcha验证码 / funCaptcha验证码商业级识别接口:https://yescaptcha.com/i/NSwk7i
识别结果:004707
识别结果:00470

原图:

image

处理后图:

image

posted @ 2025-11-22 22:27  刘宏缔的架构森林  阅读(20)  评论(0)    收藏  举报