使用 Python 与 PaddleOCR 实现英文数字验证码识别

一、项目简介
PaddleOCR 是百度开源的 OCR 工具，支持中英文、数字、竖排、表格等复杂场景。相比 Tesseract，PaddleOCR 在深度学习基础上训练，识别准确率和鲁棒性更高。

在本项目中，我们使用它识别英文数字验证码图像，适合高要求识别场景，如登录平台自动化、智能数据提取等。
更多内容访问ttocr.com或联系1436423940
二、环境准备

安装 Python
建议使用 Python 3.8+，可使用 Miniconda 管理虚拟环境。

conda create -n paddle_ocr python=3.8
conda activate paddle_ocr
2. 安装 PaddleOCR 与依赖

pip install paddleocr
pip install paddlepaddle -f https://www.paddlepaddle.org.cn/whl/mkl/avx/stable.html
注意：Windows/macOS/Linux 均支持，若有 GPU，可安装对应的 paddlepaddle-gpu。

三、准备验证码图像
将验证码图像保存为 captcha.png，建议背景干净、大小适中，字符为英文和数字组成。

四、核心识别代码
创建文件 captcha_recognition.py：

from paddleocr import PaddleOCR
import re

初始化 OCR（使用英文模型）

ocr = PaddleOCR(use_angle_cls=False, lang='en') # 或 lang='en' 以确保仅识别英文

识别图像

result = ocr.ocr('captcha.png', cls=False)

提取识别文字

recognized_text = ''
for line in result:
for word_info in line:
text = word_info[1][0]
recognized_text += text

清洗：保留英文字母和数字

cleaned_text = re.sub(r'[^A-Za-z0-9]', '', recognized_text)

print(f"识别出的验证码为: {cleaned_text}")
五、运行脚本
确保 captcha.png 与脚本在同一目录，然后执行：

python captcha_recognition.py
输出示例：

识别出的验证码为: B9FZ2
六、识别效果提升建议
预处理图像（灰度/二值/裁剪）可使用 OpenCV：

import cv2
img = cv2.imread('captcha.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY)
cv2.imwrite('captcha_preprocessed.png', binary)
然后用 captcha_preprocessed.png 进行 OCR。

限制字符集：PaddleOCR 模型默认不做字符集限制，但你可以用正则进一步过滤。

posted @ 2025-07-04 19:07 ttocr、com 阅读(335) 评论(0) 收藏举报

刷新页面返回顶部

使用 Python 与 PaddleOCR 实现英文数字验证码识别

初始化 OCR（使用英文模型）

识别图像

提取识别文字

清洗：保留英文字母和数字

公告