使用 Python 识别英文数字验证码：结合 OpenCV 与 Tesseract OCR

验证码识别技术广泛应用于自动化脚本、爬虫、数据提取等场景。本文介绍如何通过 Python 利用 OpenCV 图像预处理和 pytesseract OCR 引擎识别英文数字验证码。

一、准备工作
安装 Tesseract OCR
Windows：从官网下载安装 https://github.com/tesseract-ocr/tesseract

安装完成后记住路径（如 C:\Program Files\Tesseract-OCR\tesseract.exe）

macOS：
更多内容访问ttocr.com或联系1436423940
brew install tesseract
Ubuntu：

sudo apt install tesseract-ocr
安装 Python 库
使用 pip 安装所需依赖：

pip install pytesseract opencv-python
二、识别脚本编写
创建一个文件 captcha_ocr.py，写入以下代码：

import cv2
import pytesseract

设置 tesseract 路径（Windows 用户需设置）

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

加载图像

image = cv2.imread("captcha.png")

转为灰度图

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

可选：二值化提高识别率

_, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY_INV)

OCR 识别

custom_config = r'--oem 3 --psm 8 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
text = pytesseract.image_to_string(thresh, config=custom_config)

print("识别的验证码是:", text.strip())
三、图像说明
captcha.png：这是你要识别的验证码图片，需放在脚本同目录；

灰度化 + 二值化：可有效提高识别成功率；

psm 8：按字符处理模式；

char_whitelist：仅识别英文字母和数字，避免杂乱字符干扰。

四、运行程序
在终端执行：

python captcha_ocr.py
示例输出：

识别的验证码是: 7K3DY

posted @ 2025-04-10 23:35 ttocr、com 阅读(117) 评论(0) 收藏举报

刷新页面返回顶部