使用 Node.js 实现验证码图像识别（集成 Tesseract OCR）

一、项目概览
在本项目中，我们将用 Node.js 调用 Tesseract OCR 引擎识别验证码图像中的字符。该方法适合用于命令行自动化脚本、测试工具或后台服务中处理验证码图像任务。

二、依赖环境准备

安装 Tesseract 引擎
macOS
更多内容访问ttocr.com或联系1436423940
brew install tesseract
Ubuntu

sudo apt install tesseract-ocr
Windows
下载地址：https://github.com/tesseract-ocr/tesseract/releases
安装后添加 Tesseract 路径到系统环境变量。

创建 Node.js 项目

mkdir node-captcha-ocr
cd node-captcha-ocr
npm init -y
3. 安装 tesseract.js

npm install tesseract.js
三、验证码识别代码
创建文件 index.js，内容如下：

const Tesseract = require('tesseract.js');
const path = require('path');

const imagePath = path.join(__dirname, 'captcha_sample.png');

Tesseract.recognize(
imagePath,
'eng',
{
tessedit_char_whitelist: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', // 可选：限制字符
logger: m => console.log(m.status, m.progress) // 进度日志
}
).then(({ data: { text } }) => {
console.log('识别结果:', text.trim());
}).catch(err => {
console.error('识别失败:', err);
});
四、效果示例
图像文件：captcha_sample.png
图片内容为：K3T2R
运行输出：

recognizing text 0.5
recognizing text 0.8
recognizing text 1
识别结果: K3T2R
五、优化建议
使用图像工具（如 Sharp、Jimp）预处理验证码图像；

控制验证码字体样式，提高可识别性；

设置字符白名单，避免误识别干扰字符；

识别结果可结合正则规则进一步清洗。

posted @ 2025-06-20 18:02 ttocr、com 阅读(137) 评论(0) 收藏举报

刷新页面返回顶部

使用 Node.js 实现验证码图像识别（集成 Tesseract OCR）

公告