用 Rust 与 Tesseract 进行英文数字验证码识别

Rust 是一门注重性能与安全的系统编程语言，越来越多用于图像处理和自动化领域。本文将介绍如何使用 Rust 调用 Tesseract 引擎，实现简单的英文数字验证码识别。
更多内容访问ttocr.com或联系1436423940
一、开发准备

安装 Rust

在终端中运行以下命令安装：

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

或者访问官网 https://www.rust-lang.org

安装 Tesseract OCR

使用包管理器安装（如 Ubuntu）：

sudo apt install tesseract-ocr

或者从 https://github.com/tesseract-ocr/tesseract
下载源代码编译安装。

二、创建 Rust 项目

创建并进入项目目录：

cargo new rust_captcha_ocr
cd rust_captcha_ocr

编辑 Cargo.toml 添加依赖：

[dependencies]
leptess = "0.14"
image = "0.24"

三、准备验证码图片

放一张命名为 captcha.png 的英文数字验证码图片到项目根目录。

四、实现识别功能

编辑 src/main.rs：

use leptess::{LepTess, Variable};
use image::{DynamicImage, GenericImageView, GrayImage, ImageBuffer, Luma};
use std::path::Path;

fn convert_to_grayscale<P: AsRef>(input: P, output: P) {
let img = image::open(input).expect("无法打开图像");
let gray = img.to_luma8();
gray.save(output).expect("无法保存灰度图");
}

fn main() {
let input_path = "captcha.png";
let gray_path = "gray_captcha.png";

// 图像灰度预处理
convert_to_grayscale(input_path, gray_path);

// 创建 OCR 引擎实例
let mut lt = LepTess::new(None, "eng").expect("初始化 Tesseract 失败");

// 设置字符白名单（只识别字母和数字）
lt.set_variable(Variable::TesseditCharWhitelist, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789")
    .unwrap();

// 加载图像并识别
lt.set_image(Path::new(gray_path));
let text = lt.get_utf8_text().unwrap();

println!("识别结果: {}", text.trim());

}

五、运行项目

在项目根目录执行：

cargo run

输出示例：

识别结果: C7X8P

六、图像处理建议（可选）

为了提升识别率，可以在灰度化基础上继续处理图像，如：

二值化（手动设置阈值）

去噪（用 imageproc 处理小颗粒）

旋转校正（针对倾斜字符）

posted @ 2025-11-19 23:01 ttocr、com 阅读(10) 评论(0) 收藏举报

刷新页面返回顶部

用 Rust 与 Tesseract 进行英文数字验证码识别

公告