用 Rust 和 Tesseract OCR 实现英文数字验证码识别

验证码识别作为图像处理和字符识别的典型应用，常见于自动化测试、数据采集等场景。本文将介绍如何使用 Rust 编写一个简单的 CLI 工具，结合 Tesseract OCR 引擎识别英文数字验证码。

一、环境准备

安装 Rust

访问 https://rust-lang.org
安装 Rust 工具链：

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

安装完成后，运行：

rustc --version
cargo --version

安装 Tesseract OCR

确保系统已安装 Tesseract：

Linux（Ubuntu/Debian）：

sudo apt install tesseract-ocr

macOS：

brew install tesseract

Windows：

从官网下载安装：https://github.com/tesseract-ocr/tesseract

二、创建项目并添加依赖

创建新项目：

cargo new captcha_ocr
cd captcha_ocr

编辑 Cargo.toml 添加依赖：

[dependencies]
image = "0.25.1"
tempfile = "3.9.0"

三、实现验证码识别程序

在 src/main.rs 中输入如下代码：

use std::process::Command;
use std::fs::File;
use image::{GenericImageView, GrayImage, ImageBuffer, Luma};
use std::path::Path;
use tempfile::tempdir;

fn convert_to_grayscale(input: &str, output: &str) -> Result<(), Box> {
let img = image::open(input)?;
let gray = img.to_luma8();
gray.save(output)?;
Ok(())
}

fn run_tesseract(image_path: &str) -> Result<String, Box> {
let output = Command::new("tesseract")
.arg(image_path)
.arg("stdout")
.arg("-l")
.arg("eng")
.arg("--psm")
.arg("7")
.output()?;

if !output.status.success() {
    return Err("Tesseract failed".into());
}

Ok(String::from_utf8_lossy(&output.stdout).trim().to_string())

}

fn main() -> Result<(), Box> {
let input_path = "captcha.png";
let temp_dir = tempdir()?;
let gray_path = temp_dir.path().join("gray_captcha.png");

println!("将图像转换为灰度...");
convert_to_grayscale(input_path, gray_path.to_str().unwrap())?;

println!("使用 Tesseract 识别验证码...");
let result = run_tesseract(gray_path.to_str().unwrap())?;

println!("识别结果: {}", result);

Ok(())

}

四、运行程序

将你的验证码图片命名为 captcha.png，放在项目根目录下，然后运行程序：

cargo run

示例输出：

将图像转换为灰度...
使用 Tesseract 识别验证码...
识别结果: H9Z7P

五、说明与拓展

image crate 用于图像加载与灰度转换；

tesseract 调用通过 Command 执行外部命令行；

使用 tempfile 创建临时目录，避免污染主目录；

--psm 7 告诉 Tesseract 只识别一行文字，适合大多数验证码；

你可以进一步添加图像二值化、噪点过滤、字符分割等图像预处理逻辑，以提升识别精度。

posted @ 2025-11-24 23:44 ttocr、com 阅读(18) 评论(0) 收藏举报

刷新页面返回顶部

用 Rust 和 Tesseract OCR 实现英文数字验证码识别

公告