用 Rust 结合 Tesseract OCR 进行验证码识别

环境准备
1.1 安装 Rust
Rust 可以通过官方工具 rustup 安装：

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
安装完成后，检查版本：

rustc --version
1.2 安装 Tesseract OCR
在 Linux（Ubuntu）上：

sudo apt update
sudo apt install tesseract-ocr libtesseract-dev
在 macOS 上：

brew install tesseract
在 Windows 上：

从 Tesseract GitHub 下载并安装。

安装完成后，检查版本：

tesseract --version
1.3 创建 Rust 项目
创建新的 Rust 项目：

cargo new rust_ocr
cd rust_ocr
1.4 添加依赖
在 Cargo.toml 中添加 tesseract 和 image 依赖：

[dependencies]
tesseract = "0.16.1"
image = "0.24.6"
然后运行：

cargo build
2. 代码实现
在 src/main.rs 文件中编写如下代码：

use image::{DynamicImage, GrayImage, Luma, imageops};
use std::path::Path;
use tesseract::Tesseract;

// 预处理图片（灰度化 & 二值化）
fn preprocess_image(image_path: &str) -> GrayImage {
let img = image::open(image_path).expect("无法打开图片");

// 转换为灰度图像
let gray_img = img.to_luma8();

// 二值化处理，提高对比度
let threshold = 128;
let binary_img = imageops::map_pixels(&gray_img, |_, _, pixel| {
    if pixel[0] > threshold {
        Luma([255]) // 白色
    } else {
        Luma([0]) // 黑色
    }
});

binary_img

}

// OCR 识别
fn recognize_captcha(image_path: &str) {
let processed_img = preprocess_image(image_path);

// 保存预处理后的图片（可选）
let output_path = "processed_captcha.png";
processed_img.save(output_path).expect("无法保存预处理图像");

// 进行 OCR 识别
let text = Tesseract::new(None, "eng")
    .unwrap()
    .set_image(output_path)
    .recognize()
    .unwrap();

println!("识别出的验证码: {}", text.trim());

}

fn main() {
let image_path = "captcha.png"; // 确保该路径下有验证码图片
recognize_captcha(image_path);
}
3. 代码解析
3.1 预处理验证码

fn preprocess_image(image_path: &str) -> GrayImage
转换为灰度图像

二值化处理（去除背景干扰，提高识别率）

3.2 OCR 解析

let text = Tesseract::new(None, "eng")
调用 Tesseract 进行 OCR 识别

输出识别结果

运行程序
确保 captcha.png 存在，然后运行：
更多内容访问ttocr.com或联系1436423940
cargo run
程序会加载验证码图片，进行处理，并输出识别出的文本。
提高 OCR 识别率
优化 Tesseract 配置

let text = Tesseract::new(None, "eng")
.unwrap()
.set_page_seg_mode(tesseract::PageSegMode::SingleLine)
.set_variable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
.unwrap()
.set_image(output_path)
.recognize()
.unwrap();
去除噪点（可以使用 OpenCV-rust 进行更高级的图像处理）

posted @ 2025-03-26 22:40 ttocr、com 阅读(50) 评论(0) 收藏举报

刷新页面返回顶部

用 Rust 结合 Tesseract OCR 进行验证码识别

公告