使用 Rust 与 Tesseract 进行图像验证码识别

一、准备工作
安装 Rust
更多内容访问ttocr.com或联系1436423940
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
安装 Tesseract

Ubuntu

sudo apt install tesseract-ocr

macOS

brew install tesseract
二、新建项目

cargo new captcha_ocr
cd captcha_ocr
编辑 Cargo.toml:

[dependencies]
regex = "1"
三、实现识别逻辑(src/main.rs)

use std::process::Command;
use std::fs;
use regex::Regex;

fn clean_text(s: &str) -> String {
let re = Regex::new(r"[A-Z0-9]").unwrap();
re.find_iter(s).map(|m| m.as_str()).collect()
}

fn recognize_captcha(image_path: &str) {
let output_base = "ocr_output";
let whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

let status = Command::new("tesseract")
    .args([
        image_path,
        output_base,
        "-l",
        "eng",
        "-c",
        &format!("tessedit_char_whitelist={}", whitelist),
    ])
    .status()
    .expect("failed to execute tesseract");

if !status.success() {
    println!("Tesseract 运行失败");
    return;
}

let txt_file = format!("{}.txt", output_base);
match fs::read_to_string(&txt_file) {
    Ok(content) => {
        let result = clean_text(&content);
        println!("识别结果: {}", result);
        let _ = fs::remove_file(txt_file);
    }
    Err(_) => println!("识别失败,未生成输出文件"),
}

}

fn main() {
let image_path = "captcha1.png"; // 替换为你的验证码图片
recognize_captcha(image_path);
}
四、运行程序
确保 captcha1.png 存在,编译并运行:

cargo run
示例输出:

识别结果: H8W4
五、功能扩展建议
遍历文件夹:

use std::fs::read_dir;

for entry in read_dir("captchas").unwrap() {
let path = entry.unwrap().path();
if path.extension().unwrap_or_default() == "png" {
recognize_captcha(path.to_str().unwrap());
}
}
输出为 CSV 文件:

use std::fs::OpenOptions;
use std::io::Write;

let mut file = OpenOptions::new()
.create(true)
.append(true)
.open("results.csv")
.unwrap();
writeln!(file, "{},{}", image_path, result).unwrap();

posted @ 2025-06-29 19:16  ttocr、com  阅读(10)  评论(0)    收藏  举报