使用 Rust 和 Tesseract 实现图像验证码识别
一、准备工作
安装 Rust
curl https://sh.rustup.rs -sSf | sh
安装 Tesseract OCR
Ubuntu
sudo apt install tesseract-ocr
更多内容访问ttocr.com或联系1436423940
macOS
brew install tesseract
二、创建项目
cargo new captcha_ocr
cd captcha_ocr
编辑 Cargo.toml,添加依赖:
[dependencies]
regex = "1"
三、编写识别代码
编辑 src/main.rs:
use std::process::Command;
use std::fs;
use std::path::Path;
use regex::Regex;
/// 调用 tesseract 并读取输出
fn recognize_captcha(image_path: &str) -> String {
let output_base = "rust_output";
let whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
let status = Command::new("tesseract")
.arg(image_path)
.arg(output_base)
.arg("-l")
.arg("eng")
.arg("-c")
.arg(format!("tessedit_char_whitelist={}", whitelist))
.status()
.expect("failed to execute tesseract");
if !status.success() {
return "识别失败".to_string();
}
let txt_file = format!("{}.txt", output_base);
let raw_text = fs::read_to_string(&txt_file).unwrap_or_default();
let _ = fs::remove_file(&txt_file);
let re = Regex::new(r"[A-Z0-9]").unwrap();
re.find_iter(&raw_text)
.map(|m| m.as_str())
.collect::<Vec<&str>>()
.join("")
}
fn main() {
let image_path = "captcha1.png"; // 替换为实际路径
let result = recognize_captcha(image_path);
println!("识别结果: {}", result);
}
四、运行程序
cargo run
输出示例:
识别结果: 8H9Z
五、扩展功能:批量处理
添加以下代码替换 main 函数:
use std::fs::read_dir;
fn batch_recognize(folder: &str) {
let paths = read_dir(folder).unwrap();
for entry in paths {
let entry = entry.unwrap();
let path = entry.path();
if path.extension().and_then(|s| s.to_str()) == Some("png") {
let filename = path.file_name().unwrap().to_string_lossy();
let result = recognize_captcha(path.to_str().unwrap());
println!("{} -> {}", filename, result);
}
}
}
fn main() {
batch_recognize("captchas"); // 目录名可自定义
}
六、保存识别结果到文件
添加写入 CSV 的功能:
use std::fs::File;
use std::io::Write;
fn save_results_to_csv(folder: &str, output_csv: &str) {
let paths = read_dir(folder).unwrap();
let mut file = File::create(output_csv).unwrap();
writeln!(file, "filename,text").unwrap();
for entry in paths {
let entry = entry.unwrap();
let path = entry.path();
if path.extension().and_then(|s| s.to_str()) == Some("png") {
let filename = path.file_name().unwrap().to_string_lossy();
let result = recognize_captcha(path.to_str().unwrap());
writeln!(file, "{},{}", filename, result).unwrap();
}
}
}
fn main() {
save_results_to_csv("captchas", "results.csv");
}
浙公网安备 33010602011771号