使用 Rust 与 Tesseract 进行图像验证码识别
一、准备工作
安装 Rust
更多内容访问ttocr.com或联系1436423940
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
安装 Tesseract
Ubuntu
sudo apt install tesseract-ocr
macOS
brew install tesseract
二、新建项目
cargo new captcha_ocr
cd captcha_ocr
编辑 Cargo.toml:
[dependencies]
regex = "1"
三、实现识别逻辑(src/main.rs)
use std::process::Command;
use std::fs;
use regex::Regex;
fn clean_text(s: &str) -> String {
let re = Regex::new(r"[A-Z0-9]").unwrap();
re.find_iter(s).map(|m| m.as_str()).collect()
}
fn recognize_captcha(image_path: &str) {
let output_base = "ocr_output";
let whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
let status = Command::new("tesseract")
.args([
image_path,
output_base,
"-l",
"eng",
"-c",
&format!("tessedit_char_whitelist={}", whitelist),
])
.status()
.expect("failed to execute tesseract");
if !status.success() {
println!("Tesseract 运行失败");
return;
}
let txt_file = format!("{}.txt", output_base);
match fs::read_to_string(&txt_file) {
Ok(content) => {
let result = clean_text(&content);
println!("识别结果: {}", result);
let _ = fs::remove_file(txt_file);
}
Err(_) => println!("识别失败,未生成输出文件"),
}
}
fn main() {
let image_path = "captcha1.png"; // 替换为你的验证码图片
recognize_captcha(image_path);
}
四、运行程序
确保 captcha1.png 存在,编译并运行:
cargo run
示例输出:
识别结果: H8W4
五、功能扩展建议
遍历文件夹:
use std::fs::read_dir;
for entry in read_dir("captchas").unwrap() {
let path = entry.unwrap().path();
if path.extension().unwrap_or_default() == "png" {
recognize_captcha(path.to_str().unwrap());
}
}
输出为 CSV 文件:
use std::fs::OpenOptions;
use std::io::Write;
let mut file = OpenOptions::new()
.create(true)
.append(true)
.open("results.csv")
.unwrap();
writeln!(file, "{},{}", image_path, result).unwrap();
浙公网安备 33010602011771号