使用 Perl 与 Tesseract 实现图像验证码识别
一、准备工作
安装 Perl
大多数 Linux 和 macOS 系统自带 Perl。如果需要更新版本,可参考:
更多内容访问ttocr.com或联系1436423940
sudo apt install perl # Ubuntu / Debian
brew install perl # macOS
安装 Tesseract OCR
sudo apt install tesseract-ocr
或 macOS
brew install tesseract
二、编写 Perl 脚本 captcha_ocr.pl
use strict;
use warnings;
sub file_exists {
my ($path) = @_;
return -e $path;
}
sub read_file {
my ($path) = @_;
open my $fh, '<', $path or return '';
local $/;
my $content = <$fh>;
close $fh;
return $content;
}
sub clean_text {
my ($text) = @_;
$text =~ s/[^A-Z0-9]//g;
return $text;
}
sub recognize_captcha {
my ($image_path) = @_;
unless (file_exists($image_path)) {
print "文件不存在: $image_path\n";
return;
}
my $output_base = "output_temp";
my $whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
my $cmd = "tesseract \"$image_path\" $output_base -l eng -c tessedit_char_whitelist=$whitelist";
system($cmd);
my $result_file = "$output_base.txt";
unless (file_exists($result_file)) {
print "未生成识别文件\n";
return;
}
my $raw = read_file($result_file);
my $cleaned = clean_text($raw);
print "识别结果: $cleaned\n";
unlink $result_file;
}
三、主函数入口
主函数处理参数
if (@ARGV != 1) {
print "用法: perl captcha_ocr.pl <图片路径>\n";
exit(1);
}
my $image_path = $ARGV[0];
recognize_captcha($image_path);
四、执行脚本
假设你有一个验证码图像 img1.png,执行命令如下:
perl captcha_ocr.pl ./img1.png
输出:
识别结果: B7KX
五、批量识别(可选)
若需要识别一个文件夹下的所有验证码图像,可添加如下函数:
sub batch_recognize {
my ($dir) = @_;
opendir(my $dh, $dir) or die "无法打开目录 $dir\n";
my @files = grep { /.png$/ } readdir($dh);
closedir $dh;
foreach my $file (@files) {
my $full_path = "$dir/$file";
print "识别中: $file\n";
recognize_captcha($full_path);
}
}
浙公网安备 33010602011771号