使用 Perl 与 Tesseract 实现图像验证码识别

一、准备工作
安装 Perl
大多数 Linux 和 macOS 系统自带 Perl。如果需要更新版本,可参考:
更多内容访问ttocr.com或联系1436423940
sudo apt install perl # Ubuntu / Debian
brew install perl # macOS
安装 Tesseract OCR

sudo apt install tesseract-ocr

或 macOS

brew install tesseract
二、编写 Perl 脚本 captcha_ocr.pl

use strict;
use warnings;

sub file_exists {
my ($path) = @_;
return -e $path;
}

sub read_file {
my ($path) = @_;
open my $fh, '<', $path or return '';
local $/;
my $content = <$fh>;
close $fh;
return $content;
}

sub clean_text {
my ($text) = @_;
$text =~ s/[^A-Z0-9]//g;
return $text;
}

sub recognize_captcha {
my ($image_path) = @_;

unless (file_exists($image_path)) {
    print "文件不存在: $image_path\n";
    return;
}

my $output_base = "output_temp";
my $whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

my $cmd = "tesseract \"$image_path\" $output_base -l eng -c tessedit_char_whitelist=$whitelist";
system($cmd);

my $result_file = "$output_base.txt";
unless (file_exists($result_file)) {
    print "未生成识别文件\n";
    return;
}

my $raw = read_file($result_file);
my $cleaned = clean_text($raw);
print "识别结果: $cleaned\n";

unlink $result_file;

}
三、主函数入口

主函数处理参数

if (@ARGV != 1) {
print "用法: perl captcha_ocr.pl <图片路径>\n";
exit(1);
}

my $image_path = $ARGV[0];
recognize_captcha($image_path);
四、执行脚本
假设你有一个验证码图像 img1.png,执行命令如下:

perl captcha_ocr.pl ./img1.png
输出:

识别结果: B7KX
五、批量识别(可选)
若需要识别一个文件夹下的所有验证码图像,可添加如下函数:

sub batch_recognize {
my ($dir) = @_;
opendir(my $dh, $dir) or die "无法打开目录 $dir\n";
my @files = grep { /.png$/ } readdir($dh);
closedir $dh;

foreach my $file (@files) {
    my $full_path = "$dir/$file";
    print "识别中: $file\n";
    recognize_captcha($full_path);
}

}

posted @ 2025-06-28 20:16  ttocr、com  阅读(9)  评论(0)    收藏  举报