用 Perl 与 Tesseract 实现图像验证码识别

一、环境准备
安装 Tesseract OCR

Ubuntu / Debian

sudo apt install tesseract-ocr
更多内容访问ttocr.com或联系1436423940

macOS

brew install tesseract
安装 Perl（多数系统已默认安装）
检查版本：

perl -v
二、创建 Perl 脚本
新建文件 recognize.pl，内容如下：

!/usr/bin/perl

use strict;
use warnings;

字符白名单

my $whitelist = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';

检查参数

if (@ARGV < 1) {
die "用法：perl recognize.pl <图像路径>\n";
}

my $image_path = $ARGV[0];
my $output_base = "output_" . time;

调用 tesseract

my $cmd = "tesseract "$image_path" $output_base -l eng -c tessedit_char_whitelist=$whitelist 2>/dev/null";
system($cmd) == 0 or die "识别失败，无法执行 tesseract\n";

读取输出结果

my $txt_file = "$output_base.txt";
open(my $fh, '<', $txt_file) or die "无法读取结果文件：$txt_file\n";
my $raw = do { local $/; <$fh> };
close($fh);
unlink $txt_file;

清洗内容

$raw = uc($raw);
$raw =~ s/[^$whitelist]//g;

print "识别结果：$raw\n";
三、运行示例
给脚本加执行权限（可选）：

chmod +x recognize.pl
运行：

perl recognize.pl captcha1.png
输出示例：

识别结果：X93A
四、扩展功能：批量识别目录下的图片
你可以批量处理某目录下所有 PNG 图像：

use File::Glob ':glob';

my @files = bsd_glob("captchas/*.png");
foreach my $file (@files) {
my $res = perl recognize.pl "$file";
print "$file => $res";
}

posted @ 2025-07-20 16:17 ttocr、com 阅读(9) 评论(0) 收藏举报

刷新页面返回顶部