使用 Perl 和 Tesseract 实现验证码识别工具

一、项目概述
本项目使用 Perl 脚本调用系统中的 Tesseract OCR 引擎，对验证码图片进行识别，提取其中的英文和数字字符。适合用于服务器端批处理、自动登录脚本或命令行识别工具。

二、环境准备

安装 Perl
大多数系统自带 Perl，可使用如下命令检查：
更多内容访问ttocr.com或联系1436423940
perl -v
安装 Tesseract

Ubuntu / Debian

sudo apt install tesseract-ocr

macOS

brew install tesseract
确保命令 tesseract 可用：

tesseract --version
三、准备验证码图像
将图像命名为 captcha.png，图像内容为清晰的英文字母和数字组合，保存到脚本同一目录。

四、编写识别脚本
创建文件 ocr.pl：

!/usr/bin/perl

use strict;
use warnings;

my $input_image = "captcha.png";
my $processed_image = "processed.png";

图像预处理：灰度并二值化（调用 ImageMagick 的 convert 工具）

my $convert_cmd = "convert $input_image -colorspace Gray -threshold 50% $processed_image";
system($convert_cmd) == 0 or die "图像预处理失败\n";

OCR 识别：调用 tesseract，限制识别字符

my $tesseract_cmd = "tesseract $processed_image stdout -l eng --psm 7 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
my $output = $tesseract_cmd;

清洗结果

$output =~ s/[^A-Za-z0-9]//g;

print "识别结果为: $output\n";
赋予脚本可执行权限：

chmod +x ocr.pl
五、运行脚本

./ocr.pl
输出示例：

识别结果为: F6XBZ

posted @ 2025-07-06 21:46 ttocr、com 阅读(35) 评论(0) 收藏举报

刷新页面返回顶部