使用 Nim 和 Tesseract 实现验证码识别

本项目通过 Nim 调用系统的 Tesseract OCR 工具，实现验证码图像的字符识别。代码简洁，运行高效，适合构建命令行自动化工具。

一、环境准备
安装 Nim 编译器
可通过 choosenim 安装：
更多内容访问ttocr.com或联系1436423940
curl https://nim-lang.org/choosenim/init.sh -sSf | sh
安装 Tesseract OCR

Ubuntu / Debian：

sudo apt install tesseract-ocr
macOS：

brew install tesseract
Windows：安装后将安装目录加入 PATH 环境变量

二、核心代码 captcha_ocr.nim

import os, strutils, strformat

proc recognizeCaptcha(imagePath: string) =
if not fileExists(imagePath):
echo "文件不存在: " & imagePath
quit(1)

let outputBase = "ocr_output"
let whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

let cmd = fmt"tesseract {imagePath} {outputBase} -l eng -c tessedit_char_whitelist={whitelist}"
discard execShellCmd(cmd)

let resultFile = outputBase & ".txt"
if not fileExists(resultFile):
echo "识别失败：未生成输出文件"
quit(1)

let raw = readFile(resultFile)
let cleaned = raw.replace(re"[^\w\d]", "").strip()
echo "识别结果: " & cleaned

removeFile(resultFile)
三、添加主函数支持命令行参数

when isMainModule:
if paramCount() != 1:
echo "用法: nim c -r captcha_ocr.nim <图像路径>"
quit(1)

let imagePath = paramStr(1)
recognizeCaptcha(imagePath)
四、编译并运行
编译：

nim c -d:release captcha_ocr.nim
运行：

./captcha_ocr ./test_captcha.png
输出示例：

识别结果: 5JKL
五、批量识别图像目录
可以通过遍历目录处理多个验证码图像：

for kind, path in walkDir("captcha_dir"):
if kind == pcFile and path.endsWith(".png"):
recognizeCaptcha(path)

posted @ 2025-06-27 13:31 ttocr、com 阅读(10) 评论(0) 收藏举报

刷新页面返回顶部