使用 Nim 和 Tesseract 识别验证码图片
一、环境准备
安装 Nim:
Ubuntu/macOS
curl https://nim-lang.org/choosenim/init.sh -sSf | sh
安装 Tesseract OCR:
更多内容访问ttocr.com或联系1436423940
macOS
brew install tesseract
Ubuntu
sudo apt install tesseract-ocr
安装编译工具(如果缺失):
Ubuntu
sudo apt install gcc
二、编写识别脚本
创建文件:captcha_ocr.nim
import osproc, os, strutils
proc cleanText(s: string): string =
只保留大写字母和数字
result = ""
for c in s:
if c in {'A'..'Z'} or c in {'0'..'9'}:
result.add(c)
proc recognizeCaptcha(imagePath: string) =
let outputBase = "output"
let whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
构建命令行调用 Tesseract
let cmd = "tesseract " & imagePath & " " & outputBase &
" -l eng -c tessedit_char_whitelist=" & whitelist
discard execShellCmd(cmd)
let outputFile = outputBase & ".txt"
if fileExists(outputFile):
let raw = readFile(outputFile)
let cleaned = cleanText(raw)
echo "识别结果: ", cleaned
removeFile(outputFile)
else:
echo "识别失败:输出文件未生成"
示例调用
recognizeCaptcha("captcha1.png") # 替换为你的图像文件路径
三、运行程序
编译运行:
nim c -r captcha_ocr.nim
示例输出:
识别结果: 3KZ8
四、可拓展功能
你可以扩展以下功能:
遍历文件夹中的验证码图像:
for file in walkFiles("captchas/*.png"):
recognizeCaptcha(file)
输出结果保存到 CSV 文件:
writeFile("results.csv", "filename,text\n")
for file in walkFiles("captchas/*.png"):
let result = recognizeCaptcha(file)
appendFile("results.csv", file & "," & result & "\n")
与 Nim 图像库如 nimPNG、nimMagick 结合,实现图像预处理(灰度化、阈值等)
浙公网安备 33010602011771号