使用 Elixir 调用 Tesseract 实现验证码识别

一、环境准备
安装 Elixir
可通过以下方式安装:
更多内容访问ttocr.com或联系1436423940

macOS

brew install elixir

Ubuntu

sudo apt install elixir
安装 Tesseract

Ubuntu

sudo apt install tesseract-ocr

macOS

brew install tesseract
二、新建 Elixir 项目

mix new captcha_recognizer
cd captcha_recognizer
三、编写识别模块
修改 lib/captcha_recognizer.ex:

defmodule CaptchaRecognizer do
@whitelist "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
@output_file "elixir_output.txt"

def recognize(file_path) do
cmd = [
"tesseract",
file_path,
"elixir_output",
"-l", "eng",
"-c", "tessedit_char_whitelist=#{@whitelist}"
]

{_, exit_code} = System.cmd(Enum.at(cmd, 0), Enum.drop(cmd, 1))

if exit_code == 0 do
  case File.read(@output_file) do
    {:ok, content} ->
      content
      |> String.upcase()
      |> String.replace(~r/[^A-Z0-9]/, "")
    {:error, _} -> "无法读取识别结果"
  end
else
  "Tesseract 执行失败"
end

end
end
四、运行识别
创建一个测试脚本 lib/test_run.ex:

defmodule TestRun do
def run do
image = "captcha1.png" # 替换为你的验证码图片路径
result = CaptchaRecognizer.recognize(image)
IO.puts("识别结果: #{result}")
end
end

TestRun.run()
然后运行:

elixir lib/test_run.ex
示例输出:

识别结果: 3ABZ
五、批量处理验证码图片
可扩展识别逻辑至多个图像文件:

defmodule BatchRecognizer do
def run(dir) do
files = Path.wildcard(Path.join(dir, "*.png"))

for file <- files do
  result = CaptchaRecognizer.recognize(file)
  IO.puts("#{Path.basename(file)} -> #{result}")
end

end
end

BatchRecognizer.run("captchas")

posted @ 2025-07-01 21:13  ttocr、com  阅读(9)  评论(0)    收藏  举报