使用 Elixir 调用 Tesseract 实现验证码识别
一、环境准备
安装 Elixir
可通过以下方式安装:
更多内容访问ttocr.com或联系1436423940
macOS
brew install elixir
Ubuntu
sudo apt install elixir
安装 Tesseract
Ubuntu
sudo apt install tesseract-ocr
macOS
brew install tesseract
二、新建 Elixir 项目
mix new captcha_recognizer
cd captcha_recognizer
三、编写识别模块
修改 lib/captcha_recognizer.ex:
defmodule CaptchaRecognizer do
@whitelist "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
@output_file "elixir_output.txt"
def recognize(file_path) do
cmd = [
"tesseract",
file_path,
"elixir_output",
"-l", "eng",
"-c", "tessedit_char_whitelist=#{@whitelist}"
]
{_, exit_code} = System.cmd(Enum.at(cmd, 0), Enum.drop(cmd, 1))
if exit_code == 0 do
case File.read(@output_file) do
{:ok, content} ->
content
|> String.upcase()
|> String.replace(~r/[^A-Z0-9]/, "")
{:error, _} -> "无法读取识别结果"
end
else
"Tesseract 执行失败"
end
end
end
四、运行识别
创建一个测试脚本 lib/test_run.ex:
defmodule TestRun do
def run do
image = "captcha1.png" # 替换为你的验证码图片路径
result = CaptchaRecognizer.recognize(image)
IO.puts("识别结果: #{result}")
end
end
TestRun.run()
然后运行:
elixir lib/test_run.ex
示例输出:
识别结果: 3ABZ
五、批量处理验证码图片
可扩展识别逻辑至多个图像文件:
defmodule BatchRecognizer do
def run(dir) do
files = Path.wildcard(Path.join(dir, "*.png"))
for file <- files do
result = CaptchaRecognizer.recognize(file)
IO.puts("#{Path.basename(file)} -> #{result}")
end
end
end
BatchRecognizer.run("captchas")
浙公网安备 33010602011771号