使用 Elixir 与 Tesseract 实现验证码图像识别

一、简介
验证码识别（OCR）通常用于自动化登录或数据提取场景。本教程将介绍如何使用 Elixir 语言通过系统调用集成 Tesseract 引擎，实现图像验证码识别。

二、环境准备

安装 Elixir
macOS:
更多内容访问ttocr.com或联系1436423940
brew install elixir
Ubuntu:

sudo apt-get update
sudo apt-get install elixir
Windows:

从 https://elixir-lang.org/install.html 下载安装包。

安装 Tesseract

sudo apt install tesseract-ocr
或 macOS:

brew install tesseract
确认版本：

tesseract --version
三、创建 Elixir 项目

mix new captcha_ocr
cd captcha_ocr
四、实现识别逻辑
编辑 lib/captcha_ocr.ex 文件：

defmodule CaptchaOCR do
def recognize(image_path) do
cmd = "tesseract"
args = [image_path, "stdout", "-l", "eng", "--psm", "7"]
case System.cmd(cmd, args, stderr_to_stdout: true) do
{output, 0} ->
String.trim(output)
|> String.upcase()
{error_msg, _} ->
"识别失败: #{error_msg}"
end
end
end
五、运行识别测试
编辑 lib/main.ex（或创建新模块）：

defmodule Main do
def run([image_path]) do
result = CaptchaOCR.recognize(image_path)
IO.puts("识别结果: #{result}")
end

def run(_) do
IO.puts("用法: mix run -e 'Main.run(["path/to/image.png"])'")
end
end
命令行执行：

mix run -e 'Main.run(["captcha.png"])'
输出示例：

识别结果: 8K9F
六、批量识别（可选）
你可以使用 Elixir 的 File.ls!/1 和 Path.extname/1 构建批量识别逻辑：

defmodule BatchOCR do
def run(folder) do
folder
|> File.ls!()
|> Enum.filter(&String.ends_with?(&1, ".png"))
|> Enum.each(fn filename ->
full_path = Path.join(folder, filename)
result = CaptchaOCR.recognize(full_path)
IO.puts("#{filename} => #{result}")
end)
end
end
运行方式：

mix run -e 'BatchOCR.run("captchas")'

posted @ 2025-07-21 10:24 ttocr、com 阅读(10) 评论(0) 收藏举报

刷新页面返回顶部

使用 Elixir 与 Tesseract 实现验证码图像识别

公告