使用 F# 与 Tesseract 实现图像验证码识别

一、环境准备
安装 .NET SDK 与 F#

安装 .NET SDK(包含 F#)

https://dotnet.microsoft.com/download
更多内容访问ttocr.com或联系1436423940

创建项目

dotnet new console -lang "F#" -o CaptchaOCR
cd CaptchaOCR
安装 Tesseract

macOS

brew install tesseract

Ubuntu

sudo apt install tesseract-ocr
二、编辑代码
打开 Program.fs,替换内容如下:

open System
open System.IO
open System.Diagnostics
open System.Text.RegularExpressions

/// 提取大写字母和数字
let cleanText (text: string) =
Regex("[A-Z0-9]").Matches(text)
|> Seq.cast
|> Seq.map (fun m -> m.Value)
|> String.concat ""

/// 调用 Tesseract 识别验证码
let recognizeCaptcha (imagePath: string) =
let outputBase = "fsharp_ocr_output"
let whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

let startInfo = new ProcessStartInfo()
startInfo.FileName <- "tesseract"
startInfo.Arguments <- sprintf "%s %s -l eng -c tessedit_char_whitelist=%s"
                                imagePath outputBase whitelist
startInfo.RedirectStandardOutput <- true
startInfo.RedirectStandardError <- true
startInfo.UseShellExecute <- false

let proc = Process.Start(startInfo)
proc.WaitForExit()

let outputFile = outputBase + ".txt"
if File.Exists(outputFile) then
    let rawText = File.ReadAllText(outputFile)
    File.Delete(outputFile)
    cleanText rawText
else
    "识别失败"

[]
let main _ =
let imagePath = "captcha1.png" // 替换为你的图片路径
let result = recognizeCaptcha imagePath
printfn "识别结果: %s" result
0
三、运行程序

dotnet run
输出示例:

识别结果: 7GDQ
四、拓展建议
批量识别多个验证码图片

Directory.GetFiles("captchas", "*.png")
|> Array.iter (fun path ->
let res = recognizeCaptcha path
printfn "%s -> %s" (Path.GetFileName path) res)
输出结果到 CSV 文件

let output = new StreamWriter("result.csv")
output.WriteLine("filename,text")

Directory.GetFiles("captchas", "*.png")
|> Array.iter (fun path ->
let res = recognizeCaptcha path
output.WriteLine(sprintf "%s,%s" (Path.GetFileName path) res))

output.Close()

posted @ 2025-06-30 13:50  ttocr、com  阅读(14)  评论(0)    收藏  举报