使用 F# 与 Tesseract 实现图像验证码识别
一、环境准备
安装 .NET SDK 与 F#
安装 .NET SDK(包含 F#)
https://dotnet.microsoft.com/download
更多内容访问ttocr.com或联系1436423940
创建项目
dotnet new console -lang "F#" -o CaptchaOCR
cd CaptchaOCR
安装 Tesseract
macOS
brew install tesseract
Ubuntu
sudo apt install tesseract-ocr
二、编辑代码
打开 Program.fs,替换内容如下:
open System
open System.IO
open System.Diagnostics
open System.Text.RegularExpressions
/// 提取大写字母和数字
let cleanText (text: string) =
Regex("[A-Z0-9]").Matches(text)
|> Seq.cast
|> Seq.map (fun m -> m.Value)
|> String.concat ""
/// 调用 Tesseract 识别验证码
let recognizeCaptcha (imagePath: string) =
let outputBase = "fsharp_ocr_output"
let whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
let startInfo = new ProcessStartInfo()
startInfo.FileName <- "tesseract"
startInfo.Arguments <- sprintf "%s %s -l eng -c tessedit_char_whitelist=%s"
imagePath outputBase whitelist
startInfo.RedirectStandardOutput <- true
startInfo.RedirectStandardError <- true
startInfo.UseShellExecute <- false
let proc = Process.Start(startInfo)
proc.WaitForExit()
let outputFile = outputBase + ".txt"
if File.Exists(outputFile) then
let rawText = File.ReadAllText(outputFile)
File.Delete(outputFile)
cleanText rawText
else
"识别失败"
[
let main _ =
let imagePath = "captcha1.png" // 替换为你的图片路径
let result = recognizeCaptcha imagePath
printfn "识别结果: %s" result
0
三、运行程序
dotnet run
输出示例:
识别结果: 7GDQ
四、拓展建议
批量识别多个验证码图片
Directory.GetFiles("captchas", "*.png")
|> Array.iter (fun path ->
let res = recognizeCaptcha path
printfn "%s -> %s" (Path.GetFileName path) res)
输出结果到 CSV 文件
let output = new StreamWriter("result.csv")
output.WriteLine("filename,text")
Directory.GetFiles("captchas", "*.png")
|> Array.iter (fun path ->
let res = recognizeCaptcha path
output.WriteLine(sprintf "%s,%s" (Path.GetFileName path) res))
output.Close()
浙公网安备 33010602011771号