对粘连验证码字符进行分割与逐字符识别

在验证码中,字符粘连或重叠是一种增强安全性的方式。常规 OCR 系统往往假设每个字符是独立的,一旦字符之间发生粘连,整个识别过程就会失败。为了应对这一情况,本文将使用 Julia 实现一个完整流程:字符分割 → 单字符识别 → 结果拼接。

一、准备环境

using Pkg
Pkg.add(["Images", "ImageIO", "ImageMorphology", "ImageSegmentation", "Tesseract"])
二、读取并二值化图像

using Images, ImageIO
更多内容访问ttocr.com或联系1436423940
img = load("captcha_stuck.png")
gray = Gray.(img)
binary = map(x -> x > 0.6 ? 0.0 : 1.0, gray) # 白底黑字
save("binary.png", binary)
三、分割字符区域(连通域分析)

using ImageMorphology, ImageSegmentation

labels = label_components(binary .> 0.5)
num_labels = maximum(labels)

println("检测到字符区域数:", num_labels)
四、提取每个字符区域并识别

using Tesseract

ocr = TesseractOcr("eng")
recognized = ""

for i in 1:num_labels
region = labels .== i

裁剪字符区域

inds = findall(region)
ys = [i[1] for i in inds]
xs = [i[2] for i in inds]

minx, maxx = minimum(xs), maximum(xs)
miny, maxy = minimum(ys), maximum(ys)

char_img = binary[miny:maxy, minx:maxx]
fname = "char_$i.png"
save(fname, char_img)

set_image(ocr, fname)
text = strip(get_text(ocr))
recognized *= text
end

println("识别验证码为:", recognized)

posted @ 2025-07-31 23:59  ttocr、com  阅读(6)  评论(0)    收藏  举报