基于端到端卷积神经网络的多字符验证码自动识别系统

传统的验证码识别系统依赖字符切割,流程复杂且对图像质量敏感。本文提出一种无需字符分割的端到端卷积神经网络(CNN)模型,直接对整个验证码图像进行识别。系统采用 PyTorch 构建,结合卷积模块与全连接输出多字符标签,具备高准确率与实时性,适用于常见英文+数字组合的验证码自动识别场景。

  1. 引言
    验证码广泛用于反自动化验证,其识别难点包括字符干扰线、扭曲变形、无固定长度等。相比逐字符切割的方法,端到端识别无需字符分割预处理,具有更强的适应能力。本文设计一个轻量级的 CNN 模型,用于识别长度为 4~6 的英文数字组合验证码图像。

  2. 数据生成与预处理
    采用 captcha 库动态生成训练样本:
    更多内容访问ttocr.com或联系1436423940
    from captcha.image import ImageCaptcha
    import random
    import string
    from PIL import Image
    import numpy as np

def random_captcha_text(length=5):
chars = string.ascii_letters + string.digits
return ''.join(random.choices(chars, k=length))

def generate_captcha_image(text):
image_gen = ImageCaptcha(width=160, height=60)
img = image_gen.generate_image(text)
return img, text

def image_to_tensor(img):
img = img.convert("L").resize((160, 60))
img = np.array(img) / 255.0
return img.astype(np.float32)
每个验证码统一为 5 个字符(如需处理变长可用 CTC Loss)。

  1. 模型设计:端到端多标签 CNN
    模型输出 5 个字符,每个字符是一个独立的分类任务,共 62 个类:

import torch.nn as nn

class CaptchaCNN(nn.Module):
def init(self, num_classes=62, code_len=5):
super().init()
self.conv = nn.Sequential(
nn.Conv2d(1, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
)
self.fc = nn.Sequential(
nn.Flatten(),
nn.Linear(128207, 1024), nn.ReLU(),
nn.Linear(1024, code_len * num_classes)
)
self.code_len = code_len
self.num_classes = num_classes

def forward(self, x):
    out = self.conv(x)
    out = self.fc(out)
    return out.view(-1, self.code_len, self.num_classes)
  1. 标签编码与解码
    字符集:0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,共 62 个字符。

all_chars = string.digits + string.ascii_lowercase + string.ascii_uppercase
char2index = {c: i for i, c in enumerate(all_chars)}
index2char = {i: c for i, c in enumerate(all_chars)}

def encode_label(text):
return [char2index[c] for c in text]

def decode_output(output_tensor):
pred = output_tensor.argmax(dim=2)
return [''.join(index2char[i.item()] for i in row) for row in pred]
5. 模型训练与评估

import torch.optim as optim
import torch.nn.functional as F

model = CaptchaCNN()
optimizer = optim.Adam(model.parameters(), lr=0.001)

def loss_fn(output, target):
return sum(F.cross_entropy(output[:, i], target[:, i]) for i in range(5))

for epoch in range(20):
for images, labels in dataloader:
preds = model(images)
loss = loss_fn(preds, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
准确率评估:

correct = 0
total = 0
for images, labels in test_loader:
preds = model(images)
pred_texts = decode_output(preds)
true_texts = [''.join(index2char[c.item()] for c in row) for row in labels]
correct += sum(p == t for p, t in zip(pred_texts, true_texts))
total += len(true_texts)
print(f'验证码整体准确率: {correct / total * 100:.2f}%')

posted @ 2025-07-30 20:43  ttocr、com  阅读(14)  评论(0)    收藏  举报