用 PyTorch 构建验证码识别系统

在这篇教程中，我们将使用 PyTorch 来构建一个验证码识别系统。与 TensorFlow 类似，PyTorch 也是一个广泛使用的深度学习框架，它提供了更多的灵活性和控制力，特别适合于研究和快速原型设计。

我们将利用卷积神经网络（CNN）来处理验证码图像，并且使用 PyTorch 构建模型进行训练和预测。

环境准备
首先，确保你安装了以下依赖：

pip install torch torchvision matplotlib opencv-python pillow
torch：PyTorch 的核心库。

torchvision：PyTorch 提供的用于图像处理的工具包。

matplotlib：用于可视化。

opencv-python：用于图像处理。

pillow：用于图像加载和处理。

数据集准备与图像预处理
首先，我们需要准备一个包含验证码的图像数据集，并对这些图像进行预处理。我们会将图像转换为灰度图，并进行二值化处理，这样有助于提取验证码中的字符。

(1) 图像预处理

import cv2
import numpy as np

def preprocess_image(img_path, img_size=(64, 64)):
# 读取图像
img = cv2.imread(img_path)

# 转换为灰度图
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# 使用 Otsu 的方法进行二值化
_, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

# 调整图像大小
resized_img = cv2.resize(binary, img_size)

# 归一化图像
normalized_img = resized_img / 255.0

return normalized_img

示例图像路径

img_path = 'captcha_images/test1.png'
processed_img = preprocess_image(img_path)

显示处理后的图像

import matplotlib.pyplot as plt
plt.imshow(processed_img, cmap='gray')
plt.show()
在此代码中，我们将图像转换为灰度图，并使用 Otsu 方法进行二值化处理。这将有助于更清晰地分离文本和背景。

构建 CNN 模型
现在我们使用 PyTorch 构建一个卷积神经网络模型。这个模型包含了多个卷积层、池化层和全连接层。最后，输出层会根据验证码中的字符数量进行分类。

(1) 构建模型

import torch
import torch.nn as nn
import torch.optim as optim

class CaptchaCNN(nn.Module):
def init(self, num_classes=36):
super(CaptchaCNN, self).init()

    self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
    self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
    self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
    self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)

    self.fc1 = nn.Linear(128 * 8 * 8, 1024)
    self.fc2 = nn.Linear(1024, num_classes)

def forward(self, x):
    x = self.pool(torch.relu(self.conv1(x)))
    x = self.pool(torch.relu(self.conv2(x)))
    x = self.pool(torch.relu(self.conv3(x)))
    
    x = x.view(-1, 128 * 8 * 8)  # 展平
    x = torch.relu(self.fc1(x))
    x = self.fc2(x)
    return x

创建模型

model = CaptchaCNN(num_classes=36) # 假设有 36 类（数字 + 字母）

查看模型结构

print(model)
在上面的代码中，我们定义了一个简单的 CNN 模型，包括三个卷积层，每个卷积层后跟一个池化层。最后，我们通过全连接层输出最终的分类结果。

数据加载与训练
在训练模型之前，我们需要将图像数据加载到 PyTorch 中，并且将标签进行 One-hot 编码。这里假设你已经有一个包含多个验证码图像的数据集。

(1) 数据加载与转换

import os
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image

class CaptchaDataset(Dataset):
def init(self, image_dir, transform=None):
self.image_dir = image_dir
self.transform = transform
self.image_files = [f for f in os.listdir(image_dir) if f.endswith('.png')]

def __len__(self):
    return len(self.image_files)

def __getitem__(self, idx):
    img_name = os.path.join(self.image_dir, self.image_files[idx])
    img = Image.open(img_name).convert('L')  # 转为灰度图
    label = self.image_files[idx].split('.')[0]  # 获取标签

    # 如果有转换操作，应用它
    if self.transform:
        img = self.transform(img)

    return img, label

数据增强和转换操作

transform = transforms.Compose([
transforms.Resize((64, 64)),
transforms.ToTensor(),
])

加载数据

train_dataset = CaptchaDataset(image_dir='captcha_images', transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

检查数据加载

for images, labels in train_loader:
print(images.size(), labels)
break
在上面的代码中，我们定义了一个 CaptchaDataset 类，用于从指定目录加载图像并转换为 Tensor 格式。我们还添加了数据增强和预处理步骤。

训练模型
训练过程中，我们将使用交叉熵损失函数（CrossEntropyLoss）和 Adam 优化器。训练过程中，我们会打印每个周期的损失和准确率。

(1) 训练过程

def train(model, train_loader, num_epochs=10):
criterion = nn.CrossEntropyLoss() # 使用交叉熵损失函数
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(num_epochs):
    model.train()  # 切换到训练模式
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in train_loader:
        # 转换为张量并推送到 GPU（如果有的话）
        images = images.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()

        # 前向传播
        outputs = model(images)
        loss = criterion(outputs, labels)

        # 反向传播与优化
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        # 统计准确率
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    epoch_loss = running_loss / len(train_loader)
    epoch_acc = 100 * correct / total

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc:.2f}%')

确保模型在 GPU 上运行（如果有的话）

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

开始训练

train(model, train_loader, num_epochs=10)
6. 进行预测
训练完成后，我们可以使用模型对新的验证码进行预测。

(1) 进行预测

def predict(model, img_path):
model.eval() # 切换到评估模式
img = preprocess_image(img_path)
img = torch.tensor(img).unsqueeze(0).unsqueeze(0).float().to(device) # 增加批次维度和通道维度

# 预测
outputs = model(img)
_, predicted = torch.max(outputs, 1)

return predicted.item()

预测一个新的验证码图像

predicted_label = predict(model, 'captcha_images/test1.png')
print(f'Predicted label: {predicted_label}')

posted @ 2025-04-07 13:09 ttocr、com 阅读(34) 评论(0) 收藏举报

刷新页面返回顶部