深度学习(混合精度训练)

混合精度训练通常会结合使用单精度浮点数(float32)和半精度浮点数(float16),以提高训练效率和减少内存占用。

代码中关键在于两个地方:

1. 在with autocast():下,模型的前向传播和损失计算放在自动混合精度加速环境中进行。

2. 使用scaler对象进行混合精度训练的管理。

通过这种方式,混合精度训练可以在减少内存占用的同时加快计算速度,并在不牺牲模型精度的情况下提高训练效率。

代码如下:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.cuda.amp import GradScaler, autocast
from torchvision import transforms,datasets
import time

# 自定义LeNet模型
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16*4*4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x)) 
        x = torch.max_pool2d(x, 2) 
        x = torch.relu(self.conv2(x)) 
        x = torch.max_pool2d(x, 2) 
        x = x.view(x.size(0), -1) 
        x = torch.relu(self.fc1(x)) 
        x = torch.relu(self.fc2(x))  
        x = self.fc3(x)  
        return x

# 加载MNIST数据集
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])
trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)


# 创建LeNet模型和优化器
model = LeNet()
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

scaler = GradScaler()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

start = time.clock()
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)
        optimizer.zero_grad()
    
        with autocast():
            output = model(images)
            loss = criterion(output, labels)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
       
        running_loss += loss.item()
        _, predicted = torch.max(output.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}, Accuracy: {(100 * correct / total):.2f}%")
end = time.clock()
print(end-start)
print('Training finished.')

# 保存模型
torch.save(model.state_dict(), 'lenet_mnist.pth')

MNIST数据集本身比较小,看不出什么区别。

训练集数据量稍微大一些,训练时间上还是能看出明显区别的。

posted @ 2024-03-16 11:42  Dsp Tian  阅读(23)  评论(0编辑  收藏  举报