17-完整模型训练套路

1. CIFAR 10 model 网络模型

① 下面用 CIFAR 10 model网络来完成分类问题,网络模型如下图所示。
image

2. DataLoader加载数据集

import torchvision
from torch import nn
from torch.utils.data import DataLoader

# 准备数据集
train_data = torchvision.datasets.CIFAR10("./dataset",train=True,transform=torchvision.transforms.ToTensor(),download=True)       
test_data = torchvision.datasets.CIFAR10("./dataset",train=False,transform=torchvision.transforms.ToTensor(),download=True)       

# length 长度
train_data_size = len(train_data)
test_data_size = len(test_data)
# 如果train_data_size=10,则打印:训练数据集的长度为:10
print("训练数据集的长度:{}".format(train_data_size))
print("测试数据集的长度:{}".format(test_data_size))

# 利用 Dataloader 来加载数据集
train_dataloader = DataLoader(train_data_size, batch_size=64)        
test_dataloader = DataLoader(test_data_size, batch_size=64)
Files already downloaded and verified
Files already downloaded and verified
训练数据集的长度:50000
测试数据集的长度:10000

3. 测试网络正确

import torch
from torch import nn

# 搭建神经网络
class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()        
        self.model1 = nn.Sequential(
            nn.Conv2d(3,32,5,1,2),  # 输入通道3,输出通道32,卷积核尺寸5×5,步长1,填充2    
            nn.MaxPool2d(2),
            nn.Conv2d(32,32,5,1,2),
            nn.MaxPool2d(2),
            nn.Conv2d(32,64,5,1,2),
            nn.MaxPool2d(2),
            nn.Flatten(),  # 展平后变成 64*4*4 了
            nn.Linear(64*4*4,64),
            nn.Linear(64,10)
        )
        
    def forward(self, x):
        x = self.model1(x)
        return x
    
if __name__ == '__main__':
    tudui = Tudui()
    input = torch.ones((64,3,32,32))
    output = tudui(input)
    print(output.shape)  # 测试输出的尺寸是不是我们想要的
torch.Size([64, 10])

4. 网络训练数据

import torchvision
from torch import nn
from torch.utils.data import DataLoader

# from model import * 相当于把 model中的所有内容写到这里,这里直接把 model 写在这里
class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()        
        self.model1 = nn.Sequential(
            nn.Conv2d(3,32,5,1,2),  # 输入通道3,输出通道32,卷积核尺寸5×5,步长1,填充2    
            nn.MaxPool2d(2),
            nn.Conv2d(32,32,5,1,2),
            nn.MaxPool2d(2),
            nn.Conv2d(32,64,5,1,2),
            nn.MaxPool2d(2),
            nn.Flatten(),  # 展平后变成 64*4*4 了
            nn.Linear(64*4*4,64),
            nn.Linear(64,10)
        )
        
    def forward(self, x):
        x = self.model1(x)
        return x

# 准备数据集
train_data = torchvision.datasets.CIFAR10("./dataset",train=True,transform=torchvision.transforms.ToTensor(),download=True)       
test_data = torchvision.datasets.CIFAR10("./dataset",train=False,transform=torchvision.transforms.ToTensor(),download=True)       

# length 长度
train_data_size = len(train_data)
test_data_size = len(test_data)
# 如果train_data_size=10,则打印:训练数据集的长度为:10
print("训练数据集的长度:{}".format(train_data_size))
print("测试数据集的长度:{}".format(test_data_size))

# 利用 Dataloader 来加载数据集
train_dataloader = DataLoader(train_data, batch_size=64)        
test_dataloader = DataLoader(test_data, batch_size=64)

# 创建网络模型
tudui = Tudui() 

# 损失函数
loss_fn = nn.CrossEntropyLoss() # 交叉熵,fn 是 fuction 的缩写

# 优化器
learning = 0.01  # 1e-2 就是 0.01 的意思
optimizer = torch.optim.SGD(tudui.parameters(),learning)   # 随机梯度下降优化器  

# 设置网络的一些参数
# 记录训练的次数
total_train_step = 0

# 训练的轮次
epoch = 10

for i in range(epoch):
    print("-----第 {} 轮训练开始-----".format(i+1))
    
    # 训练步骤开始
    for data in train_dataloader:
        imgs, targets = data
        outputs = tudui(imgs)
        loss = loss_fn(outputs, targets) # 计算实际输出与目标输出的差距
        
        # 优化器对模型调优
        optimizer.zero_grad()  # 梯度清零
        loss.backward() # 反向传播,计算损失函数的梯度
        optimizer.step()   # 根据梯度,对网络的参数进行调优
        
        total_train_step = total_train_step + 1
        #print("训练次数:{},Loss:{}".format(total_train_step,loss))  # 方式一:获得loss值
        print("训练次数:{},Loss:{}".format(total_train_step,loss.item()))  # 方式二:获得loss值
Files already downloaded and verified
Files already downloaded and verified
训练数据集的长度:50000
测试数据集的长度:10000
-----第 1 轮训练开始-----
训练次数:1,Loss:2.3299059867858887
训练次数:2,Loss:2.3018362522125244
训练次数:3,Loss:2.2983856201171875
...
训练次数:778,Loss:2.0410759449005127
训练次数:779,Loss:2.1226587295532227
训练次数:780,Loss:2.0455377101898193
训练次数:781,Loss:2.1226439476013184
训练次数:782,Loss:2.1010429859161377
-----第 2 轮训练开始-----
训练次数:783,Loss:2.0878143310546875
训练次数:784,Loss:1.8932205438613892
训练次数:785,Loss:2.0417253971099854
...
训练次数:1560,Loss:1.8508845567703247
训练次数:1561,Loss:1.8362983465194702
训练次数:1562,Loss:1.6391022205352783
训练次数:1563,Loss:1.8635512590408325
训练次数:1564,Loss:2.218357801437378
-----第 3 轮训练开始-----
训练次数:1565,Loss:1.9309566020965576
训练次数:1566,Loss:1.8227077722549438
训练次数:1567,Loss:1.9470105171203613
...
训练次数:2342,Loss:1.6162735223770142
训练次数:2343,Loss:1.6201684474945068
训练次数:2344,Loss:1.4277459383010864
训练次数:2345,Loss:1.6406025886535645
训练次数:2346,Loss:2.2170968055725098
-----第 4 轮训练开始-----
训练次数:2347,Loss:1.714172601699829
训练次数:2348,Loss:1.5495185852050781
训练次数:2349,Loss:1.7623050212860107
...
训练次数:3124,Loss:1.482405185699463
训练次数:3125,Loss:1.517656683921814
训练次数:3126,Loss:1.3997808694839478
训练次数:3127,Loss:1.5784509181976318
训练次数:3128,Loss:2.270616054534912
-----第 5 轮训练开始-----
训练次数:3129,Loss:1.680168628692627
训练次数:3130,Loss:1.4717119932174683
训练次数:3131,Loss:1.7041345834732056
...
训练次数:3906,Loss:1.397294044494629
训练次数:3907,Loss:1.4086668491363525
训练次数:3908,Loss:1.3827290534973145
训练次数:3909,Loss:1.5424541234970093
训练次数:3910,Loss:2.2353408336639404
-----第 6 轮训练开始-----
训练次数:3911,Loss:1.6357128620147705
训练次数:3912,Loss:1.4443740844726562
训练次数:3913,Loss:1.6506778001785278
...
训练次数:4687,Loss:1.4720959663391113
训练次数:4688,Loss:1.3255469799041748
训练次数:4689,Loss:1.3054076433181763
训练次数:4690,Loss:1.3236397504806519
训练次数:4691,Loss:1.4870140552520752
训练次数:4692,Loss:2.085285186767578
-----第 7 轮训练开始-----
训练次数:4693,Loss:1.519224762916565
训练次数:4694,Loss:1.3556820154190063
训练次数:4695,Loss:1.5660985708236694
...
训练次数:5470,Loss:1.2511683702468872
训练次数:5471,Loss:1.2069371938705444
训练次数:5472,Loss:1.2312521934509277
训练次数:5473,Loss:1.415238380432129
训练次数:5474,Loss:1.9278712272644043
-----第 8 轮训练开始-----
训练次数:5475,Loss:1.3937904834747314
训练次数:5476,Loss:1.2605940103530884
训练次数:5477,Loss:1.49344801902771
...
训练次数:6252,Loss:1.1681272983551025
训练次数:6253,Loss:1.1195703744888306
训练次数:6254,Loss:1.1405022144317627
训练次数:6255,Loss:1.3474702835083008
训练次数:6256,Loss:1.7841163873672485
-----第 9 轮训练开始-----
训练次数:6257,Loss:1.2861464023590088
训练次数:6258,Loss:1.1618098020553589
训练次数:6259,Loss:1.4472986459732056
...
训练次数:7034,Loss:1.084890604019165
训练次数:7035,Loss:1.0351139307022095
训练次数:7036,Loss:1.0526344776153564
训练次数:7037,Loss:1.2817728519439697
训练次数:7038,Loss:1.6561686992645264
-----第 10 轮训练开始-----
训练次数:7039,Loss:1.2101346254348755
训练次数:7040,Loss:1.0879660844802856
训练次数:7041,Loss:1.4072219133377075
...

训练次数:7816,Loss:1.016163945198059
训练次数:7817,Loss:0.964885413646698
训练次数:7818,Loss:0.9678632020950317
训练次数:7819,Loss:1.222188949584961
训练次数:7820,Loss:1.5298950672149658

5. item作用

import torch
a = torch.tensor(5)
print(a)
print(a.item())
tensor(5)
5

6. 查看训练损失

① 在pytorch中,tensor有一个requires_grad参数,如果设置为True,则反向传播时,该tensor就会自动求导。

② tensor的requires_grad的属性默认为False,若一个节点(叶子变量:自己创建的tensor)requires_grad被设置为True,那么所有依赖它的节点requires_grad都为True(即使其他相依赖的tensor的requires_grad = False)

③ 当requires_grad设置为False时,反向传播时就不会自动求导了,因此大大节约了显存或者说内存。

④ with torch.no_grad的作用在该模块下,所有计算得出的tensor的requires_grad都自动设置为False。

⑤ 即使一个tensor(命名为x)的requires_grad = True,在with torch.no_grad计算,由x得到的新tensor(命名为w-标量)requires_grad也为False,且grad_fn也为None,即不会对w求导。

⑥ torch.no_grad():停止计算梯度,不能进行反向传播。

import torchvision
import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

# from model import * 相当于把 model中的所有内容写到这里,这里直接把 model 写在这里
class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()        
        self.model1 = nn.Sequential(
            nn.Conv2d(3,32,5,1,2),  # 输入通道3,输出通道32,卷积核尺寸5×5,步长1,填充2    
            nn.MaxPool2d(2),
            nn.Conv2d(32,32,5,1,2),
            nn.MaxPool2d(2),
            nn.Conv2d(32,64,5,1,2),
            nn.MaxPool2d(2),
            nn.Flatten(),  # 展平后变成 64*4*4 了
            nn.Linear(64*4*4,64),
            nn.Linear(64,10)
        )
        
    def forward(self, x):
        x = self.model1(x)
        return x

# 准备数据集
train_data = torchvision.datasets.CIFAR10("./dataset",train=True,transform=torchvision.transforms.ToTensor(),download=True)       
test_data = torchvision.datasets.CIFAR10("./dataset",train=False,transform=torchvision.transforms.ToTensor(),download=True)       

# length 长度
train_data_size = len(train_data)
test_data_size = len(test_data)
# 如果train_data_size=10,则打印:训练数据集的长度为:10
print("训练数据集的长度:{}".format(train_data_size))
print("测试数据集的长度:{}".format(test_data_size))

# 利用 Dataloader 来加载数据集
train_dataloader = DataLoader(train_data, batch_size=64)        
test_dataloader = DataLoader(test_data, batch_size=64)

# 创建网络模型
tudui = Tudui() 

# 损失函数
loss_fn = nn.CrossEntropyLoss() # 交叉熵,fn 是 fuction 的缩写

# 优化器
learning = 0.01  # 1e-2 就是 0.01 的意思
optimizer = torch.optim.SGD(tudui.parameters(),learning)   # 随机梯度下降优化器  

# 设置网络的一些参数
# 记录训练的次数
total_train_step = 0
# 记录测试的次数
total_test_step = 0

# 训练的轮次
epoch = 10

# 添加 tensorboard
writer = SummaryWriter("logs")

for i in range(epoch):
    print("-----第 {} 轮训练开始-----".format(i+1))
    
    # 训练步骤开始
    for data in train_dataloader:
        imgs, targets = data
        outputs = tudui(imgs)
        loss = loss_fn(outputs, targets) # 计算实际输出与目标输出的差距
        
        # 优化器对模型调优
        optimizer.zero_grad()  # 梯度清零
        loss.backward() # 反向传播,计算损失函数的梯度
        optimizer.step()   # 根据梯度,对网络的参数进行调优
        
        total_train_step = total_train_step + 1
        if total_train_step % 100 == 0:
            print("训练次数:{},Loss:{}".format(total_train_step,loss.item()))  # 方式二:获得loss值
            writer.add_scalar("train_loss",loss.item(),total_train_step)
    
    # 测试步骤开始(每一轮训练后都查看在测试数据集上的loss情况)
    total_test_loss = 0
    with torch.no_grad():  # 没有梯度计算,节约内存
        for data in test_dataloader: # 测试数据集提取数据
            imgs, targets = data
            outputs = tudui(imgs)
            loss = loss_fn(outputs, targets) # 仅data数据在网络模型上的损失
            total_test_loss = total_test_loss + loss.item() # 所有loss
    print("整体测试集上的Loss:{}".format(total_test_loss))
    writer.add_scalar("test_loss",total_test_loss,total_test_step)
    total_test_step = total_test_step + 1
        
writer.close()
Files already downloaded and verified
Files already downloaded and verified
训练数据集的长度:50000
测试数据集的长度:10000
-----第 1 轮训练开始-----
训练次数:100,Loss:2.2818071842193604
训练次数:200,Loss:2.267061471939087
训练次数:300,Loss:2.2060177326202393
训练次数:400,Loss:2.1160497665405273
训练次数:500,Loss:2.03908371925354
训练次数:600,Loss:2.0013811588287354
训练次数:700,Loss:1.971280574798584
整体测试集上的Loss:311.444508433342
-----第 2 轮训练开始-----
训练次数:800,Loss:1.8406707048416138
训练次数:900,Loss:1.835253357887268
训练次数:1000,Loss:1.9193772077560425
训练次数:1100,Loss:1.9817758798599243
训练次数:1200,Loss:1.6866414546966553
训练次数:1300,Loss:1.6833062171936035
训练次数:1400,Loss:1.7423250675201416
训练次数:1500,Loss:1.7910836935043335
整体测试集上的Loss:295.83529579639435
-----第 3 轮训练开始-----
训练次数:1600,Loss:1.7340000867843628
训练次数:1700,Loss:1.6623749732971191
训练次数:1800,Loss:1.9103188514709473
训练次数:1900,Loss:1.722930908203125
训练次数:2000,Loss:1.8943604230880737
训练次数:2100,Loss:1.4975690841674805
训练次数:2200,Loss:1.464676856994629
训练次数:2300,Loss:1.7708508968353271
整体测试集上的Loss:273.4990575313568
-----第 4 轮训练开始-----
训练次数:2400,Loss:1.7362182140350342
训练次数:2500,Loss:1.3517616987228394
训练次数:2600,Loss:1.5586233139038086
训练次数:2700,Loss:1.6879914999008179
训练次数:2800,Loss:1.469564437866211
训练次数:2900,Loss:1.5893890857696533
训练次数:3000,Loss:1.352890968322754
训练次数:3100,Loss:1.4961837530136108
整体测试集上的Loss:270.01156997680664
-----第 5 轮训练开始-----
训练次数:3200,Loss:1.3372247219085693
训练次数:3300,Loss:1.4689146280288696
训练次数:3400,Loss:1.4240412712097168
训练次数:3500,Loss:1.5419731140136719
训练次数:3600,Loss:1.5850610733032227
训练次数:3700,Loss:1.343977451324463
训练次数:3800,Loss:1.3023576736450195
训练次数:3900,Loss:1.4324713945388794
整体测试集上的Loss:257.1781986951828
-----第 6 轮训练开始-----
训练次数:4000,Loss:1.3752213716506958
训练次数:4100,Loss:1.4291632175445557
训练次数:4200,Loss:1.5042070150375366
训练次数:4300,Loss:1.1800527572631836
训练次数:4400,Loss:1.1353368759155273
训练次数:4500,Loss:1.3278626203536987
训练次数:4600,Loss:1.385879397392273
整体测试集上的Loss:243.80352401733398
-----第 7 轮训练开始-----
训练次数:4700,Loss:1.3193678855895996
训练次数:4800,Loss:1.5091830492019653
训练次数:4900,Loss:1.390406608581543
训练次数:5000,Loss:1.377677083015442
训练次数:5100,Loss:0.9832243919372559
训练次数:5200,Loss:1.306634545326233
训练次数:5300,Loss:1.2060096263885498
训练次数:5400,Loss:1.3645224571228027
整体测试集上的Loss:227.03500604629517
-----第 8 轮训练开始-----
训练次数:5500,Loss:1.2007256746292114
训练次数:5600,Loss:1.2000162601470947
训练次数:5700,Loss:1.217725157737732
训练次数:5800,Loss:1.2193546295166016
训练次数:5900,Loss:1.344832420349121
训练次数:6000,Loss:1.5032548904418945
训练次数:6100,Loss:0.9945251941680908
训练次数:6200,Loss:1.0842390060424805
整体测试集上的Loss:210.75880527496338
-----第 9 轮训练开始-----
训练次数:6300,Loss:1.3924059867858887
训练次数:6400,Loss:1.08247971534729
训练次数:6500,Loss:1.6116385459899902
训练次数:6600,Loss:1.0441133975982666
训练次数:6700,Loss:1.0808278322219849
训练次数:6800,Loss:1.1203839778900146
训练次数:6900,Loss:1.065340518951416
训练次数:7000,Loss:0.8646073341369629
整体测试集上的Loss:200.43587028980255
-----第 10 轮训练开始-----
训练次数:7100,Loss:1.2311145067214966
训练次数:7200,Loss:0.9793491363525391
训练次数:7300,Loss:1.1264833211898804
训练次数:7400,Loss:0.8558132648468018
训练次数:7500,Loss:1.1851539611816406
训练次数:7600,Loss:1.2427409887313843
训练次数:7700,Loss:0.8233367204666138
训练次数:7800,Loss:1.2412829399108887
整体测试集上的Loss:194.5557427406311

① 在 Anaconda 终端里面,激活py3.6.3环境,再输入

tensorboard --logdir=C:\Users\wangy\Desktop\03CV\logs

命令,将网址赋值浏览器的网址栏,回车,即可查看tensorboard显示日志情况。

image

image

7. 保存每一轮后参数

import torchvision
import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

# from model import * 相当于把 model中的所有内容写到这里,这里直接把 model 写在这里
class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()        
        self.model1 = nn.Sequential(
            nn.Conv2d(3,32,5,1,2),  # 输入通道3,输出通道32,卷积核尺寸5×5,步长1,填充2    
            nn.MaxPool2d(2),
            nn.Conv2d(32,32,5,1,2),
            nn.MaxPool2d(2),
            nn.Conv2d(32,64,5,1,2),
            nn.MaxPool2d(2),
            nn.Flatten(),  # 展平后变成 64*4*4 了
            nn.Linear(64*4*4,64),
            nn.Linear(64,10)
        )
        
    def forward(self, x):
        x = self.model1(x)
        return x

# 准备数据集
train_data = torchvision.datasets.CIFAR10("./dataset",train=True,transform=torchvision.transforms.ToTensor(),download=True)       
test_data = torchvision.datasets.CIFAR10("./dataset",train=False,transform=torchvision.transforms.ToTensor(),download=True)       

# length 长度
train_data_size = len(train_data)
test_data_size = len(test_data)
# 如果train_data_size=10,则打印:训练数据集的长度为:10
print("训练数据集的长度:{}".format(train_data_size))
print("测试数据集的长度:{}".format(test_data_size))

# 利用 Dataloader 来加载数据集
train_dataloader = DataLoader(train_data, batch_size=64)        
test_dataloader = DataLoader(test_data, batch_size=64)

# 创建网络模型
tudui = Tudui() 

# 损失函数
loss_fn = nn.CrossEntropyLoss() # 交叉熵,fn 是 fuction 的缩写

# 优化器
learning = 0.01  # 1e-2 就是 0.01 的意思
optimizer = torch.optim.SGD(tudui.parameters(),learning)   # 随机梯度下降优化器  

# 设置网络的一些参数
# 记录训练的次数
total_train_step = 0
# 记录测试的次数
total_test_step = 0

# 训练的轮次
epoch = 10

# 添加 tensorboard
writer = SummaryWriter("logs")

for i in range(epoch):
    print("-----第 {} 轮训练开始-----".format(i+1))
    
    # 训练步骤开始
    for data in train_dataloader:
        imgs, targets = data
        outputs = tudui(imgs)
        loss = loss_fn(outputs, targets) # 计算实际输出与目标输出的差距
        
        # 优化器对模型调优
        optimizer.zero_grad()  # 梯度清零
        loss.backward() # 反向传播,计算损失函数的梯度
        optimizer.step()   # 根据梯度,对网络的参数进行调优
        
        total_train_step = total_train_step + 1
        if total_train_step % 100 == 0:
            print("训练次数:{},Loss:{}".format(total_train_step,loss.item()))  # 方式二:获得loss值
            writer.add_scalar("train_loss",loss.item(),total_train_step)
    
    # 测试步骤开始(每一轮训练后都查看在测试数据集上的loss情况)
    total_test_loss = 0
    with torch.no_grad():  # 没有梯度了
        for data in test_dataloader: # 测试数据集提取数据
            imgs, targets = data
            outputs = tudui(imgs)
            loss = loss_fn(outputs, targets) # 仅data数据在网络模型上的损失
            total_test_loss = total_test_loss + loss.item() # 所有loss
    print("整体测试集上的Loss:{}".format(total_test_loss))
    writer.add_scalar("test_loss",total_test_loss,total_test_step)
    total_test_step = total_test_step + 1
    
    torch.save(tudui, "./model/tudui_{}.pth".format(i)) # 保存每一轮训练后的结果
    print("模型已保存")
    
writer.close()
Files already downloaded and verified
Files already downloaded and verified
训练数据集的长度:50000
测试数据集的长度:10000
-----第 1 轮训练开始-----
训练次数:100,Loss:2.296692132949829
训练次数:200,Loss:2.285885810852051
训练次数:300,Loss:2.279501438140869
训练次数:400,Loss:2.2302145957946777
训练次数:500,Loss:2.1076254844665527
训练次数:600,Loss:2.0241076946258545
训练次数:700,Loss:2.0326571464538574
整体测试集上的Loss:313.3945701122284
模型已保存
-----第 2 轮训练开始-----
训练次数:800,Loss:1.8856056928634644
训练次数:900,Loss:1.8258416652679443
训练次数:1000,Loss:1.8736964464187622
训练次数:1100,Loss:2.009686231613159
训练次数:1200,Loss:1.7110859155654907
训练次数:1300,Loss:1.639999508857727
训练次数:1400,Loss:1.7460256814956665
训练次数:1500,Loss:1.804326057434082
整体测试集上的Loss:306.9472336769104
模型已保存
-----第 3 轮训练开始-----
训练次数:1600,Loss:1.7464873790740967
训练次数:1700,Loss:1.6793572902679443
训练次数:1800,Loss:1.9503461122512817
训练次数:1900,Loss:1.7317644357681274
训练次数:2000,Loss:1.9306591749191284
训练次数:2100,Loss:1.5165047645568848
训练次数:2200,Loss:1.459275722503662
训练次数:2300,Loss:1.79405677318573
整体测试集上的Loss:263.37182998657227
模型已保存
-----第 4 轮训练开始-----
训练次数:2400,Loss:1.7481664419174194
训练次数:2500,Loss:1.3587579727172852
训练次数:2600,Loss:1.5589655637741089
训练次数:2700,Loss:1.6773592233657837
训练次数:2800,Loss:1.5090978145599365
训练次数:2900,Loss:1.539999008178711
训练次数:3000,Loss:1.354047417640686
训练次数:3100,Loss:1.4937833547592163
整体测试集上的Loss:252.46941196918488
模型已保存
-----第 5 轮训练开始-----
训练次数:3200,Loss:1.3801052570343018
训练次数:3300,Loss:1.4397848844528198
训练次数:3400,Loss:1.46108078956604
训练次数:3500,Loss:1.5322155952453613
训练次数:3600,Loss:1.566237211227417
训练次数:3700,Loss:1.3101667165756226
训练次数:3800,Loss:1.2599278688430786
训练次数:3900,Loss:1.4321829080581665
整体测试集上的Loss:243.0005919933319
模型已保存
-----第 6 轮训练开始-----
训练次数:4000,Loss:1.3768717050552368
训练次数:4100,Loss:1.4406071901321411
训练次数:4200,Loss:1.5087004899978638
训练次数:4300,Loss:1.1848419904708862
训练次数:4400,Loss:1.1364362239837646
训练次数:4500,Loss:1.3455544710159302
训练次数:4600,Loss:1.40190851688385
整体测试集上的Loss:229.64346647262573
模型已保存
-----第 7 轮训练开始-----
训练次数:4700,Loss:1.2932283878326416
训练次数:4800,Loss:1.4792245626449585
训练次数:4900,Loss:1.3620022535324097
训练次数:5000,Loss:1.3700700998306274
训练次数:5100,Loss:0.9695762991905212
训练次数:5200,Loss:1.312595009803772
训练次数:5300,Loss:1.2064651250839233
训练次数:5400,Loss:1.3512318134307861
整体测试集上的Loss:218.9336529970169
模型已保存
-----第 8 轮训练开始-----
训练次数:5500,Loss:1.1977111101150513
训练次数:5600,Loss:1.2471140623092651
训练次数:5700,Loss:1.156531810760498
训练次数:5800,Loss:1.2149838209152222
训练次数:5900,Loss:1.2761603593826294
训练次数:6000,Loss:1.495023250579834
训练次数:6100,Loss:1.0265220403671265
训练次数:6200,Loss:1.0587254762649536
整体测试集上的Loss:209.12245571613312
模型已保存
-----第 9 轮训练开始-----
训练次数:6300,Loss:1.44582200050354
训练次数:6400,Loss:1.0848979949951172
训练次数:6500,Loss:1.5730582475662231
训练次数:6600,Loss:1.0684460401535034
训练次数:6700,Loss:1.0620619058609009
训练次数:6800,Loss:1.1571838855743408
训练次数:6900,Loss:1.0781376361846924
训练次数:7000,Loss:0.8753705620765686
整体测试集上的Loss:200.97392404079437
模型已保存
-----第 10 轮训练开始-----
训练次数:7100,Loss:1.237581729888916
训练次数:7200,Loss:0.9725397229194641
训练次数:7300,Loss:1.0951743125915527
训练次数:7400,Loss:0.8216850161552429
训练次数:7500,Loss:1.2100721597671509
训练次数:7600,Loss:1.2381412982940674
训练次数:7700,Loss:0.8831480145454407
训练次数:7800,Loss:1.2118467092514038
整体测试集上的Loss:194.03061652183533
模型已保存

8. argmax作用

import torch
outputs = torch.tensor([[0.1,0.2],
                        [0.05,0.4]])
print(outputs.argmax(0))  # 竖着看,最大值的索引
print(outputs.argmax(1))  # 横着看,最大值的索引
preds = outputs.argmax(0)
targets = torch.tensor([0,1])
print((preds == targets).sum()) # 对应位置相等的个数
tensor([0, 1])
tensor([1, 1])
tensor(2)

9. 打印正确率

import torchvision
import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

# from model import * 相当于把 model中的所有内容写到这里,这里直接把 model 写在这里
class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()        
        self.model1 = nn.Sequential(
            nn.Conv2d(3,32,5,1,2),  # 输入通道3,输出通道32,卷积核尺寸5×5,步长1,填充2    
            nn.MaxPool2d(2),
            nn.Conv2d(32,32,5,1,2),
            nn.MaxPool2d(2),
            nn.Conv2d(32,64,5,1,2),
            nn.MaxPool2d(2),
            nn.Flatten(),  # 展平后变成 64*4*4 了
            nn.Linear(64*4*4,64),
            nn.Linear(64,10)
        )
        
    def forward(self, x):
        x = self.model1(x)
        return x

# 准备数据集
train_data = torchvision.datasets.CIFAR10("./dataset",train=True,transform=torchvision.transforms.ToTensor(),download=True)       
test_data = torchvision.datasets.CIFAR10("./dataset",train=False,transform=torchvision.transforms.ToTensor(),download=True)       

# length 长度
train_data_size = len(train_data)
test_data_size = len(test_data)
# 如果train_data_size=10,则打印:训练数据集的长度为:10
print("训练数据集的长度:{}".format(train_data_size))
print("测试数据集的长度:{}".format(test_data_size))

# 利用 Dataloader 来加载数据集
train_dataloader = DataLoader(train_data, batch_size=64)        
test_dataloader = DataLoader(test_data, batch_size=64)

# 创建网络模型
tudui = Tudui() 

# 损失函数
loss_fn = nn.CrossEntropyLoss() # 交叉熵,fn 是 fuction 的缩写

# 优化器
learning = 0.01  # 1e-2 就是 0.01 的意思
optimizer = torch.optim.SGD(tudui.parameters(),learning)   # 随机梯度下降优化器  

# 设置网络的一些参数
# 记录训练的次数
total_train_step = 0
# 记录测试的次数
total_test_step = 0

# 训练的轮次
epoch = 10

# 添加 tensorboard
writer = SummaryWriter("logs")

for i in range(epoch):
    print("-----第 {} 轮训练开始-----".format(i+1))
    
    # 训练步骤开始
    for data in train_dataloader:
        imgs, targets = data
        outputs = tudui(imgs)
        loss = loss_fn(outputs, targets) # 计算实际输出与目标输出的差距
        
        # 优化器对模型调优
        optimizer.zero_grad()  # 梯度清零
        loss.backward() # 反向传播,计算损失函数的梯度
        optimizer.step()   # 根据梯度,对网络的参数进行调优
        
        total_train_step = total_train_step + 1
        if total_train_step % 100 == 0:
            print("训练次数:{},Loss:{}".format(total_train_step,loss.item()))  # 方式二:获得loss值
            writer.add_scalar("train_loss",loss.item(),total_train_step)
    
    # 测试步骤开始(每一轮训练后都查看在测试数据集上的loss情况)
    total_test_loss = 0
    total_accuracy = 0
    with torch.no_grad():  # 没有梯度了
        for data in test_dataloader: # 测试数据集提取数据
            imgs, targets = data
            outputs = tudui(imgs)
            loss = loss_fn(outputs, targets) # 仅data数据在网络模型上的损失
            total_test_loss = total_test_loss + loss.item() # 所有loss
            accuracy = (outputs.argmax(1) == targets).sum()
            total_accuracy = total_accuracy + accuracy
            
    print("整体测试集上的Loss:{}".format(total_test_loss))
    print("整体测试集上的正确率:{}".format(total_accuracy/test_data_size))
    writer.add_scalar("test_loss",total_test_loss,total_test_step)
    writer.add_scalar("test_accuracy",total_accuracy/test_data_size,total_test_step)  
    total_test_step = total_test_step + 1
    
    torch.save(tudui, "./model/tudui_{}.pth".format(i)) # 保存每一轮训练后的结果
    print("模型已保存")
    
writer.close()
Files already downloaded and verified
Files already downloaded and verified
训练数据集的长度:50000
测试数据集的长度:10000
-----第 1 轮训练开始-----
训练次数:100,Loss:2.2990777492523193
训练次数:200,Loss:2.279019594192505
训练次数:300,Loss:2.274381637573242
训练次数:400,Loss:2.212763547897339
训练次数:500,Loss:2.0860657691955566
训练次数:600,Loss:2.026383399963379
训练次数:700,Loss:2.0226848125457764
整体测试集上的Loss:318.3006658554077
整体测试集上的正确率:0.26919999718666077
模型已保存
-----第 2 轮训练开始-----
训练次数:800,Loss:1.932318091392517
训练次数:900,Loss:1.8830816745758057
训练次数:1000,Loss:1.9455211162567139
训练次数:1100,Loss:1.973578929901123
训练次数:1200,Loss:1.7392346858978271
训练次数:1300,Loss:1.6807948350906372
训练次数:1400,Loss:1.7497491836547852
训练次数:1500,Loss:1.7937464714050293
整体测试集上的Loss:304.5683034658432
整体测试集上的正确率:0.30399999022483826
模型已保存
-----第 3 轮训练开始-----
训练次数:1600,Loss:1.787430763244629
训练次数:1700,Loss:1.6468796730041504
训练次数:1800,Loss:1.9685696363449097
训练次数:1900,Loss:1.7380374670028687
训练次数:2000,Loss:1.948999285697937
训练次数:2100,Loss:1.5249638557434082
训练次数:2200,Loss:1.4795399904251099
训练次数:2300,Loss:1.758912205696106
整体测试集上的Loss:263.2710431814194
整体测试集上的正确率:0.3921000063419342
模型已保存
-----第 4 轮训练开始-----
训练次数:2400,Loss:1.741290807723999
训练次数:2500,Loss:1.3643462657928467
训练次数:2600,Loss:1.574839472770691
训练次数:2700,Loss:1.720109462738037
训练次数:2800,Loss:1.4971864223480225
训练次数:2900,Loss:1.598922848701477
训练次数:3000,Loss:1.3547699451446533
训练次数:3100,Loss:1.5192795991897583
整体测试集上的Loss:258.58551633358
整体测试集上的正确率:0.4034999907016754
模型已保存
-----第 5 轮训练开始-----
训练次数:3200,Loss:1.3681577444076538
训练次数:3300,Loss:1.5126806497573853
训练次数:3400,Loss:1.496009111404419
训练次数:3500,Loss:1.5612006187438965
训练次数:3600,Loss:1.5879095792770386
训练次数:3700,Loss:1.3388112783432007
训练次数:3800,Loss:1.2843410968780518
训练次数:3900,Loss:1.4247167110443115
整体测试集上的Loss:254.64410090446472
整体测试集上的正确率:0.4162999987602234
模型已保存
-----第 6 轮训练开始-----
训练次数:4000,Loss:1.4355524778366089
训练次数:4100,Loss:1.467890739440918
训练次数:4200,Loss:1.5169270038604736
训练次数:4300,Loss:1.199944257736206
训练次数:4400,Loss:1.172995686531067
训练次数:4500,Loss:1.3407166004180908
训练次数:4600,Loss:1.417804479598999
整体测试集上的Loss:245.55124926567078
整体测试集上的正确率:0.4399000108242035
模型已保存
-----第 7 轮训练开始-----
训练次数:4700,Loss:1.3220632076263428
训练次数:4800,Loss:1.5399339199066162
训练次数:4900,Loss:1.4223642349243164
训练次数:5000,Loss:1.432321548461914
训练次数:5100,Loss:1.017971396446228
训练次数:5200,Loss:1.3004764318466187
训练次数:5300,Loss:1.2478861808776855
训练次数:5400,Loss:1.401055097579956
整体测试集上的Loss:237.160910487175
整体测试集上的正确率:0.46000000834465027
模型已保存
-----第 8 轮训练开始-----
训练次数:5500,Loss:1.2419700622558594
训练次数:5600,Loss:1.2860450744628906
训练次数:5700,Loss:1.2340316772460938
训练次数:5800,Loss:1.2950438261032104
训练次数:5900,Loss:1.3835818767547607
训练次数:6000,Loss:1.5290265083312988
训练次数:6100,Loss:1.0339772701263428
训练次数:6200,Loss:1.1568495035171509
整体测试集上的Loss:229.54707419872284
整体测试集上的正确率:0.47859999537467957
模型已保存
-----第 9 轮训练开始-----
训练次数:6300,Loss:1.4557569026947021
训练次数:6400,Loss:1.1302629709243774
训练次数:6500,Loss:1.5682624578475952
训练次数:6600,Loss:1.1243321895599365
训练次数:6700,Loss:1.1435623168945312
训练次数:6800,Loss:1.125715970993042
训练次数:6900,Loss:1.1060220003128052
训练次数:7000,Loss:0.971026599407196
整体测试集上的Loss:217.55635011196136
整体测试集上的正确率:0.5054000020027161
模型已保存
-----第 10 轮训练开始-----
训练次数:7100,Loss:1.3340775966644287
训练次数:7200,Loss:0.9736936688423157
训练次数:7300,Loss:1.099505066871643
训练次数:7400,Loss:0.8387444019317627
训练次数:7500,Loss:1.2155531644821167
训练次数:7600,Loss:1.2118051052093506
训练次数:7700,Loss:0.8927739262580872
训练次数:7800,Loss:1.2374874353408813
整体测试集上的Loss:206.82526969909668
整体测试集上的正确率:0.5320000052452087
模型已保存

10. 特殊层作用

① model.train()和model.eval()的区别主要在于Batch Normalization和Dropout两层。

② 如果模型中有BN层(Batch Normalization)和 Dropout,需要在训练时添加model.train()。model.train()是保证BN层能够用到每一批数据的均值和方差。对于Dropout,model.train()是随机取一部分网络连接来训练更新参数。

③ 不启用 Batch Normalization 和 Dropout。
如果模型中有BN层(Batch Normalization)和Dropout,在测试时添加model.eval()。model.eval()是保证BN层能够用全部训练数据的均值和方差,即测试过程中要保证BN层的均值和方差不变。对于Dropout,model.eval()是利用到了所有网络连接,即不进行随机舍弃神经元。

④ 训练完train样本后,生成的模型model要用来测试样本。在model(test)之前,需要加上model.eval(),否则的话,有输入数据,即使不训练,它也会改变权值。这是model中含有BN层和Dropout所带来的的性质。

⑤ 在做one classification的时候,训练集和测试集的样本分布是不一样的,尤其需要注意这一点。

import torchvision
import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

# from model import * 相当于把 model中的所有内容写到这里,这里直接把 model 写在这里
class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()        
        self.model1 = nn.Sequential(
            nn.Conv2d(3,32,5,1,2),  # 输入通道3,输出通道32,卷积核尺寸5×5,步长1,填充2    
            nn.MaxPool2d(2),
            nn.Conv2d(32,32,5,1,2),
            nn.MaxPool2d(2),
            nn.Conv2d(32,64,5,1,2),
            nn.MaxPool2d(2),
            nn.Flatten(),  # 展平后变成 64*4*4 了
            nn.Linear(64*4*4,64),
            nn.Linear(64,10)
        )
        
    def forward(self, x):
        x = self.model1(x)
        return x

# 准备数据集
train_data = torchvision.datasets.CIFAR10("./dataset",train=True,transform=torchvision.transforms.ToTensor(),download=True)       
test_data = torchvision.datasets.CIFAR10("./dataset",train=False,transform=torchvision.transforms.ToTensor(),download=True)       

# length 长度
train_data_size = len(train_data)
test_data_size = len(test_data)
# 如果train_data_size=10,则打印:训练数据集的长度为:10
print("训练数据集的长度:{}".format(train_data_size))
print("测试数据集的长度:{}".format(test_data_size))

# 利用 Dataloader 来加载数据集
train_dataloader = DataLoader(train_data, batch_size=64)        
test_dataloader = DataLoader(test_data, batch_size=64)

# 创建网络模型
tudui = Tudui() 

# 损失函数
loss_fn = nn.CrossEntropyLoss() # 交叉熵,fn 是 fuction 的缩写

# 优化器
learning = 0.01  # 1e-2 就是 0.01 的意思
optimizer = torch.optim.SGD(tudui.parameters(),learning)   # 随机梯度下降优化器  

# 设置网络的一些参数
# 记录训练的次数
total_train_step = 0
# 记录测试的次数
total_test_step = 0

# 训练的轮次
epoch = 10

# 添加 tensorboard
writer = SummaryWriter("logs")

for i in range(epoch):
    print("-----第 {} 轮训练开始-----".format(i+1))
    
    # 训练步骤开始
    tudui.train() # 当网络中有dropout层、batchnorm层时,这些层能起作用
    for data in train_dataloader:
        imgs, targets = data
        outputs = tudui(imgs)
        loss = loss_fn(outputs, targets) # 计算实际输出与目标输出的差距
        
        # 优化器对模型调优
        optimizer.zero_grad()  # 梯度清零
        loss.backward() # 反向传播,计算损失函数的梯度
        optimizer.step()   # 根据梯度,对网络的参数进行调优
        
        total_train_step = total_train_step + 1
        if total_train_step % 100 == 0:
            print("训练次数:{},Loss:{}".format(total_train_step,loss.item()))  # 方式二:获得loss值
            writer.add_scalar("train_loss",loss.item(),total_train_step)
    
    # 测试步骤开始(每一轮训练后都查看在测试数据集上的loss情况)
    tudui.eval()  # 当网络中有dropout层、batchnorm层时,这些层不能起作用
    total_test_loss = 0
    total_accuracy = 0
    with torch.no_grad():  # 没有梯度了
        for data in test_dataloader: # 测试数据集提取数据
            imgs, targets = data
            outputs = tudui(imgs)
            loss = loss_fn(outputs, targets) # 仅data数据在网络模型上的损失
            total_test_loss = total_test_loss + loss.item() # 所有loss
            accuracy = (outputs.argmax(1) == targets).sum()
            total_accuracy = total_accuracy + accuracy
            
    print("整体测试集上的Loss:{}".format(total_test_loss))
    print("整体测试集上的正确率:{}".format(total_accuracy/test_data_size))
    writer.add_scalar("test_loss",total_test_loss,total_test_step)
    writer.add_scalar("test_accuracy",total_accuracy/test_data_size,total_test_step)  
    total_test_step = total_test_step + 1
    
    torch.save(tudui, "./model/tudui_{}.pth".format(i)) # 保存每一轮训练后的结果
    #torch.save(tudui.state_dict(),"tudui_{}.path".format(i)) # 保存方式二         
    print("模型已保存")
    
writer.close()
Files already downloaded and verified
Files already downloaded and verified
训练数据集的长度:50000
测试数据集的长度:10000
-----第 1 轮训练开始-----
训练次数:100,Loss:2.292330265045166
训练次数:200,Loss:2.2909886837005615
训练次数:300,Loss:2.2775135040283203
训练次数:400,Loss:2.2197389602661133
训练次数:500,Loss:2.1354541778564453
训练次数:600,Loss:2.034959077835083
训练次数:700,Loss:2.0130105018615723
整体测试集上的Loss:319.69296860694885
整体测试集上的正确率:0.2678999900817871
模型已保存
-----第 2 轮训练开始-----
训练次数:800,Loss:1.8924949169158936
训练次数:900,Loss:1.8564952611923218
训练次数:1000,Loss:1.9163199663162231
训练次数:1100,Loss:1.972761631011963
训练次数:1200,Loss:1.698002815246582
训练次数:1300,Loss:1.6668578386306763
训练次数:1400,Loss:1.7467551231384277
训练次数:1500,Loss:1.8171281814575195
整体测试集上的Loss:294.6422094106674
整体测试集上的正确率:0.3321000039577484
模型已保存
-----第 3 轮训练开始-----
训练次数:1600,Loss:1.7753604650497437
训练次数:1700,Loss:1.637514591217041
训练次数:1800,Loss:1.936806559562683
训练次数:1900,Loss:1.710182785987854
训练次数:2000,Loss:1.9697281122207642
训练次数:2100,Loss:1.507324457168579
训练次数:2200,Loss:1.4598215818405151
训练次数:2300,Loss:1.8211809396743774
整体测试集上的Loss:268.6419733762741
整体测试集上的正确率:0.37880000472068787
模型已保存
-----第 4 轮训练开始-----
训练次数:2400,Loss:1.696730136871338
训练次数:2500,Loss:1.3451323509216309
训练次数:2600,Loss:1.614168643951416
训练次数:2700,Loss:1.5963644981384277
训练次数:2800,Loss:1.4918489456176758
训练次数:2900,Loss:1.6028531789779663
训练次数:3000,Loss:1.3561456203460693
训练次数:3100,Loss:1.5363717079162598
整体测试集上的Loss:260.29946398735046
整体测试集上的正确率:0.39590001106262207
模型已保存
-----第 5 轮训练开始-----
训练次数:3200,Loss:1.3781168460845947
训练次数:3300,Loss:1.4570066928863525
训练次数:3400,Loss:1.4464694261550903
训练次数:3500,Loss:1.5474085807800293
训练次数:3600,Loss:1.5136005878448486
训练次数:3700,Loss:1.3479602336883545
训练次数:3800,Loss:1.2738752365112305
训练次数:3900,Loss:1.483515977859497
整体测试集上的Loss:243.80596554279327
整体测试集上的正确率:0.4325999915599823
模型已保存
-----第 6 轮训练开始-----
训练次数:4000,Loss:1.376009464263916
训练次数:4100,Loss:1.4102662801742554
训练次数:4200,Loss:1.5586539506912231
训练次数:4300,Loss:1.202476978302002
训练次数:4400,Loss:1.0953962802886963
训练次数:4500,Loss:1.3712406158447266
训练次数:4600,Loss:1.3603018522262573
整体测试集上的Loss:229.96637046337128
整体测试集上的正确率:0.4652999937534332
模型已保存
-----第 7 轮训练开始-----
训练次数:4700,Loss:1.3486863374710083
训练次数:4800,Loss:1.4762073755264282
训练次数:4900,Loss:1.3585703372955322
训练次数:5000,Loss:1.3923097848892212
训练次数:5100,Loss:0.9942217469215393
训练次数:5200,Loss:1.3098843097686768
训练次数:5300,Loss:1.1401594877243042
训练次数:5400,Loss:1.3566023111343384
整体测试集上的Loss:217.51882588863373
整体测试集上的正确率:0.499099999666214
模型已保存
-----第 8 轮训练开始-----
训练次数:5500,Loss:1.25923752784729
训练次数:5600,Loss:1.188690185546875
训练次数:5700,Loss:1.223688006401062
训练次数:5800,Loss:1.2504695653915405
训练次数:5900,Loss:1.3733277320861816
训练次数:6000,Loss:1.5391217470169067
训练次数:6100,Loss:1.0835392475128174
训练次数:6200,Loss:1.0833714008331299
整体测试集上的Loss:206.92583346366882
整体测试集上的正确率:0.5250999927520752
模型已保存
-----第 9 轮训练开始-----
训练次数:6300,Loss:1.4169930219650269
训练次数:6400,Loss:1.0912859439849854
训练次数:6500,Loss:1.5215225219726562
训练次数:6600,Loss:1.044875979423523
训练次数:6700,Loss:1.091143012046814
训练次数:6800,Loss:1.1253798007965088
训练次数:6900,Loss:1.106575608253479
训练次数:7000,Loss:0.8582056164741516
整体测试集上的Loss:198.72406196594238
整体测试集上的正确率:0.5472999811172485
模型已保存
-----第 10 轮训练开始-----
训练次数:7100,Loss:1.2614285945892334
训练次数:7200,Loss:1.0269255638122559
训练次数:7300,Loss:1.104204773902893
训练次数:7400,Loss:0.8442661166191101
训练次数:7500,Loss:1.2488468885421753
训练次数:7600,Loss:1.2494431734085083
训练次数:7700,Loss:0.8363684415817261
训练次数:7800,Loss:1.2570160627365112
整体测试集上的Loss:193.20221555233002
整体测试集上的正确率:0.5633000135421753
模型已保存

posted @ 2025-06-07 21:59  小西贝の博客  阅读(18)  评论(0)    收藏  举报