[自然语言处理先修]五、卷积神经网络
五、卷积神经网络
学习路线参考:
https://blog.51cto.com/u_15298598/3121189
https://github.com/Ailln/nlp-roadmap
https://juejin.cn/post/7113066539053482021
本节学习使用工具&阅读文章:
https://easyai.tech/ai-definition/cnn/
https://blog.csdn.net/weixin_44912159/article/details/105345760
1. 概述
卷积神经网络是神经网络的一个分支,其特色为包含卷积计算。CNN可以进行监督学习和非监督学习,有着较强的数据特征提取能力,且机器学习效果稳定,不依赖特征工程。多用于图像处理。
CNN中除了全连接层还存在着两种特有的网络层:卷积层、池化层。

2. 卷积层
用于提取输入矩阵的特征。
卷积层的计算原理是卷积核在输入矩阵上进行滑动,每滑动一次,就将滑动区域的元素和自身相乘并累加,从而计算出输出矩阵中对应位置的元素。
卷积核是可学习的参数,相当于权重。

假设输入矩阵的规模为\(m*n\),滑动步长为\(t\),卷积核规模为\(k*k\),则卷积层的输出矩阵尺寸为\((m-k+t)*(n-k+t)\)。通常滑动步长为1。
-
前向传播
假设输入集合为\(X=\{X_1,X_2,…,X_n\}\),卷积核矩阵集合为\(W=\{W_1,W_2,…,W_n\}\),卷积输出矩阵为\(S\),其规模为\(i*j\),偏置量为\(b\)。则卷积过程可以表示为:\(S=(X*W)+b=\sum^n_k(X_k*W_k)+b\)
假设卷积层具有激活函数\(\theta(x)\),则卷积层的输出结果为\(\theta(S)=\theta(\sum^n_k(X_k*W_k)+b)\)
3. 池化层
用于对信息进行抽样,简化输入数据的同时保证特征不变性。
卷积层的计算原理是池化核在输入矩阵上进行滑动,按照池化规则进行计算。通常池化核的计算方法一般有最大值池化和平均值池化两种。

4. Pytorch实现
-
数据准备(使用MNIST数据集)
import torch from torch import nn from torch.utils.data import DataLoader from torchvision import datasets from torchvision.transforms import ToTensor, Lambda, Compose import matplotlib.pyplot as plt # 载入训练集 training_data = datasets.MNIST( root="data", train=True, download=True, transform=ToTensor() ) # 载入测试集 test_data = datasets.MNIST( root="data", train=False, download=True, transform=ToTensor() )print(training_data.train_data.size()) # [60000,28,28] plt.imshow(training_data.train_data[0].numpy()) # 展示第一张图片 plt.show()
batch_size = 128 # 创建数据管道 train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True) test_dataloader = DataLoader(test_data, batch_size=batch_size) # 检查数据形状 for X, y in test_dataloader: print("Shape of X [N, C, H, W]: ", X.shape, X.dtype) print("Shape of y: ", y.shape, y.dtype) break # N: 一个batch中的data实例数量 # C: 通道数 # [H, W]: 图片的高和宽 -
网络搭建
class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.conv1 = nn.Sequential( nn.Conv2d(1, 16, kernel_size=3, padding=1), # input:1 * 28 * 28, output:16 * 28 * 28 nn.ReLU(), nn.MaxPool2d(2) # input:16 * 28 * 28, output:16 * 14 * 14 ) self.conv2 = nn.Sequential( nn.Conv2d(16, 32, kernel_size=3, padding=1), # input:16 * 14 * 14, output:32 * 14 * 14 nn.ReLU(), nn.MaxPool2d(2) # input:32 * 14 * 14, output:32 * 7 * 7 ) self.classifier = nn.Sequential( nn.Linear(32 * 7 * 7, 10) # 将输出分成10类 ) def forward(self, x): x = self.conv1(x) x = self.conv2(x) x = x.view(x.size(0), -1) # [batch, 32, 7, 7] → [batch, 32*7*7] out = self.classifier(x) return out model = CNN() print(model)CNN( (conv1): Sequential( (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (conv2): Sequential( (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (classifier): Sequential( (0): Linear(in_features=1568, out_features=10, bias=True) ) ) -
损失函数与优化器定义
loss_fn = nn.CrossEntropyLoss() # 交叉熵 optimizer = torch.optim.Adam(model.parameters()) # Adam优化 -
模型训练
# epochs: 迭代次数 epochs=10 for i in range(epochs): # 每个epoch的迭代 model.train() # 训练模式 train_loss=0 for j, (X, y) in enumerate(train_dataloader): # 每个batch的迭代 # 前向传播 pred = model(X) # 计算损失 loss = loss_fn(pred, y) train_loss += loss.item() # 反向传播 optimizer.zero_grad() loss.backward() optimizer.step() # 每100个batch输出损失值 if j % 100 == 0: loss = loss.item() print(f"epoch {i} batch {j} loss: {loss/batch_size:>7f}") # 每次迭代结束后输出测试结果 with torch.no_grad(): model.eval() # 评估模式 test_loss=0 hit=0 for (X, y) in test_dataloader: pred = model(X) test_loss += loss_fn(pred, y).item() hit += (pred.argmax(1) == y).sum().item() print(f"epoch {i}, train loss: {train_loss/len(train_dataloader.dataset):>7f} test loss: {test_loss/len(test_dataloader.dataset):>7f} accuracy: {hit/len(test_dataloader.dataset) :>7f}")-
optimizer.zero_grad()
清空历史梯度。
根据pytorch中的backward()函数的计算,当网络参量进行反馈时,梯度是被积累的而不是被替换掉。batch之间并不需要累积梯度,因此每个batch都要zero_grad。
-
loss.backward()
进行反向传播,并计算梯度。
-
optimizer.step()
优化器对权重值进行更新。
-
with torch.no_grad()
停止autograd模块的工作,以起到加速和节省显存的作用。它的作用是将该with语句包裹起来的部分停止梯度的更新。
-
-
结果
epoch 0 batch 0 loss: 0.017991 epoch 0 batch 100 loss: 0.018041 epoch 0 batch 200 loss: 0.018068 epoch 0 batch 300 loss: 0.018066 epoch 0 batch 400 loss: 0.018037 epoch 0, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000 epoch 1 batch 0 loss: 0.018085 epoch 1 batch 100 loss: 0.017887 epoch 1 batch 200 loss: 0.018049 epoch 1 batch 300 loss: 0.018109 epoch 1 batch 400 loss: 0.018097 epoch 1, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000 epoch 2 batch 0 loss: 0.017934 epoch 2 batch 100 loss: 0.017987 epoch 2 batch 200 loss: 0.018088 epoch 2 batch 300 loss: 0.017946 epoch 2 batch 400 loss: 0.018093 epoch 2, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000 epoch 3 batch 0 loss: 0.017929 epoch 3 batch 100 loss: 0.017884 epoch 3 batch 200 loss: 0.018053 epoch 3 batch 300 loss: 0.017972 epoch 3 batch 400 loss: 0.017919 epoch 3, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000 epoch 4 batch 0 loss: 0.018011 epoch 4 batch 100 loss: 0.017994 epoch 4 batch 200 loss: 0.017952 epoch 4 batch 300 loss: 0.018021 epoch 4 batch 400 loss: 0.018020 epoch 4, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000 epoch 5 batch 0 loss: 0.018102 epoch 5 batch 100 loss: 0.018018 epoch 5 batch 200 loss: 0.018042 epoch 5 batch 300 loss: 0.018090 epoch 5 batch 400 loss: 0.017981 epoch 5, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000 epoch 6 batch 0 loss: 0.017944 epoch 6 batch 100 loss: 0.017992 epoch 6 batch 200 loss: 0.018033 epoch 6 batch 300 loss: 0.018052 epoch 6 batch 400 loss: 0.018094 epoch 6, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000 epoch 7 batch 0 loss: 0.018016 epoch 7 batch 100 loss: 0.018022 epoch 7 batch 200 loss: 0.018112 epoch 7 batch 300 loss: 0.018066 epoch 7 batch 400 loss: 0.018044 epoch 7, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000 epoch 8 batch 0 loss: 0.018011 epoch 8 batch 100 loss: 0.018112 epoch 8 batch 200 loss: 0.018002 epoch 8 batch 300 loss: 0.018023 epoch 8 batch 400 loss: 0.018140 epoch 8, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000 epoch 9 batch 0 loss: 0.018014 epoch 9 batch 100 loss: 0.017930 epoch 9 batch 200 loss: 0.018003 epoch 9 batch 300 loss: 0.017927 epoch 9 batch 400 loss: 0.017920 epoch 9, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000

浙公网安备 33010602011771号