Pytorch和CNN图像分类
Pytorch和CNN图像分类
PyTorch是一个基于Torch的Python开源机器学习库,用于自然语言处理等应用程序。它主要由Facebookd的人工智能小组开发,不仅能够 实现强大的GPU加速,同时还支持动态神经网络,这一点是现在很多主流框架如TensorFlow都不支持的。 PyTorch提供了两个高级功能:
1.具有强大的GPU加速的张量计算(如Numpy)
2.包含自动求导系统的深度神经网络。除了Facebook之外,Twitter、GMU和Salesforce等机构都采用了PyTorch。
本文使用CIFAR-10数据集进行图像分类。该数据集中的图像是彩色小图像,其中被分为了十类。一些示例图像,如下图所示:

测试GPU是否可以使用
数据集中的图像大小为32x32x3 。在训练的过程中最好使用GPU来加速。
1importtorch
2importnumpyasnp
3
4#检查是否可以利用GPU
5train_on_gpu = torch.cuda.is_available()
6
7ifnottrain_on_gpu:
8print('CUDA is not available.')
9else:
10print('CUDA is available!')
结果:
CUDA is available!
加载数据
数据下载可能会比较慢。请耐心等待。加载训练和测试数据,将训练数据分为训练集和验证集,然后为每个数据集创建DataLoader。
1fromtorchvisionimportdatasets
2importtorchvision.transformsastransforms
3fromtorch.utils.data.samplerimportSubsetRandomSampler
4
5# number of subprocesses to use for data loading
6num_workers =0
7#每批加载16张图片
8batch_size =16
9# percentage of training set to use as validation
10valid_size =0.2
11
12#将数据转换为torch.FloatTensor,并标准化。
13transform = transforms.Compose([
14transforms.ToTensor(),
15transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))
16])
17
18#选择训练集与测试集的数据
19train_data = datasets.CIFAR10('data', train=True,
20download=True, transform=transform)
21test_data = datasets.CIFAR10('data', train=False,
22download=True, transform=transform)
23
24# obtain training indices that will be used for validation
25num_train = len(train_data)
26indices = list(range(num_train))
27np.random.shuffle(indices)
28split = int(np.floor(valid_size * num_train))
29train_idx, valid_idx = indices[split:], indices[:split]
30
31# define samplers for obtaining training and validation batches
32train_sampler = SubsetRandomSampler(train_idx)
33valid_sampler = SubsetRandomSampler(valid_idx)
34
35# prepare data loaders (combine dataset and sampler)
36train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
37sampler=train_sampler, num_workers=num_workers)
38valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
39sampler=valid_sampler, num_workers=num_workers)
40test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
41num_workers=num_workers)
42
43#图像分类中10类别
44classes = ['airplane','automobile','bird','cat','deer',
45'dog','frog','horse','ship','truck']
查看训练集中的一批样本
1import matplotlib.pyplot as plt
2%matplotlib inline
3
4# helper function to un-normalize and display an image
5defimshow(img):
6img = img /2+0.5# unnormalize
7plt.imshow(np.transpose(img, (1,2,0)))# convert from Tensor image
8
9#获取一批样本
10dataiter = iter(train_loader)
11images, labels = dataiter.next()
12images = images.numpy()# convert images to numpy for display
13
14#显示图像,标题为类名
15fig = plt.figure(figsize=(25,4))
16#显示16张图片
17foridxinnp.arange(16):
18ax = fig.add_subplot(2,16/2, idx+1, xticks=[], yticks=[])
19imshow(images[idx])
20ax.set_title(classes[labels[idx]])
结果:

查看一张图像中的更多细节
在这里,进行了归一化处理。红色、绿色和蓝色(RGB)颜色通道可以被看作三个单独的灰度图像。
1rgb_img = np.squeeze(images[3])
2channels = ['red channel','green channel','blue channel']
3
4fig = plt.figure(figsize = (36,36))
5foridxinnp.arange(rgb_img.shape[0]):
6ax = fig.add_subplot(1,3, idx +1)
7img = rgb_img[idx]
8ax.imshow(img, cmap='gray')
9ax.set_title(channels[idx])
10width, height = img.shape
11thresh = img.max()/2.5
12forxinrange(width):
13foryinrange(height):
14val = round(img[x][y],2)ifimg[x][y] !=0else0
15ax.annotate(str(val), xy=(y,x),
16horizontalalignment='center',
17verticalalignment='center', size=8,
18color='white'ifimg[x][y]<threshelse'black')
结果:

定义卷积神经网络的结构
这里,将定义一个CNN的结构。将包括以下内容:
- 卷积层:可以认为是利用图像的多个滤波器(经常被称为卷积操作)进行滤波,得到图像的特征。
- 通常,我们在 PyTorch 中使用
nn.Conv2d定义卷积层,并指定以下参数:
1nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

用 3x3 窗口和步长 1 进行卷积运算
§ in_channels 是指输入深度。对于灰阶图像来说,深度 = 1
§ out_channels 是指输出深度,或你希望获得的过滤图像数量
§ kernel_size 是卷积核的大小(通常为 3,表示 3x3 核)
§ stride 和 padding 具有默认值,但是应该根据你希望输出在空间维度 x, y 里具有的大小设置它们的值。
- 池化层:这里采用的最大池化:对指定大小的窗口里的像素值最大值。
- 在 2x2 窗口里,取这四个值的最大值。
- 由于最大池化更适合发现图像边缘等重要特征,适合图像分类任务。
- 最大池化层通常位于卷积层之后,用于缩小输入的 x-y 维度 。
- 通常的“线性+dropout”层可避免过拟合,并产生输出10类别。
下图中,可以看到这是一个具有2个卷积层的神经网络。
卷积层的输出大小
要计算给定卷积层的输出大小,我们可以执行以下计算:
这里,假设输入大小为(H,W),滤波器大小为(FH,FW),输出大小为 (OH,OW),填充为P,步幅为S。此时,输出大小可通过下面公式进行计算。

例: 输入大小为(H=7,W=7),滤波器大小为(FH=3,FW=3),填充为P=0,步幅为S=1, 输出大小为 (OH=5,OW=5)。如果用 S=2,将得输出大小为 (OH=3,OW=3)。
1importtorch.nnasnn
2importtorch.nn.functionalasF
3
4#定义卷积神经网络结构
5classNet(nn.Module):
6def__init__(self):
7super(Net, self).__init__()
8#卷积层 (32x32x3的图像)
9self.conv1 = nn.Conv2d(3,16,3, padding=1)
10#卷积层(16x16x16)
11self.conv2 = nn.Conv2d(16,32,3, padding=1)
12#卷积层(8x8x32)
13self.conv3 = nn.Conv2d(32,64,3, padding=1)
14#最大池化层
15self.pool = nn.MaxPool2d(2,2)
16# linear layer (64 * 4 * 4 -> 500)
17self.fc1 = nn.Linear(64*4*4,500)
18# linear layer (500 -> 10)
19self.fc2 = nn.Linear(500,10)
20# dropout层 (p=0.3)
21self.dropout = nn.Dropout(0.3)
22
23defforward(self, x):
24# add sequence of convolutional and max pooling layers
25x = self.pool(F.relu(self.conv1(x)))
26x = self.pool(F.relu(self.conv2(x)))
27x = self.pool(F.relu(self.conv3(x)))
28# flatten image input
29x = x.view(-1,64*4*4)
30# add dropout layer
31x = self.dropout(x)
32# add 1st hidden layer, with relu activation function
33x = F.relu(self.fc1(x))
34# add dropout layer
35x = self.dropout(x)
36# add 2nd hidden layer, with relu activation function
37x = self.fc2(x)
38returnx
39
40# create a complete CNN
41model = Net()
42print(model)
43
44#使用GPU
45iftrain_on_gpu:
46model.cuda()
结果:
1Net(
2(conv1): Conv2d(3,16, kernel_size=(3,3), stride=(1,1), padding=(1,1))
3(conv2): Conv2d(16,32, kernel_size=(3,3), stride=(1,1), padding=(1,1))
4(conv3): Conv2d(32,64, kernel_size=(3,3), stride=(1,1), padding=(1,1))
5(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
6(fc1): Linear(in_features=1024, out_features=500, bias=True)
7(fc2): Linear(in_features=500, out_features=10, bias=True)
8(dropout): Dropout(p=0.3, inplace=False)
9)
选择损失函数与优化函数
1importtorch.optimasoptim
2#使用交叉熵损失函数
3criterion = nn.CrossEntropyLoss()
4#使用随机梯度下降,学习率lr=0.01
5optimizer = optim.SGD(model.parameters(), lr=0.01)
训练卷积神经网络模型
注意:训练集和验证集的损失是如何随着时间的推移而减少的;如果验证损失不断增加,则表明可能过拟合现象。(实际上,在下面的例子中,如果n_epochs设置为40,可以发现存在过拟合现象!)
1#训练模型的次数
2n_epochs =30
3
4valid_loss_min = np.Inf# track change in validation loss
5
6forepochinrange(1, n_epochs+1):
7
8# keep track of training and validation loss
9train_loss =0.0
10valid_loss =0.0
11
12###################
13#训练集的模型 #
14###################
15model.train()
16fordata, targetintrain_loader:
17# move tensors to GPU if CUDA is available
18iftrain_on_gpu:
19data, target = data.cuda(), target.cuda()
20# clear the gradients of all optimized variables
21optimizer.zero_grad()
22# forward pass: compute predicted outputs by passing inputs to the model
23output = model(data)
24# calculate the batch loss
25loss = criterion(output, target)
26# backward pass: compute gradient of the loss with respect to model parameters
27loss.backward()
28# perform a single optimization step (parameter update)
29optimizer.step()
30# update training loss
31train_loss += loss.item()*data.size(0)
32
33######################
34#验证集的模型#
35######################
36model.eval()
37fordata, targetinvalid_loader:
38# move tensors to GPU if CUDA is available
39iftrain_on_gpu:
40data, target = data.cuda(), target.cuda()
41# forward pass: compute predicted outputs by passing inputs to the model
42output = model(data)
43# calculate the batch loss
44loss = criterion(output, target)
45# update average validation loss
46valid_loss += loss.item()*data.size(0)
47
48#计算平均损失
49train_loss = train_loss/len(train_loader.sampler)
50valid_loss = valid_loss/len(valid_loader.sampler)
51
52#显示训练集与验证集的损失函数
53print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
54epoch, train_loss, valid_loss))
55
56#如果验证集损失函数减少,就保存模型。
57ifvalid_loss <= valid_loss_min:
58print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(
59valid_loss_min,
60valid_loss))
61torch.save(model.state_dict(),'model_cifar.pt')
62valid_loss_min = valid_loss
结果:
1Epoch: 1TrainingLoss: 2.065666ValidationLoss: 1.706993
2Validationlossdecreased(inf--> 1.706993).Savingmodel...
3Epoch: 2TrainingLoss: 1.609919ValidationLoss: 1.451288
4Validationlossdecreased(1.706993--> 1.451288).Savingmodel...
5Epoch: 3TrainingLoss: 1.426175ValidationLoss: 1.294594
6Validationlossdecreased(1.451288--> 1.294594).Savingmodel...
7Epoch: 4TrainingLoss: 1.307891ValidationLoss: 1.182497
8Validationlossdecreased(1.294594--> 1.182497).Savingmodel...
9Epoch: 5TrainingLoss: 1.200655ValidationLoss: 1.118825
10Validationlossdecreased(1.182497--> 1.118825).Savingmodel...
11Epoch: 6TrainingLoss: 1.115498ValidationLoss: 1.041203
12Validationlossdecreased(1.118825--> 1.041203).Savingmodel...
13Epoch: 7TrainingLoss: 1.047874ValidationLoss: 1.020686
14Validationlossdecreased(1.041203--> 1.020686).Savingmodel...
15Epoch: 8TrainingLoss: 0.991542ValidationLoss: 0.936289
16Validationlossdecreased(1.020686--> 0.936289).Savingmodel...
17Epoch: 9TrainingLoss: 0.942437ValidationLoss: 0.892730
18Validationlossdecreased(0.936289--> 0.892730).Savingmodel...
19Epoch: 10TrainingLoss: 0.894279ValidationLoss: 0.875833
20Validationlossdecreased(0.892730--> 0.875833).Savingmodel...
21Epoch: 11TrainingLoss: 0.859178ValidationLoss: 0.838847
22Validationlossdecreased(0.875833--> 0.838847).Savingmodel...
23Epoch: 12TrainingLoss: 0.822664ValidationLoss: 0.823634
24Validationlossdecreased(0.838847--> 0.823634).Savingmodel...
25Epoch: 13TrainingLoss: 0.787049ValidationLoss: 0.802566
26Validationlossdecreased(0.823634--> 0.802566).Savingmodel...
27Epoch: 14TrainingLoss: 0.749585ValidationLoss: 0.785852
28Validationlossdecreased(0.802566--> 0.785852).Savingmodel...
29Epoch: 15TrainingLoss: 0.721540ValidationLoss: 0.772729
30Validationlossdecreased(0.785852--> 0.772729).Savingmodel...
31Epoch: 16TrainingLoss: 0.689508ValidationLoss: 0.768470
32Validationlossdecreased(0.772729--> 0.768470).Savingmodel...
33Epoch: 17TrainingLoss: 0.662432ValidationLoss: 0.758518
34Validationlossdecreased(0.768470--> 0.758518).Savingmodel...
35Epoch: 18TrainingLoss: 0.632324ValidationLoss: 0.750859
36Validationlossdecreased(0.758518--> 0.750859).Savingmodel...
37Epoch: 19TrainingLoss: 0.616094ValidationLoss: 0.729692
38Validationlossdecreased(0.750859--> 0.729692).Savingmodel...
39Epoch: 20TrainingLoss: 0.588593ValidationLoss: 0.729085
40Validationlossdecreased(0.729692--> 0.729085).Savingmodel...
41Epoch: 21TrainingLoss: 0.571516ValidationLoss: 0.734009
42Epoch: 22TrainingLoss: 0.545541ValidationLoss: 0.721433
43Validationlossdecreased(0.729085--> 0.721433).Savingmodel...
44Epoch: 23TrainingLoss: 0.523696ValidationLoss: 0.720512
45Validationlossdecreased(0.721433--> 0.720512).Savingmodel...
46Epoch: 24TrainingLoss: 0.508577ValidationLoss: 0.728457
47Epoch: 25TrainingLoss: 0.483033ValidationLoss: 0.722556
48Epoch: 26TrainingLoss: 0.469563ValidationLoss: 0.742352
49Epoch: 27TrainingLoss: 0.449316ValidationLoss: 0.726019
50Epoch: 28TrainingLoss: 0.442354ValidationLoss: 0.713364
51Validationlossdecreased(0.720512--> 0.713364).Savingmodel...
52Epoch: 29TrainingLoss: 0.421807ValidationLoss: 0.718615
53Epoch: 30TrainingLoss: 0.404595ValidationLoss: 0.729914
加载模型
1model.load_state_dict(torch.load('model_cifar.pt'))
结果:
1<All keys matched successfully>
测试训练好的网络
在测试数据上测试你的训练模型!一个“好”的结果将是CNN得到大约70%,这些测试图像的准确性。
1# track test loss
2test_loss =0.0
3class_correct = list(0.foriinrange(10))
4class_total = list(0.foriinrange(10))
5
6model.eval()
7# iterate over test data
8fordata, targetintest_loader:
9# move tensors to GPU if CUDA is available
10iftrain_on_gpu:
11data, target = data.cuda(), target.cuda()
12# forward pass: compute predicted outputs by passing inputs to the model
13output = model(data)
14# calculate the batch loss
15loss = criterion(output, target)
16# update test loss
17test_loss += loss.item()*data.size(0)
18# convert output probabilities to predicted class
19_, pred = torch.max(output,1)
20# compare predictions to true label
21correct_tensor = pred.eq(target.data.view_as(pred))
22correct = np.squeeze(correct_tensor.numpy())ifnottrain_on_gpuelsenp.squeeze(correct_tensor.cpu().numpy())
23# calculate test accuracy for each object class
24foriinrange(batch_size):
25label = target.data[i]
26class_correct[label] += correct[i].item()
27class_total[label] +=1
28
29# average test loss
30test_loss = test_loss/len(test_loader.dataset)
31print('Test Loss: {:.6f}\n'.format(test_loss))
32
33foriinrange(10):
34ifclass_total[i] >0:
35print('Test Accuracy of %5s: %2d%% (%2d/%2d)'% (
36classes[i],100* class_correct[i] / class_total[i],
37np.sum(class_correct[i]), np.sum(class_total[i])))
38else:
39print('Test Accuracy of %5s: N/A (no training examples)'% (classes[i]))
40
41print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)'% (
42100.* np.sum(class_correct) / np.sum(class_total),
43np.sum(class_correct), np.sum(class_total)))
结果:
1Test Loss:0.708721
2
3Test Accuracyofairplane:82% (826/1000)
4Test Accuracyofautomobile:81% (818/1000)
5Test Accuracyofbird:65% (659/1000)
6Test Accuracyofcat:59% (590/1000)
7Test Accuracyofdeer:75% (757/1000)
8Test Accuracyofdog:56% (565/1000)
9Test Accuracyoffrog:81% (812/1000)
10Test Accuracyofhorse:82% (823/1000)
11Test Accuracyofship:86% (866/1000)
12Test Accuracyoftruck:84% (848/1000)
13
14Test Accuracy (Overall):75% (7564/10000)
显示测试样本的结果
1# obtain one batch of test images
2dataiter = iter(test_loader)
3images, labels = dataiter.next()
4images.numpy()
5
6# move model inputs to cuda, if GPU available
7iftrain_on_gpu:
8images = images.cuda()
9
10# get sample outputs
11output = model(images)
12# convert output probabilities to predicted class
13_, preds_tensor = torch.max(output,1)
14preds = np.squeeze(preds_tensor.numpy())ifnottrain_on_gpuelsenp.squeeze(preds_tensor.cpu().numpy())
15
16# plot the images in the batch, along with predicted and true labels
17fig = plt.figure(figsize=(25,4))
18foridxinnp.arange(16):
19ax = fig.add_subplot(2,16/2, idx+1, xticks=[], yticks=[])
20imshow(images.cpu()[idx])
21ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),
22color=("green"ifpreds[idx]==labels[idx].item()else"red"))
结果:

参考资料:
《吴恩达深度学习笔记》
《深度学习入门:基于Python的理论与实现》
https://pytorch.org/docs/stable/nn.html#
https://github.com/udacity/deep-learning-v2-pytorch

浙公网安备 33010602011771号