《PyTorch Tutorials》

1. 快速入门PYTORCH

1.1. 什么是PyTorch

It' s a Python-based scientific computing package targeted at two sets of audiences:

• A replacement for NumPy to use the power of GPUs.
• A deep learning research platform that provides maximum flexibility and speed.

1.1.1. 基础概念

• Tensors: similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.
import torch

x = torch.zeros(3, 2, dtype=torch.long) # torch.int64
print(x)

y = torch.randn_like(x, dtype=torch.double) # result has the same size but dtype is overrode!
print(y)

z = x.new_ones(3, 3) # result has the same dtype
print(z)

print(z.size()) # torch.Size is in fact a tuple, so it supports all tuple operations.

tensor([[0, 0],
[0, 0],
[0, 0]])
tensor([[0.1171, 2.2741],
[0.8569, 0.7953],
[1.4362, 0.4094]], dtype=torch.float64)
tensor([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
torch.Size([3, 3])

• Addition: torch.add(x, y)x + ytorch.add(x, y, out=result)y.add_(x)
x = torch.randn(3, 2, dtype=torch.double) # float64
y = torch.randn(3, 2, dtype=torch.double)

print(x + y) # also a tensor

result = torch.randn_like(x)
print(result)

print(y)

tensor([[ 0.2623,  0.3829],
[ 2.5567,  1.3920],
[ 1.3003, -0.5964]], dtype=torch.float64)
tensor([[ 0.2623,  0.3829],
[ 2.5567,  1.3920],
[ 1.3003, -0.5964]], dtype=torch.float64)
tensor([[ 0.2623,  0.3829],
[ 2.5567,  1.3920],
[ 1.3003, -0.5964]], dtype=torch.float64)
tensor([[ 0.2623,  0.3829],
[ 2.5567,  1.3920],
[ 1.3003, -0.5964]], dtype=torch.float64)

• Indexing: We can use standard Numpy-like indexing!!!
print(y[:,1])

tensor([ 0.3829,  1.3920, -0.5964], dtype=torch.float64)

• Resizing: torch.Tensor.view
x = torch.randn(2, 3)
print(x)
print(x.view(-1, 6)) # the size -1 is inferred from other dimensions

tensor([[-1.2632, -0.2648, -1.0473],
[ 1.8173,  0.0445, -1.4210]])
tensor([[-1.2632, -0.2648, -1.0473,  1.8173,  0.0445, -1.4210]])

• Get item: If you have a one element tensor, use .item() to get the value as a Python number.
x = torch.randn(1)
print(x)
print(x.item())

tensor([0.8341])
0.834109365940094


1.1.2. 与NumPy之间的桥梁

Convert a Torch Tensor to a NumPy array and vice versa.

Note:

1. The Torch Tensor and NumPy array will share their underlying memory locations.
2. All the Tensors on a CPU except a CharTensor support converting to NumPy and back.
• Torch Tensor -> NumPy Array
a = torch.randn(1)
print(a)

b = a.numpy()
print(b)

print(a)
print(b)

tensor([1.5351])
[1.5350896]
tensor([2.5351])
[2.5350895]

• NumPy Array -> Torch Tensor
import numpy as np
a = np.random.randn(1)
print(a)

b = torch.from_numpy(a)

a += 1
print(a)
print(b)

[-0.51711662]
[0.48288338]
tensor([0.4829], dtype=torch.float64)

• CUDA Tensors
x = torch.randn(1)
print(x)

device = torch.device("cuda:0") # a CUDA device object
x = x.to(device) # move it to GPU
print(x)

y = torch.randn_like(x, device=device) # directly create a tensor on GPU
print(y)

z = x + y

print(z)
print(z.to('cpu', torch.int32)) # move to CPU, and change its dtype together.

tensor([0.8053])
tensor([0.8053], device='cuda:0')
tensor([-1.4201], device='cuda:0')
tensor([-0.6148], device='cuda:0')
tensor([0], dtype=torch.int32)


The autograd package provides automatic differentiation for all operations on Tensors.

1.2.1. Tensor

If you set torch.Tensor's attribute .requires_grad as True (default is False), it starts to track all operations on it.
When you finish your computation, you can call .backward() and have all the gradients computed automatically.
The gradient for this tensor will be accumulated into .grad attribute.

To stop a tensor from tracking history, you can call .detach() to detach it from the computation history.
You can also wrap the code block in with torch.no_grad():.
It is particularly helpful when evaluating a model because the model may have trainable parameters with requires_grad=True. This may help saving memory.

Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None).

If you want to compute the derivatives, you can call .backward() on a Tensor. If Tensor is not a scalar, you need to specify a gradient argument that is a tensor of matching shape to backward().

1.0实际上是$\frac{\partial{Loss}}{\partial{Loss}}=1.0$

import torch

y = x + 2
print(y)

z = (y * y * 3).mean()
print(z)

tensor([[3., 3.],


torch.autograd is an engine for computing vector-Jacobian product. That is, given any vector $v = (v_1 v_2 \cdots v_m)^T$, compute:

$J^T \cdot v = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_m}\\ \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_m} \\ \end{matrix} \right] \cdot \left[ \begin{matrix} v_1 \\ \vdots \\ v_m \\ \end{matrix} \right]$

Why should we do that?
Because we usually compute a loss value $l$ at the end. Let's suppose $v$ to be the scalar function: $l = g(\vec{y})$, then we have:

$v = (\frac{\partial{l}}{\partial{y_1}} \cdots \frac{\partial{l}}{\partial{y_m}})^T$

then we have:

$J^T \cdot v = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_m}\\ \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_m} \\ \end{matrix} \right] \cdot \left[ \begin{matrix} \frac{\partial{l}}{\partial{y_1}} \\ \vdots \\ \frac{\partial{l}}{\partial{y_m}} \\ \end{matrix} \right] = \left[ \begin{matrix} \frac{\partial{l}}{\partial{x_1}} \\ \vdots \\ \frac{\partial{l}}{\partial{x_m}} \\ \end{matrix} \right]$

To better feed external gradients into a model that has non-scalar output, PyTorch provides vector-Jacobian product by autograd.

z.backward() # Because z contains a single scalar, it's equivalent to z.backward(torch.tensor(1.))

print(x.grad) # \partial{z}/\partial{x_i} = 1.5(x+2) = 4.5

tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])


If output is not a scalar, a vector $v$ is needed:

x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
print(z)

v = torch.tensor([[0.1,1],[10,100]],dtype=torch.float32) # shape matching!
z.backward(v)


tensor([[27., 27.],
tensor([[   1.8000,   18.0000],
[ 180.0000, 1800.0000]])


We can stop tracking history:

x = torch.ones(2, 2, requires_grad=True)

y = x + 2

z = y * y * 3

True
False


1.3. Neural Networks

Neural networks can be constructed using the torch.nn package.

1.3.1. Defind the network

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module): # nn.Module contains layers

def __init__(self):
super(Net, self).__init__() # allows you to call methods of the superclass nn.Module in your subclass Net.

self.conv1 = nn.Conv2d(1, 6, 5) # 1 input channel, 6 output channel, 5x5 kernel
self.conv2 = nn.Conv2d(6, 16, 5)

self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x): # method forward(input) that returns the output.
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)

x = F.relu(self.fc1(x.view(-1, self.num_flat_features(x))))
x = F.relu(self.fc2(x))

x = self.fc3(x)
return x

def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features

net = Net()
print(net)

Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)


You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined using autograd.

The learnable parameters of a model are returned by net.parameters():

params = list(net.parameters())
print(len(params))

10


1.3.2. Process inputs and call backward

Let's try input random $32 \times 32$ image:

input = torch.randn(1,1,32,32)
out = net(input)
print(out)

tensor([[ 0.1177,  0.0199, -0.0774,  0.0580,  0.0407,  0.0384,  0.0380, -0.1090,


We can even zero the gradient buffers of all parameters and backprops with random gradients:

net.zero_grad()
out.backward(torch.randn(1, 10))


Note: torch.nn only supports mini-batches, not a single sample. For example, nn.Conv2d will take in 4D Tensor os nSamples x nChannels x Height x Width.
You can use input.unsqueeze(0) to add a fake batch dimension for a single sample.

1.3.3. Loss function

There are several different loss functions under the nn package, e.g. nn.MSELoss:

output = net(input)
target = torch.randn(10)
target = target.view(1, -1)
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)

tensor(0.7902, grad_fn=<MseLossBackward>)


Now, if we follow loss in the backward direction using its .grad_fn attribute, we can see a graph of computations:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss

print(loss.grad_fn)

<MseLossBackward object at 0x000001C6AB532780>


1.3.4. Backprop

To backpropagate the error all we have to do is to loss.backward().

net.zero_grad() # zeroes the gradient buffers of all parameters

loss.backward()


tensor([0., 0., 0., 0., 0., 0.])
tensor([-0.0074, -0.0043,  0.0082,  0.0022, -0.0055, -0.0047])


1.3.5. Update the weights

The simple implementation is:

learning_rate = 0.1
for f in net.parameters():


However, there are various different update rules such as SGD, Adam, RMSProp, etc.
To enable this, we can use torch.optim package:

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01)

output = net(input)
loss = criterion(output, target)
loss.backward()

optimizer.step() # update


1.4. 举例：Training a Classifier

Specifically for vision, we can use torchvision that has data loaders for common datasets such as imagenet, CIFAR10, MNIST, etc. and data tranformers for images, viz., torchvision.datasets and torch.utils.data.DataLoader.

For this tutorial, we will use the CIFAR10 dataset. It has the classes: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'. The images in CIFAR10 are of size 3x32x32.

1.4.2. Training an image classifier

Load CIFAR10 and normalize its range from [0,1] to [-1,1]:

import torch
import torchvision
import torchvision.transforms as transforms

# Compose several transforms together: to tensor, normalize each channnel (totally 3) with mean 0.5 and std 0.5 (supposed to be).
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# shuffle: set to True to have the data reshuffled at every epoch (default: False).
# num_workers: how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to .\data\cifar-10-python.tar.gz

100.0%



Show some training images:

import matplotlib.pyplot as plt
import numpy as np

def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy() # Tensor -> numpy array

plt.imshow(np.transpose(npimg, (1, 2, 0))) # channel x height x width -> height x width x channel
plt.show()

images, labels = dataiter.next()

imshow(images[0])
print(labels[0],classes[labels[0]])


tensor(9) truck


Let's define a CNN:

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

net = Net()


We can move it to GPU:

device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
net.to(device)

Net(
(conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)


Define a loss function and optimizer:

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)


Training:

for epoch in range(3):

sum_loss = 0.0
max_show = 3000
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data

# send to GPU
inputs, labels = inputs.to(device), labels.to(device)

# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# print statistics
sum_loss += loss.item()
if (i+1) % max_show == 0:    # print every 3000 mini-batches
print('[%d, %5d] loss: %.3f' %
((epoch+1), (i+1), (sum_loss/max_show)))
sum_loss = 0.0

print('Finished Training')

[1,  3000] loss: 0.774
[1,  6000] loss: 0.807
[1,  9000] loss: 0.832
[1, 12000] loss: 0.844
[2,  3000] loss: 0.722
[2,  6000] loss: 0.791
[2,  9000] loss: 0.804
[2, 12000] loss: 0.820
[3,  3000] loss: 0.711
[3,  6000] loss: 0.761
[3,  9000] loss: 0.776
[3, 12000] loss: 0.786
Finished Training


Test our trained model on test data:

sum_correct = 0
sum_test = 0

images, labels = data
images, labels = images.to(device), labels.to(device)

outputs = net(images) # 4x10
_, predicted = torch.max(outputs.data, 1) # (max_value, index)

sum_correct += (predicted==labels).sum().item()
sum_test += labels.size(0)

print("Accuracy on 10000 test images: %.3f %%" % (100*sum_correct/sum_test))

Accuracy on 10000 test images: 63.140 %


1.5. Data Parallelism

We will learn how to use multiple GPUs using DataParallel.
DataParallel splits your data automatically and sends job orders to multiple models on several GPUs. After each model finishes their job, DataParallel collects and merges the results before returning it to you.

Please note that: just calling my_tensor.to(device) returns a new copy of my_tensor on GPU instead of rewriting my_tensor. You need to assign it to a new tensor and use that tensor on the GPU.

It is easy to make your model run parallelly using DataParallel:

model = nn.DataParallel(model)


Let's see an example.

### Imports and parameters
import torch
import torch.nn as nn

input_size = 5
output_size = 2

batch_size = 30
data_size = 100

device = torch.device("cuda:0")

### Dummy dataset
class RandomDataset(Dataset):

def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)

def __getitem__(self, index):
return self.data[index]

def __len__(self):
return self.len

### Simple model
class Model(nn.Module):

def __init__(self, input_size, output_size):
super(Model, self).__init__()
self.fc = nn.Linear(input_size, output_size)

def forward(self, input):
output = self.fc(input)
print("\tInside the model: input size",input.size(),"output size",output.size())

return output

### Create model and dataparallel
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
print(torch.cuda.device_count(),"GPUs are found!")
model = nn.DataParallel(model)
model.to(device)

### Run the model
batch_size=batch_size, shuffle=True)
input = data.to(device)
output = model(input)
print("Total: Input size",input.size(),"output size",output.size())

2 GPUs are found!
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Total: Input size torch.Size([30, 5]) output size torch.Size([30, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Total: Input size torch.Size([30, 5]) output size torch.Size([30, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Total: Input size torch.Size([30, 5]) output size torch.Size([30, 2])
Inside the model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Inside the model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Total: Input size torch.Size([10, 5]) output size torch.Size([10, 2])


2. 数据装载和处理

For this tutorial, we should install two packages:

• scikit-image: Image io and transforms
• pandas: Easier csv parsing

We have prepared a pose estimation database in ./data/faces. There are some human faces and their landmark points stored in .csv.
Let's read the CSV and get the annotations in an (N,2) array:

import pandas as pd
from skimage import io
import matplotlib.pyplot as plt

'''
image_name,part_0_x,part_0_y,part_1_x,part_1_y,part_2_x, ... ,part_67_x,part_67_y
0805personali01.jpg,27,83,27,98, ... 84,134
1084239450_e76e00b7e7.jpg,70,236,71,257, ... ,128,312
'''

n = 50
img_name = landmarks_list.iloc[n, 0]
landmarks = landmarks_list.iloc[n, 1:].values.astype('float').reshape(-1,2) # pandas dict -> values

def show_landmarks(image, landmarks):
'Show image with landmarks.'
plt.imshow(image)
plt.scatter(landmarks[:,0],landmarks[:,1], s=10, marker=".", c="r")
plt.pause(0.001) # pause a bit so that plots are updated

plt.figure()
img_path = "./data/faces/"+img_name
plt.show()


2.1. Dataset Class

torch.utils.data.Dataset is an abstract class representing a dataset. Our custom dataset should inherit Dataset and override the following methods:

• __len__: so that len(dataset) returns the size of the dataset.
• __getitem__: so that dataset[i] can used for indexing.

Demo:

from torch.utils.data import Dataset
import os

class FaceLandmarksDataset(Dataset):
'Face landmarks dataset.'
def __init__(self, CsvFile_path, dir_img, transform=None):
self.dir_img = dir_img
self.transform = transform

def __len__(self):
return len(self.landmarks_list)

def __getitem__(self, idx):
img_path = os.path.join(self.dir_img,
self.landmarks_list.iloc[idx, 0])
landmarks = self.landmarks_list.iloc[idx, 1:].values.astype("float").reshape(-1,2)
sample = {'image':image, 'landmarks': landmarks}

if self.transform:
sample = self.transform(sample)

return sample

### Instantiate this class and show four images.
face_landmarks = FaceLandmarksDataset(CsvFile_path='./data/faces/face_landmarks.csv', dir_img='./data/faces/')

fig = plt.figure()
for i in range(len(face_landmarks)):
sample = face_landmarks[i]
print(i, sample['image'].shape, sample['landmarks'].shape)

ax = plt.subplot(1,4,i+1)
ax.set_title('Sample #{}'.format(i))
ax.axis('off')

ax.imshow(sample['image'])
ax.scatter(sample['landmarks'][:,0],sample['landmarks'][:,1], s=10, marker=".", c="r")
#show_landmarks(**sample)

if i == 3:
plt.tight_layout()
plt.show()
break

0 (324, 215, 3) (68, 2)
1 (500, 333, 3) (68, 2)
2 (250, 258, 3) (68, 2)
3 (434, 290, 3) (68, 2)


2.2. Transforms

We want to:

• randomly crop samples.
• rescale images.
• convert the numpy images to torch images (notice: swap axes).

We also want to write them as callable classes instead of simple functions:

from skimage import transform
import numpy as np

class Rescale():
'''
Rescale the image in a sample to a given size.

Args:
output_size (tuple or int): Desired output size. If int, the smaller image edge is matched to it
and the aspect ratio remains the same.
'''
def __init__(self, output_size):
assert isinstance(output_size, (int, tuple)) # ensure that output_size is an int or a tuple.
self.output_size = output_size

def __call__(self, sample):
image, landmarks = sample['image'], sample['landmarks']

h, w = image.shape[:2]
if isinstance(self.output_size, int): # int: the length of the smaller edge
if h > w:
new_h, new_w = self.output_size * h / w, self.output_size
else:
new_h, new_w = self.output_size, self.output_size * w / h
else:
new_h, new_w = self.output_size
new_h, new_w = int(new_h), int(new_w)

image = transform.resize(image, (new_h, new_w))
landmarks = landmarks * [new_w/w, new_h/h]

return {'image':image, 'landmarks':landmarks}

class RandomCrop():
'''
Crop the image in a sample randomly.

Args:
output_size (tuple or int). If int, square crop is made.
'''
def __init__(self, output_size):
assert isinstance(output_size, (int, tuple)) # ensure that output_size is an int or a tuple.
self.output_size = output_size

def __call__(self, sample):
image, landmarks = sample['image'], sample['landmarks']

h, w = image.shape[:2]
if isinstance(self.output_size, int):
new_h, new_w = self.output_size, self.output_size
else:
new_h, new_w = self.output_size

start_h_idx = np.random.randint(0, h - new_h)
start_w_idx = np.random.randint(0, w - new_w)

image = image[start_h_idx: (start_h_idx+new_h),
start_w_idx: (start_w_idx+new_w)]
landmarks = landmarks - [start_w_idx, start_h_idx]

return {'image':image, 'landmarks':landmarks}

class ToTensor():
'''
Convert the ndarray image in a sample to a Tensor.
Notice: swap color axis because:
numpy image: H x W x C
torch image: C X H X W
'''
def __call__(self, sample):
image, landmarks = sample['image'], sample['landmarks']
image = image.transpose((2, 0, 1))
return {'image': torch.from_numpy(image),
'landmarks': torch.from_numpy(landmarks)}


We now apply our transforms on an sample:

from torchvision import transforms

scale = Rescale(256) # the length of the smaller side is 256
crop = RandomCrop(210) # crop a 128x128 img
composed_trans = transforms.Compose([scale, crop])

fig = plt.figure()
plt.tight_layout()

sample = face_landmarks[65]
transformed_sample = composed_trans(sample)

show_landmarks(**sample)
show_landmarks(**transformed_sample)

plt.show()


2.3. Iterating through the Dataset

import torch

transformed_dataset = FaceLandmarksDataset(CsvFile_path='./data/faces/face_landmarks.csv',
dir_img='./data/faces/',
transform=transforms.Compose([
Rescale(256),
RandomCrop(210),
ToTensor()
]))

for i in range(len(transformed_dataset)):
sample = transformed_dataset[i]
print(i, sample['image'].size(), sample['landmarks'].size())

if i == 4:
break

0 torch.Size([3, 210, 210]) torch.Size([68, 2])
1 torch.Size([3, 210, 210]) torch.Size([68, 2])
2 torch.Size([3, 210, 210]) torch.Size([68, 2])
3 torch.Size([3, 210, 210]) torch.Size([68, 2])
4 torch.Size([3, 210, 210]) torch.Size([68, 2])


However, we also want to:

• batch the data.
• shuffle the data.
• Load the data in parallel.

torch.utils.DataLoader is an iterator which provides all these features.

from torch.utils.data import DataLoader
from torchvision import utils

shuffle=True, num_workers=0) # Windows may error when num_workers > 0

def show_landmarks_batch(sample_batch):
'Show images with landmarks for a batch of samples.'
image_batch, landmarks_batch = sample_batch['image'], sample_batch['landmarks']

batch_size = len(image_batch)
im_size = image_batch.size(2)

grid = utils.make_grid(image_batch)
plt.imshow(grid.numpy().transpose((1,2,0))) # Tensors -> ndarrays -> CxHxW to HxWxC

for i in range(batch_size):
plt.scatter(landmarks_batch[i,:,0].numpy()+im_size*i,
landmarks_batch[i,:,1].numpy(),
s=10, marker='.', c='r')

ite_batch = 3
print(sample_batch['image'].size(),
sample_batch['landmarks'].size())
if ite == ite_batch:
plt.figure()
show_landmarks_batch(sample_batch)
plt.axis('off')
plt.show()
break

torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])


2.4. Torchvision

torchvision package provides some common datasets and transforms.

We might not even have to write custom classes. One of the more generic datasets available in torchvision is ImageFolder. It assumes that images are organized in the following way:

root/ants/xxx.png
root/ants/xxy.jpeg
root/ants/xxz.png
.
.
.
root/bees/123.jpg
root/bees/nsdf3.png
root/bees/asd932_.png


where ants and bees are class labels.

Besides, generic transforms in PIL.Image like RandomHorizontalFlip, Scale are also available.

import torch
from torchvision import transforms, datasets

data_transform = transforms.Compose([
transforms.RandomSizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std = [0.229, 0.224, 0.225])
])

hymenoptera_dataset = datasets.ImageFolder(root='hymenoptera_data/train',
transform=data_transform)

batch_size=4,shuffle=True,num_workers=0)


3. 温习和拓展：PyTorch好在哪

• A replacement for NumPy to use the power of GPUs.
• A deep learning research platform that provides maximum flexibility and speed.

1. 用GPU承载张量运算。
2. 提供深度学习所需的其他功能。

• An n-dimensional Tensor, similar to numpy but can run on GPUs.
• Automatic differentiation for building and training neural networks.

1. GPU能提供50倍甚至更多的运算加速。
2. 现今深度学习方法仍然离不开BP方法，因此差分法求梯度是不可或缺的。其中自动差分技术是被广泛使用的。

Tensor在概念上和NumPy的array本质上是一致的，但Tensor的功能更全面

1. Tensor携带着运算图（computational graph）和梯度信息，并且可以保持追踪状态；运算图上的节点就是Tensor，边缘（edges）是函数（functions）。
2. Tensor可以使用GPU完成数值计算。

import torch

### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-6
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Initialize weight Tensors randomly
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

### Iterations
for ite in range(1,total_ite+1):

y_pred = x.mm(w1).clamp(min=0).mm(w2) # clamp acts as relu function
loss = (y_pred -y ).pow(2).sum()
if ite % 100 == 0:
print(ite, loss.item())

loss.backward()

# Manually update weights
# Weights have requires_grad=True, but we don't need tracking.

# Maunally zero the gradients after updating weights

100 616.6424560546875
200 5.097920894622803
300 0.06861867755651474
400 0.0014389019925147295
500 0.0001344898482784629


TensorFlow和PyTorch最大的不同是：

• TensorFlow的运算图（computational graphs）是静态的（static）：当定义好后，我们可以多次使用相同的运算图，只有输入数据可以不同。
• PyTorch的运算图是动态的（dynamic）：每次前向传递（forward pass）时，运算图可以是全新的

3.2. 简化操作：nn Module

PyTorch提供了一些模块来解决这些问题。

import torch

### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4 # bigger!
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Define network model by nn package
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out)
)
model = model.to(device)

### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')

for ite in range(1, total_ite+1):

y_pred = model(x)
loss = loss_fn(y_pred, y)
if ite % 100 == 0:
print(ite, loss.item())

loss.backward()

# Manually update weights

for param in model.parameters():

100 2.5298514366149902
200 0.04136687144637108
300 0.0011623052414506674
400 4.448959225555882e-05
500 2.1180185285629705e-06


import torch

### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4 # bigger!
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Define network model by nn package
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out)
)
model = model.to(device)

### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')

### Define optimizer

for ite in range(1, total_ite+1):

y_pred = model(x)
loss = loss_fn(y_pred, y)
if ite % 100 == 0:
print(ite, loss.item())

loss.backward()

# Update parameters
optimizer.step()

100 65.00260925292969
200 1.0924508571624756
300 0.006899723317474127
400 5.2772647904930636e-05
500 1.6419755866081687e-07


nn包中提供的网络组件是很基本的。如果我们的网络很复杂，那么我们还可以自定义复杂网络

import torch

class TwoLayerNet(torch.nn.Module):

def __init__(self, D_in, H, D_out):

super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)

def forward(self, x):

h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
return y_pred

### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4 # bigger!
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Define network model by nn package
model = TwoLayerNet(D_in, H, D_out)
model = model.to(device)

### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')

### Define optimizer

for ite in range(1, total_ite+1):

y_pred = model(x)
loss = loss_fn(y_pred, y)
if ite % 100 == 0:
print(ite, loss.item())

loss.backward()
optimizer.step()

100 71.93133544921875
200 1.759734869003296
300 0.012220818549394608
400 0.0002741872740443796
500 1.9429817257332616e-05


3.3. 动态优势：Control Flow + Weight Sharing of PyTorch

• 每次前向传播时，隐藏层数目是随机的，可能是1，2，3或4；
• 隐藏层的参数是共享的。
import torch
import random

class DynamicNet(torch.nn.Module):

def __init__(self, D_in, H, D_out):

super(DynamicNet, self).__init__()
self.input_linear = torch.nn.Linear(D_in, H)
self.middle_linear = torch.nn.Linear(H, H)
self.output_linear = torch.nn.Linear(H, D_out)

def forward(self, x):

h_relu = self.input_linear(x).clamp(min=0)
rand_num = random.randint(0, 3)
for _ in range(rand_num): # 1 layer, 2 layers, 3 layers or 4 layers
h_relu = self.middle_linear(h_relu).clamp(min=0)
y_pred = self.output_linear(h_relu)
return y_pred, rand_num

### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4
momentum = 0.9
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Define network model by nn package
model = DynamicNet(D_in, H, D_out)
model = model.to(device)

### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')

### Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)

for ite in range(1, total_ite+1):

y_pred, rand_num = model(x)
loss = loss_fn(y_pred, y)
if ite % 100 == 0:
print(ite, loss.item(), rand_num)

loss.backward()
optimizer.step()

100 13.738285064697266 0
200 4.243963718414307 2
300 0.7942993640899658 1
400 0.43234169483184814 3
500 0.42137715220451355 2


4. 迁移学习（TRANSFER LEARNING）

1. Finetune整个网络：迁移网络的所有参数都不是冻结的。
2. 冻结迁移网络：只有新增的或后几层网络是可训练的。

4.1. 装载数据

4.1.1. 构建数据装载器

import torch
from torchvision import datasets, transforms
import os

device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")

data_dir = "./hymenoptera_data"
data_transform = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
}

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
data_transform[x])
for x in ['train', 'val']}
batch_size=4,
shuffle=True,
num_workers=4)
for x in ['train', 'val']}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes


4.1.2. 展示一些训练样本

import numpy as np
import matplotlib.pyplot as plt
import torchvision

plt.ion() # interactive mode

def imshow(input, title=None):
'Imshow for Tensors.'
input = input.numpy().transpose((1,2,0))
means = np.array([0.485, 0.456, 0.406])
stds = np.array([0.229, 0.224, 0.225])
input = input * stds + means
input = np.clip(input, 0, 1)
plt.imshow(input)
if title is not None:
plt.title(title)
plt.pause(0.001) # pause a bit so that plots can be updated

input = torchvision.utils.make_grid(inputs)
imshow(input, title=[class_names[x] for x in classes])


4.2. 训练

1. 调整学习率：通过torch.optim.lr_scheduler，基于epoches数来自行调整scheduler
2. 保存最佳模型。
import torch.optim as optim
from torch.optim import lr_scheduler
import time
import copy
from torchvision import models
import torch.nn as nn

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
'Finetune model and return the best model.'
time_start = time.time()

# Init
best_model = copy.deepcopy(model.state_dict())
best_acc = 0.0

# Train and eval epoch by epoch
for epoch in range(num_epochs):

for phase in ['train', 'val']:
if phase == 'train':
scheduler.step()
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode

# Run model
running_loss = 0.0 # Init for accumulation
running_corrects = 0

# Move to GPU
inputs = inputs.to(device)
labels = labels.to(device)

# Forward
outputs = model(inputs)
loss = criterion(outputs, labels)
_, preds = torch.max(outputs, 1)

if phase == 'train':
loss.backward()
optimizer.step()

# Accumulation
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)

# Evaluation
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]

if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model = copy.deepcopy(model.state_dict())

if ((epoch + 1) % 5 == 0) or (epoch == num_epochs):
if phase == 'train':
print('Epoch {}/{}'.format(epoch+1, num_epochs))
print('{:5s} loss: {:.3f} acc: {:.3f}'.format(phase, epoch_loss, epoch_acc))
if phase == 'val':
print('')

time_elapsed = time.time() - time_start
print('Training complete in {:.0f}m {:.0f}s'.format((time_elapsed // 60),
(time_elapsed % 60)))
print('Best val acc: {:.3f}'.format(best_acc))

return model

### Finetune the convnet
model_ft = models.resnet18(pretrained=True) # Returns a model pre-trained on ImageNet
num_fc_in = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_fc_in, 2)

model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1) # Decay LR by a factor of 0.1 every 7 epochs
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=25)

Epoch 5/25
train loss: 0.348 acc: 0.857
val   loss: 0.261 acc: 0.902

Epoch 10/25
train loss: 0.339 acc: 0.861
val   loss: 0.193 acc: 0.928

Epoch 15/25
train loss: 0.285 acc: 0.885
val   loss: 0.198 acc: 0.915

Epoch 20/25
train loss: 0.255 acc: 0.893
val   loss: 0.177 acc: 0.922

Epoch 25/25
train loss: 0.250 acc: 0.893
val   loss: 0.268 acc: 0.908

Training complete in 2m 41s
Best val acc: 0.928


1. torchvision.models中有大量网络模型，包括VGG，DenseNet，ResNet等。

2. 进一步，我们还可以得到在ImageNet上预训练好的模型。只需要设置pretrained=True即可得到，模型会下载、保存至torch.utils.model_zoo规定的路径下。

3. 一些网络的训练和测试行为是不同的，例如存在BN的网络。因此需要调用model.train()model.eval()来切换。

4. 所有的预训练模型都要求输入RGB图像（维度：3xHxW）的长和宽不小于224，并且要经过正则化：mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225]。一般是借助变换实现：

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])

1. Tensor.data()是已经被抛弃的方法，用于产生一个与原张量一样的张量，不篡改原张量的计算历史。
现在建议都用.detach()代替，更安全。

4.3. 冻结前几层

1. 将预训练网络中所有参数的梯度追踪关闭，因为不参与迭代；
2. 新建一个FC层，因为输出只有两类（蚂蚁和蜜蜂）；
3. 优化器只对FC层参数进行优化。
model_conv = torchvision.models.resnet18(pretrained=True)

for param in model_conv.parameters():

num_fc_in = model_ft.fc.in_features
model_conv.fc = nn.Linear(num_fc_in, 2)

model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9) # Notice params!!!
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
model_conv = train_model(model_conv, criterion, optimizer_conv, exp_lr_scheduler,
num_epochs=25)

Epoch 5/25
train loss: 0.448 acc: 0.803
val   loss: 0.240 acc: 0.902

Epoch 10/25
train loss: 0.304 acc: 0.881
val   loss: 0.184 acc: 0.941

Epoch 15/25
train loss: 0.341 acc: 0.852
val   loss: 0.185 acc: 0.941

Epoch 20/25
train loss: 0.412 acc: 0.820
val   loss: 0.180 acc: 0.941

Epoch 25/25
train loss: 0.294 acc: 0.902
val   loss: 0.187 acc: 0.941

Training complete in 2m 11s
Best val acc: 0.948


5. 保存和加载模型

1. torch.save：将一个序列化的对象保存在本地磁盘上。序列化过程借助Python的pickle模块实现。无论是模型、张量还是词典型对象都可以保存。
2. torch.load：使用pickle模块执行解序列化（deserialize）操作，将对象加载到内存中。
3. torch.nn.Module.load_state_dict：借助一个解序列的state_dict，加载模型参数。

5.1. 什么是state_dict？

state_dict是一个简单的Python字典对象，将模型的每一层映射为字典内的一个张量。

import torch.nn as nn
import torch.nn.functional as F

# Define model
class model_class(nn.Module):
def __init__(self):
super(model_class, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

# Initialize model
model = model_class()

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Print model's state_dict
print("Model's state_dict:")
for param_tensor in model.state_dict():
print(param_tensor, "\t", model.state_dict()[param_tensor].size())

# Print optimizer's state_dict
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
print(var_name, "\t", optimizer.state_dict()[var_name])

Model's state_dict:
conv1.weight 	 torch.Size([6, 3, 5, 5])
conv1.bias 	 torch.Size([6])
conv2.weight 	 torch.Size([16, 6, 5, 5])
conv2.bias 	 torch.Size([16])
fc1.weight 	 torch.Size([120, 400])
fc1.bias 	 torch.Size([120])
fc2.weight 	 torch.Size([84, 120])
fc2.bias 	 torch.Size([84])
fc3.weight 	 torch.Size([10, 84])
fc3.bias 	 torch.Size([10])
Optimizer's state_dict:
state 	 {}
param_groups 	 [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [1831247242080, 1831247241936, 1831247242296, 1831247241864, 1831247242800, 1831247243232, 1831247242872, 1831247242584, 1831247243160, 1831247242656]}]


5.2. 保存和加载模型

5.2.1. 保存和加载state_dict（推荐）

torch.save(model.state_dict(), PATH) # Save

model.eval()


1. 加载完如果要测试，请务必设置为eval模式。因为网络中可能存在dropout和BN等结构。若遗漏，那么结果可能不稳定。
2. 不能直接通过model.load_state_dict(PATH)进行加载，而需要先借助torch.load解序列。因为.load_state_dict函数的输入必须是一个字典对象。

5.2.2. 保存和加载整个模型

torch.save(model, PATH) # Save



5.3. 保存和加载特定节点的模型

### Save
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss
}, PATH)

model = model_class()
optimizer = optimizer_class()

checkpoint = torch.load(PATH) # just a dict
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()


5.4. 在一个文件中保存多个模型

### Save
torch.save({
'modelA_state_dict': modelA.state_dict(),
'modelB_state_dict': modelB.state_dict(),
'optimizerA_state_dict': optimizerA.state_dict(),
'optimizerB_state_dict': optimizerB.state_dict(),
...
}, PATH)

modelA = ModelAClass(*args, **kwargs)
modelB = ModelBClass(*args, **kwargs)
optimizerA = TheOptimizerAClass(*args, **kwargs)
optimizerB = TheOptimizerBClass(*args, **kwargs)

modelA.eval()
modelB.eval()
# - or -
modelA.train()
modelB.train()


5.5. 转移部分参数

model.load_state_dict(torch.load(PATH), strict=False)


5.6. 在不同设备上存储/读取

5.6.1. 在GPU和CPU之间

torch.save(model.state_dict(), PATH) # Save

model = TheModelClass(*args, **kwargs)

# Save on GPU, load on CPU
device = torch.device('cpu')

# GPU, GPU
device = torch.device("cuda")
model.to(device)

# Save on CPU, load on GPU
device = torch.device("cuda")
model.to(device)


1. 只要设备不同，就需要设置torch.load的参数map_location来完成映射。
2. 如果模型要在GPU上跑，一定要将模型的参数转化为CUDA张量，就地更改。
3. 输入数据也要转移到GPU上。但对张量而言，tensor.to(device)并非就地更改，而是创建新的张量。因此一定要赋值：new_t = tensor.to(torch.device('cuda'))

5.6.2. 保存torch.nn.DataParallel模型

torch.save(model.module.state_dict(), PATH) # Save


torch.nn.DataParallel是一个模型装饰器（model wrapper），可以运行并行GPU运算。

6. 究竟什么是torch.nn？

PyTorch提供了精美的模块和类来帮助我们构建神经网络，如torch.nntorch.optimDatasetDataLoader

6.1. MNIST数据准备

if not os.path.exists(directory):
os.makedirs(directory)


Python 3.5+提供了更加安全的pathlibmkdir

from pathlib import Path
path_data = Path("data")
path_mnist = path_data / "mnist"
path_mnist.mkdir(parents=True, exist_ok=True) # 若父目录不存在，则创建之；若目录已存在，则不创建且不报错。

import requests
url = "http://deeplearning.net/data/mnist/"
filename = "mnist.pkl.gz"
if not (path_mnist / filename).exists():
content = requests.get(url + filename).content
(path_mnist / filename).open("wb").write(content)


1. 通过gzip.open读取压缩包。
2. 通过pickle.load()解序列。
import pickle
import gzip

with gzip.open((path_mnist / filename).as_posix(), "rb") as f:
((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")


from matplotlib import pyplot
import numpy as np

pyplot.imshow(x_train[0].reshape((28, 28)), cmap='gray')
print(x_train.shape)

(50000, 784)


import torch
x_train, y_train, x_valid, y_valid = map(
torch.tensor, (x_train, y_train, x_valid, y_valid)
)
print(x_train.shape, y_train.shape)
print(x_valid.shape, y_valid.shape)

torch.Size([50000, 784]) torch.Size([50000])
torch.Size([10000, 784]) torch.Size([10000])


6.2. 用PyTorch（无torch.nn）实现一个神经网络

import math

weights = torch.randn(784, 10)/ math.sqrt(784)


def log_softmax(x):
return x - x.exp().sum(-1).log().unsqueeze(-1)

def model(xb):
return log_softmax(xb @ weights + bias) # @ stands for dot product operation


bs = 64
xb = x_train[0:bs]
preds = model(xb)
print(preds[0], preds.shape)

tensor([-2.9072, -2.0663, -3.1907, -2.5540, -1.8701, -1.6411, -2.5566, -2.3751,


def nll(input, target):
return -input[range(target.shape[0]), target].mean()

loss_func = nll

# e.g.
yb = y_train[0:bs]
print(loss_func(preds, yb))

tensor(2.3465, grad_fn=<NegBackward>)


def accuracy(out, yb):
preds = torch.argmax(out, dim=1)
return (preds == yb).float().mean()

print(accuracy(preds, yb))

tensor(0.1406)


from IPython.core.debugger import set_trace

lr = 0.5
epochs = 2

n = x_train.shape[0]

for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
# set_trace()
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i: end_i]
yb = y_train[start_i: end_i]

pred = model(xb)
loss = loss_func(pred, yb)

loss.backward()

# Test
pred = model(xb)
print(loss_func(pred, yb), accuracy(pred, yb))

tensor(0.0852, grad_fn=<NegBackward>) tensor(1.)


6.3. 使用torch.nn.functional提供的函数

import torch.nn.functional as F

loss_func = F.cross_entropy

def model(xb):
return xb @ weights + bias

pred = model(xb)
print(loss_func(pred, yb), accuracy(pred, yb))

tensor(0.0852, grad_fn=<NllLossBackward>) tensor(1.)


6.4. 使用nn.Module提供的神经网络类

PyTorch提供了nn.Module类来构建我们的神经网络。我们只需要继承该类的同时新建我们自己的网络类。

from torch import nn

class Mnist_Logistic(nn.Module):

def __init__(self):
super().__init__()
self.weights = nn.Parameter(torch.randn(784, 10)/ math.sqrt(784))
self.bias = nn.Parameter(torch.zeros(10))

def forward(self, xb):
return xb @ self.weights + self.bias

model = Mnist_Logistic()


pred = model(xb)
print(loss_func(pred, yb))

tensor(2.2320, grad_fn=<NllLossBackward>)


with torch.no_grad():

for p in model.parameters():



def fit():
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i: end_i]
yb = y_train[start_i: end_i]

pred = model(xb)
loss = loss_func(pred, yb)

loss.backward()
for p in model.parameters():

fit()

pred = model(xb)
print(loss_func(pred, yb))

tensor(0.0805, grad_fn=<NllLossBackward>)


6.5. 使用nn.Linear简化函数定义

class Mnist_Logistic(nn.Module):

def __init__(self):
super().__init__()
self.lin = nn.Linear(784, 10)

def forward(self, xb):
return self.lin(xb)

model = Mnist_Logistic()

fit()

pred = model(xb)
print(loss_func(pred, yb))

tensor(0.0811, grad_fn=<NllLossBackward>)


6.6. 使用optim简化优化定义

from torch import optim

model = Mnist_Logistic()
opt = optim.SGD(model.parameters(), lr=lr)

pred = model(xb)
print(loss_func(pred, yb))

for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]

pred = model(xb)
loss = loss_func(pred, yb)

loss.backward()

opt.step() # replace the loop for updating parameters

pred = model(xb)
print(loss_func(pred, yb))

tensor(2.3233, grad_fn=<NllLossBackward>)


6.7. 使用Dataset简化数据获取

PyTorch提供了TensorDataset函数。我们的数据集将被其装饰为一个新的张量TensorDataset，可以迭代，也可以在其第一个维度上切片（slice）。

from torch.utils.data import TensorDataset

train_ds = TensorDataset(x_train, y_train)


model = Mnist_Logistic()
opt = optim.SGD(model.parameters(), lr=lr)

pred = model(xb)
print(loss_func(pred, yb))

for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs
end_i = start_i + bs
xb, yb = train_ds[start_i:end_i]

pred = model(xb)
loss = loss_func(pred, yb)

loss.backward()

opt.step() # replace the loop for updating parameters

pred = model(xb)
print(loss_func(pred, yb))

tensor(2.3649, grad_fn=<NllLossBackward>)


6.8. 使用DataLoader管理batch

from torch.utils.data import DataLoader

train_ds = TensorDataset(x_train, y_train)

model = Mnist_Logistic()
opt = optim.SGD(model.parameters(), lr=lr)

pred = model(xb)
print(loss_func(pred, yb))

for epoch in range(epochs):
for xb, yb in train_dl:

pred = model(xb)
loss = loss_func(pred, yb)

loss.backward()

opt.step() # replace the loop for updating parameters

pred = model(xb)
print(loss_func(pred, yb))

tensor(2.2478, grad_fn=<NllLossBackward>)


6.9. 加入验证集（validation）及打乱训练集

train_ds = TensorDataset(x_train, y_train)

valid_ds = TensorDataset(x_valid, y_valid)


model = Mnist_Logistic()
opt = optim.SGD(model.parameters(), lr=lr)

pred = model(xb)
print(loss_func(pred, yb))

for epoch in range(epochs):

model.train()
for xb, yb in train_dl:

pred = model(xb)
loss = loss_func(pred, yb)

loss.backward()

opt.step() # replace the loop for updating parameters

model.eval()
xb, yb = valid_ds[:]
valid_pred = model(xb)
valid_loss = loss_func(valid_pred, yb)

print(epoch, valid_loss)

tensor(2.3226, grad_fn=<NllLossBackward>)
0 tensor(0.5259)
1 tensor(0.2907)


6.10. 简化为3行语句

1. 得到DatasetDataLoader
2. 建立模型model和优化器opt
3. 训练模型。

import numpy as np

def get_data(x_train, y_train, bs):
# Returns training dataloader and a validation dataset.
train_ds = TensorDataset(x_train, y_train)

valid_ds = TensorDataset(x_valid, y_valid)

return train_dl, valid_ds

class Mnist_Logistic(nn.Module):
# Defines our model.
def __init__(self):
super().__init__()
self.lin = nn.Linear(784, 10)

def forward(self, xb):
return self.lin(xb)

def get_model():
# Returns a model and optimizer.
model = Mnist_Logistic()
opt = optim.SGD(model.parameters(), lr=lr)
return model, opt

loss_func = F.cross_entropy

def fit(epochs, model, loss_func, opt, train_dl, valid_ds):
# Defines the training process
for epoch in range(epochs):
model.train()
for xb, yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step() # replace the loop for updating parameters

model.eval()
xb, yb = valid_ds[:]
valid_pred = model(xb)
valid_loss = loss_func(valid_pred, yb)

print(epoch, valid_loss)


train_dl, valid_ds = get_data(x_train, y_train, bs)

model, opt = get_model()

fit(epochs, model, loss_func, opt, train_dl, valid_ds)

0 tensor(0.3271)
1 tensor(0.2750)


6.11. 升级：CNN

class Mnist_CNN(nn.Module):
# Defines a 3-layers CNN.
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)

def forward(self, xb):
xb = xb.view(-1, 1, 28, 28) # Batch_size * 1 * 28 * 28
xb = F.relu(self.conv1(xb)) #
xb = F.relu(self.conv2(xb))
xb = F.relu(self.conv3(xb))
xb = F.avg_pool2d(xb, 4)
xb = xb.view(-1, xb.size(1))
return xb

def get_model():
# Returns a model and optimizer.
model = Mnist_CNN()
opt = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
return model, opt


model, opt = get_model()

fit(epochs, model, loss_func, opt, train_dl, valid_ds)

0 tensor(0.3839)
1 tensor(0.2570)


6.12. 更简单地创建网络：nn.Sequential

class Lambda(nn.Module):
def __init__(self, func):
super().__init__()
self.func = func

def forward(self, x):
return self.func(x)

def preprocess(x):
return x.view(-1, 1, 28, 28)


lambda x: x.view(-1, 1, 28, 28)

<function __main__.<lambda>(x)>


model = nn.Sequential(
Lambda(preprocess),
nn.ReLU(),
nn.ReLU(),
nn.ReLU(),
nn.AvgPool2d(4),
Lambda(lambda x: x.view(x.size(0), -1)),
)

opt = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)

fit(epochs, model, loss_func, opt, train_dl, valid_ds)

0 tensor(0.3162)
1 tensor(0.2278)


6.13. 数据预处理

1. 输入必须是 $28x28$的向量或图像，因为preprocess过程严格被定义。
2. 最后的池化操作$4x4$的，因为池化函数的尺寸定义为4。

6.14. 使用GPU！

print(torch.cuda.is_available())

True


dev = torch.device("cuda:1") if torch.cuda.is_available() else torch.device("cpu")


def get_data(x_train, y_train, bs):
# Returns training dataloader and a validation dataset.
train_ds = TensorDataset(x_train, y_train)

valid_ds = TensorDataset(x_valid, y_valid)

return train_dl, valid_ds

class Mnist_Logistic(nn.Module):
# Defines our model.
def __init__(self):
super().__init__()
self.lin = nn.Linear(784, 10)

def forward(self, xb):
return self.lin(xb)

def get_model():
# Returns a model and optimizer.
model = Mnist_Logistic()
model.to(dev)
opt = optim.SGD(model.parameters(), lr=lr)
return model, opt

loss_func = F.cross_entropy

def fit(epochs, model, loss_func, opt, train_dl, valid_ds):
# Defines the training process
for epoch in range(epochs):
model.train()
for xb, yb in train_dl:
xb, yb = xb.to(dev), yb.to(dev)
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step() # replace the loop for updating parameters

model.eval()
xb, yb = valid_ds[:]
xb, yb = xb.to(dev), yb.to(dev)
valid_pred = model(xb)
valid_loss = loss_func(valid_pred, yb)

print(epoch, valid_loss)

train_dl, valid_ds = get_data(x_train, y_train, bs)

model, opt = get_model()

fit(epochs, model, loss_func, opt, train_dl, valid_ds)

0 tensor(0.4048, device='cuda:1')
1 tensor(0.3062, device='cuda:1')
posted @ 2019-05-05 17:15  RyanXing  阅读(...)  评论(...编辑  收藏