Pytorch之Tensor学习

Pytorch之Tensor学习

Tensors是与数组和矩阵类似的数据结构,比如它与numpy 的ndarray类似,但tensors可以在GPU上运行。实际上,tensors和numpy数组经常共用内存,消除了拷贝数据的需要。Tensors被优化的可以自动求微分。

import torch
import numpy as np

初始化Tensor

  • 直接从数据
data=[[1,2],[3,4]]
x_data=torch.tensor(data)
x_data

tensor([[1, 2],
[3, 4]])

  • 从numpy数组
np_array=np.array(data)
x_np=torch.tensor(np_array)
x_np

tensor([[1, 2],
[3, 4]], dtype=torch.int32)

x_np=torch.from_numpy(np_array)
x_np

tensor([[1, 2],
[3, 4]], dtype=torch.int32)

  • 从另一个tensor

新tensor与参数tensor相比,保留了其特性(shape,datatype)等,除非显式的替换:

x_ones=torch.ones_like(x_data);x_ones

tensor([[1, 1],
[1, 1]])

x_rand=torch.rand_like(x_data,dtype=torch.float);x_rand

tensor([[0.1462, 0.1567],
[0.6331, 0.8472]])

  • 随机或者恒定值

shape是tensor维度的元组

shape=(2,3)
rand_tensor=torch.rand(shape)
ones_tensor=torch.ones(shape)
zeros_tensor=torch.zeros(shape)
print(rand_tensor)
print(ones_tensor)
print(zeros_tensor)

tensor([[0.4811, 0.5744, 0.8909],
[0.6602, 0.9882, 0.1145]])
tensor([[1., 1., 1.],
[1., 1., 1.]])
tensor([[0., 0., 0.],
[0., 0., 0.]])

Tensor的属性

Tensor属性为shape,datatype,被储存在的设备,device

tensor=torch.rand(3,4)
tensor.shape

torch.Size([3, 4])

tensor.dtype

torch.float32

tensor.device

device(type='cpu')

Tensor运算

超过100个tensor运算,包括算术,线性代数,矩阵操作(转置,索引,切片),采样等。每个运算都可以在GPU上进行(常常比在CPU上更快)

默认地,tensors在CPU上被创建。我们需要显式的通过.to方法来将它移动到GPU上。在设备间拷贝大型tensor对于时间和开销都是高昂的。

if torch.cuda.is_available():
    tensor=tensor.to('cuda')

类似numpy的索引和切片:

tensor=torch.ones((4,4));tensor

tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])

tensor[0]

tensor([1., 1., 1., 1.])

tensor[:,0]

tensor([1., 1., 1., 1.])

tensor[...,-1]=100;tensor

tensor([[ 1., 1., 1., 100.],
[ 1., 1., 1., 100.],
[ 1., 1., 1., 100.],
[ 1., 1., 1., 100.]])

tensor[:,1]=10;tensor

tensor([[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.]])

除了常用的索引选择数据,PyTorch还提供了一些高级的选择函数:

help(torch.index_select)

Help on built-in function index_select:

index_select(...)
index_select(input, dim, index, *, out=None) -> Tensor

Returns a new tensor which indexes the :attr:input tensor along dimension
:attr:dim using the entries in :attr:index which is a LongTensor.

The returned tensor has the same number of dimensions as the original tensor
(:attr:input). The :attr:dim\ th dimension has the same size as the length
of :attr:index; other dimensions have the same size as in the original tensor.

.. note:: The returned tensor does not use the same storage as the original
tensor. If :attr:out has a different shape than expected, we
silently change it to the correct shape, reallocating the underlying
storage if necessary.

Args:
input (Tensor): the input tensor.
dim (int): the dimension in which we index
index (IntTensor or LongTensor): the 1-D tensor containing the indices to index

Keyword args:
out (Tensor, optional): the output tensor.

Example::

>>> x = torch.randn(3, 4)
>>> x
tensor([[ 0.1427, 0.0231, -0.5414, -1.0009],
[-0.4664, 0.2647, -0.1228, -1.1068],
[-1.1734, -0.6571, 0.7230, -0.6004]])
>>> indices = torch.tensor([0, 2])
>>> torch.index_select(x, 0, indices)
tensor([[ 0.1427, 0.0231, -0.5414, -1.0009],
[-1.1734, -0.6571, 0.7230, -0.6004]])
>>> torch.index_select(x, 1, indices)
tensor([[ 0.1427, -0.5414],
[-0.4664, -0.1228],
[-1.1734, 0.7230]])

help(torch.masked_select)

Help on built-in function masked_select:

masked_select(...)
masked_select(input, mask, *, out=None) -> Tensor

Returns a new 1-D tensor which indexes the :attr:input tensor according to
the boolean mask :attr:mask which is a BoolTensor.

The shapes of the :attr:mask tensor and the :attr:input tensor don't need
to match, but they must be :ref:broadcastable <broadcasting-semantics>.

.. note:: The returned tensor does not use the same storage
as the original tensor

Args:
input (Tensor): the input tensor.
mask (BoolTensor): the tensor containing the binary mask to index with

Keyword args:
out (Tensor, optional): the output tensor.

Example::

>>> x = torch.randn(3, 4)
>>> x
tensor([[ 0.3552, -2.3825, -0.8297, 0.3477],
[-1.2035, 1.2252, 0.5002, 0.6248],
[ 0.1307, -2.0608, 0.1244, 2.0139]])
>>> mask = x.ge(0.5)
>>> mask
tensor([[False, False, False, False],
[False, True, True, True],
[False, False, False, True]])
>>> torch.masked_select(x, mask)
tensor([ 1.2252, 0.5002, 0.6248, 2.0139])

help(torch.gather)

Help on built-in function gather:

gather(...)
gather(input, dim, index, *, sparse_grad=False, out=None) -> Tensor

Gathers values along an axis specified by dim.

For a 3-D tensor the output is specified by::

out[i][j][k] = input[index[i][j][k]][j][k] # if dim == 0
out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1
out[i][j][k] = input[i][j][index[i][j][k]] # if dim == 2

:attr:input and :attr:index must have the same number of dimensions.
It is also required that index.size(d) <= input.size(d) for all
dimensions d != dim. :attr:out will have the same shape as :attr:index.
Note that input and index do not broadcast against each other.

Args:
input (Tensor): the source tensor
dim (int): the axis along which to index
index (LongTensor): the indices of elements to gather

Keyword arguments:
sparse_grad (bool, optional): If True, gradient w.r.t. :attr:input will be a sparse tensor.
out (Tensor, optional): the destination tensor

Example::

>>> t = torch.tensor([[1, 2], [3, 4]])
>>> torch.gather(t, 1, torch.tensor([[0, 0], [1, 0]]))
tensor([[ 1, 1],
[ 4, 3]])

可以用torch.cat来合并tensor,沿着某个方向,另外还有torch.stack,这稍微与torch.cat有些不一样。

t1=torch.cat([tensor,tensor,tensor],dim=1);t1

tensor([[ 1., 10., 1., 100., 1., 10., 1., 100., 1., 10., 1., 100.],
[ 1., 10., 1., 100., 1., 10., 1., 100., 1., 10., 1., 100.],
[ 1., 10., 1., 100., 1., 10., 1., 100., 1., 10., 1., 100.],
[ 1., 10., 1., 100., 1., 10., 1., 100., 1., 10., 1., 100.]])

torch.cat([tensor,tensor,tensor],dim=0)

tensor([[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.],
[ 1., 10., 1., 100.]])

catstack的区别在于前者会再增加现有维度的值,可以理解为续接,后者会增加一个维度,可以理解为叠加。

a=torch.arange(0,12).reshape(3,4)
a

tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

torch.cat([a,a]).shape

torch.Size([6, 4])

torch.stack([a,a]).shape

torch.Size([2, 3, 4])

torch.cat([a,a])

tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

torch.stack([a,a])

tensor([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],

[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])

  • 算术运算
tensor=torch.arange(0,9).reshape(3,3);tensor

tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])

以下计算了tensor之间的矩阵乘法,y1,y2的值相同

y1=tensor@tensor.T
y1

tensor([[ 5, 14, 23],
[ 14, 50, 86],
[ 23, 86, 149]])

y2=tensor.matmul(tensor.T)
y2

tensor([[ 5, 14, 23],
[ 14, 50, 86],
[ 23, 86, 149]])

y3=torch.empty(3,3)
torch.add(tensor,tensor.T,out=y3)
print(y3)

tensor([[ 0., 4., 8.],
[ 4., 8., 12.],
[ 8., 12., 16.]])

单元素tensor,比如通过aggregate所有值得到一个值,那么就可以通过item()来得到Python的数值。

agg=tensor.sum();agg

tensor(36)

agg_item=agg.item();agg_item

36

在位操作,那些把结果储存在运算数的运算被称为在位操作,可以用_来标识。比如x.copy_(y)x.t_()将会改变x

tensor

tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])

tensor.add_(5)

tensor([[ 5, 6, 7],
[ 8, 9, 10],
[11, 12, 13]])

tensor

tensor([[ 5, 6, 7],
[ 8, 9, 10],
[11, 12, 13]])

在位运算可能会省存储空间,但当计算导数的时候,会出错,因此不建议使用。

与numpy 数组的相互转换

使用numpy()from_numpy()将tensor和numpy数组相互转换。但需要注意的是:这两个函数所产生的tensor和Numpy的数组共享相同的内存(所以它们之间的转换很快),改变其中一个就改变了另一个!

Tensor to Numpy array

t=torch.ones(5)
t

tensor([1., 1., 1., 1., 1.])

n=t.numpy();n

array([ 1., 1., 1., 1., 1.], dtype=float32)

t.add_(1)

tensor([2., 2., 2., 2., 2.])

Numpy array to Tensor

n=np.ones(5)
t=torch.from_numpy(n)
t

tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

np.add(n,1,out=n)

array([ 2., 2., 2., 2., 2.])

t

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

n

array([ 2., 2., 2., 2., 2.])

此外,除了上面的方法,还有一个常用的方法就算直接使用torch.tensor()将numpy数组转换为tensor,需要注意的的是该方法总是会进行数据拷贝,返回的tensor和原来的数据不再共享内存。

a=np.arange(9).reshape(3,3)
c=torch.tensor(a)
a+=1
print(c)
print(a)

tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]], dtype=torch.int32)
[[1 2 3]
[4 5 6]
[7 8 9]]

View()

view()来改变tensor的形状,该方法返回的新tensor与源tensor共享内存(其实是同一个tensor),也即更改其中的一个,另外一个也会跟着改变。具有相同功能的reshape,也不能保证返回的是其拷贝。

x=torch.randn(5,3);x

tensor([[-0.5722, -0.4844, 1.5515],
[-0.2504, 0.2010, 0.0182],
[ 0.0400, 0.0397, 2.0167],
[ 1.8868, -0.4670, 0.5968],
[ 0.9070, 0.5825, -1.0549]])

y=x.view(15);y

tensor([-0.5722, -0.4844, 1.5515, -0.2504, 0.2010, 0.0182, 0.0400, 0.0397,
2.0167, 1.8868, -0.4670, 0.5968, 0.9070, 0.5825, -1.0549])

y[0]=100
x

tensor([[ 1.0000e+02, -4.8445e-01, 1.5515e+00],
[-2.5042e-01, 2.0102e-01, 1.8231e-02],
[ 3.9969e-02, 3.9711e-02, 2.0167e+00],
[ 1.8868e+00, -4.6697e-01, 5.9683e-01],
[ 9.0702e-01, 5.8254e-01, -1.0549e+00]])

z=x.view(-1,5);z

tensor([[ 1.0000e+02, -4.8445e-01, 1.5515e+00, -2.5042e-01, 2.0102e-01],
[ 1.8231e-02, 3.9969e-02, 3.9711e-02, 2.0167e+00, 1.8868e+00],
[-4.6697e-01, 5.9683e-01, 9.0702e-01, 5.8254e-01, -1.0549e+00]])

q=x.reshape(15);q

tensor([ 1.0000e+02, -4.8445e-01, 1.5515e+00, -2.5042e-01, 2.0102e-01,
1.8231e-02, 3.9969e-02, 3.9711e-02, 2.0167e+00, 1.8868e+00,
-4.6697e-01, 5.9683e-01, 9.0702e-01, 5.8254e-01, -1.0549e+00])

q[0]=250;x

tensor([[ 2.5000e+02, -4.8445e-01, 1.5515e+00],
[-2.5042e-01, 2.0102e-01, 1.8231e-02],
[ 3.9969e-02, 3.9711e-02, 2.0167e+00],
[ 1.8868e+00, -4.6697e-01, 5.9683e-01],
[ 9.0702e-01, 5.8254e-01, -1.0549e+00]])

如果我们想要返回一个真正新的副本(即不共享内存),可以先用clone创造一个副本,再用view

x_cp=x.clone().view(15)
x-=1
print(x)
print(x_cp)

tensor([[ 2.4900e+02, -1.4844e+00, 5.5149e-01],
[-1.2504e+00, -7.9898e-01, -9.8177e-01],
[-9.6003e-01, -9.6029e-01, 1.0167e+00],
[ 8.8677e-01, -1.4670e+00, -4.0317e-01],
[-9.2979e-02, -4.1746e-01, -2.0549e+00]])
tensor([ 2.5000e+02, -4.8445e-01, 1.5515e+00, -2.5042e-01, 2.0102e-01,
1.8231e-02, 3.9969e-02, 3.9711e-02, 2.0167e+00, 1.8868e+00,
-4.6697e-01, 5.9683e-01, 9.0702e-01, 5.8254e-01, -1.0549e+00])

使用clone还有一个好处就是会记录在计算图中,即梯度回传到副本时也会传到源tensor.
另外一个常用的函数就是item(),它可以将一个标量tensor转换为python number

x=torch.randn(1);x

tensor([-0.9871])

x.item()

-0.9870905876159668

线性代数

  • 迹:torch.trace
help(torch.trace)

Help on built-in function trace:

trace(...)
trace(input) -> Tensor

Returns the sum of the elements of the diagonal of the input 2-D matrix.

Example::

>>> x = torch.arange(1., 10.).view(3, 3)
>>> x
tensor([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.]])
>>> torch.trace(x)
tensor(15.)

  • 对角线元素:torch.diag
help(torch.diag)

Help on built-in function diag:

diag(...)
diag(input, diagonal=0, *, out=None) -> Tensor

- If :attr:input is a vector (1-D tensor), then returns a 2-D square tensor
with the elements of :attr:input as the diagonal.
- If :attr:input is a matrix (2-D tensor), then returns a 1-D tensor with
the diagonal elements of :attr:input.

The argument :attr:diagonal controls which diagonal to consider:

- If :attr:diagonal = 0, it is the main diagonal.
- If :attr:diagonal > 0, it is above the main diagonal.
- If :attr:diagonal < 0, it is below the main diagonal.

Args:
input (Tensor): the input tensor.
diagonal (int, optional): the diagonal to consider

Keyword args:
out (Tensor, optional): the output tensor.

.. seealso::

:func:torch.diagonal always returns the diagonal of its input.

:func:torch.diagflat always constructs a tensor with diagonal elements
specified by the input.

Examples:

Get the square matrix where the input vector is the diagonal::

>>> a = torch.randn(3)
>>> a
tensor([ 0.5950,-0.0872, 2.3298])
>>> torch.diag(a)
tensor([[ 0.5950, 0.0000, 0.0000],
[ 0.0000,-0.0872, 0.0000],
[ 0.0000, 0.0000, 2.3298]])
>>> torch.diag(a, 1)
tensor([[ 0.0000, 0.5950, 0.0000, 0.0000],
[ 0.0000, 0.0000,-0.0872, 0.0000],
[ 0.0000, 0.0000, 0.0000, 2.3298],
[ 0.0000, 0.0000, 0.0000, 0.0000]])

Get the k-th diagonal of a given matrix::

>>> a = torch.randn(3, 3)
>>> a
tensor([[-0.4264, 0.0255,-0.1064],
[ 0.8795,-0.2429, 0.1374],
[ 0.1029,-0.6482,-1.6300]])
>>> torch.diag(a, 0)
tensor([-0.4264,-0.2429,-1.6300])
>>> torch.diag(a, 1)
tensor([ 0.0255, 0.1374])

  • triu 上三角
help(torch.triu)

Help on built-in function triu:

triu(...)
triu(input, diagonal=0, *, out=None) -> Tensor

Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices
:attr:input, the other elements of the result tensor :attr:out are set to 0.

The upper triangular part of the matrix is defined as the elements on and
above the diagonal.

The argument :attr:diagonal controls which diagonal to consider. If
:attr:diagonal = 0, all elements on and above the main diagonal are
retained. A positive value excludes just as many diagonals above the main
diagonal, and similarly a negative value includes just as many diagonals below
the main diagonal. The main diagonal are the set of indices
:math:\lbrace (i, i) \rbrace for :math:i \in [0, \min\{d_{1}, d_{2}\} - 1] where
:math:d_{1}, d_{2} are the dimensions of the matrix.

Args:
input (Tensor): the input tensor.
diagonal (int, optional): the diagonal to consider

Keyword args:
out (Tensor, optional): the output tensor.

Example::

>>> a = torch.randn(3, 3)
>>> a
tensor([[ 0.2309, 0.5207, 2.0049],
[ 0.2072, -1.0680, 0.6602],
[ 0.3480, -0.5211, -0.4573]])
>>> torch.triu(a)
tensor([[ 0.2309, 0.5207, 2.0049],
[ 0.0000, -1.0680, 0.6602],
[ 0.0000, 0.0000, -0.4573]])
>>> torch.triu(a, diagonal=1)
tensor([[ 0.0000, 0.5207, 2.0049],
[ 0.0000, 0.0000, 0.6602],
[ 0.0000, 0.0000, 0.0000]])
>>> torch.triu(a, diagonal=-1)
tensor([[ 0.2309, 0.5207, 2.0049],
[ 0.2072, -1.0680, 0.6602],
[ 0.0000, -0.5211, -0.4573]])

>>> b = torch.randn(4, 6)
>>> b
tensor([[ 0.5876, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235],
[-0.2447, 0.9556, -1.2919, 1.3378, -0.1768, -1.0857],
[ 0.4333, 0.3146, 0.6576, -1.0432, 0.9348, -0.4410],
[-0.9888, 1.0679, -1.3337, -1.6556, 0.4798, 0.2830]])
>>> torch.triu(b, diagonal=1)
tensor([[ 0.0000, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235],
[ 0.0000, 0.0000, -1.2919, 1.3378, -0.1768, -1.0857],
[ 0.0000, 0.0000, 0.0000, -1.0432, 0.9348, -0.4410],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.4798, 0.2830]])
>>> torch.triu(b, diagonal=-1)
tensor([[ 0.5876, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235],
[-0.2447, 0.9556, -1.2919, 1.3378, -0.1768, -1.0857],
[ 0.0000, 0.3146, 0.6576, -1.0432, 0.9348, -0.4410],
[ 0.0000, 0.0000, -1.3337, -1.6556, 0.4798, 0.2830]])

  • tril 下三角
help(torch.tril)

Help on built-in function tril:

tril(...)
tril(input, diagonal=0, *, out=None) -> Tensor

Returns the lower triangular part of the matrix (2-D tensor) or batch of matrices
:attr:input, the other elements of the result tensor :attr:out are set to 0.

The lower triangular part of the matrix is defined as the elements on and
below the diagonal.

The argument :attr:diagonal controls which diagonal to consider. If
:attr:diagonal = 0, all elements on and below the main diagonal are
retained. A positive value includes just as many diagonals above the main
diagonal, and similarly a negative value excludes just as many diagonals below
the main diagonal. The main diagonal are the set of indices
:math:\lbrace (i, i) \rbrace for :math:i \in [0, \min\{d_{1}, d_{2}\} - 1] where
:math:d_{1}, d_{2} are the dimensions of the matrix.

Args:
input (Tensor): the input tensor.
diagonal (int, optional): the diagonal to consider

Keyword args:
out (Tensor, optional): the output tensor.

Example::

>>> a = torch.randn(3, 3)
>>> a
tensor([[-1.0813, -0.8619, 0.7105],
[ 0.0935, 0.1380, 2.2112],
[-0.3409, -0.9828, 0.0289]])
>>> torch.tril(a)
tensor([[-1.0813, 0.0000, 0.0000],
[ 0.0935, 0.1380, 0.0000],
[-0.3409, -0.9828, 0.0289]])

>>> b = torch.randn(4, 6)
>>> b
tensor([[ 1.2219, 0.5653, -0.2521, -0.2345, 1.2544, 0.3461],
[ 0.4785, -0.4477, 0.6049, 0.6368, 0.8775, 0.7145],
[ 1.1502, 3.2716, -1.1243, -0.5413, 0.3615, 0.6864],
[-0.0614, -0.7344, -1.3164, -0.7648, -1.4024, 0.0978]])
>>> torch.tril(b, diagonal=1)
tensor([[ 1.2219, 0.5653, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.4785, -0.4477, 0.6049, 0.0000, 0.0000, 0.0000],
[ 1.1502, 3.2716, -1.1243, -0.5413, 0.0000, 0.0000],
[-0.0614, -0.7344, -1.3164, -0.7648, -1.4024, 0.0000]])
>>> torch.tril(b, diagonal=-1)
tensor([[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.4785, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 1.1502, 3.2716, 0.0000, 0.0000, 0.0000, 0.0000],
[-0.0614, -0.7344, -1.3164, 0.0000, 0.0000, 0.0000]])

广播机制

x=torch.arange(1,3).view(1,2);x

tensor([[1, 2]])

y=torch.arange(1,4).view(3,1);y

tensor([[1],
[2],
[3]])

x+y

tensor([[2, 3],
[3, 4],
[4, 5]])

运算的内存开销

索引,view是不会开辟新内存,而y=x+y这样的运算是会新开内存,然后将y指向新内存。

x=torch.tensor([1,2])
y=torch.tensor([3,4])
id_before=id(y)
y=y+x
id(y)==id_before

False

如果我们想指定结果到原来y的内存,可以使用索引来进行替换操作。

x=torch.tensor([1,2])
y=torch.tensor([3,4])
id_before=id(y)
y[:]=y+x
id_before==id(y)

True

我们还可以使用运算符全名函数的out参数或者自加符号(也即add_):

x=torch.tensor([1,2])
y=torch.tensor([3,4])
id_before=id(y)
torch.add(x,y,out=y)
id(y)==id_before

True

y+=x
id(y)==id_before

True

y.add_(x)
id(y)==id_before

True

y.requires_grad

False

自动求梯度

Pytorch提供的autograd包能根据输入和前向传播过程自动构建计算图,并执行反向传播。

如果将Tensor类的属性.require_grad设置为True,它将追踪在其上的所有操作(这样就可以利用链式法则进行梯度传播了)。完成计算后,可以调用.backward()来完成所有梯度计算。此tensor的梯度将累积到.grad属性中。

注意在y.backward()时,如果y是标量,则不需要backward()传入任何参数,否则,需要传入一个与y同形的tensor,则此时y.backward(w)的含义是:先计算L=torch.sum(y*w),则L是个标量,然后求L对自变量x的导数。

如果不想要被继续追踪,可以调用.detach()可将其从追踪记录中分离出来,这样就可以防止将来的计算被追踪,这样梯度就传不过去了。此外,还可以用with torch.no_grad()将不想被追踪的操作代码块包裹起来,这种方法在评价模型的时候很常用,因为在评估模型时,我们并不需要计算可训练参数(requires_grad=True)的梯度。

Function是另外一个很重要的类。TensorFunction互相结合就可以构建一个记录有整个计算过程的有向无环图(DAG)。每个tensor都有一个.grad_fn属性,该属性即创建该TensorFunction,也就是说该tensor是不是通过某些运算得到的,若是,则grad_fn1返回一个与这些运算相关的对象,否则是None.

x=torch.ones(2,2,requires_grad=True)
print(x)
print(x.grad_fn)
print(x.grad) # 未计算则为None
print(x.dtype)

tensor([[1., 1.],
[1., 1.]], requires_grad=True)
None
None
torch.float32

y=x+2
print(y)
print(y.grad_fn)

tensor([[3., 3.],
[3., 3.]], grad_fn=)
<AddBackward0 object at 0x000001BA1B94F860>

注意x是直接创建的,所以没有grad_fn,而y是通过一个加法操作创建的,所以它有grad_fn。像x这种直接创建的称为叶子节点,叶子节点对应的grad_fnNone.

z=y*y*3
out=z.mean()
print(z,out)

tensor([[27., 27.],
[27., 27.]], grad_fn=) tensor(27., grad_fn=)

通过.requires_grad_()来用in-place的方式改变requires_grad属性:

a=torch.randn(2,2)
a=((a*3)/(a-1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b=(a*a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x000001BA1B92FBA8>

梯度

因为out是一个标量,所以调用backward()时不需要指定求导变量:

out

tensor(27., grad_fn=)

out.backward()
print(x.grad)

tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])

x

tensor([[1., 1.],
[1., 1.]], requires_grad=True)

out为o,因为:

\[o=1/4 \sum_{i=1}^{4}3(x_i+2)^2 \]

所以:

\[\frac {\partial o } {\partial x_i }|_{x_i=1}=9/2=4.5 \]

量为向量的函数对于向量的梯度就是一个雅可比矩阵J,而torch.autograd这个包就是用来计算一些雅可比矩阵的乘积的,例如,如果v是已给标量函数的 $$ l=g( y^{\rightarrow} ) $$ 的梯度:

\[v=( \frac {\partial l} {y_1} ... \frac {\partial l} {y_m}) \]

根据链式法则,我们有l关于 $$ x^{\rightarrow} $$ 的雅可比矩阵

\[VJ= (\frac {\partial l} {x_1} ... \frac {\partial l} {x_m} ) \]

注意:grad 在反向传播过程中是累加的,这意味着每一次运行反向传播,梯度都会累加之前的梯度,所以一般在反向传播之前需要把梯度清零。

out2=x.sum();out2

tensor(4., grad_fn=)

out2.backward()
print(x.grad)

tensor([[5.5000, 5.5000],
[5.5000, 5.5000]])

out3=x.sum()
x.grad.data.zero_()
out3.backward()
print(x.grad)

tensor([[1., 1.],
[1., 1.]])

小练习:

a=torch.tensor([1,2,3],requires_grad=True,dtype=torch.float32)
print(a.grad)

None

b=a**2;b

tensor([1., 4., 9.], grad_fn=)

b.requires_grad

True

w=torch.tensor([0.1,0.2,0.3])
b.backward(w)
print(a.grad)

tensor([0.2000, 0.8000, 1.8000])

d=b.sum();d

tensor(14., grad_fn=)

d.requires_grad

True

d.backward()


RuntimeError Traceback (most recent call last)
in
----> 1 d.backward()

E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):

E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\autograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
147 Variable.execution_engine.run_backward(
148 tensors, grad_tensors
, retain_graph, create_graph, inputs,
--> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
150
151

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

d=2*x
for i in range(11):
    d.backward(retain_graph=True)
    print(x.grad)

tensor(4.)
tensor(6.)
tensor(8.)
tensor(10.)
tensor(12.)
tensor(14.)
tensor(16.)
tensor(18.)
tensor(20.)
tensor(22.)
tensor(24.)

d=2*x for i in range(11): d.backward() print(x.grad)
tensor(26.)


RuntimeError Traceback (most recent call last)
in
1 d=2*x
2 for i in range(11):
----> 3 d.backward()
4 print(x.grad)

E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):

E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\autograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
147 Variable.execution_engine.run_backward(
148 tensors, grad_tensors
, retain_graph, create_graph, inputs,
--> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
150
151

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

c=a.sum();c

tensor(6., grad_fn=)

c.backward()
a.grad

tensor([1.2000, 1.8000, 2.8000])

a.grad.data.zero_()

tensor([0., 0., 0.])

c=a.sum()
c.backward()
print(a.grad)

tensor([1., 1., 1.])

torch.arange(0,9).view(3,3)

tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])

torch.arange(0,9).view(3,3).sum()

tensor(36)

  • 更实际的例子
x=torch.tensor([1.0,2.0,3.0,4.0],requires_grad=True) #注意赋值时候是1.0,而不是1,2,3,否则dtype不是torch.float
x.dtype

torch.float32

y=2*x
z=y.view(2,2)
print(z)
v=torch.tensor([[1.0,0.1],[0.01,0.001]],dtype=torch.float)
z.backward(v)
print(x.grad)

tensor([[2., 4.],
[6., 8.]], grad_fn=)
tensor([2.0000, 0.2000, 0.0200, 0.0020])

  • 中断梯度追踪的例子
x=torch.tensor(1.0,requires_grad=True)
y1=x**2
with torch.no_grad():
    y2=x**3
y3=y1+y2

print(x.requires_grad)
print(y1,y1.requires_grad)
print(y2,y2.requires_grad)
print(y3,y3.requires_grad)

True
tensor(1., grad_fn=) True
tensor(1.) False
tensor(2., grad_fn=) True

y3.backward()
print(x.grad)

tensor(2.)

\[ y_3=y_1+y_2=x^2+x^3 $$ ,当x=1时, $$ \frac {d y_3} {dx} $$ 不应该是5么?实际上,由于 y2的定义被`torch.no_grad()`包裹,所以与y2有关的梯度是不会回传的,只有y1有关的梯度才会回传。 上面提到,`y2.requires_grad=False`,所以不能调用`y2.backward()`,会报错: y2.backward() --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-131-8061dc2a05a4> in <module> ----> 1 y2.backward() E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs) 253 create_graph=create_graph, 254 inputs=inputs) --> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) 256 257 def register_hook(self, hook): E:\software\Anaconda\envs\pytorch_env\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs) 147 Variable._execution_engine.run_backward( 148 tensors, grad_tensors_, retain_graph, create_graph, inputs, --> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag 150 151 RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn 此外,若我们要修改`tensor`的数值,但又不希望被`autograd`记录(即不影响反向传播),那么就可以对`tensor.data`操作. ```python x=torch.ones(1,requires_grad=True) print(x.data) # 还是一个tensor print(x.data.requires_grad) #但已经独立于计算图之外 y=2*x x.data*=100 #仅仅改变了值,不会记录在计算图,所以不会影响梯度传播 y.backward() print(x) print(x.grad) ``` tensor([1.]) False tensor([100.], requires_grad=True) tensor([2.]) ### 注意reshape的使用 考虑 $$ y=\sum_{i=1}^{n} {x_i} \]

example 1:

x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True)
y=x.sum()
print(y)
y.backward()
print(x.grad)

tensor(15., grad_fn=)
tensor([[1., 1., 1., 1., 1.]])

example 2:故意多一个步骤,让输入变下形状

x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True).reshape(-1,1);x

tensor([[1.],
[2.],
[3.],
[4.],
[5.]], grad_fn=)

y=x.sum()
y.backward()
print(x.grad)

None

E:\software\Anaconda\envs\pytorch_env\lib\site-packages\ipykernel_main_.py:3: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.
app.launch_new_instance()

如果初始时就使用reshape,那么被求导的变量实际是reshape之前的tensor,而非x,但被要求求导的对象没有变量名,所以不能使用.grad,正确的方法:

x=torch.tensor([[1,2,3,4,5]],dtype=torch.float,requires_grad=True)
print(x)
z=x.reshape(-1,1)
print(z)
y=z.sum()
y.backward()
x.grad

tensor([[1., 2., 3., 4., 5.]], requires_grad=True)
tensor([[1.],
[2.],
[3.],
[4.],
[5.]], grad_fn=)

tensor([[1., 1., 1., 1., 1.]])


posted @ 2021-06-30 13:33  JohnYang819  阅读(551)  评论(0)    收藏  举报