Pytorch 几个踩坑点
tensor.detach() creates a tensor that shares storage with tensor that does not requires grad. This will remove a tensor from a computation graph.
tensor.clone() creates a copy of tensor that imitates the original tensor's requires_grad field. This still keeps the copy as a part of the computational graph it came from.
tensor.data returns a new tensor that shares storage with tensor. But it always has requires_grad=False.
2.
gradient 可以理解成一阶近似,所以梯度可以理解成某个变量
尽我学者之力,其余不必多问。

浙公网安备 33010602011771号