深度学习中向量(矩阵)微分基本知识
函数矩阵基本运算
1. 函数矩阵\[ A(x)=\left[
\begin{matrix}
a_{11}(x) & a_{12}(x) & \cdots & a_{1n}(x) \\
a_{21}(x) & a_{22}(x) & \cdots & a_{2n}(x) \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1}(x) & a_{m2}(x) & \cdots & a_{mn}(x)
\end{matrix}
\right]
\tag{1}
\]
- 函数矩阵的导数
\[ A^{'}(x_0)=\frac{dA(x)}{dx}=
\left[
\begin{matrix}
a^{'}_{11}(x) & a^{'}_{12}(x) & \cdots & a^{'}_{1n}(x) \\
a^{'}_{21}(x) & a^{'}_{22}(x) & \cdots & a^{'}_{2n}(x) \\
\vdots & \vdots & \ddots & \vdots \\
a^{'}_{m1}(x) & a^{'}_{m2}(x) & \cdots & a^{'}_{mn}(x)
\end{matrix}
\right]
\tag{2}
\]
- 运算性质
- 函数矩阵的加法、数量乘法、矩阵与矩阵的乘法、矩阵的转置与常数矩阵的相应运算完全相同。
-
\[\frac{d}{dx}[A(x)+B(x)]=\frac{dA(x)}{dx}+\frac{dB(x)}{dx}\tag{3} \]
- 设\(k(x)\)为\(x\)的纯量函数,\(A(x)\)是函数矩阵,则
\[\frac{d}{dx}[k(x)A(x)]=\frac{dk(x)}{dx}A(x)+k(x)\frac{dA(x)}{dx}\tag{4}
\]
- 设\(A(x)\),\(B(x)\)均可导,且可以相差,则
\[\frac{d}{dx}[A(x)B(x)]=\frac{dA(x)}{dx}B(x)+A(x)\frac{dB(x)}{dx} \tag{5}
\]
- 设计\(A(x)\)为函数矩阵,\(x=f(t)\)是\(t\)的纯量函数,\(A(x)\)与\(f(x)\)均可导,则
\[\frac{d}{dx}(A(x))=f^{'}(t)\frac{dA(x)}{dx} \tag{6}
\]
- 函数矩阵的高阶导数为:
\[\frac{d^2 A(x)}{dx^2}=\frac{d}{dx}\left(\frac{dA(x)}{dx}\right) \tag{7}
\]
\[\frac{d^3 A(x)}{dx^3}=\frac{d}{dx}\left(\frac{d^2A(x)}{dx^2}\right) \tag{8}
\]
\[\vdots
\]
\[\frac{d^k A(x)}{dx^k}=\frac{d}{dx}\left(\frac{d^{k-1}A(x)}{dx^{k-1}}\right) \tag{9}
\]
- PyTorch求导示例
import torch
x=torch.Tensor([[1,2,3],[2,3,4]])
x.requires_grad=True
# y=x^2
y=torch.pow(x,2)
# 求导
y.sum().backward()
print(x.grad)
>>tensor([[2., 4., 6.],
[4., 6., 8.]])
- 函数矩阵对矩阵的导数
- 定义
设\(A=(a_{ij})_{p \times q}\),\(B=(b_{kl})_{p \times q}\),\(A\)中的每一个元素\(a_{ij}\)是\(B\)的函数,即\(a_{ij}=a_{ij}(B)\),称\(A\)是\(B\)的函数,用\(A(B)\)表示。 - 函数矩阵\(A\)对矩阵\(B\)的导数\(\frac{DA}{DB}\)\[ \frac{DA}{DB}= \left[ \begin{matrix} \frac{\partial A}{\partial b_{11}} & \frac{\partial A}{\partial b_{12}} & \cdots & \frac{\partial A}{\partial b_{1q}} \\ \frac{\partial A}{\partial b_{21}} & \frac{\partial A}{\partial b_{22}} & \cdots & \frac{\partial A}{\partial b_{2q}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial A}{\partial b_{p1}} & \frac{\partial A}{\partial b_{p2}} & \cdots & \frac{\partial A}{\partial b_{pq}} \end{matrix} \right]=\left(\frac{\partial A}{\partial b_{kl}} \right) \tag{10} \]其中:\[ \frac{\partial A}{\partial b_{kl}}= \left[ \begin{matrix} \frac{\partial a_{11}}{\partial b_{kl}} & \frac{\partial a_{12}}{\partial b_{kl}} & \cdots & \frac{\partial a_{1n}}{\partial b_{kl}} \\ \frac{\partial a_{21}}{\partial b_{kl}} & \frac{\partial a_{22}}{\partial b_{kl}} & \cdots & \frac{\partial a_{2n}}{\partial b_{kl}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial a_{m1}}{\partial b_{kl}} & \frac{\partial a_{m2}}{\partial b_{kl}} & \cdots & \frac{\partial a_{mn}}{\partial b_{kl}} \end{matrix} \right] \tag{11} \]
- 性质\[\frac{DA}{DA^T}=\frac{DA^T}{DA}=E \tag{12} \]\[\frac{D(A+B)}{DC}=\frac{DA}{DC}+\frac{DB}{DC} \tag{13} \]

浙公网安备 33010602011771号