扩展——向量求导

向量求导

感谢

[矩阵求导的本质与分子布局、分母布局的本质(矩阵求导——本质篇) - 知乎 (zhihu.com)](https://zhuanlan.zhihu.com/p/263777564#:~:text= 分子布局,就是分子是列向量形式,分母是行向量形式,如 式。 如果这里的 是 实向量函数 的话,结果就是,的矩阵了: 分母布局 ,就是分母是 列向量 形式,分子是 行向量 形式,如 式。)

07 自动求导【动手学深度学习v2】_哔哩哔哩_bilibili

梯度,指向值变化最大的方向,这里都是分子布局

一 函数计算、求导与向量矩阵

考虑一个函数

\[\text{function(input)} \]

针对\(\text{function}\)\(\text{input}\)的类型,我们可以将这个函数分类。

1 \(\text{function}\)是一个标量

我们称\(\text{function}\)是一个实值标量函数。用细体小写字母\(f\)表示

1.1 \(\text{input}\)是一个标量

我们称\(\text{function}\)的变元是标量,用细体小写字母\(x\)表示。

计算:输入是标量(\((1,)\)),函数是一个实值标量函数,结果是一个值(标量)(\((1,)\)

求导: 分母(函数值)是标量(\((1,)\)),分子是标量(\((1,)\)),结果是标量(\((1,)\)

image-20220331202208775

例1

\[f(x) = 2x+2 \\ f'(x)=2 \]

1.2 \(\text{input}\)是一个向量

我们称\(\text{function}\)的变元是向量,用粗体小写字母\(\mathbfcal{x}\)表示。

计算:输入是列向量(\((n,1)n\times 1\)),函数是一个实值标量函数,结果是一个标量(数)(\((1,)\)

求导:分母(函数值)是标量(\((1,)\)),分子是列向量(\((n,1)n\times 1\)),结果是行向量(\((1,n)1\times n\)

\[\mathbfcal{x}= \left[ \begin {array}{1} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right ]_{(n,1)} \\ y=f(\mathbfcal{x})_{(1,)} \\ f'(\mathbfcal{x})= \frac{\partial y}{ \partial \mathbfcal{x}} = \left[ \frac{\partial y}{\partial x_1} , \frac{\partial y}{\partial x_2}, \cdots , \frac{\partial y}{\partial x_n} \right ]_{(1,n)} \]

image-20220331211924207

例2

\[\mathbfcal{x}= \left[ \begin {array}{1} x_1 \\ x_2 \end{array} \right ] \\ y=f(\mathbfcal{x}) = a_1x_1^2+a_2x_2^2+a_3x_1x_2+a_4x_1+a_5x_2+a_6 \\ f'(\mathbfcal{x})= \frac{\partial y}{ \partial \mathbfcal{x}} = \left[ \frac{\partial y}{\partial x_1} , \frac{\partial y}{\partial x_2} \right ] = \left[ 2a_1+a_3x_2+a_4, 2a_2+a_3x_1+a_5\right ] \]

1.3 \(\text{input}\)是一个矩阵

我们称\(\text{function}\)的变元是矩阵,用粗体大写字母\(\symbf{X}\)表示。

计算:输入是矩阵(\((n,k)n\times k\)),函数是一个实值标量函数,结果是一个标量(数)(\((1,)\)

求导:分母(函数值)是标量(\((1,)\)),分子是矩阵(\((n,k)n\times k\)),结果是矩阵(\((k,n)k\times n\)

\[\symbf{X} = \left ( \begin{matrix} x_{11} & x_{12} & \cdots & x_{1k} \\ x_{21} & x_{22} & \cdots & x_{2k} \\ \vdots & \vdots & & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{nk} \\ \end{matrix} \right )_{(n,k)n\times k} \\ y=f(\symbf{X})_{(1,)} \\ f'(\symbf{X}) =\frac{\partial y}{\partial \symbf{X}} = \left ( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{n1}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{n2}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y}{\partial x_{1k}} & \frac{\partial y}{\partial x_{2k}} & \cdots & \frac{\partial y}{\partial x_{nk}} \\ \end{matrix} \right )_{(k,n)k\times n} \]

image-20220331215017049

例3

\[\symbf{X} = \left ( \begin{matrix} x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32} \\ \end{matrix} \right )_{(3,2) 3\times 2} \\ y=f(\symbf{X})_{(1,)}=a_1x_{11}^2+a_2x_{12}^2+a_3x_{21}^2+a_4x_{22}^2+a_5x_{31}^2+a_6x_{32}^2 \\ f'(\symbf{X}) =\frac{\partial y}{\partial \symbf{X}} = \left ( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{31}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \frac{\partial y}{\partial x_{32}} \\ \end{matrix} \right) _{(2,3),2\times 3} = \left ( \begin{matrix} 2a_1x_{11} & 2a_3x_{21} & 2a_5x_{31} \\ 2a_2x_{12} & 2a_4x_{22} & 2a_6x_{32} \\ \end{matrix} \right) _{(2,3),2\times 3} \]

2 \(\text{function}\)是一个向量

我们称\(\text{function}\)是一个实向量函数。用粗体小写字母\(\mathbfcal{f}\)表示。

含义: \(\mathbfcal{f}\)是由 若干个\(f\)组成的一个向量

image-20220401095853152

2.1 \(\text{input}\)是一个标量

计算:输入(变元)是标量(\((1,)\)),函数是列向量函数(\((m,1)m \times 1\)),输出结果是列向量(\((m,1)m \times 1\))

求导: 求导分母(函数值)是列向量(\((m,1)m \times 1\)),分子是标量(\((1,)\)),求导结果是列向量(\((m,1)m \times 1\))

\[x \\ \mathbfcal{y} = \mathbfcal{f}(x)= \left[ \begin {array}{1} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right ] \\ \frac{\partial \mathbfcal{y}}{ \partial x} = \left[ \begin {array}{1} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x}\\ \vdots \\ \frac{\partial y_n}{\partial x} \end{array} \right ] \]

image-20220401100531229

例四

\[x \\ \mathbfcal{y}=\mathbfcal{f}(x)= \left[ \begin {array}{1} x+1 \\ 2x^2+1 \\ 3x^3+1 \end{array} \right ]_{(3,1)3 \times 1} \\ \mathbfcal{f}'(x)=\frac{\partial \mathbfcal{y}}{ \partial x} = \left[ \begin {array}{1} 1 \\ 4x \\ 9x^2 \end{array} \right ]_{(3,1)3\times 1} \]

2.2 \(\text{input}\)​是一个向量

计算:输入(变元)是向量(\((n,1)n\times 1\)),函数是列向量函数(\((m,1)m \times 1\)),输出结果是列向量(\((m,1)m \times 1\))

求导: 求导分母(函数值)是列向量(\((m,1)m \times 1\)),分子是向量(\((n,1) n \times 1\)),求导结果是列向量(\((m,n)m \times n\))

Jacobian矩阵

  • Jacobian矩阵可被视为是一种组织梯度向量的方法。
  • 梯度向量可以被视为是一种组织偏导数的方法。
  • 故,Jacobian矩阵可以被视为一个组织偏导数的矩阵。

\[\mathbfcal{x}= \left[ \begin {array}{1} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right ]_{(n,1)n\times 1} \\ \mathbfcal{y} = \mathbfcal{f}(\mathbfcal{x})= \left[ \begin {array}{1} y_1 \\ y_2 \\ \vdots \\ y_m \end{array} \right ] = \left[ \begin {array}{1} f_1(x_1,x_2,\cdots,x_n) \\ f_2(x_1,x_2,\cdots,x_n) \\ \vdots \\ f_m(x_1,x_2,\cdots,x_n) \end{array} \right ]_{(m,1)m\times 1} \\ \frac{\partial \mathbfcal{y}}{ \partial \mathbfcal{x}} = \left ( \begin{matrix} \frac{\partial y_1}{\partial x_{1}} & \frac{\partial y_1}{\partial x_{2}} & \cdots & \frac{\partial y_1}{\partial x_{n}} \\ \frac{\partial y_2}{\partial x_{1}} & \frac{\partial y_2}{\partial x_{2}} & \cdots & \frac{\partial y_2}{\partial x_{n}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_m}{\partial x_{1}} & \frac{\partial y_m}{\partial x_{2}} & \cdots & \frac{\partial y_m}{\partial x_{n}} \\ \end{matrix} \right )_{(m,n)m\times n} \]

image-20220401105850540

image-20220401104012735

例五

\[\mathbfcal{x}= \left[ \begin {array}{1} x_1 \\ x_2 \end{array} \right ]_{(2,1)2 \times 1} \\ \mathbfcal{y}=\mathbfcal{f}(x)= \left[ \begin {array}{1} x_1+x_2 \\ 2x_1^2+2x_2^2 \\ 3x_1^3+3x_2^3 \end{array} \right ]_{(3,1)3 \times 1} \\ \mathbfcal{f}'(x)=\frac{\partial \mathbfcal{y}}{ \partial x} = \left[ \begin {array}{1} 1 &1 \\ 4x_1 &4x_2 \\ 9x_1^2 &9x_3^2 \end{array} \right ]_{(3,2)3\times 2} \]

2.3 \(\text{input}\)是一个矩阵

计算:输入(变元)是矩阵(\((n,k)n\times k\)),函数是列向量函数(\((m,1)m \times 1\)),输出结果是列向量(\((m,1)m \times 1\))

求导: 求导分母(函数值)是列向量(\((m,1)m \times 1\)),分子是矩阵(\((n,k) n \times k\)),求导结果是张量(\((m,k,n)m \times k \times n\))

\[\symbf{X} = \left ( \begin{matrix} x_{11} & x_{12} & \cdots & x_{1k} \\ x_{21} & x_{22} & \cdots & x_{2k} \\ \vdots & \vdots & & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{nk} \\ \end{matrix} \right )_{(n,k)n\times k} \\ \\ \mathbfcal{y} = \mathbfcal{f}(\symbf{X})= \left[ \begin {array}{1} y_1 \\ y_2 \\ \vdots \\ y_m \end{array} \right ] = \left[ \begin {array}{1} f_1(x_{11},x_{12},\cdots,x_{21},\cdots,x_{nn}) \\ f_2(x_{11},x_{12},\cdots,x_{21},\cdots,x_{nn}) \\ \vdots \\ f_m(x_{11},x_{12},\cdots,x_{21},\cdots,x_{nn}) \end{array} \right ]_{(m,1)m\times 1} \\ \frac{\partial \mathbfcal{y}}{ \partial \symbf{X}} = \left ( \begin{matrix} \frac{\partial y_1}{\partial x_{11}} & \frac{\partial y_1}{\partial x_{12}} & \cdots & \frac{\partial y_1}{\partial x_{1k}} \\ \frac{\partial y_2}{\partial x_{11}} & \frac{\partial y_2}{\partial x_{12}} & \cdots & \frac{\partial y_2}{\partial x_{1k}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_m}{\partial x_{11}} & \frac{\partial y_m}{\partial x_{12}} & \cdots & \frac{\partial y_m}{\partial x_{1k}} \\ \end{matrix} \right ) \left ( \begin{matrix} \frac{\partial y_1}{\partial x_{21}} & \frac{\partial y_1}{\partial x_{22}} & \cdots & \frac{\partial y_1}{\partial x_{2k}} \\ \frac{\partial y_2}{\partial x_{21}} & \frac{\partial y_2}{\partial x_{22}} & \cdots & \frac{\partial y_2}{\partial x_{2k}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_m}{\partial x_{21}} & \frac{\partial y_m}{\partial x_{22}} & \cdots & \frac{\partial y_m}{\partial x_{2k}} \\ \end{matrix} \right ) \cdots \left ( \begin{matrix} \frac{\partial y_1}{\partial x_{n1}} & \frac{\partial y_1}{\partial x_{n2}} & \cdots & \frac{\partial y_1}{\partial x_{nk}} \\ \frac{\partial y_2}{\partial x_{n1}} & \frac{\partial y_2}{\partial x_{n2}} & \cdots & \frac{\partial y_2}{\partial x_{nk}} \\ \vdots & \vdots & & \vdots \\ \frac{\partial y_m}{\partial x_{n1}} & \frac{\partial y_m}{\partial x_{n2}} & \cdots & \frac{\partial y_m}{\partial x_{nk}} \\ \end{matrix} \right ) _{(m,k,n)m\times k\times n} \]

image-20220401111128159

image-20220401114216702

3 \(\text{function}\)是一个矩阵

我们称\(\text{function}\)是一个实矩阵函数。用粗体大写字母\(\mathbf{F}\)表示。

含义: \(\mathbf{F}\)是由 若干个\(f\)组成的一个矩阵

3.1 \(\text{input}\)是一个标量

计算:输入是标量(\((1,)\)),函数是一个实矩阵函数,结果是一个矩阵(\((m,l)\)

求导: 分母(函数值)是矩阵(\((m,l)\)),分子是标量(\((1,)\)),结果是矩阵(\((m,l)\)

\[x \\ \symbf{Y} = \mathbf{F}(x)= \left ( \begin{matrix} f_{11}(x) & f_{12}(x) & \cdots & f_{1l}(x) \\ f_{21}(x) & f_{22}(x) & \cdots & f_{2l}(x) \\ \vdots & \vdots & & \vdots \\ f_{m1}(x) & f_{m2}(x) & \cdots & f_{ml}(x) \\ \end{matrix} \right ) = \left ( \begin{matrix} y_{11} & y_{12} & \cdots & y_{1l} \\ y_{21} & y_{22} & \cdots & y_{2l} \\ \vdots & \vdots & & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{ml} \\ \end{matrix} \right )_{(m,l)m\times l} \\ \frac{ \partial\symbf{Y}}{\partial x} =\mathbf{F}'(x) = \left ( \begin{matrix} f'_{11}(x) & f'_{12}(x) & \cdots & f'_{1l}(x) \\ f'_{21}(x) & f'_{22}(x) & \cdots & f'_{2l}(x) \\ \vdots & \vdots & & \vdots \\ f'_{m1}(x) & f'_{m2}(x) & \cdots & f'_{ml}(x) \\ \end{matrix} \right ) = \left ( \begin{matrix} \frac{ \partial y_{11}}{ \partial x} & \frac{ \partial y_{12}}{ \partial x} & \cdots & \frac{ \partial y_{1l}}{ \partial x} \\ \frac{ \partial y_{21}}{ \partial x} & \frac{ \partial y_{22}}{ \partial x} & \cdots & \frac{ \partial y_{2l}}{ \partial x} \\ \vdots & \vdots & & \vdots \\ \frac{ \partial y_{m1}}{ \partial x} & \frac{ \partial y_{m2}}{ \partial x} & \cdots & \frac{ \partial y_{ml}}{ \partial x} \\ \end{matrix} \right )_{(m,l)m\times l} \]

image-20220401120135813

3.2 \(\text{input}\)是一个向量

计算:输入是向量(\((n,1)\)),函数是一个实矩阵函数,结果是一个矩阵(\((m,l)\)

求导: 分母(函数值)是矩阵(\((m,l)\)),分子是向量(\((n,1)\)),结果是矩阵(\((m,l,n)m\times l \times n\)

3.3 \(\text{input}\)​是一个矩阵

计算:输入是矩阵(\((n,k)\)),函数是一个实矩阵函数,结果是一个矩阵(\((m,l)\)

求导: 分母(函数值)是矩阵(\((m,l)\)),分子是矩阵(\((n,k)\)),结果是矩阵(\((m,l,k,n)m\times l \times k \times n\)

总结

img

image-20220331192357424

样例

标量关于向量求导1.2

image-20220402102242277

向量关于向量求导2.2

image-20220402102209146

二 向量链式法则

image-20220402102028760

标量链式法则

\(x,u,y\)都是标量

\[y=f(u) , u=g(x) \\ \frac{\partial y}{\partial x} = \frac{\partial y}{\partial u}\frac{\partial u}{\partial x} \]

向量链式法则

标量关于向量求导

  • 中间变量是标量 1.1 1.2

    \[y=f(u) , u=g(\mathbfcal{x}) \\ \mathbfcal{x}_{(n,1)}、u_{(1,)}、y_{(1,)} \\ \frac{\partial y}{\partial \mathbfcal{x}}_{(1,n)} = \frac{\partial y}{\partial u}_{(1,)} \frac{\partial u}{\partial \mathbfcal{x}}_{(1,n)} \]

    image-20220402104700857

  • 中间变量是向量 1.2 , 2.2

    \[y=f(\mathbfcal{u}) , \mathbfcal{u}_{(k,1)}=\mathbfcal{g}(\mathbfcal{x}) \\ \mathbfcal{x}_{(n,1)}、\mathbfcal{u}_{(k,1)}、y_{(1,)} \\ \frac{\partial y}{\partial \mathbfcal{x}}_{(1,n)} = \frac{\partial y}{\partial \mathbfcal{u}}_{(1,k)} \frac{\partial \mathbfcal{u}}{\partial \mathbfcal{x}}_{(k,n)} \]

    image-20220402110032446

向量关于向量求导

  • 中间变量是向量 2.2 2.2

    \[\mathbfcal{y}_{(m,1)}=\mathbfcal{f}_{(m,1)}(\mathbfcal{u}_{(k,1)}) , \mathbfcal{u}_{(k,1)}=\mathbfcal{g}_{(k,1)}(\mathbfcal{x}_{(n,1)}) \\ \mathbfcal{x}_{(n,1)}、\mathbfcal{u}_{(k,1)}、\mathbfcal{y}_{(m,1)} \\ \frac{\partial \mathbfcal{y}}{\partial \mathbfcal{x}}_{(m,n)} = \frac{\partial \mathbfcal{y}}{\partial \mathbfcal{u}}_{(m,k)} \frac{\partial \mathbfcal{u}}{\partial \mathbfcal{x}}_{(k,n)} \]

样例

image-20220402111412097

image-20220402111719161

三 自动求导

image-20220403101900651

自动求导,是求导计算一个函数在指定值上的导数

计算图

image-20220403101958920

两种模式

image-20220403102044180

反向累积

image-20220403102255322

image-20220403102339593

复杂度

image-20220403102422729

代码实现

image-20220403114451327

image-20220403114510473

image-20220403114521486

posted @ 2022-04-03 12:40  英飞  阅读(551)  评论(0)    收藏  举报