常用矩阵导数公式

    1 矩阵\(Y=f(x)\)对标量x求导

         矩阵Y是一个\(m\times n\)的矩阵,对标量x求导,相当于矩阵中每个元素对x求导

    \[\frac{dY}{dx}=\begin{bmatrix}\dfrac{df_{11}(x)}{dx} & \ldots & \dfrac{df_{1n}(x)}{dx} \\ \vdots & \ddots &\vdots \\ \dfrac{df_{m1}(x)}{dx} & \ldots & \dfrac{df_{mn}(x)}{dx} \end{bmatrix}\]

    2 标量y=f(x)对矩阵X求导

         注意与上面不同,这次括号内是求偏导,\(X\)是是一个\(m\times n\)的矩阵,函数\(y=f(x)\)对矩阵\(X\)中的每个元素求偏导,对\(m\times n\)矩阵求导后还是\(m\times n\)矩阵

    \[\frac{dy}{dX} = \begin{bmatrix}\dfrac{\partial f}{\partial x_{11}} & \ldots & \dfrac{\partial f}{\partial x_{1n}}\\ \vdots & \ddots & \vdots \\\dfrac{\partial f}{\partial x_{m1}} & \ldots & \dfrac{\partial f}{\partial x_{mn}}\end{bmatrix}\]


    3 函数矩阵Y对矩阵X求导

    矩阵\(Y=F(x)\)对每一个\(X\)的元素求导,构成一个超级矩阵

    \[F(x)=\begin{bmatrix}f_{11}(x) & \ldots &  f_{1n}(x)\\ \vdots & \ddots &\vdots \\ f_{m1}(x) & \ldots & f_{mn}(x) \end{bmatrix}\]

    \[X=\begin{bmatrix}x_{11} & \ldots &  x_{1s}\\ \vdots & \ddots &\vdots \\ x_{r1} & \ldots & x_{rs}\end{bmatrix}\]

    \[\frac{dF}{dX} = \begin{bmatrix}\dfrac{\partial F}{\partial x_{11}} & \ldots & \dfrac{\partial F}{\partial x_{1s}}\\ \vdots & \ddots & \vdots \\\dfrac{\partial F}{\partial x_{r1}} & \ldots & \dfrac{\partial F}{\partial x_{rs}}\end{bmatrix}\]

    其中

    \[\frac{\partial F}{\partial x_{ij}} = \begin{bmatrix}\dfrac{\partial f_{11}}{\partial x_{ij}} & \ldots & \dfrac{\partial f_{1n}}{\partial x_{ij}}\\ \vdots & \ddots & \vdots \\\dfrac{\partial f_{m1}}{\partial x_{ij}} & \ldots & \dfrac{\partial f_{mn}}{\partial x_{ij}}\end{bmatrix}\]

    4 向量导数

    若\(m\times 1\)向量函数\(y=[y_1,y_2,…,y_m]^T\),其中,\(y_1,y_2,…,y_m\)是向量的标量函数。\(x\)是\(n\times 1\)向量。则有

    \[\frac{\partial Y}{\partial X^T} = \begin{bmatrix}\dfrac{\partial y_1}{\partial x_1} & \ldots & \dfrac{\partial y_1}{\partial x_n}\\ \vdots & \ddots & \vdots \\\dfrac{\partial y_m}{\partial x_1} & \ldots & \dfrac{\partial y_m}{\partial x_n}\end{bmatrix}\]

    这是一个\(m\times n\)矩阵,称作向量函数\(y\)的Jacobi矩阵。

    若\(y=[x_1,x_2,…,x_n]\),则有

    \[\frac{\partial{x^T}}{\partial{x}}=I \tag{$1$}\]

    其中,\(I\)是单位矩阵。

    若\(A\)和\(y\)均与向量\(x\)无关,则

    \[\frac{\partial{x^TAy}}{\partial{x}}=\frac{\partial{x^T}}{\partial{x}}Ay=Ay \tag{$1$}\]

    注意到:\(y^TAx=<A^Ty,x>=<x,A^Ty>=x^TA^Ty\), 向量内积的公式,故

    \[\frac{\partial{y^TAx}}{\partial{x}}=\frac{\partial{x^TA^Ty}}{\partial{x}}=A^Ty \tag{$2$}\]

    由于\(x^TAx=\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}A_{ij}x_ix_j\)

    可求出梯度\(\frac{\partial{x^TAx}}{\partial{x}}\)的第k个分量为

    \[\bigg[\frac{\partial{x^TAx}}{\partial{x}}\bigg]_k=\frac{\partial}{\partial{x_k}}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}A_{ij}x_ix_j=\sum\limits_{i=1}^nA_{ik}x_i+\sum\limits_{j=1}^{n}A_{kj}x_j\]

    即有公式

    \[\frac{\partial{x^TAx}}{\partial{x}}=Ax+A^Tx \tag{$3$}\]

    特别地,若\(A\)为对称矩阵,则有\(\frac{\partial{x^TAx}}{\partial{x}}=2Ax\)

    用上面三个公式,我们能够得到更多的实值函数\(f(x)\)相对于列向量\(x\)的几个常用梯度公式:

    若\(f(x)=c\)为常数,则有梯度 \(\frac{\partial{c}}{\partial{x}}=0\)

    线性法则:若\(f(x)\)和\(g(x)\)分别是向量\(x\)的实值函数,\(c_1\)和\(c_2\)为实常数,则有

    \[\frac{\partial{[c_1f(x)+c_2g(x)]}}{\partial{x}}=c_1\frac{\partial{f(x)}}{\partial{x}}+c_2\frac{\partial{g(x)}}{\partial{x}}\]

    乘积法则:若\(f(x)\)和\(g(x)\)都是向量\(x\)的实值函数,则

    \[\frac{\partial{f(x)g(x)}}{\partial{x}}=g(x)\frac{\partial{f(x)}}{\partial{x}}+f(x)\frac{\partial{g(x)}}{\partial{x}}\]

    若\(f(x)\),\(g(x)\)和\(h(x)\)都是向量\(x\)的实值函数,则

    \[\frac{\partial{f(x)g(x)h(x)}}{\partial{x}}=g(x)h(x)\frac{\partial{f(x)}}{\partial{x}}+f(x)h(x)\frac{\partial{g(x)}}{\partial{x}}+f(x)g(x)\frac{\partial{h(x)}}{\partial{x}}\]

    商法则:若\(g(x)\neq0\),则

    \[\frac{\partial{f(x)/g(x)}}{\partial{x}}=\frac{1}{g^2(x)}\big[g(x)\frac{\partial{f(x)}}{\partial{x}}-f(x)\frac{\partial{g(x)}}{\partial{x}}\big]\]

    链式法则:若\(y(x)\)是\(x\)的向量值函数,则

    \[\frac{\partial{f(y(x))}}{\partial{x}}=\frac{\partial{y^T(x)}}{\partial{x}}\frac{\partial{f(y)}}{\partial{y}}\]

    其中,\(\frac{\partial{y^T(x)}}{\partial{x}}\)为\(n\times n\)矩阵。

    若\(n\times 1\)向量\(\alpha\) 与\(x\)是无关的常数向量,则

    \[\frac{\partial{\alpha^Ty(x)}}{\partial{x}}=\frac{\partial{y^T(x)}}{\partial{x}}\alpha\]

    \[\frac{\partial{y^T(x)\alpha}}{\partial{x}}=\frac{\partial{y^T(x)}}{\partial{x}}\alpha\]

    令\(x\)为\(n\times 1\)向量,\(\alpha\)为\(m\times 1\)常数向量,\(A\)和\(B\)分别为\(m\times n\)和\(m\times m\)常数矩阵,且\(B])为对称矩阵,则

    \[\frac{\partial{(\alpha-Ax)^TB(\alpha-Ax)}}{\partial{x}}=-2A^TB(\alpha-Ax)\]


    5 迹函数的梯度矩阵

    二次项目标函数可以利用矩阵的迹重写,因为一标量可以视为\(1\times 1\)矩阵。所以二次项目标函数的迹直接等于函数本身,即

    \[f(x)=x^TAx=tr(x^TAx)=tr(Axx^T)\]

    \[\frac{\partial{tr(A)}}{\partial{A}}=I \tag{$1$}\]

    \[\frac{\partial{tr(AB)}}{\partial{A}}=B^T \tag{$2$}\]

    由于\(tr(xy^T)=tr(yx^T)=X^Ty\),所以

    \[\frac{\partial{tr(xy^T)}}{\partial{x}}=\frac{\partial{tr(yx^T)}}{\partial{x}}=y \tag{$3$} \]

    \(m\times m\)矩阵\(W\)可逆时,有

    \[\frac{\partial{tr(W^{-1})}}{\partial{W}}=-(W^{-1})^T \tag{$4$}\]

    另外几个公式:

    \[\frac{\partial{f(A)}}{\partial{A^T}}=(\frac{\partial{f(A)}}{\partial{A}})^T\]

    \[\frac{\partial{tr(ABA^TC)}}{\partial{A}}= CAB + C^TAB^T \]

    \[\frac{\partial{|A|}}{\partial{A}}=|A|(A^{-1})^T\]

posted on 2017-09-24 18:57  迈克老狼2012  阅读(17310)  评论(0编辑  收藏  举报

导航