线性回归推导

矩阵求导相关资料

因为我们所求的都是梯度, 所以, 本文采用的求导方法为分母布局
首先, 要求的拟合函数为

\[\begin{align*} y = X*W + b \end{align*} \]

其中

\[\begin{align*} \mathbf{X}=\begin{bmatrix}x_{1}\\ x_{2}\\ \vdots\\ x_{m} \end{bmatrix}, \mathbf{W}=\begin{bmatrix}w_{1} \end{bmatrix}, \mathbf{b}=\begin{bmatrix}b_{1} \end{bmatrix} \end{align*} \]

为了方便, 可以把X增广一维, 变为

\[\begin{align*} \mathbf{X}=\begin{bmatrix}x_{1} & 1\\ x_{2}& 1\\ \vdots\\ x_{m}& 1\end{bmatrix}, \mathbf{W}=\begin{bmatrix}w_{1} \\ b1 \end{bmatrix} \end{align*} \]

那么, 拟合函数就变成了

\[\begin{align*} y = X * W \end{align*} \]

常见矩阵求导公式

首先给出一个m*n的矩阵A和一个n*1的列向量x

\[\begin{align*} A=\begin{bmatrix} a_{11} & a_{12} & \cdots & x_{1n}\\ a_{21} & a_{22} & \cdots & x_{2n}\\ \vdots & \vdots & & \vdots\\ a_{m1} & a_{m2} & \cdots& x_{mn}\\ \end{bmatrix} \end{align*} \]

\[\begin{align*} x=\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \end{align*} \]

\[\begin{align*} Ax = \begin{bmatrix} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n\\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n\\ \vdots \\ a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n\\ \end{bmatrix} \end{align*} \]

Ax对x求偏导, 得

\[\begin{align*} \frac{\partial Ax}{\partial x} = \begin{bmatrix} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ \vdots & \vdots & & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{mn} \\ \end{bmatrix} = A^T \end{align*} \]

Ax对\(x^T\)求偏导, 得

\[\begin{align*} \frac{\partial Ax}{\partial x^T} = \begin{bmatrix} a_{11} & a_{12} & \cdots & x_{1n}\\ a_{21} & a_{22} & \cdots & x_{2n}\\ \vdots & \vdots & & \vdots\\ a_{m1} & a_{m2} & \cdots& x_{mn}\\ \end{bmatrix} = A \end{align*} \]

\[\begin{align*} \frac{\partial x^T A}{\partial x} &= \left[ \left( \frac{\partial x^T A}{\partial x} \right)^T\right]^T \\ &= \left[ \frac{(\partial x^T A)^T}{\partial x^T}\right]^T \\ &= \left[ \frac{\partial A^T x}{\partial x^T}\right]^T \\ &= (A^T)^T \\ &= A \end{align*} \]

\[\begin{align*} x^Tx = \begin{bmatrix}x_{11}^2 + x_{22}^2 + x_{nn}^2\end{bmatrix} \\ \frac{\partial x^Tx}{\partial x} = 2x \end{align*} \]

标量对向量复合函数求导公式为

\[\begin{align*} u=u(x), v=v(x) \\ \frac{\partial uv}{\partial x} = \frac{\partial u}{\partial x}v + u\frac{\partial v}{\partial x} \\ \frac{\partial x^Tx}{\partial x} = \frac{\partial x^T}{\partial x}x + x^T\frac{\partial x}{\partial x} = x+(x^T)^T= 2x \end{align*} \]

Loss函数求导

Loss函数为

\[\begin{align*} L(W) = \frac{1}{2m} (XW - y)^2 = \frac{1}{2m} (XW - y)^T(X W - y) \end{align*} \]

解法1

\[\begin{align*} Z = f(W) = XW - y \end{align*} \]

其导函数为(注意顺序不能乱)

\[\begin{align*} \frac{\partial L(W)}{\partial W} = \frac{\partial f(W)}{ \partial W} \frac{\partial L(f(W))}{ \partial f(W)} \end{align*} \]

其中

\[\begin{align*} \frac{\partial f(W) }{\partial W} &= \frac{\partial (XW - y)}{\partial W} \\ &= \frac{\partial XW}{\partial W} - \frac{\partial y}{\partial W} \\ &= X^T - 0 \\ &= X^T \end{align*} \]

\[\begin{align*} \frac{\partial L(f(W))}{\partial f(W)} &= \frac{1}{2m}\frac{\partial Z^TZ}{\partial Z} \\ &= \frac{1}{2m}2Z \\ &= \frac{1}{m} (XW - y) \end{align*} \]

因此

\[\begin{align*} \frac{\partial L(W)}{\partial W} &= \frac{\partial f(W)}{ \partial W} \frac{\partial L(f(W))}{ \partial f(W)} \\ &= \frac{1}{m}X^T(XW - y) \end{align*} \]

解法2

\[\begin{align*} \frac{\partial f(W) }{\partial W} &= \frac{1}{2m}\frac{\partial (XW - y)^T(X W - y)}{\partial W} \\ &= \frac{1}{2m}\left[\frac{\partial W^TX^TXW}{\partial W} - \frac{\partial y^TXW}{\partial W} - \frac{\partial W^TX^Ty}{\partial W} + \frac{\partial y^Ty}{\partial W} \right]\\ &= \frac{1}{2m}\left[(\frac{\partial W^TX^TX}{\partial W}W + W^TX^TX\frac{\partial W}{\partial W}) - (y^TX)^T - (X^Ty) + 0 \right]\\ &= \frac{1}{2m}\left[X^TXW + (W^TX^TX)^T - 2X^Ty \right]\\ &= \frac{1}{2m}\left[2X^T(XW - y)\right] \\ &= \frac{1}{m}X^T(XW - y) \end{align*} \]

未经允许禁止转载 https://spxcds.com/2018/10/07/linear_regression

posted @ 2019-01-01 11:38  spxcds  阅读(239)  评论(0编辑  收藏  举报