## PRML读书笔记——线性回归模型（上）

Given a training data set comprising $N$ observations $\{x_n\}$, where $n = 1, ... , N$, together with corresponding target values $\{t_n\}$, the goal is to predict the value of $t$ for a new value of $x$.

## 线性基函数模型（Linear Basis Function Models）

$y(\mathbf{x}, \mathbf{w}) = w_o + w_1 x_1 + ... + w_D x_D$

$y(\mathbf{x}, \mathbf{w}) = w_0 + \sum_{j=1}^{M-1}{w_j \phi_j(\mathbf{x})}$

$y(\mathbf{x}, \mathbf{w}) = \sum_{j=0}^{M-1}{w_j \phi_j(\mathbf{x})} = \mathbf{w}^T \mathbf{\phi(x)}$

1. Power basis：$\phi_j(x) = x^j$
2. Gaussian basis：$\phi_j(x) = \exp\{-\frac{(x-\mu_j)^2}{2s^2}\}$
3. Sigmoidal basis：$\phi_j(x) = \sigma(\frac{x-\mu_j}{s})$，其中：$\sigma(x) = \frac{1}{1+\exp(-x)}$
4. Fourier basis：类似于信号处理里的小波变换

### 最大似然和最小二乘法

$t \sim p(t|\mathbf{x}, \mathbf{w}, \beta)=\mathcal{N}(t|y(\mathbf{x}, \mathbf{w}), \beta^{-1})$

$E[L]=\int\int{\{y(x)-t\}^2p(x, t)dxdt}$

$y(\mathbf{x})=E[t|\mathbf{x}]=\int{tp(t|\mathbf{x})}dt=y(\mathbf{x}, \mathbf{w})$

$p(\mathbf{t}|\mathbf{w}, \beta) = \prod_{n=1}^N{\mathcal{N}(t_n|\mathbf{w}^T\mathbf{\phi(x_n)}, \beta^{-1})}$

\begin{align*} \ln{p(\mathbf{t}|\mathbf{w}, \beta)} & = \sum_{n=1}^N{\ln\mathcal{N}(t_n|\mathbf{w}^T\mathbf{\phi(x_n)}, \beta^{-1})} \\ & = \frac{N}{2}\ln\beta - \frac{N}{2}\ln(2\pi) - \beta E_D(\mathbf{w}) \end{align*}

\begin{align*} \mathbf{w}_{ML} & = (\mathbf{\Phi}^T\mathbf{\Phi})^{-1}\mathbf{\Phi}^T\mathbf{t} \\ \beta_{ML}^{-1} & = \frac{1}{N}\sum_{n=1}^N{\{t_n-\mathbf{w}_{ML}^T\mathbf{\phi(x_n)}\}^2} \end{align*}

### 序列学习算法

SGD应用在线性回归模型时的更新法则如下：
$\mathbf{w}^{(\tau+1)}=\mathbf{w}^{(\tau)}+\eta(t_n-\mathbf{w}^{(\tau)T}\mathbf{\phi_n})\mathbf{\phi_n}$

### 正则化的最小二乘法

$E_W(\mathbf{w}) = \frac{1}{2}\mathbf{w^Tw}$

$E(\mathbf{w})=\frac{1}{2}\sum_{n=1}^N\{t_n-\mathbf{w}^T\mathbf{\phi(x_n)}\}^2+\frac{\lambda}{2}\mathbf{w^Tw}$

L2正则又被称为weight decay技术，这是因为如果将上式对$\mathbf{w}$求导，我们将得到一个与$\mathbf{w}$成正比的衰减项。更一般的正则项具有如下的形式：
$E_W(\mathbf{w}) = \frac{1}{2}\sum_{j=1}^M|w_j|^q$

## Bias-Variance分解

$E[L]=\int\int{\{y(\mathbf{x})-t\}^2p(\mathbf{x}, t)d\mathbf{x}dt}$

$E[L] = \int{\{y(\mathbf{x})-h(\mathbf{x})\}^2p(\mathbf{x})}d\mathbf{x} + \int{\{h(\mathbf{x})-t\}^2p(\mathbf{x}, t)}d\mathbf{x}dt$

$h(\mathbf{x})=E[t|\mathbf{x}]=\int{tp(t|\mathbf{x})}dt$

$E_{\mathcal{D}}[\{y(\mathbf{x}, \mathcal{D})-h(\mathbf{x})\}^2] = \{E_{\mathcal{D}}[y(\mathbf{x}, \mathcal{D})]-h(\mathbf{x})\}^2 + E_{\mathcal{D}}[\{y(\mathbf{x}, \mathcal{D})-E_{\mathcal{D}}[y(\mathbf{x}, \mathcal{D})]\}^2]$

posted on 2017-05-08 11:56  公子天  阅读(...)  评论(... 编辑 收藏

• 随笔 - 11
• 文章 - 0
• 评论 - 16
• 引用 - 0