【机器学习】Softmax 回归
Softmax 回归
多项分布
目标值 \(y\) 取值只能为 \(\{1,2,\cdots,k\}\) 中的一个,设每个取值的概率为 \(\phi_1,\phi_2,\cdots,\phi_k\),分布律为 \(p(y=k;\phi)=\phi_i\),参数为:
\[\phi=
\left[\begin{matrix}
\phi_1 \\
\phi_2 \\
\vdots \\
\phi_k
\end{matrix}\right]
\]
由于 \(\sum_{i=1}^k\phi_i=1\),所以 \(\phi_k=1-\sum_{i=1}^{k-1}\phi_i\)。
指示函数
我们称 \(l\{\cdot\}\) 为指示函数,括号内为一个逻辑判别式,指示函数的值就是判别式的真值。
例如:\(l\{1=1\}=True=1\),\(l\{2=3\}=False=0\)。
使用 GLM 推导假设函数
\[\begin{aligned}
p(y;\phi)&=\prod_{i=1}^{k}\phi_i^{l\{y=i\}} \\
&=\prod_{i=1}^{k-1}\phi_i^{l\{y=i\}}\cdot\phi_k^{1-\sum_{j=1}^{k-1}l\{y=j\}} \\
&=\exp(l\{y=1\}\log(\phi_1)+l\{y=2\}\log(\phi_2)+\cdots+\left(1-\sum_{j=1}^{k-1}l\{y=j\}\right)\log(\phi_k)) \\
&=\exp(l\{y=1\}\log(\phi_1/\phi_k)+l\{y=2\}\log(\phi_2/\phi_k)+\cdots+\log(\phi_k)) \\
&=1\cdot\exp\left(
    \left[\begin{matrix}
    \log(\phi_1/\phi_k) \\
    \log(\phi_2/\phi_k) \\
    \vdots \\
    \log(\phi_{k-1}/\phi_k)
    \end{matrix}\right]^T
    \left[\begin{matrix}
    l\{y=1\} \\
    l\{y=2\} \\
    \vdots \\
    l\{y=k-1\}
    \end{matrix}\right]
    +\log(\phi_k)
\right) \\
&=b(y)\exp\left(\eta^TT(y)-a(\eta)\right) \\
\\
\eta&=
\left[\begin{matrix}
    \log(\phi_1/\phi_k) \\
    \log(\phi_2/\phi_k) \\
    \vdots \\
    \log(\phi_{k-1}/\phi_k)
    \end{matrix}\right] \\
T(y)&=
\left[\begin{matrix}
    l\{y=1\} \\
    l\{y=2\} \\
    \vdots \\
    l\{y=k-1\}
    \end{matrix}\right] \\
a(\eta)&=-\log(\phi_k) \\
b(y)&=1 \\
\end{aligned}\]
由此可得:
\[\begin{aligned}
\eta_i&=\log\frac{\phi_i}{\phi_k} \\
\phi_ke^{\eta_i}&=\phi_i \\
\phi_k\sum_{i=1}^ke^{\eta_i}&=\sum_{i=1}^k\phi_k=1 \\
\phi_i&=\frac{e^{\eta_i}}{\sum_{j=1}^{k}e^{\eta_j}} \\
p(y=i\mid x;\theta)&=\phi_i \\
&=\frac{e^{\eta_i}}{\sum_{j=1}^{k}e^{\eta_j}} \\
&=\frac{e^{\theta_i^Tx}}{\sum_{j=1}^{k}e^{\theta_i^Tx}}
\end{aligned}\]
注意:在 GLM 中,我们的预测目标是T(y),所以这里预测输出不直接是类别的编号 \(y\),并且对每个类别,我们需要分别训练一个参数 \(\theta_i\)。
所以,假设函数为:
\[\begin{aligned}
h_\theta(x)&=E[T(y)\mid x;\theta] \\
&=E\left[
    \begin{array}{c|}
    l\{y=1\} \\
    l\{y=2\} \\
    \vdots \\
    l\{y=k-1\}
    \end{array}\quad
    x;\theta
\right] \\
&= \left[
    \begin{array}{c}
    \phi_1 \\
    \phi_2 \\
    \vdots \\
    \phi_{k-1}
    \end{array}
\right] \\
&= \left[
    \begin{array}{c}
    \frac{\exp(\theta_1^Tx)}{\sum_{j=1}^{k}\exp(\theta_j^Tx)} \\
    \frac{\exp(\theta_2^Tx)}{\sum_{j=1}^{k}\exp(\theta_j^Tx)} \\
    \vdots \\
    \frac{\exp(\theta_{k-1}^Tx)}{\sum_{j=1}^{k}\exp(\theta_j^Tx)} \\
    \end{array}
\right]
\end{aligned}\]
似然函数的推导
假设总共m个样本:
\[\begin{aligned}
L(\theta)&=\prod_{i=1}^{m}p(y^{(i)}\mid x^{(i)};\theta) \\
\ell(\theta)&=\sum_{i=1}^{m}\log p(y^{(i)}\mid x^{(i)};\theta) \\
&=\sum_{i=1}^{m}\log\prod_{l=1}^{k}\left(\frac{e^{\theta_l^Tx^{(i)}}}{\sum_{j=1}^{k}e^{\theta_j^Tx^{(i)}}}\right)^{l\{y^{(i)}=l\}}
\end{aligned}\]
所以 Softmax 回归的最优化问题就是:
\[\begin{aligned}
\max\limits_{\theta}\ell(\theta)
\end{aligned}\]
 
                    
                     
                    
                 
                    
                
 
                
            
         
         浙公网安备 33010602011771号
浙公网安备 33010602011771号