# logistic回归和最大熵

#### 0 - logistic分布

$F(x) = P(X \leq x)=\frac{1}{1+e^{-(x-\mu)/\gamma}}$

$f(x) = F'(x) = \frac{e^{-(x-\mu)/\gamma}}{1+e^{-(x-\mu)/\gamma}}$

$F(-x+\mu)-\frac{1}{2} = -F(x-\mu)+\frac{1}{2}$

$\gamma$参数越小，那么该曲线越往中间缩，则中心附近增长越快

#### 1 - 二项logistic回归

$h_\theta(\bf x)=g(\theta^T\bf x) = \frac{1}{1+e^{-\theta^T\bf x }}$

$P(y=1|\bf x;\theta) = h_\theta(\bf x)$
$P(y=0|\bf x;\theta) =1- h_\theta(\bf x)$
$\log\frac{P(y=1|\bf x;\theta) }{1-P(y=1|\bf x;\theta) }=\theta^T\bf x$

$p(y|\bf x; \theta) = (h_\theta(x))^y(1-h_\theta(\bf x))^{1-y}$

$\begin{eqnarray}L(\theta) &=&\prod_{i=1}^mp(y^{(i)}|x^{(i)};\theta)\\ &=&\prod_{i=1}^m(h_\theta(x^{(i)}))^{y^{(i)}}(1-h_\theta(\bf x^{(i)}))^{1-y^{(i)}} \end{eqnarray}$

$\begin{eqnarray}\it l(\theta) &=&\log L(\theta)\\ &=&\sum_{i=1}^my^{(i)}\log h(x^{(i)})+(1-y^{(i)})\log (1-h(x^{(i)})) \end{eqnarray}$

ps:上述式子是单样本下梯度更新过程，且基于第$j$个参数（标量）进行求导，即涉及到输入样本$x$的第$j$个元素$x_j$

ps:上面式子是加号而不是减号，是因为这里是为了最大化，而不是最小化

#### 2 - 多项logistic回归

$P(Y=k|x) = \frac{e^{(\theta_k* \bf x)}}{1+\sum_{k=1}^{K-1}e^{(\theta_k* \bf x)}},k=1,2,...K-1$

$P(Y=K|x) = \frac{1}{1+\sum_{k=1}^{K-1}e^{(\theta_k* \bf x)}}$

#### 3 - softmax

logistic回归模型的代价函数为：

$J(\theta) = -\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)}\log h_\theta({\bf x}^{(i)})+(1-y^{(i)})\log (1-h_\theta({\bf x}^{(i)})) \right]$

$\begin{eqnarray}J(\theta) &=& -\frac{1}{m}\left[\sum_{i=1}^{m} \sum_{j=1}^K1\{y^{(i)}=j\}\log \frac{e^{\theta_j^T{\bf x}^{(i)}}}{\sum_{l=1}^Ke^{\theta_l^T{\bf x}^{(i)}}}\right]\\ &=& -\frac{1}{m}\left[\sum_{i=1}^{m} \sum_{j=1}^K1\{y^{(i)}=j\}\left[\log {e^{\theta_j^T{\bf x}^{(i)}}}-\log{\sum_{l=1}^Ke^{\theta_l^T{\bf x}^{(i)}}}\right]\right] \end{eqnarray}$

$\begin{eqnarray}\nabla_{\theta_j}J(\theta) &=&-\frac{1}{m}\sum_{i=1}^{m}1\{y^{(i)}=j\}\left[\frac{e^{\theta_j^T{\bf x}^{(i)}}* {\bf x}^{(i)}}{e^{\theta_j^T{\bf x}^{(i)}}}-\frac{e^{\theta_j^T{\bf x}^{(i)}}* {\bf x}^{(i)}}{\sum_{l=1}^Ke^{\theta_l^T{\bf x}^{(i)}}}\right]\\ &=&-\frac{1}{m}\sum_{i=1}^{m}1\{y^{(i)}=j\}\left[{\bf x}^{(i)}-\frac{e^{\theta_j^T{\bf x}^{(i)}}* {\bf x}^{(i)}}{\sum_{l=1}^Ke^{\theta_l^T{\bf x}^{(i)}}}\right]\\ &=&-\frac{1}{m}\sum_{i=1}^{m}{\bf x}^{(i)}\left(1\{y^{(i)}=j\}-\frac{e^{\theta_j^T{\bf x}^{(i)}}}{\sum_{l=1}^Ke^{\theta_l^T{\bf x}^{(i)}}}\right)\\ &=&-\frac{1}{m}\sum_{i=1}^{m}{\bf x}^{(i)}\left[1\{y^{(i)}=j\} - p(y^{(i)}=j|{\bf x}^{(i)};\theta)\right] \end{eqnarray}$

ps:因为在关于$\theta_j$求导的时候，其他非$\theta_j$引起的函数对该导数为0。所以$\sum_{j=1}^K$中省去了其他部分
ps:这里的$\theta_j$不同于逻辑回归部分，这里是一个向量;

#### 4 - softmax与logistic的关系

$\begin{eqnarray}J(\theta) &=& -\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)}\log h_\theta({\bf x}^{(i)})+(1-y^{(i)})\log (1-h_\theta({\bf x}^{(i)})) \right]\\ &=& -\frac{1}{m}\left[\sum_{i=1}^{m} \sum_{j=0}^11\{y^{(i)}=j\}\log p(y^{(i)}=j|{\bf x}^{(i)};\theta)\right] \end{eqnarray}$

[] 李航，统计学习方法
[] 周志华，机器学习
[] CS229 Lecture notes Andrew Ng
[] ufldl
[] Foundations of Machine Learning

posted @ 2018-10-11 16:37  仙守  阅读(321)  评论(0编辑  收藏  举报