# 最大熵模型MaxEnt

### 最大熵模型

$$max \; H(Y|X)=-\sum_i{\sum_j{p(x_i,y_j)logp(y_j|x_i)}}$$

注：本文我们讨论的$x,y$都是离散随机变量。

$$f(x,y)=\left\{\begin{matrix} 1 & if \; x,y满足某个条件 \\ 0 & otherwise \end{matrix}\right.$$

$$f_1(x,y)=\left\{\begin{matrix} 1 & if \; x>3且y="张三" \\ 0 & otherwise \end{matrix}\right.$$

$$f_2(x,y)=\left\{\begin{matrix} 1 & if \; x=0且y="李四" \\ 0 & otherwise \end{matrix}\right.$$

x和y的联合分布的经验分布：

$$\tilde{p}(x=x_i,y=y_j)=\frac{count(x=x_i,y=y_j)}{N}$$

$x,y$都是随机变量，$x_i,y_j$是其具体的取值，$N$是样本的总量。

x的经验分布：

$$\tilde{p}(x=x_i)=\frac{count(x=x_i)}{N}$$

$$E_{\tilde{p}}(f)=\sum_i{\sum_j{\tilde{p}(x_i,y_j)f(x_i,y_j)}}=\frac{1}{N}\sum_i{\sum_j{f(x_i,y_j)}}$$

$$E_{p}(f)=\sum_i{\sum_j{p(x_i,y_j)f(x_i,y_j)}} \approx \sum_i{\sum_j{\tilde{p}(x_i)p(y_j|x_i)f(x_i,y_j)}}$$

$$min \; -H(y|x)=\sum_i{\sum_j{\tilde{p}(x_i)p(y_j|x_i)logp(y_j|x_i)}}$$

$$s.t.\left\{\begin{matrix}\sum_j{p(y_j|x_i)}=1 & \forall{i} \\ E_{p}(f_k)=E_{\tilde{p}}(f_k) & \forall{k} \end{matrix}\right.$$

### 模型求解

$$\underset{p}{arg \; min}L(p;w,\lambda)=\sum_i{\sum_j{\tilde{p}(x_i)p(y_j|x_i)logp(y_j|x_i)}}+\sum_i{w_i\left(1-\sum_j{p(y_j|x_i)}\right)}+\sum_k{\lambda_k\left[\sum_i{\sum_j{\tilde{p}(x_i,y_j)f_k(x_i,y_j)}}-\sum_i{\sum_j{\tilde{p}(x_i)p(y_j|x_i)f_k(x_i,y_j)}}\right]} \label{L}$$

KKT条件指出上述问题与$\underset{w,\lambda}{max} \; \underset{p}{min} \; L(p;w,\lambda)$等价。下面就分step1和step2两步走，先调$p$求极小，再调$w,\lambda$求极大。

Step 1

$$\frac{\partial L(p;w,\lambda)}{\partial p(y_j|x_i)}=\tilde{p}(x_i)[logp(y_j|x_i)+1]-w_i-\sum_k{\lambda_k\tilde{p}(x_i)f_k(x_i,y_j)}$$

$$=\tilde{p}(x_i)\left[logp(y_j|x_i)+1-\frac{w_i}{\tilde{p}(x_i)}-\sum_k{\lambda_kf_k(x_i,y_j)}\right]=0$$

$$\therefore p(y_j|x_i)=exp\left\{-1+\frac{w_i}{\tilde{p}(x_i)}+\sum_k{\lambda_kf_k(x_i,y_j)}\right\}=\frac{exp\left\{\sum_k{\lambda_kf_k(x_i,y_j)}\right\}}{exp\left\{1-\frac{w_i}{\tilde{p}(x_i)}\right\}}$$

$$\because \sum_jp(y_j|x_i)=1$$

$$\therefore \frac{\sum_jexp\left\{\sum_k{\lambda_kf_k(x_i,y_j)}\right\}}{\sum_jexp\left\{1-\frac{w_i}{\tilde{p}(x_i)}\right\}}=\frac{\sum_jexp\left\{\sum_k{\lambda_kf_k(x_i,y_j)}\right\}}{exp\left\{1-\frac{w_i}{\tilde{p}(x_i)}\right\}}=1$$

$$\therefore exp\left\{1-\frac{w_i}{\tilde{p}(x_i)}\right\}=\sum_jexp\left\{\sum_k{\lambda_kf_k(x_i,y_j)}\right\}$$

$$\therefore p(y_j|x_i)=\frac{exp\left\{\sum_k{\lambda_kf_k(x_i,y_j)}\right\}}{\sum_jexp\left\{\sum_k{\lambda_kf_k(x_i,y_j)}\right\}} \label{p}$$

Step 2

$$Z_i=\sum_j{exp\left\{\sum_k{\lambda_kf_k(x_i,y_j)}\right\}}$$

$$p(y_j|x_i)=\frac{exp\left\{\sum_k{\lambda_kf_k(x_i,y_j)}\right\}}{Z_i}$$

$$logp(y_j|x_i)=\sum_k{\lambda_kf_k(x_i,y_j)}-logZ_i \label{log}$$

$$\sum_jp(y_j|x_i)=1 \label{s1}$$

$$\underset{w,\lambda}{arg \; max} \; L(w,\lambda;p)=\sum_i{\sum_j{\tilde{p}(x_i)p(y_j|x_i)\left[\sum_k{\lambda_kf_k(x_i,y_j)}-logZ_i\right]}}+\sum_k{\lambda_k\left[\sum_i{\sum_j{\tilde{p}(x_i,y_j)f_k(x_i,y_j)}}-\sum_i{\sum_j{\tilde{p}(x_i)p(y_j|x_i)f_k(x_i,y_j)}}\right]}$$

$$=\sum_i\sum_j\sum_k\tilde{p}(x_i,y_j)\lambda_kf_k(x_i,y_j)-\sum_i\sum_j\tilde{p}(x_i)p(y_j|x_i)logZ_i$$

$$=\sum_i\sum_j\sum_k\tilde{p}(x_i,y_j)\lambda_kf_k(x_i,y_j)-\sum_i\tilde{p}(x_i)logZ_i\sum_jp(y_j|x_i)$$

$$=\sum_i\sum_j\sum_k\tilde{p}(x_i,y_j)\lambda_kf_k(x_i,y_j)-\sum_i\tilde{p}(x_i)logZ_i$$

posted @ 2017-07-01 15:43  张朝阳  阅读(1298)  评论(0编辑  收藏