1.
\[g(z)=
\begin{cases}
0,z<0;\\
0.5,z=0;\\
1,z>0;\\
\end{cases}
\]
2
\[左图黑线的函数形式:y=\frac{1}{1+e^{-z}}
\]
\[把z=w^Tx+b带入上式得:y=\frac{1}{1+e^{-(w^Tx+b)}}
\]
3
\[P(y=1|x)=\frac{e^{\theta x}}{1+e^{\theta x}}
\]
\[P(y=0|x)=\frac{1}{1+e^{\theta x}}
\]
\[P(y_i|x_i;\theta)=P(y_i=1|x)^{y_i} P(y_i=0|x_i)^{1-y_i}
\]
4
\[最大似然估计:l(\theta)=\sum_{i=1}^m lnP(y_i|x_i;\theta)=\sum_{i=1}^m(y_i(\theta^Tx_i)-ln(1+e^{\theta^T x}))
\]
\[\frac{\partial l(\theta)}{\partial \theta}=\sum_{i=1}^my_i x_i-\sum_{i=1}^m\frac{x_i e^{\theta^T x_i}}{1+e^{\theta ^Tx_i}}=\sum_{i=1}^m(y_i-sigmoid(\theta^T x))x_i
\]
5
\[\theta^{t+1}=\theta^t-\frac{\partial l(\theta)}{\partial \theta}=\theta^t-\alpha\sum_{i=1}^n(y_i-sigmoid(\theta^Tx_i))x_i
\]
6
\[随机变量X的概率分布为:P(X=x_i)=p_i,i=1,2,...,n
\]
\[随机变量X的熵为:H(p)=-\sum_{i=1}^np_ilog p_i
\]
\[g_R(D,A)=\frac{g(D,A)}{H_A(D)} H_A(D)=-\sum_{i=1}^n\frac{|D_i|}{|D|}log_2\frac{|D_i|}{D},n是特征A取值的个数
\]
\[C_\alpha(T)=C(T)+\alpha|T|
\]
7.
\[假设输入空间被划分成M个单元R_1,R_2,...,R_M,并在每个单元R_m上有一固定值c_m。我们使用平方误差\sum_{x_i\in R_m}(y_i-f(x_i))^2来表示回归树对于训练数据的预测误差,可得c_m=average(y_i|x_i\in R_m)
\]
\[选择第j个变量x^{(j)}和他的值s作为切分点把x分成两个区域:R_1(j,s)={x|x^{(j)}<=s}和R_2,然后求解最优切分变量j和最优切分点s,具体地,求解下式:
\]
\[min_{j,s}(min_{c_1} \sum_{x_i\in R_1(j,s)}(y_i-c_1)^2+min_{c_2} \sum_{x_i\in R_2(j,s)}(y_i-c_2)^2)
\]
8
\[给定样本D,其基尼指数为:Gini(D)=1-\sum_{k=1}^K(\frac{|C_k|}{|D|})^2,C_k是D中属于第k类的样本子集,K是样本的类别
\]
\[在特征A的条件下,集合D的基尼指数定义为:Gini(D,A)=\frac{D_1}{D}Gini(D_1)+frac{D_2}{D}Gini(D_2)
\]
9
- 初始化\(f_0(x)=0\)
- 对m=1,2,...,M
计算残差:\(r_{mi}=y_i-f_{m-1}(x_i) i=1,2,...,N\)
拟合残差\(r_{mi}\)学习一个回归树,得到T(x:m)
更新\(f_m(x)=f_{m-1}(x)+T(x:m)\) - 得到回归问题提升树 \(f_M(x)=\sum_{m=1}^MT(x:m)\)