《统计学习方法》第一章
Hoeffding 不等式
设\(X_1,X_2...X_N\)是独立随机变量,切\(X_I\in[a_i,b_i],i=1,2,...N;\bar X\)是\(X_1,X_2...X_N\)的经验均值,即\(\bar X=\frac{1}{n}\sum\limits_{i=1}^NX_i\),则对任意\(t>0\),以下不等式成立:
\[P[\bar X-E(\bar X)\geq t]\leq exp\left (\frac{2N^2t^2}{\sum\limits_{i=1}^{N}(b_i-a_i)^2}\right )
\]
证明
首先证明hoeffding引理
\[E(e^{sX})\leq e^{s^2(b-a)^2/8}
\]
证明:
因为e^{sX}是关于X的的凸函数,根据凸函数的性质
\[e^{sX}\leq \frac{b-X}{b-a}e^{sa}+\frac{X-a}{b-a}e^{sb}\\
\Downarrow \\
E(e^{sX})=\frac{b}{b-a}e^{sa}-\frac{a}{b-a}e^{sb} =(-\frac{a}{b-a})e^{sa}\left(-\frac{a}{b}+e^{sb-sa} \right)
\]
记\(\theta=-\frac{a}{b-a},u=s(b-a)\)
\[E(e^{sX})\leq (1-\theta+\theta e^u)e^{-\theta u}= e^{ln(1-\theta+\theta e^u)-\theta u}
\]
记\(\phi(u)={ln(1-\theta+\theta e^u)-\theta u}\),只需证\(\phi(u)\leq\frac{1}{8}s^2(b-a)^2\)
利用泰勒公式
\[\phi(u)=\phi(0)+u\phi'(0)+\frac{1}{2}u^2\phi''(\xi)\\
\phi(0)=0\\
\phi'(0)=\frac{\theta e^u}{1-\theta+\theta e^u}-\theta =0
\\
\phi(u)''=\frac{\theta e^{\xi}(1-\theta)}{1-\theta +\theta e^\xi}=\frac{\theta e^{\xi}}{1-\theta +\theta e^\xi}(1-\frac{\theta e^{\xi}}{1-\theta +\theta e^\xi})=m(1-m)\leq \frac{1}{4} \\
\Downarrow \\
\phi(u)\leq\frac{1}{8}s^2(b-a)^2
\]
引理成立
证明
markov不等式
\[P(X\geq a)\leq \frac{E(X)}{a}
\]
由markov不等式
\[P(S_n-E(S_n)\geq t)=P\left(e^{s(S_n-E(S_n))}\geq e^{st}\right)
\leq e^{-st}E(e^{s(S_n-E(S_n))}) \\
=e^{-st}\prod\limits_{i=1}^nE(e^{s(X_i-E(X_i))}) \\
=e^{-st}\prod\limits_{i=1}^ne^{\frac{s^2(b_i-a_i)^2}{8}}=exp\left(-st+\frac{1}{8}s^2\sum\limits^n_{i=1}(b_i-a_i)^2\right)
\]
注意到 \(\Phi(s)=-st+\frac{1}{8}s^2\sum\limits^n_{i=1}(b_i-a_i)^2\) 是关于s的二次函数,这个函数在\(s=\frac{4t}{\sum^n_{i=1}(b_i-a_i)^2}\)时取到最小值:
\[\Phi(s)\leq \frac{2t^2}{\sum^n_{i=1}(b_i-a_i)^2}\\
\]
所以
\[P(S_n-E(S_n)\geq t)\leq exp(\frac{2t^2}{\sum^n_{i=1}(b_i-a_i)^2})
\]
\(s_n=n\bar x\),所以
\[ P(\bar x-E(\bar x)\geq t)\leq exp(\frac{2n^2t^2}{\sum^n_{i=1}(b_i-a_i)^2})
\]
泛化误差上界:
对二分类问题,当假设空间是有限个函数的集合\(\mathcal{F}=\{f_1,f_2...f_d\}\)时,对任意一个函数\(f\in\mathcal{F}\),至少以概率\(1-\delta\),\(0<\delta<1\),以下不等式成立:
\[P(R(f)<\hat R(f)+\epsilon)\geq 1-\delta \\
\]
其中
\[\epsilon= \sqrt{\frac{1}{2N}\left(log\ d+log\frac{1}{\delta}\right)} \\
\]
证明
由Hoeffding不等式
\[P(R(f)-\hat R(f)\geq\epsilon)\leq exp(-2N\epsilon^2) \\
\Downarrow
\\ P(\exists f\in \mathcal{F}:R(f)-\hat R(f) \geq \epsilon)\leq P(\sum\limits_{f\in \mathcal{F})}P(R(f)-\hat R(f)\geq\epsilon)\leq d\exp(-2N\epsilon^2)
\]
取\(\delta=d\exp(-2N\epsilon^2)\),则
\[P(R(f)<\hat R(f)+\epsilon)\geq 1-\delta \\
\epsilon= \sqrt{\frac{1}{2N}\left(log\ d+log\frac{1}{\delta}\right)}
\]

浙公网安备 33010602011771号