《统计学习方法》第一章

Hoeffding 不等式

\(X_1,X_2...X_N\)是独立随机变量,切\(X_I\in[a_i,b_i],i=1,2,...N;\bar X\)\(X_1,X_2...X_N\)的经验均值,即\(\bar X=\frac{1}{n}\sum\limits_{i=1}^NX_i\),则对任意\(t>0\),以下不等式成立:

\[P[\bar X-E(\bar X)\geq t]\leq exp\left (\frac{2N^2t^2}{\sum\limits_{i=1}^{N}(b_i-a_i)^2}\right ) \]

证明

首先证明hoeffding引理

\[E(e^{sX})\leq e^{s^2(b-a)^2/8} \]

证明:

因为e^{sX}是关于X的的凸函数,根据凸函数的性质

\[e^{sX}\leq \frac{b-X}{b-a}e^{sa}+\frac{X-a}{b-a}e^{sb}\\ \Downarrow \\ E(e^{sX})=\frac{b}{b-a}e^{sa}-\frac{a}{b-a}e^{sb} =(-\frac{a}{b-a})e^{sa}\left(-\frac{a}{b}+e^{sb-sa} \right) \]

\(\theta=-\frac{a}{b-a},u=s(b-a)\)

\[E(e^{sX})\leq (1-\theta+\theta e^u)e^{-\theta u}= e^{ln(1-\theta+\theta e^u)-\theta u} \]

\(\phi(u)={ln(1-\theta+\theta e^u)-\theta u}\),只需证\(\phi(u)\leq\frac{1}{8}s^2(b-a)^2\)

利用泰勒公式

\[\phi(u)=\phi(0)+u\phi'(0)+\frac{1}{2}u^2\phi''(\xi)\\ \phi(0)=0\\ \phi'(0)=\frac{\theta e^u}{1-\theta+\theta e^u}-\theta =0 \\ \phi(u)''=\frac{\theta e^{\xi}(1-\theta)}{1-\theta +\theta e^\xi}=\frac{\theta e^{\xi}}{1-\theta +\theta e^\xi}(1-\frac{\theta e^{\xi}}{1-\theta +\theta e^\xi})=m(1-m)\leq \frac{1}{4} \\ \Downarrow \\ \phi(u)\leq\frac{1}{8}s^2(b-a)^2 \]

引理成立

证明

markov不等式

\[P(X\geq a)\leq \frac{E(X)}{a} \]

由markov不等式

\[P(S_n-E(S_n)\geq t)=P\left(e^{s(S_n-E(S_n))}\geq e^{st}\right) \leq e^{-st}E(e^{s(S_n-E(S_n))}) \\ =e^{-st}\prod\limits_{i=1}^nE(e^{s(X_i-E(X_i))}) \\ =e^{-st}\prod\limits_{i=1}^ne^{\frac{s^2(b_i-a_i)^2}{8}}=exp\left(-st+\frac{1}{8}s^2\sum\limits^n_{i=1}(b_i-a_i)^2\right) \]

注意到 \(\Phi(s)=-st+\frac{1}{8}s^2\sum\limits^n_{i=1}(b_i-a_i)^2\) 是关于s的二次函数,这个函数在\(s=\frac{4t}{\sum^n_{i=1}(b_i-a_i)^2}\)时取到最小值:

\[\Phi(s)\leq \frac{2t^2}{\sum^n_{i=1}(b_i-a_i)^2}\\ \]

所以

\[P(S_n-E(S_n)\geq t)\leq exp(\frac{2t^2}{\sum^n_{i=1}(b_i-a_i)^2}) \]

\(s_n=n\bar x\),所以

\[ P(\bar x-E(\bar x)\geq t)\leq exp(\frac{2n^2t^2}{\sum^n_{i=1}(b_i-a_i)^2}) \]

泛化误差上界:

对二分类问题,当假设空间是有限个函数的集合\(\mathcal{F}=\{f_1,f_2...f_d\}\)时,对任意一个函数\(f\in\mathcal{F}\),至少以概率\(1-\delta\)\(0<\delta<1\),以下不等式成立:

\[P(R(f)<\hat R(f)+\epsilon)\geq 1-\delta \\ \]

其中

\[\epsilon= \sqrt{\frac{1}{2N}\left(log\ d+log\frac{1}{\delta}\right)} \\ \]

证明

由Hoeffding不等式

\[P(R(f)-\hat R(f)\geq\epsilon)\leq exp(-2N\epsilon^2) \\ \Downarrow \\ P(\exists f\in \mathcal{F}:R(f)-\hat R(f) \geq \epsilon)\leq P(\sum\limits_{f\in \mathcal{F})}P(R(f)-\hat R(f)\geq\epsilon)\leq d\exp(-2N\epsilon^2) \]

\(\delta=d\exp(-2N\epsilon^2)\),则

\[P(R(f)<\hat R(f)+\epsilon)\geq 1-\delta \\ \epsilon= \sqrt{\frac{1}{2N}\left(log\ d+log\frac{1}{\delta}\right)} \]

posted @ 2020-10-16 09:47  无证_骑士  阅读(40)  评论(0)    收藏  举报
页脚HTML页码