统计学习方法学习笔记-04-朴素贝叶斯法
朴素贝叶斯的学习与分类,朴素贝叶斯的参数估计算法。
朴素贝叶斯法的学习与分类
设输入空间\(\mathcal{X} \subseteq R^n\)为\(n\)维向量的集合,输出空间为类标记集合\(\mathcal{Y} = \{c_1,c_2,\cdots,c_K\}\),输入为特征向量\(x \in \mathcal{X}\),输出为类标记\(y \in \mathcal{Y}\),\(X\)是定义在输入空间\(\mathcal{X}\)上的随机向量,\(Y\)是定义在输出空间\(\mathcal{Y}\)上的随机变量,\(P(X,Y)\)是\(X\)和\(Y\)的联合概率分布,训练数据集\(T = \{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\}\)由\(P(X,Y)\)独立同分布产生。
- 先验概率分布:
\[P(Y = c_k),k = 1,2,\cdots,K
\]
- 条件概率分布:
\[P(X = x|Y = c_k) = P(X^{(1)} = x^{(1)},\cdots,X^{(n)} = x^{(n)}|Y = c_k),k = 1,2,\cdots,K
\]
- 条件独立性假设下的概率分布:
\[\begin{aligned}
P(X = x|Y = c_k) &= P(X^{(1)} = x^{(1)},\cdots,X^{(n)} = x^{(n)}|Y = c_k) \\
&= \prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)
\end{aligned}
\]
- 后验概率分布:
\[\begin{aligned}
P(Y = c_k|X = x)
&= \frac{P(X = x|Y = c_k)P(Y = c_k)}{\sum_kP(X = x|Y = c_k)P(Y = c_k)} \\
&= \frac{P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)}{\sum_kP(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)}
\end{aligned}
\]
- 朴素贝叶斯分类器:将实例分到后验概率最大的类中,这等价于期望风险最小化
\[y = f(x) = arg \mathop{max}\limits_{c_k}\frac{P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)}{\sum_kP(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)}
\]
分母与类别无关所以:
\[y = f(x) = arg \mathop{max}\limits_{c_k}P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)
\]
朴素贝叶斯法的参数估计
极大似然估计
在朴素贝叶斯法中,学习意味着估计先验概率\(P(Y = c_y)\)和条件概率分布\(P(X^{(j)} = x^{(j)}|Y = c_k)\)
- 先验概率的学习:
\[P(Y = c_k) = \frac{\sum_{i = 1}^NI(y_i = c_k)}{N},k = 1,2,\cdots,K
\]
- 条件概率的学习:
\[P(X^{(j)} = a_{jl}|Y = c_k) = \frac{\sum_{i = 1}^NI(x_i^{(j)} = a_{jl},y_i = c_k)}{\sum_{i = 1}^NI(y_i = c_k)} \\
j = 1,2,\cdots,n;\ l = 1,2,\cdots,S_j;\ k = 1,2,\cdots,K
\]
第\(j\)个特征\(x^{(j)}\)的可能取值的集合为\(\{a_{j1},a_{j2},\cdots,a_{jS_j}\}\),\(x_i^{(j)}\)是第\(i\)个样本的第\(j\)个特征,\(a_{jl}\)是第\(j\)个特征可能取的第\(l\)个值,\(I\)是指示函数。
学习与分类算法
- 计算先验概率和条件概率
- 对于给定的实例\(x = (x^{(1)},x^{(2)},\cdots,x^{(n)})^T\),计算:
\[P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)
\]
- 确定实例的类别
\[y = arg \mathop{max}\limits_{c_k}P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)
\]
贝叶斯估计
目的:用极大似然估计可能会出现所要估计的概率值为0的情况,解决的办法是采用贝叶斯估计
- 条件概率的贝叶斯估计是:
\[P_\lambda(X^{(j)} = a_{jl}|Y = c_k) = \frac{\sum_{i = 1}^NI(x_i^{(j)} = a_{jl},y_i = c_k) + \lambda}{\sum_{i = 1}^NI(y_i = c_k) + S_j\lambda} \\
\]
式中\(\lambda \geq 0\),当\(\lambda = 0\)时就是极大似然估计,常取\(\lambda = 1\),这时称为拉普拉斯平滑
- 先验概率的贝叶斯估计:
\[P_\lambda(Y = c_k) = \frac{\sum_{i = 1}^NI(y_i = c_k) + \lambda}{N + K\lambda}
\]
分母中\(\lambda\)前面的系数是用来保证概率和为1
浙公网安备 33010602011771号