统计学习方法学习笔记-04-朴素贝叶斯法

朴素贝叶斯的学习与分类,朴素贝叶斯的参数估计算法。

朴素贝叶斯法的学习与分类

设输入空间\(\mathcal{X} \subseteq R^n\)\(n\)维向量的集合,输出空间为类标记集合\(\mathcal{Y} = \{c_1,c_2,\cdots,c_K\}\),输入为特征向量\(x \in \mathcal{X}\),输出为类标记\(y \in \mathcal{Y}\),\(X\)是定义在输入空间\(\mathcal{X}\)上的随机向量,\(Y\)是定义在输出空间\(\mathcal{Y}\)上的随机变量,\(P(X,Y)\)\(X\)\(Y\)的联合概率分布,训练数据集\(T = \{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\}\)\(P(X,Y)\)独立同分布产生。

  • 先验概率分布:

\[P(Y = c_k),k = 1,2,\cdots,K \]

  • 条件概率分布:

\[P(X = x|Y = c_k) = P(X^{(1)} = x^{(1)},\cdots,X^{(n)} = x^{(n)}|Y = c_k),k = 1,2,\cdots,K \]

  • 条件独立性假设下的概率分布:

\[\begin{aligned} P(X = x|Y = c_k) &= P(X^{(1)} = x^{(1)},\cdots,X^{(n)} = x^{(n)}|Y = c_k) \\ &= \prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k) \end{aligned} \]

  • 后验概率分布:

\[\begin{aligned} P(Y = c_k|X = x) &= \frac{P(X = x|Y = c_k)P(Y = c_k)}{\sum_kP(X = x|Y = c_k)P(Y = c_k)} \\ &= \frac{P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)}{\sum_kP(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)} \end{aligned} \]

  • 朴素贝叶斯分类器:将实例分到后验概率最大的类中,这等价于期望风险最小化

\[y = f(x) = arg \mathop{max}\limits_{c_k}\frac{P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)}{\sum_kP(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k)} \]

分母与类别无关所以:

\[y = f(x) = arg \mathop{max}\limits_{c_k}P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k) \]

朴素贝叶斯法的参数估计

极大似然估计

在朴素贝叶斯法中,学习意味着估计先验概率\(P(Y = c_y)\)和条件概率分布\(P(X^{(j)} = x^{(j)}|Y = c_k)\)

  • 先验概率的学习:

\[P(Y = c_k) = \frac{\sum_{i = 1}^NI(y_i = c_k)}{N},k = 1,2,\cdots,K \]

  • 条件概率的学习:

\[P(X^{(j)} = a_{jl}|Y = c_k) = \frac{\sum_{i = 1}^NI(x_i^{(j)} = a_{jl},y_i = c_k)}{\sum_{i = 1}^NI(y_i = c_k)} \\ j = 1,2,\cdots,n;\ l = 1,2,\cdots,S_j;\ k = 1,2,\cdots,K \]

\(j\)个特征\(x^{(j)}\)的可能取值的集合为\(\{a_{j1},a_{j2},\cdots,a_{jS_j}\}\)\(x_i^{(j)}\)是第\(i\)个样本的第\(j\)个特征,\(a_{jl}\)是第\(j\)个特征可能取的第\(l\)个值,\(I\)是指示函数。

学习与分类算法

  • 计算先验概率和条件概率
  • 对于给定的实例\(x = (x^{(1)},x^{(2)},\cdots,x^{(n)})^T\),计算:

\[P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k) \]

  • 确定实例的类别

\[y = arg \mathop{max}\limits_{c_k}P(Y = c_k)\prod_{j = 1}^n P(X^{(j)} = x^{(j)}|Y = c_k) \]

贝叶斯估计

目的:用极大似然估计可能会出现所要估计的概率值为0的情况,解决的办法是采用贝叶斯估计

  • 条件概率的贝叶斯估计是:

\[P_\lambda(X^{(j)} = a_{jl}|Y = c_k) = \frac{\sum_{i = 1}^NI(x_i^{(j)} = a_{jl},y_i = c_k) + \lambda}{\sum_{i = 1}^NI(y_i = c_k) + S_j\lambda} \\ \]

式中\(\lambda \geq 0\),当\(\lambda = 0\)时就是极大似然估计,常取\(\lambda = 1\),这时称为拉普拉斯平滑

  • 先验概率的贝叶斯估计:

\[P_\lambda(Y = c_k) = \frac{\sum_{i = 1}^NI(y_i = c_k) + \lambda}{N + K\lambda} \]

分母中\(\lambda\)前面的系数是用来保证概率和为1

posted @ 2022-09-14 19:40  eryo  阅读(63)  评论(0)    收藏  举报