贝叶斯分类算法原理
1.贝叶斯定理:
\[P(X|c) = \prod\limits_{i = 1}^k {P(Xi|c)}
\]
假设c正确的情况下样本X发生的概率
2.贝叶斯公式:
\[P(A|B) = \frac{{P(B|A)P(A)}}{{P(B)}}
\]
3.极大后验假设
\[{c_{map}} = \mathop {\arg \max }\limits_{c \in C} P(c|X) = \mathop {\arg \max }\limits_{c \in C} \frac{{P(X|c)P(c)}}{{P(X)}} = \mathop {\arg \max }\limits_{c \in C} P(X|c)P(c)
\]
最大可能性的假设(即c值)为极大后验假设
4.贝叶斯分类算法原理:
贝叶斯定理+极大后验假设
\[V(X) = \mathop {\arg \max }\limits_i P({c_i})P(X|{c_i})
\]
\[+
\]
\[P(X|{c_i}) = \prod\limits_{k = 1}^n {P({X_k}|{c_i})}
\]
\[||
\]
\[V(X) = \mathop {\arg \max }\limits_i P({c_i})\prod\limits_{k = 1}^n {P({X_k}|{c_i})}
\]
\[(贝叶斯分类公式)
\]
总结:
贝叶斯分类的步骤:
(1)计算先验概率\(P({c_i})\)
(2)计算条件概率\({P({X_k}|{c_i})}\)
(3)代入贝叶斯分类公式,求出最大的\(P({c_{map}})\prod\limits_{k = 1}^n {P({X_k}|{c_{map}})} , then{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} V(X) = {c_{map}}\)
例子:
| Day | Outlook | Temperature | Humidity | Wind | PlayTennis |
|---|---|---|---|---|---|
| D1 | Sunny | Hot | High | Weak | No |
| D2 | Sunny | Hot | High | Strong | No |
| D3 | Overcast | Hot | High | Weak | Yes |
| D4 | Rain | Mild | High | Weak | Yes |
| D5 | Rain | Cool | Normal | Weak | Yes |
| D6 | Rain | Cool | Normal | Strong | No |
| D7 | Overcast | Cool | Normal | Strong | Yes |
| D8 | Sunny | Mild | High | Weak | No |
| D9 | Sunny | Cool | Normal | Weak | Yes |
| D10 | Rain | Mild | Normal | Weak | Yes |
| D11 | Sunny | Mild | Normal | Strong | Yes |
| D12 | Overcast | Mild | High | Strong | Yes |
| D13 | Overcast | Hot | Normal | Weak | Yes |
| D14 | Rain | Mild | High | Strong | No |
假设当前天气为:X={Sunny,Hot,High,Weak}
问:当前天气是否可以打网球?
解:
(1)计算先验概率P(Yse)=9/14 P(No)=5/14
(2)计算条件概率P(X|Yes)、P(X|No)
P(X|Yes) = P(Sunny|Yes)*P(Hot|Yes)*P(High|Yes)*P(Weak|Yes) = 8/729
P(X|No) = P(Sunny|No)*P(Hot|No)*P(High|No)*P(Weak|No) = 48/625
(3)带入贝叶斯公式
P(Yes)*P(X|Yes) = 9/14 * 8/729 ≈ 0.0071
P(No)*P(X|No) = 5/14 * 48/625 ≈ 0.027
0.0071 < 0.027,则假设playTennis=No为极大后验假设,则当前天气不可以打网球。
浙公网安备 33010602011771号