[Machine Learning]Markov chain and Hidden Markov Models(HMMs) - Zhu Qing

公告

[Machine Learning]Markov chain and Hidden Markov Models(HMMs)

隐马尔可夫模型HMM快速入门:

http://homepage3.nifty.com/myinfo/HMM.pdf

如果例子再具体一点就好了。比如把海澡与天气的例子的隐马尔可夫模型的数学定义写出来就好了。

即HMM = （初始状态向量Pai，状态迁移矩阵A，混淆矩阵B）

以下关于confusion matrix的描述来自：http://en.wikipedia.org/wiki/Confusion_matrix

In the field of artificial intelligence, a confusion matrix is a visualization tool typically used in supervised learning (in unsupervised learning it is typically called a matching matrix). Each row of the matrix represents the instances in a predicted class, while each column represents the instances in an actual class. One benefit of a confusion matrix is that it is easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).

When a data set is unbalanced (when the number of samples in different classes vary greatly) the error rate of a classifier is not representative of the true performance of the classifier. This can easily be understood by an example: If there are 990 samples from class A and only 10 samples from class B, the classifier can easily be biased towards class A. If the classifier classifies all the samples as class A, the accuracy will be 99%. This is not a good indication of the classifier's true performance. The classifier has a 100% recognition rate for class A but a 0% recognition rate for class B.

In the example confusion matrix below, of the 8 actual cats, the system predicted that three were dogs, and of the eight dogs, it predicted that two were a rabbits and three were cats. We can see from the matrix that the system in question has trouble distinguishing between cats and dogs, but can make the distinction between rabbits and other types of animals pretty well.

翻译一下：

在人工智能领域，混淆矩阵常被用在监管学习中的一个形象化的工具，在非监管学习中它被称为匹配矩阵。矩阵的每行代表预测结果，而每一列代表实际结果。混淆矩阵的一个用处是用以表明两个类别是否被混淆了（即把一个类别错误的标明为另一个）。

如果数据集不平衡（当不同类别中的样本数目差距非常大时），分类器产生的错误率不能表明分类器的性能。通过下面的例子，我们可以很好地理解这一点：如果有990个样本点属于类别A，而只有10个样本点属于类别B，分类器倾向于类别A。如果分类器将所有的样本点归类于A，则精确率为99%。这并不能很好地表明分类器的属性。因为分类器100%识别类别A，但对B的识别率为0。

下面的混淆矩阵的例子是，在8只猫中，系统识别有三条狗。在8只狗，系统识别有二只兔子和三只猫。从矩阵中可以看出系统难以识别猫和狗，但可以轻易地把兔子从其它动作中识别出来。（这个例子好象有点问题，我没有看懂，再找一个其他的看看）

Example confusion matrixes
		Actual
		Cat	Dog	Rabbit
Predicted	Cat	5	2	0
	Dog	3	3	2
	Rabbit	0	1	11

posted on 2010-05-28 02:33 Zhu Qing 阅读(355) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部