一起啃PRML - 1 Introduction 绪论

一起啃PRML - 1 Introduction

@copyright 转载请注明出处 http://www.cnblogs.com/chxer/

这一部分主要是介绍一下Pattern Recognition 和 Machine Learning。

上来就是 Recognising handwritten digits, MNIST。。。在一本书的Introduction里看见MNIST真是伤心的一件事。

每一个图片28*28 pixel，按照老方法放到vector里，784.

那么 Pattern Recognition 要你做的就是给这样一个 vector 然后输出 one-hot vector 或者直接是数字。

机器学习的过程就可以理解为f(x) , @para x is a new digit image input @return a vector represent the answer, 作者也提到 the answer vector encoded in the same way as the target vector.

@define Generalisation(好吧我比较习惯用s)

“The ability to categorize correctly new examples that differ from those used for training is known as generalization ”

PS.training phase has the same meaning to learning phase.

The precise form of the function f(x) is defined during the training phase.

然后我们的 data 分为 training data & test data. 不过呢最后应用应该还是用三段式，就是 training data & testing data & validation data to improve the period of training.

当然这里应该说set比较好，代表的是训练集，验证集，测试集。

关于这三个 set ：http://stackoverflow.com/questions/2976452/whats-is-the-difference-between-train-validation-and-test-set-in-neural-networ

@define

Training Set: this data set is used to adjust the weights on the neural network.

Validation Set: this data set is used to minimise overfitting. You're not adjusting the weights of the network with this data set, you're just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn't trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over then validation data set stays the same or decreases, then you're overfitting your neural network and you should stop training.

Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.

至于比例大概似乎是随性的吧。不过网上也有人说是 70% should be train cases, 10% will be test cases and the rest 20% should be validation cases. 挺好的。

For most practical applications, the original input variables are typically pre-processed to transform them into some new space of variables where. 就是我们说的 scaled. 这样可以方便我们以后的pattern recognition 去 distinguish them.

@define feature extraction <=> pre-processing stage.

无论是 training set or other set. 预处理（信息提取）的step一定要一致。

作者举了一个例子，比如在real-time face recognition 上，可能如果resolution 比较高，就会导致处理起来超级慢（就像视频渲染似的），所以一个非常好的feature extraction (pre-processing stage)是非常非常necessary的。。。让我们可以 fast compute.

然后作者还提到 the average value of the image intensity over a rectangular subregion can be evaluated extremely efficiently .说这么多废话都还是在强调 feature extraction (pre-processing stage)的重要性，你看我都打了这么多遍的这个专有名词了可见还是很重要的。这种矩形分区的方法在后面还会提到。@author Viola and Jones, 2004

那么矩形分区 rectangular subregion 的好处也就是 control the dimensionality. 不过，在我们pre-processing 的时候要特别小心不要discard 那些有用的information. 否则我们整个模型的accuracy就会非常非常的suffer.

然后下面是一些pattern recognition 问题的分类

@define supervised learning problem : the training data comprises examples of the input vectors along with their corresponding target vectors .这就是监督学习任务，给了问题和答案。与之相对的还有非监督式学习任务。

@define classification : the digit recognition example, in which the aim is to assign each input vector to one of a finite number of discrete categories. 分类问题，把输入的vector 分到有限数量的离散类里面去的问题。

@define regression : the desired output consists of one or more continuous variables. 回归问题，通常是在多个变量干扰的情况下找到合理确定的解。比如给定一个学生的贪玩程度，寒假剩余天数，作业剩余量来推算这个学生的寒假作业有没有写完这样的问题。

@define unsupervised learning problem : the training data consists of a set of input vectors x without any corresponding target value, and you should discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualisation.

与监督式学习问题相对应的非监督式学习问题的定义真是又臭又长跟这个长句子一样也真是有一拼。还有几个概念需要明确的：

@define cluster : 扎堆呗，说的高雅一点就是“簇”一下

@define density estimation : 我们姑且叫它密度估计吧。这个density estimation 很有意思的，具体的meaning 还请Wikipedia : https://en.wikipedia.org/wiki/Density_estimation

懒癌看这里：density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. The unobservable density function is thought of as the density according to which a large population is distributed; the data are usually thought of as a random sample from that population.

这个图是血浆葡萄糖浓度和糖尿病的density estimation，更好的解释概率问题。可以visualisation.

那么话说回来，非监督式学习问题 unsupervised learning problem 说白了就是在不知道明确的分类下给定数据，实现对数据的分类以及统计整理的任务。（怎么越说越像数据挖掘了）

那么现在我们有了这么多问题，用什么手段来解决呢？

千呼万唤始出来，1998年Sutton 和 Barto 的reinforcement learning 出炉了。

@define reinforcement learning : finding suitable actions to take in a given situation in order to maximise a reward. 就是“强化学习”。

如果对这个词没什么感觉的话，强化学习大概是这样的：DeepMind的学习打简单游戏的AI就是最基本的强化学习。我们看到那个类似于弹球似的游戏就是一个例子。。。AI一开始什么都don’t know，只能通过简单的按键然后去试着理解这个游戏。慢慢的，当板子第一次接到球的时候，分数获得了提高，这时候我们的NN(Neural Networks)就获得了奖励，“哎呀！我这么这么这么按键，竟然就有分了！”于是它开始激励自己的NN参数，当然，如果分数降低了，它也会惩罚自己，修改NN参数。这样，在长时间的游戏中，AI慢慢就知道要接球，慢慢就知道了反射，慢慢就知道了怎么拿高分，然后它就是一个超级厉害的player了。。。

然而，强化学习对PacMan似乎就并没有很好的效果了，因为这种迷宫似的东西几乎怎么都能加分，但是AI唯一能意识到的就是遇到怪物就跑，然后无脑乱转。。。

欢迎去访问Google DeepMind! 超棒的科技公司！http://deepmind.com/

期待他们的AlphaGo早日打败人类。。。

关于reinforcement 要注意的：

A general feature of reinforcement learning is the trade-off between exploration, in which the system tries out new kinds of actions to see how effective they are, and exploitation, in which the system makes use of actions that are known to yield a high reward. Too strong a focus on either exploration or exploitation will yield poor results.跟很多的智能算法都很像（比如爬山算法？模拟退火？遗传算法？）就是一个度的掌握以及你往前step的方式。

另外一个reinforcement的好例子就是模拟图像了，比如我们给定一个正弦函数，让AI去预测它的轨迹（without任何的提示，AI可能永远都不会知道正弦函数的周期性）。

对上图的explanation : Plot of a training data set of N = 10 points, shown as blue circles, each comprising an observation of the input variable x along with the corresponding target variable t. The green curve shows the function sin(2πx) used to gener- ate the data. Our goal is to pre- dict the value of t for some new value of x, without knowledge of the green curve.

当然了，这些看似无关的问题可能确实使用不同的方法，但是实际上这些方法彼此相通。This chapter also provides a self-contained introduction to three important tools that will be used throughout the book, namely probability theory, decision theory, and information theory. Although these might sound like daunting topics, they are in fact straightforward, and a clear understanding of them is essential if machine learning techniques are to be used to best effect in practical applications. 用于实际应用的机器学习才是最好的。

posted @ 2016-02-16 21:36 AI_Believer 阅读(658) 评论(0) 收藏举报

刷新页面返回顶部

AI Believer

再不学习小心ANN过拟合

一起啃PRML - 1 Introduction 绪论

公告