什么是交叉检验(K-fold cross-validation)

K层交叉检验就是把原始的数据随机分成K个部分。在这K个部分中,选择一个作为测试数据,剩下的K-1个作为训练数据。

交叉检验的过程实际上是把实验重复做K次,每次实验都从K个部分中选择一个不同的部分作为测试数据(保证K个部分的数据都分别做过测试数据),剩下的K-1个当作训练数据进行实验,最后把得到的K个实验结果平均。


In K-fold cross-validation, the original sample is randomly partitioned into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for testing the model, and the remaining K − 1 subsamples are used as training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. The K results from the folds then can be averaged (or otherwise combined) to produce a single estimation. The advantage of this method over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used.

posted @ 2010-09-19 12:05  Gavin.Liu  阅读(23400)  评论(3编辑  收藏  举报