# ML科普系列（一）训练集、测试集和验证集

Ripley, B.D（1996）在他的经典专著Pattern Recognition and Neural Networks中给出了这三个词的定义。

• Training set: A set of examples used for learning, which is to fit the parameters [i.e., weights] of the classifier.
• Validation set: A set of examples used to tune the parameters [i.e., architecture, not weights] of a classifier, for example to choose the number of hidden units in a neural network.
• Test set: A set of examples used only to assess the performance [generalization] of a fully specified classifier.

## 为何需要划分

Ripley也谈到了这个问题：Why separate test and validation sets?

• 1. The error rate estimate of the final model on validation data will be biased (smaller than the true error rate) since the validation set is used to select the final model.
• 2. After assessing the final model with the test set, YOU MUST NOT tune the model any further.

## 总结

for each epoch
for each training data instance
propagate error through the network
continue training