sklearn的交叉验证

数据集划分,要保持训练集和测试集各类数据的比例一样,需要用到stratify参数,这个参数接受array-like类型的数据。

K折交叉验证,需要保持每次训练集和验证集各类数据比例一样,则需要用StratifiedKFold。

from sklearn.model_selection import train_test_split, StratifiedKFold
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42, stratify = y)
K = 2
kf = StratifiedKFold(n_splits = K)
for param in param_space:
    for train_ind, val_ind in kf.split(X_train, y_train):
            ...

 

posted on 2017-11-08 21:47  vyouman  阅读(213)  评论(0)    收藏  举报