sklearn中的random_state
在 sklearn.model_selection 有 train_test_split函数用于将样本数据切分为训练集和测试集。
其中,参数 random_state 是这样描述的:
random_state:int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
简单理解,在保持输入数据不变的情况下,
如果 random_state 等于某个固定的值, 如42,将得到同样的数据划分;
如果 random_state 等于另外某个值,将得到另外一份不同的数据划分;
如果 random_state = None (默认值),会随机选择一个种子,这样每次都会得到不同的数据划分。
给 random_state 设置相同的值,那么当别人重新运行你的代码的时候就能得到完全一样的结果,复现和你一样的过程。
>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> X, y = np.arange(10).reshape(5,2), range(5)
>>> X
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)
>>> X_train
array([[4, 5],
[0, 1],
[6, 7]])
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)
>>> X_train
array([[4, 5],
[0, 1],
[6, 7]])
random_state = 42, 每次都得到同样的划分。
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 4)
>>> X_train
array([[2, 3],
[8, 9],
[4, 5]])
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)
>>> X_train
array([[2, 3],
[6, 7],
[8, 9]])
random_state 设为不同的值,得到不同的数据切分。
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)
>>> X_train
array([[4, 5],
[0, 1],
[6, 7]])
random_state = 42, 得到和早前一样的结果。
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> X_train
array([[2, 3],
[8, 9],
[0, 1]])
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> X_train
array([[4, 5],
[2, 3],
[0, 1]])
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> X_train
array([[2, 3],
[0, 1],
[4, 5]])
忽略 random_state,即 random_state 等于默认值 None,每次调用都得到不同的结果。
random_state就是为了保证程序每次运行都分割一样的训练集和测试集。否则,同样的算法模型在不同的训练集和测试集上的效果不一样。
当你用sklearn分割完测试集和训练集,确定模型和初始参数以后,你会发现程序每运行一次,都会得到不同的准确率,无法调参。这个时候就是因为没有加random_state。加上以后就可以调参了。