使用Sklearn-train_test_split 划分数据集

使用sklearn.model_selection.train_test_split可以在数据集上随机划分出一定比例的训练集和测试集

1.使用形式为:

1 from sklearn.model_selection import train_test_split 
2 X_train, X_test, y_train, y_test = train_test_split(train_data,train_target,test_size=0.2, random_state=0)

2.参数解释:

train_data:样本特征集

train_target:样本的标签集

test_size:样本占比,测试集占数据集的比重,如果是整数的话就是样本的数量

random_state:是随机数的种子。在同一份数据集上,相同的种子产生相同的结果,不同的种子产生不同的划分结果

X_train,y_train:构成了训练集

X_test,y_test:构成了测试集

3.举例:

生成一个包含100个样本的数据集,随机换分出20%为测试集

 1 #py36
 2 #!/usr/bin/env python
 3 # -*- coding: utf-8 -*-
 4 
 5 #from sklearn.cross_validation import train_test_split
 6 from sklearn.model_selection import train_test_split 
 7 
 8 # 生成100条数据:100个2维的特征向量,对应100个标签
 9 X = [["feature ","one "]] * 50 + [["feature ","two "]] * 50
10 y = [1] * 50 + [2] * 50
11 
12 # 随机抽取20%的测试集
13 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=1)
14 print ("train:",len(X_train), "test:",len(X_test))
15 
16 # 查看被划分出的测试集
17 for i in range(len(X_test)):
18     print ("".join(X_test[i]), y_test[i])
19 
20 '''
21 train: 80 test: 20
22 feature two  2
23 feature two  2
24 feature one  1
25 feature two  2
26 feature two  2
27 feature one  1
28 feature one  1
29 feature two  2
30 feature two  2
31 feature two  2
32 feature two  2
33 feature one  1
34 feature two  2
35 feature two  2
36 feature two  2
37 feature one  1
38 feature one  1
39 feature one  1
40 feature two  2
41 feature one  1
42 '''

 

posted @ 2018-01-24 16:38  cn_XuYang  阅读(12742)  评论(2编辑  收藏  举报