kaggle教程--8--管道

管道(Pipelines)

管道的作用是将数据预处理与建模合为一体来操作,优化代码

 绝大部分的scikit-learn 对象,要么是transformers(可调用transform命令) ,要么是 models(可调用predict命令)

你的管道必须从 transformers开始,从models结束

(目前为止,数据集放入管道前,需要把object对象提前做object编码)

例子:

用管道:

from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Imputer

my_pipeline = make_pipeline(Imputer(), RandomForestRegressor())

my_pipeline.fit(train_X, train_y)
predictions = my_pipeline.predict(test_X)

 

不用管道:

my_imputer = Imputer()
my_model = RandomForestRegressor()

imputed_train_X = my_imputer.fit_transform(train_X)
imputed_test_X = my_imputer.transform(test_X)
my_model.fit(imputed_train_X, train_y)
predictions = my_model.predict(imputed_test_X)

posted on 2019-03-11 15:40  wangzhonghan  阅读(132)  评论(0)    收藏  举报

导航