kaggle教程--1--建模

1 This step of capturing patterns from data is called fitting or training the model. The data used to fit the model is called the training data.

1 从数据中捕获模式(patterns)的步骤叫拟合(fitting)或训练，拟合模型(fit the model)所用的数据较训练数据(training data)

2 import pandas as pd

pandas一般简写为pd,其中最重要的数据结构是DataFrame，它表示一张表格，像excel的表格或数据库中的表

3 df.dropna(axis=0)

只要此行内，某一个属性的值为空值(NaN)，就把这一行全部删除

y = melbourne_data.Price
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]

习惯上，预测目标(prediction target)用y表示，可以用.从原始df中提取出来

特征数据，用X表示

5 输入到模型（和后来用于预测）的列，叫做特征(features)，有时候用全部列作为特征，有时候只用部分列作为特征

6 建模步骤

The steps to building and using a model are:

Define: What type of model will it be? A decision tree? Some other type of model? Some other parameters of the model type are specified too.
Fit: Capture patterns from provided data. This is the heart of modeling.
Predict: Just what it sounds like
Evaluate: Determine how accurate the model's predictions are.

1 定义你要用哪种模型

2 从训练数据中捕获模式(也叫训练或拟合），这个是模型的核心

3 对测试数据进行预测

4 评估模型的准确率

心得：模型是一个对象，先用训练数据的特征和标签来训练（或拟合）这个模型，训练完成后，再用这个对象预测没有标签的测试数据，得到测试结果后(标签)，再评估

from sklearn.tree import DecisionTreeRegressor

# Define model. Specify a number for random_state to ensure same results each run
melbourne_model = DecisionTreeRegressor(random_state=1)

# Fit model
melbourne_model.fit(X, y)

random_state参数，保证每次建模的结果不随机，用什么数字都可以

posted on 2019-02-25 14:08 wangzhonghan 阅读(166) 评论(0) 收藏举报

刷新页面返回顶部

wangzhonghan

kaggle教程--1--建模

导航

公告