kaggle教程--1--建模
1 This step of capturing patterns from data is called fitting or training the model. The data used to fit the model is called the training data.
1 从数据中捕获模式(patterns)的步骤叫拟合(fitting)或训练,拟合模型(fit the model)所用的数据较训练数据(training data)
2 import pandas as pd
pandas一般简写为pd,其中最重要的数据结构是DataFrame,它表示一张表格,像excel的表格或数据库中的表
3 df.dropna(axis=0)
只要此行内,某一个属性的值为空值(NaN),就把这一行全部删除
4
y = melbourne_data.Price
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]
习惯上,预测目标(prediction target)用y表示,可以用.从原始df中提取出来
特征数据,用X表示
5 输入到模型(和后来用于预测)的列,叫做特征(features),有时候用全部列作为特征,有时候只用部分列作为特征
6 建模步骤
The steps to building and using a model are:
- Define: What type of model will it be? A decision tree? Some other type of model? Some other parameters of the model type are specified too.
- Fit: Capture patterns from provided data. This is the heart of modeling.
- Predict: Just what it sounds like
- Evaluate: Determine how accurate the model's predictions are.
1 定义你要用哪种模型
2 从训练数据中捕获模式(也叫训练或拟合),这个是模型的核心
3 对测试数据进行预测
4 评估模型的准确率
心得:模型是一个对象,先用训练数据的特征和标签来训练(或拟合)这个模型,训练完成后,再用这个对象预测没有标签的测试数据,得到测试结果后(标签),再评估
7
from sklearn.tree import DecisionTreeRegressor
# Define model. Specify a number for random_state to ensure same results each run
melbourne_model = DecisionTreeRegressor(random_state=1)
# Fit model
melbourne_model.fit(X, y)
random_state参数,保证每次建模的结果不随机,用什么数字都可以
posted on 2019-02-25 14:08 wangzhonghan 阅读(166) 评论(0) 收藏 举报
浙公网安备 33010602011771号