案例：泰坦尼克号乘客生存预测

流程分析：

特征值目标值

1）获取数据

2）数据处理

缺失数据处理

特征值——>字典类型

3）准备好特征值目标值

4）划分数据集

5）特征工程：字典特征抽取

6）决策树预估器流程

7）模型评估

def decisioncls():
    """
    决策树进行乘客生存预测
    :return:
    """
    # 1、获取数据
    titan = pd.read_csv("titanic.csv")

    # 2、数据的处理
    # x：特征值 y：目标值
    x = titan[['pclass', 'age', 'sex']]

    y = titan['survived']

    # print(x , y)
    # 缺失值需要处理，将特征当中有类别的这些特征进行字典特征抽取
    x['age'].fillna(x['age'].mean(), inplace=True)

    # 对于x转换成字典数据
    x.to_dict(orient="records")
    # [{"pclass": "1st", "age": 29.00, "sex": "female"}, {}]

    dict = DictVectorizer(sparse=False)

    x = dict.fit_transform(x.to_dict(orient="records"))

    # print(dict.get_feature_names_out())
    # print(x)

    # 分割训练集合测试集
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

    # 进行决策树的建立和预测
    dc = DecisionTreeClassifier(max_depth=5)

    dc.fit(x_train, y_train)

    print("预测的准确率为：", dc.score(x_test, y_test))

posted @ 2022-06-20 22:05 安全地带IV 阅读(147) 评论(0) 收藏举报

刷新页面返回顶部

安全地带IV

案例：泰坦尼克号乘客生存预测

公告