XGB基本使用

https://www.jianshu.com/p/e119f00bd93f

以下代码用到了 xgboost 包和 sklearn 包,这篇文章没有提供包的下载方式,可以自行搜索下载、安装方式。也不对参数进行解释。但是给出了各个参数含义的文档,给出的代码也没有进行寻参。
参数解释参考

# coding: utf-8


import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn import preprocessing


def train(mall_id, X, shop_ids, TEST, row_ids):
    """
    mall_id: 商场 ID(m_23232)
    X: 训练集向量
    shop_ids: 商铺标签(s_223234)
    TEST: 测试集向量
    row_ids: 测试集行号
    """
    
    # 处理真实标签为训练用标签,其中 shop_ids 为 []
    lbl = preprocessing.LabelEncoder()
    lbl.fit(shop_ids)
    y = lbl.transform(shop_ids)
    class_num = y.max() + 1  # 类别数
    
    # 划分训练集和验证集
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    xg_train = xgb.DMatrix(X_train, label=y_train)
    xg_test = xgb.DMatrix(X_test, label=y_test)
    watchlist = [(xg_train, 'train'), (xg_test, 'test')]
    
    # 定义参数
    params = {
        'objective': 'multi:softmax',
        'eta': 0.1,
        'max_depth': 9,
        'eval_metric': 'merror',
        'seed': 0,
        'missing': -999,
        'class_num': class_num,
        'silent': 1,
    }
    
    # 训练
    bst = xgb.train(params, xg_train, 60, watchlist, early_stopping_rounds=15)
    
    # 预测各个标签的概率
    # pred_prob = bst.predict(xg_test).reshape(TEST.shape[0], class_num) 

    # 预测标签
    pred = bst.predict(xg_test)
    
    # 打印正确率
    acc = (y_test == pred).mean()
    print('accuracy', acc)

    # 将标签转换为原标签
    pred = [lbl.inverse_transform(int(x)) for x in pred]
 


作者:衣介书生
链接:https://www.jianshu.com/p/e119f00bd93f
来源:简书
简书著作权归作者所有,任何形式的转载都请联系作者获得授权并注明出处。
posted @ 2019-05-29 11:54  Django's blog  阅读(409)  评论(0)    收藏  举报