XGBoost专题(一)
XGBoost专题(一)
安装
仅 Linux 平台支持使用多个 GPU 进行训练。仅介绍PYTHON语言
- 二进制包安装
pip install xgboost -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
这里使用更快更稳定的豆瓣源来安装更新
| Platform | GPU | Multi-Node-Multi-GPU |
|---|---|---|
| Linux x86_64 | ✔ | ✔ |
| Linux aarch64 | ✘ | ✘ |
| MacOS | ✘ | ✘ |
| Windows | ✔ | ✘ |
- 源码安装
第一步获取源码
git clone --recursive https://github.com/dmlc/xgboost
第二步构建共享库
- On Linux and other UNIX-like systems, the target library is
libxgboost.so - On MacOS, the target library is
libxgboost.dylib - On Windows the target library is
xgboost.dll
编译环境要求
- A recent C++ compiler supporting C++11 (g++-5.0 or higher)
- CMake 3.13 or higher.
cd xgboost
mkdir build
cd build
cmake .. -DUSE_CUDA=ON # CUDA toolkit需要存在,如果不需要支持GPU,则cmake ..
make -j4
cmake 工具怎么安装,首先安装make,gcc,g++工具
# 然后删除本地cmake
apt-get autoremove cmake1
# 下载安装包
wget https://github.com/Kitware/CMake/releases/download/v3.21.0-rc2/cmake-3.21.0-rc2-linux-x86_64.sh
# 赋权
chmod +x cmake-3.21.0-rc2-linux-x86_64.sh
# 运行完之后,会解压已经编译好的cmake工具文件夹
# 创建软链接
sudo mv cmake-3.21.0-rc2-linux-x86_64 /opt/cmake-3.21.0
ln -sf /opt/cmake-3.21.0/bin/* /usr/bin/
# 检查安装版本
cmake --version
第三步构建xgboost的python包
Python 包位于python-package/
第一种方法使用默认的工具链
cd xgboost/python-package
python setup.py install # Install the XGBoost to your current Python environment.
python setup.py build # Build the Python package.
python setup.py build_ext # Build only the C++ core.
python setup.py sdist # Create a source distribution
python setup.py bdist # Create a binary distribution
python setup.py bdist_wheel # Create a binary distribution with wheel format
# --use-cuda 支持GPU加速 --use-nccl支持分布式GPU
python setup.py install --use-cuda --use-nccl
setup.py有关可用选项的完整列表,请参阅。
其他方法就不介绍了,毕竟我没有亲自操作过
安装过程理解:首先编译好so库共享文件,然后安装python提供的接口(胶水语言),提供了操作so库的接口。
使用例子
其他参考链接:
- https://xgboost.readthedocs.io/en/latest/tutorials/input_format.html
- https://xgboost.readthedocs.io/en/latest/tutorials/index.html
- https://github.com/dmlc/xgboost/tree/master/demo
import xgboost as xgb
# read in data
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
dtest = xgb.DMatrix('demo/data/agaricus.txt.test')
# specify parameters via map
param = {'max_depth':2, 'eta':1, 'objective':'binary:logistic' }
num_round = 2
evallist = [(dtest, 'eval'), (dtrain, 'train')]
# 好好理解下面的代码
bst = xgb.train(param, dtrain, num_round, evallist)
# 保存模型
bst.save_model('0001.model')
# 加载模型
bst = xgb.Booster({'nthread': 4}) # init model
bst.load_model('0001.model') # load data
# make prediction
preds = bst.predict(dtest)
Python使用教程
设置参数形式
- dictionary形式
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}
param['nthread'] = 4
param['eval_metric'] = 'auc'
- 多个评估指标
param['eval_metric'] = ['auc', 'ams@0']
# alternatively:
# plst = param.items()
# dict_items([('max_depth', 2), ('eta', 1), ('objective', 'binary:logistic')])
# plst += [('eval_metric', 'ams@0')]
- 设置验证集 a list of pairs
evallist = [(dtest, 'eval'), (dtrain, 'train')]
最全的介绍:https://xgboost.readthedocs.io/en/latest/python/python_intro.html#setting-parameters
API doc:https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.core
每天进步一点点!
©版权声明
文章版权归作者所有,未经允许请勿转载。

浙公网安备 33010602011771号