GPUStack部署
一、环境信息
server节点与worker节点均需提前安装docker,此部署采用docker-compose部署,因此也需安装docker-compose;官方提供可直接使用命令部署,按需操作;server节点只部署GPUStack服务,华为昇腾驱动、运行时及固件均在worker节点部署配置
GPUStack安装相关:https://docs.gpustack.ai/latest/installation/requirements/
gpustack-server
操作系统:ctyunos 23.01 x86_64

gpustack-worker (GNU)
操作系统:ctyunos 23.01 aarch64

|
软件版本 |
获取链接 |
|
GPUStack-v2.0 |
docker pull docker.1ms.run/gpustack/gpustack:v2.0 |
|
Docker-26.1.3 |
https://download.docker.com/linux/static/stable/ |
|
Ascend-docker-runtime-7.3.0 |
https://gitcode.com/Ascend/mind-cluster/releases |
|
Ascend-hdk-910b-npu-driver-25.5.0 |
https://www.hiascend.com/hardware/firmware-drivers/community?product=1&model=30&cann=8.5.0&driver=Ascend+HDK+25.5.0 |
|
Ascend-hdk-910b-npu-firmware-7.8.0.5 |
https://www.hiascend.com/hardware/firmware-drivers/community?product=1&model=30&cann=8.5.0&driver=Ascend+HDK+25.5.0 |
二、本地yum源配置(内网环境)
server节点与worker节点相同操作
# 挂载镜像 mount -o loop /data/software/ctyunos-23.01-230117-aarch64-dvd.iso /mnt vim /etc/yum.repos.d/lpb.repo [lpb] name=lpb baseurl=file:///mnt enabled=1 gpgcheck=0 # 开机自动挂载 vi /etc/fstab /data/software/ctyunos-23.01-230117-aarch64-dvd.iso /mnt iso9660 defaults,loop 0 0 yum makecache
三、NFS配置
server节点安装配置
# 安装 nfs-utils, rpcbind yum install -y nfs-utils rpcbind # 启动服务 systemctl enable nfs systemctl start nfs # 创建挂载目录 mkdir -pv /data/nfs ## 配置目录参数 echo "/data/nfs 192.9.0.0/16(rw,sync,no_root_squash,no_all_squash)" > /etc/exports # 重启NFS服务 systemctl restart nfs # 验证 showmount -e 127.0.0.1
worker节点安装配置
# 安装 nfs-utils yum install -y nfs-utils # 启动服务 systemctl enable nfs systemctl start nfs # 创建挂载目录 mkdir -pv /data/nfs # 挂载 mount -t nfs 192.9.xxx.xxx:/data/nfs /data/nfs
四、server节点安装GPUStack
docker-compose.yaml
services: gpustack: image: gpustack/gpustack:v2.0 pull_policy: if_not_present container_name: gpustack-server restart: unless-stopped ports: - "9090:80" - "10161:10161" volumes: - /data/nfs/gpustack/models:/usr/local/models - /data/gpustack/data:/var/lib/gpustack environment: - TZ=Asia/Shanghai - GPUSTACK_LOG_LEVEL=info
启动:docker-compose up -d
查看启动日志:docker-compose logs -f
待完全启动成功浏览器访问http://${ip}:9090,admin账户初始化密码存放在数据目录文件initial_admin_password

五、worker节点安装GPUStack
创建用户及安装所需依赖
groupadd HwHiAiUser useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash yum install -y make dkms gcc kernel-headers-$(uname -r) kernel-devel-$(uname -r)
1、安装华为昇腾docker-runtime
https://www.hiascend.com/document/detail/zh/mindcluster/72rc1/clustersched/dlug/dlug_installation_017.html
./Ascend-docker-runtime_7.3.0_linux-aarch64.run --install

2、安装华为昇腾驱动及固件
https://www.hiascend.com/document/detail/zh/canncommercial/850/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=CTyunOS
./Ascend-hdk-910b-npu-driver_25.5.0_linux-aarch64.run --full --install-for-all

./Ascend-hdk-910b-npu-firmware_7.8.0.5.216.run --full

根据系统提示重启使其驱动及固件生效
reboot
查看卡信息:npu-smi info

3、安装GPUStack
浏览器访问server地址,添加节点获取节点纳管命令,gpustack-worker docker-compose.yaml根据此命令改造

部署时遇到了以下报错

排查下来总结是因为运行时得问题,官方提示部署docker-runtime

我这同时回显,索性两者都进行了安装,containerd也启动,最后部署成功

containerd.service配置
[Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target local-fs.target [Service] # 加载overlay文件系统,容器必备 ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/bin/containerd Type=notify Delegate=yes KillMode=process Restart=always RestartSec=5 # 解除系统资源限制 LimitNPROC=infinity LimitCORE=infinity LimitNOFILE=infinity TasksMax=infinity OOMScoreAdjust=-999 [Install] WantedBy=multi-user.target
启动containerd:systemctl start containerd
gpustack-worker docker-compose.yaml 如下
services: gpustack-worker: image: gpustack/gpustack:v2.0 pull_policy: if_not_present container_name: gpustack-worker restart: unless-stopped privileged: true network_mode: host runtime: ascend environment: - GPUSTACK_RUNTIME_DEPLOY_MIRRORED_NAME=gpustack-worker - GPUSTACK_TOKEN=gpustack_b82173ef13b50b55_e0e563a28bb83906bca120dae68fcaf7 - ASCEND_VISIBLE_DEVICES=${ASCEND_VISIBLE_DEVICES:-0} volumes: - /var/run/docker.sock:/var/run/docker.sock - /data/nfs/gpustack/models:/usr/local/models - /data/gpustack/data:/var/lib/gpustack command: > --server-url http://192.9.xxx.xxx:9090 --worker-ip 192.9.xxx.xxx
编写启动脚本start.sh自动获取昇腾设备ID注入变量
#!/bin/bash # 自动获取昇腾设备ID export ASCEND_VISIBLE_DEVICES=$(sudo ls /dev/davinci* | head -1 | grep -o '[0-9]\+' || echo "0") # 启动服务 docker-compose up -d
执行 ./start.sh

浏览器访问server节点查看节点状态


浙公网安备 33010602011771号