GPUStack部署

一、环境信息

server节点与worker节点均需提前安装docker,此部署采用docker-compose部署,因此也需安装docker-compose;官方提供可直接使用命令部署,按需操作;server节点只部署GPUStack服务,华为昇腾驱动、运行时及固件均在worker节点部署配置

GPUStack安装相关:https://docs.gpustack.ai/latest/installation/requirements/

gpustack-server

  操作系统:ctyunos 23.01 x86_64

image

gpustack-worker (GNU)

  操作系统:ctyunos 23.01 aarch64

image

软件版本

获取链接

GPUStack-v2.0

docker pull docker.1ms.run/gpustack/gpustack:v2.0

Docker-26.1.3

https://download.docker.com/linux/static/stable/

Ascend-docker-runtime-7.3.0

https://gitcode.com/Ascend/mind-cluster/releases

Ascend-hdk-910b-npu-driver-25.5.0

https://www.hiascend.com/hardware/firmware-drivers/community?product=1&model=30&cann=8.5.0&driver=Ascend+HDK+25.5.0

Ascend-hdk-910b-npu-firmware-7.8.0.5

https://www.hiascend.com/hardware/firmware-drivers/community?product=1&model=30&cann=8.5.0&driver=Ascend+HDK+25.5.0

 二、本地yum源配置(内网环境)

server节点与worker节点相同操作

# 挂载镜像
mount -o loop /data/software/ctyunos-23.01-230117-aarch64-dvd.iso /mnt

vim /etc/yum.repos.d/lpb.repo
[lpb]
name=lpb
baseurl=file:///mnt
enabled=1
gpgcheck=0

# 开机自动挂载
vi /etc/fstab
/data/software/ctyunos-23.01-230117-aarch64-dvd.iso /mnt      iso9660 defaults,loop   0 0
yum makecache

三、NFS配置

server节点安装配置

# 安装 nfs-utils, rpcbind
yum install -y nfs-utils rpcbind
# 启动服务
systemctl enable nfs
systemctl start nfs
# 创建挂载目录
mkdir -pv /data/nfs
## 配置目录参数
echo "/data/nfs 192.9.0.0/16(rw,sync,no_root_squash,no_all_squash)" > /etc/exports
# 重启NFS服务
systemctl restart nfs
# 验证
showmount -e 127.0.0.1

worker节点安装配置

# 安装 nfs-utils
yum install -y nfs-utils
# 启动服务
systemctl enable nfs
systemctl start nfs
# 创建挂载目录
mkdir -pv /data/nfs
# 挂载
mount -t nfs 192.9.xxx.xxx:/data/nfs /data/nfs

四、server节点安装GPUStack

docker-compose.yaml

services:
  gpustack:
    image: gpustack/gpustack:v2.0
    pull_policy: if_not_present
    container_name: gpustack-server
    restart: unless-stopped
    ports:
      - "9090:80"
      - "10161:10161"
    volumes:
      - /data/nfs/gpustack/models:/usr/local/models
      - /data/gpustack/data:/var/lib/gpustack
    environment:
      - TZ=Asia/Shanghai
      - GPUSTACK_LOG_LEVEL=info

启动:docker-compose up -d

查看启动日志:docker-compose logs -f

待完全启动成功浏览器访问http://${ip}:9090,admin账户初始化密码存放在数据目录文件initial_admin_password

image

 五、worker节点安装GPUStack

创建用户及安装所需依赖

groupadd HwHiAiUser
useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash
yum install -y make dkms gcc kernel-headers-$(uname -r) kernel-devel-$(uname -r)

1、安装华为昇腾docker-runtime

https://www.hiascend.com/document/detail/zh/mindcluster/72rc1/clustersched/dlug/dlug_installation_017.html

./Ascend-docker-runtime_7.3.0_linux-aarch64.run --install

image

 2、安装华为昇腾驱动及固件

https://www.hiascend.com/document/detail/zh/canncommercial/850/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=CTyunOS

./Ascend-hdk-910b-npu-driver_25.5.0_linux-aarch64.run --full --install-for-all

image

./Ascend-hdk-910b-npu-firmware_7.8.0.5.216.run --full

image

 根据系统提示重启使其驱动及固件生效

reboot

查看卡信息:npu-smi info

image

 3、安装GPUStack

浏览器访问server地址,添加节点获取节点纳管命令,gpustack-worker  docker-compose.yaml根据此命令改造

image

 部署时遇到了以下报错

image

 排查下来总结是因为运行时得问题,官方提示部署docker-runtime

image

 我这同时回显,索性两者都进行了安装,containerd也启动,最后部署成功

image

containerd.service配置

[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target

[Service]
# 加载overlay文件系统,容器必备
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd

Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# 解除系统资源限制
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

启动containerd:systemctl start containerd

gpustack-worker  docker-compose.yaml 如下

services:
  gpustack-worker:
    image: gpustack/gpustack:v2.0
    pull_policy: if_not_present
    container_name: gpustack-worker
    restart: unless-stopped
    privileged: true
    network_mode: host
    runtime: ascend
    environment:
      - GPUSTACK_RUNTIME_DEPLOY_MIRRORED_NAME=gpustack-worker
      - GPUSTACK_TOKEN=gpustack_b82173ef13b50b55_e0e563a28bb83906bca120dae68fcaf7
      - ASCEND_VISIBLE_DEVICES=${ASCEND_VISIBLE_DEVICES:-0}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /data/nfs/gpustack/models:/usr/local/models
      - /data/gpustack/data:/var/lib/gpustack
    command: >
      --server-url http://192.9.xxx.xxx:9090
      --worker-ip 192.9.xxx.xxx

编写启动脚本start.sh自动获取昇腾设备ID注入变量

#!/bin/bash
# 自动获取昇腾设备ID
export ASCEND_VISIBLE_DEVICES=$(sudo ls /dev/davinci* | head -1 | grep -o '[0-9]\+' || echo "0")
# 启动服务
docker-compose up -d

执行 ./start.sh

image

 浏览器访问server节点查看节点状态

image

 

posted @ 2026-02-04 16:50  sxFu9528  阅读(2)  评论(0)    收藏  举报