在k3s 开启vGPU Time-Slicing

添加 Helm 仓库

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

创建 RuntimeClass

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia

 创建 配置文件

cat dp-example-config.yaml
version: v1
flags:
  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  plugin:
    passDeviceSpecs: false
    deviceListStrategy: "envvar"
    deviceIDStrategy: "uuid"
  gfd:
    oneshot: false
    noTimestamp: false
    outputFile: /etc/kubernetes/node-feature-discovery/features.d/gfd
    sleepInterval: 60s
sharing:
  timeSlicing:
    resources:
    - name: nvidia.com/gpu
      replicas: 10

 安装

helm template nvidia-device-plugin . -f values.yaml --set gfd.enabled=true --set-file config.map.config=/root/nvidia/dp-example-config.yaml --set runtimeClassName=nvidia --include-crds --dry-run --namespace nvidia-device-plugin > nvidia-device-plugin-with-time-slicing.yml

The answer to 2. is to include --include-crds in the helm template command.

https://github.com/NVIDIA/gpu-operator/issues/546

--set runtimeClassName=nvidia 是必需的,因为 K3s 自动发现 nvidia-container-runtime 不会将其配置为默认运行时

https://fissssssh.aiursoft.cn/posts/configure-nvidia-gpus-in-k3s/

 

 
posted @ 2025-04-03 18:47  有何m不可  阅读(42)  评论(0)    收藏  举报