koordinator to hami

生产环境升级步骤

调度器升级

生产环境中已经有了 hami,但没有 koordinator,为了调整简单些,通过 helm chart 将 koordinator 部署在 hami 所在的 namesapce 中

kubectl annotate namespace hami meta.helm.sh/release-name=koordinator
kubectl annotate namespace hami meta.helm.sh/release-namespace=hami
kubectl label namespace hami app.kubernetes.io/managed-by=Helm

cd ./koordinator
helm install koordinator -n hami .

修改 koord-scheduler-config configmap,直接将下面的 configmap 保存后,apply 下,整体更新

apiVersion: v1
data:
  koord-scheduler-config: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: true
      resourceLock: leases
      resourceName: koord-scheduler
      resourceNamespace: hami
    extenders:
    - urlPrefix: "https://127.0.0.1:443"
      filterVerb: filter
      bindVerb: bind
      nodeCacheCapable: true
      weight: 1
      httpTimeout: 30s
      enableHTTPS: true
      tlsConfig:
        insecure: true
      managedResources:
      - name: nvidia.com/gpu
        ignoredByScheduler: true
      - name: nvidia.com/gpumem
        ignoredByScheduler: true
      - name: nvidia.com/gpucores
        ignoredByScheduler: true
      - name: nvidia.com/gpumem-percentage
        ignoredByScheduler: true
      - name: nvidia.com/priority
        ignoredByScheduler: true
      - name: cambricon.com/vmlu
        ignoredByScheduler: true
      - name: hygon.com/dcunum
        ignoredByScheduler: true
      - name: hygon.com/dcumem
        ignoredByScheduler: true
      - name: hygon.com/dcucores
        ignoredByScheduler: true
      - name: iluvatar.ai/vgpu
        ignoredByScheduler: true
      - name: "metax-tech.com/gpu"
        ignoredByScheduler: true
      - name: metax-tech.com/sgpu
        ignoredByScheduler: true
      - name: metax-tech.com/vcore
        ignoredByScheduler: true
      - name: metax-tech.com/vmemory
        ignoredByScheduler: true
      - name: huawei.com/Ascend910A
        ignoredByScheduler: true
      - name: huawei.com/Ascend910A-memory
        ignoredByScheduler: true
      - name: huawei.com/Ascend910B2
        ignoredByScheduler: true
      - name: huawei.com/Ascend910B2-memory
        ignoredByScheduler: true
      - name: huawei.com/Ascend910B
        ignoredByScheduler: true
      - name: huawei.com/Ascend910B-memory
        ignoredByScheduler: true
      - name: huawei.com/Ascend910B4
        ignoredByScheduler: true
      - name: huawei.com/Ascend910B4-memory
        ignoredByScheduler: true
      - name: huawei.com/Ascend310P
        ignoredByScheduler: true
      - name: huawei.com/Ascend310P-memory
        ignoredByScheduler: true
    profiles:
      - pluginConfig:
        - name: NodeResourcesFit
          args:
            apiVersion: kubescheduler.config.k8s.io/v1
            kind: NodeResourcesFitArgs
            scoringStrategy:
              type: LeastAllocated
              resources:
                - name: cpu
                  weight: 1
                - name: memory
                  weight: 1
                - name: "kubernetes.io/batch-cpu"
                  weight: 1
                - name: "kubernetes.io/batch-memory"
                  weight: 1
        - name: LoadAwareScheduling
          args:
            apiVersion: kubescheduler.config.k8s.io/v1
            kind: LoadAwareSchedulingArgs
            filterExpiredNodeMetrics: false
            nodeMetricExpirationSeconds: 300
            resourceWeights:
              cpu: 1
              memory: 1
            usageThresholds:
              cpu: 65
              memory: 95
            # disable by default
            # prodUsageThresholds indicates the resource utilization threshold of Prod Pods compared to the whole machine.
            # prodUsageThresholds:
            #   cpu: 55
            #   memory: 75
            # scoreAccordingProdUsage controls whether to score according to the utilization of Prod Pod
            # scoreAccordingProdUsage: true
            # aggregated supports resource utilization filtering and scoring based on percentile statistics
            aggregated:
              usageThresholds:
                cpu: 65
                memory: 95
              usageAggregationType: "p95"
              scoreAggregationType: "p95"
            estimatedScalingFactors:
              cpu: 85
              memory: 70
        - name: ElasticQuota
          args:
            apiVersion: kubescheduler.config.k8s.io/v1
            kind: ElasticQuotaArgs
            quotaGroupNamespace: hami
        plugins:
          queueSort:
            disabled:
              - name: "*"
            enabled:
              - name: Coscheduling
          preFilter:
            enabled:
              - name: Coscheduling
              - name: Reservation
              - name: NodeNUMAResource
              - name: DeviceShare
              - name: ElasticQuota
          filter:
            enabled:
              - name: Reservation
              - name: LoadAwareScheduling
              - name: NodeNUMAResource
              - name: DeviceShare
          postFilter:
            disabled:
              - name: "*"
            enabled:
              - name: Reservation
              - name: Coscheduling
              - name: ElasticQuota
              - name: DefaultPreemption
          preScore:
            enabled:
              - name: Reservation # The Reservation plugin must come first
          score:
            enabled:
              - name: LoadAwareScheduling
                weight: 1
              - name: NodeNUMAResource
                weight: 1
              - name: DeviceShare
                weight: 1
              - name: Reservation
                weight: 5000
          reserve:
            enabled:
              - name: Reservation # The Reservation plugin must come first
              - name: LoadAwareScheduling
              - name: NodeNUMAResource
              - name: DeviceShare
              - name: Coscheduling
              - name: ElasticQuota
          permit:
            enabled:
              - name: Coscheduling
          preBind:
            enabled:
              - name: NodeNUMAResource
              - name: DeviceShare
              - name: Reservation
              - name: DefaultPreBind
          bind:
            disabled:
              - name: "*"
            enabled:
              - name: Reservation
              - name: DefaultBinder
          postBind:
            enabled:
              - name: Coscheduling
        schedulerName: hami-scheduler
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: koordinator
    meta.helm.sh/release-namespace: hami
  labels:
    app.kubernetes.io/managed-by: Helm
  name: koord-scheduler-config
  namespace: hami

通过 helm 安装后,修改 koordinator 的 deployment 增加 hami 的 container 和 volume 信息

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: koordinator
    meta.helm.sh/release-namespace: default
  labels:
    app.kubernetes.io/managed-by: Helm
    koord-app: koord-scheduler
  name: koord-scheduler
  namespace: hami
  resourceVersion: "350146"
  uid: 89510e69-de4c-46c5-b9ac-aada7abbc123
spec:
  minReadySeconds: 3
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      koord-app: koord-scheduler
  strategy:
    rollingUpdate:
      maxSurge: 100%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        koord-app: koord-scheduler
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: koord-app
                  operator: In
                  values:
                  - koord-scheduler
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - --port=10251
        - --authentication-skip-lookup=true
        - --v=4
        - --feature-gates=
        - --config=/config/koord-scheduler.config
        command:
        - /koord-scheduler
        image: registry.cn-beijing.aliyuncs.com/koordinator-sh/koord-scheduler:v1.6.0
        imagePullPolicy: Always
        name: scheduler
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: healthz
            port: 10251
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "1"
            memory: 1Gi
          requests:
            cpu: 500m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /config
          name: koord-scheduler-config-volume
    # 基于原 koordinator deployment 增加的 hami container
      - command:  
        - scheduler
        - --http_bind=0.0.0.0:443
        - --cert_file=/tls/tls.crt
        - --key_file=/tls/tls.key
        - --scheduler-name=hami-scheduler
        - --metrics-bind-address=:9395
        - --node-scheduler-policy=binpack
        - --gpu-scheduler-policy=spread
        - --device-config-file=/device-config.yaml
        - --enable-ascend=true
        - --debug
        - -v=4
        env:
        - name: HAMI_NODELOCK_EXPIRE
          value: 5m
        image: 10.62.48.94:30085/hami/hami:v2.6.1 # 以最终发版为准
        imagePullPolicy: IfNotPresent
        name: vgpu-scheduler-extender
        ports:
        - containerPort: 443
          name: http
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /tls
          name: tls-config
        - mountPath: /device-config.yaml
          name: device-config
          subPath: device-config.yaml
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: koord-scheduler
      serviceAccountName: koord-scheduler
      terminationGracePeriodSeconds: 10
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: koord-scheduler-config
            path: koord-scheduler.config
          name: koord-scheduler-config
        name: koord-scheduler-config-volume
       # 增加的 hami
      - name: tls-config
        secret:
          defaultMode: 420
          secretName: hami-scheduler-tls
      - configMap:
          defaultMode: 420
          name: hami-scheduler-newversion
        name: scheduler-config
      - configMap:
          defaultMode: 420
          name: hami-scheduler-device
        name: device-config
  1. 修改 hami-scheduler service 的 selector,指向 koord-scheduler pod
    目前产品中创建 Pod 均为指定 schedulerName, 是通过 hami 的 mutatingwebhook 实现, 将 hami-scheduler
    services 指向上面部署的 koordinator+hami 结合的 Pod
  2. 修改 hami-scheduler clusterrolebinding,绑定 koord-scheduler serviceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    meta.helm.sh/release-name: hami
    meta.helm.sh/release-namespace: koordinator-system
  labels:
    app.kubernetes.io/component: hami-scheduler
    app.kubernetes.io/instance: hami
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: hami
    app.kubernetes.io/version: 2.6.1
    helm.sh/chart: hami-2.6.1
  name: hami-scheduler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: hami-scheduler
subjects:
- kind: ServiceAccount
  name: hami-scheduler
  namespace: koordinator-system
 # 增加的 koordiantor 的 serviceAccount
- kind: ServiceAccount
  name: koord-scheduler
  namespace: koordinator-system
  1. 修改 koordinator-scheduler 的 clusterRole,增加 nodes 的 patch 操作
kubectl edit clusterrole koord-scheduler-role

- apiGroups:
  - ""
  resources:
  - pods
  - nodes   # 增加字段
  verbs:
  - patch
  - update


拓扑监控

  1. seriov-exporter daemonset 部署
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app.kubernetes.io/name: sriov-metrics-exporter
    app.kubernetes.io/version: v0.0.1
  name: sriov-metrics-exporter
  namespace: nvidia-network-operator
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/name: sriov-metrics-exporter
  template:
    metadata:
      labels:
        app.kubernetes.io/name: sriov-metrics-exporter
        app.kubernetes.io/version: v0.0.1
    spec:
      hostNetwork: true
      containers:
      - args:
        - --path.kubecgroup=/host/kubecgroup
        - --path.sysbuspci=/host/sys/bus/pci/devices/
        - --path.sysclassnet=/host/sys/class/net/
        - --path.cpucheckpoint=/host/cpu_manager_state
        - --path.kubeletsocket=/host/kubelet.sock
        - --collector.kubepoddevice=true
        - --collector.vfstatspriority=sysfs,netlink
        image: ghcr.io/k8snetworkplumbingwg/sriov-network-metrics-exporter:latest
        imagePullPolicy: Always
        name: sriov-metrics-exporter
        resources:
          requests:
            memory: 100Mi
            cpu: 100m
          limits:
            memory: 100Mi
            cpu: 100m
        securityContext:
          capabilities:
            drop:
              - ALL
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
        volumeMounts:
        - mountPath: /host/kubelet.sock
          name: kubeletsocket
        - mountPath: /host/sys/bus/pci/devices
          name: sysbuspcidevices
          readOnly: true
        - mountPath: /host/sys/devices
          name: sysdevices
          readOnly: true
        - mountPath: /host/sys/class/net
          name: sysclassnet
          readOnly: true
        - mountPath: /host/kubecgroup
          name: kubecgroup
          readOnly: true
        - mountPath: /host/cpu_manager_state
          name: cpucheckpoint
          readOnly: true
      nodeSelector:
        kubernetes.io/os: linux
        feature.node.kubernetes.io/network-sriov.capable: "true"
      restartPolicy: Always
      tolerations:
      - operator: Exists
      volumes:
      - hostPath:
          path: /var/lib/kubelet/pod-resources/kubelet.sock
          type: "Socket"
        name: kubeletsocket
      - hostPath:
          path: /sys/fs/cgroup/kubepods.slice/
          type: "Directory"
        name: kubecgroup
      - hostPath:
          path: /var/lib/kubelet/cpu_manager_state
          type: "File"
        name: cpucheckpoint
      - hostPath:
          path: /sys/class/net
          type: "Directory"
        name: sysclassnet
      - hostPath:
          path: /sys/bus/pci/devices
          type: "Directory"
        name: sysbuspcidevices
      - hostPath:
          path: /sys/devices
          type: "Directory"
        name: sysdevices
---
apiVersion: v1
kind: Service
metadata:
  name: sriov-metrics-exporter
  namespace: nvidia-network-operator
  annotations:
    prometheus.io/target: "true"
spec:
  selector:
    app.kubernetes.io/name: sriov-metrics-exporter
  ports:
    - protocol: TCP
      port: 9808
      targetPort: 9808
  1. 新增 seriov-exporter 的 vmrules
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
  annotations:
    meta.helm.sh/release-name: sriov-exporter
    meta.helm.sh/release-namespace: monitoring-system
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: sriov
    group: kubegien
    kubegien.org/rule-level: builtin
  name: sriov-exporter
  namespace: monitoring-system
spec:
  groups:
  - name: sriov.rules
    rules:
    - expr: |
        sum(rate(sriov_vf_tx_bytes{}[5m])) by (pf,node)
      record: sriov_vf_tx_bytes:sum
    - expr: |
        sum(rate(sriov_vf_rx_bytes{}[5m])) by (pf,node)
      record: sriov_vf_rx_bytes:sum
  1. 新增 seriov-exporter 的 vmPodScrape
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMPodScrape
metadata:
  annotations:
    meta.helm.sh/release-name: seriov-exporter
    meta.helm.sh/release-namespace: monitoring-system
  labels:
    app.kubernetes.io/name: seriov-exporter
  name: seriov-exporter
  namespace: monitoring-system
spec:
  namespaceSelector:
    any: true
  podMetricsEndpoints:
  - interval: 5m
    path: /metrics
    port: prometheus
    relabelConfigs:
    - action: replace
      regex: (.*)
      replacement: $1
      sourceLabels:
      - __meta_kubernetes_pod_node_name
      targetLabel: node
  selector:
    matchLabels:
      app.kubernetes.io/name: sriov-metrics-exporter
  1. 节点 numa 内存指标 修改 node-exporter daemonset,增加参数
      - --collector.meminfo_numa
# https://rigitlab.gientech.com/P6085080/components-helmchart/-/commit/3772a900e65f370fd34e2f1322b33a988518c65e

    - expr: sum by (node, numa) (label_replace(label_replace(node_memory_numa_MemUsed,"numa",
        "$1", "node", "(.*)"),"node", "$1", "instance", "(.*)"))
      record: node:node_numa_memory_bytes_used:sum
    - expr: sum by (node, numa) (label_replace(label_replace(node_memory_numa_MemTotal,"numa",
        "$1", "node", "(.*)"),"node", "$1", "instance", "(.*)"))
      record: node:node_numa_memory_bytes_total:sum
    - expr: node:node_numa_memory_bytes_used:sum / node:node_numa_memory_bytes_total:sum
      record: node:node_numa_memory_utilization

  1. 增加 dcgm-exporter recording rule 规则
#https://rigitlab.gientech.com/P6085080/components-helmchart/-/commit/f5c627eed5a806804d46f8a60025e6a4a3910b3b

        - expr: |
            sum by (uuid, driver, deviceType, node) ( label_replace (label_replace(label_replace(floor (DCGM_FI_DEV_FB_USED * 1024) * on (namespace, pod) group_left(node, host_ip, role)  node_namespace_pod:kube_pod_info:, "driver", "$1", "DCGM_FI_DRIVER_VERSION",  "(.+)" ), "uuid",  "$1", "UUID", "(.+)" ), "deviceType", "$1", "modelName", "(.+)"))
          record: gpu:nv_memory_usage_in_byte:sum

kueue 的更新

1. 配置:
# 更新 intergrations
    integrations:
      frameworks:
      - batch/job
      - kubeflow.org/mpijob
      - ray.io/rayjob
      - ray.io/raycluster
      - jobset.x-k8s.io/jobset
      - trainer.kubeflow.org/trainjob
      - kubeflow.org/paddlejob
      - kubeflow.org/pytorchjob
      - kubeflow.org/tfjob
      - kubeflow.org/xgboostjob
      - kubeflow.org/jaxjob
      - workload.codeflare.dev/appwrapper
      - pod
      - deployment
      - statefulset

#  kueue 参数调整,feature-gates 改为
 --feature-gates=TopologyAwareScheduling=true,VisibilityOnDemand=false,ElasticJobsViaWorkloadSlices=true
2. CRD
	1. 删除 就的 kueue topology kubectl delete crd topologies.kueue.x-k8s.io
	2. kubectl apply -f ./topologies.yaml
	
3. Kueue 最新的镜像
   10.62.48.94:30085/cap-system/kueue:v0.14.0-devel-329-g19ebce029-1

创建默认的 topology

apiVersion: kueue.x-k8s.io/v1beta1
kind: Topology
metadata:
  name: "default"
spec:
  levels:
  - nodeLabel: "topology.kubegien.org/switch"
  - nodeLabel: "topology.kubegien.org/rack"
  - nodeLabel: "kubernetes.io/hostname"
---

# 以上 yaml 保存成文件,通过 kubectl create  -f 创建
#注意: 已有 resourceFlavor 的环境,当没有配置  `topologyName` 时,可以新增,若已配置,则不可以修改。
# 更新环境时,需要修改已有的 resourceFlavor,增加 topologyName: default

创建 infraNode 资源

kubectl create -f https://rigitlab.gientech.com/cprd/ctl-icm/components-helmchart/-/commit/5564572016e6f4253c8fd24ef4d10863c4a8e872

限制

  1. 当分区开启了拓扑感知调度时,及在 resourcflavor 上增加了 topologyName 字段,那么使用这个分区的资源池的服务就需要设置节点拓扑亲和,这对现有的服务会存在一定的影响,需要更新正在运行的业务服务,如默认增加如下 label
kueue.x-k8s.io/podset-preferred-topology:topology.kubegien.org/switch 

部署时,在节点上打上标签

node.koordinator.sh/numa-topology-policy: Restricted

我来为您展示一个完整的示例:创建 Deployment -> Service -> Ingress,并使用 IngressClass nginx 暴露服务。

完整的 YAML 配置

1. 创建 Deployment

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-doc-deployment
  namespace: default
  labels:
    app: api-doc
    version: v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-doc
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: api-doc
        version: v1
    spec:
      containers:
      - name: api-doc
        image: swaggerapi/swagger-ui:v5.9.0  # 使用 Swagger UI 作为示例
        # 如果您有自己的 API 文档镜像,替换这里
        # image: your-api-doc-image:tag
        ports:
        - containerPort: 8080
          name: http
        env:
        - name: SWAGGER_JSON
          value: "/api-docs/openapi.json"
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"
        readinessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10

2. 创建 Service

service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: api-doc-service
  namespace: default
  labels:
    app: api-doc
spec:
  type: ClusterIP
  selector:
    app: api-doc
  ports:
  - name: http
    port: 80
    targetPort: 8080
    protocol: TCP

3. 创建 Ingress(使用 IngressClass nginx)

ingress.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-doc-ingress
  namespace: default
  annotations:
    # Nginx Ingress 控制器相关注解
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "false"  # 如果是 HTTP
    # 如果是 HTTPS,使用以下配置
    # nginx.ingress.kubernetes.io/ssl-redirect: "true"
    # nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    
    # 其他有用的注解
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
spec:
  ingressClassName: nginx  # 指定使用 nginx IngressClass
  rules:
  - host: api-doc.example.com  # 替换为您的域名
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-doc-service
            port:
              number: 80

4. 批量创建所有资源

# 创建所有资源
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml

# 或者使用一个文件包含所有配置
cat <<EOF > api-doc-all.yaml
# 这里包含上面的三个YAML配置,用 --- 分隔
EOF
kubectl apply -f api-doc-all.yaml

验证配置

# 查看所有资源状态
kubectl get deployment,service,ingress -l app=api-doc

# 查看详细状态
kubectl describe deployment api-doc-deployment
kubectl describe service api-doc-service
kubectl describe ingress api-doc-ingress

# 查看 Pod 状态
kubectl get pods -l app=api-doc

# 查看日志
kubectl logs -l app=api-doc --tail=50

测试访问

# 如果是本地测试,可以使用端口转发
kubectl port-forward service/api-doc-service 8080:80

# 然后访问 http://localhost:8080

# 或者通过 Ingress Controller 的 LoadBalancer IP
kubectl get svc -n ingress-nginx  # 查看 nginx ingress controller 的外部IP

# 使用 curl 测试(修改 hosts 文件或使用 Host 头)
curl -H "Host: api-doc.example.com" http://<INGRESS-EXTERNAL-IP>/

重要说明

  1. IngressClass 前提条件

    • 需要先安装 Nginx Ingress Controller
    • 确保 IngressClass nginx 存在:
      kubectl get ingressclass
      
  2. 安装 Nginx Ingress Controller(如果没有)

    # 使用 Helm(推荐)
    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    helm install ingress-nginx ingress-nginx/ingress-nginx \
      --namespace ingress-nginx \
      --create-namespace
    
    # 或者使用 kubectl
    kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.2/deploy/static/provider/cloud/deploy.yaml
    
  3. DNS/域名配置

    • 需要在 DNS 中将 api-doc.example.com
posted @ 2025-12-05 15:01  鬼魅巫师  阅读(4)  评论(0)    收藏  举报