Kubernetes Operator入门开发指南

Kubernetes Operator 入门开发指南

使用 Python 和 Kopf 框架编写一个简单的 Kubernetes Operator。我们将创建一个名为 WebSite 的自定义资源,当用户创建该资源时,Operator 会自动为其创建一个 Deployment 和一个 Service,并在资源删除时清理这些资源。

准备工作

  • 一个可用的 Kubernetes 集群

  • Python 3.8+

  • 安装 kopfkubernetes Python 包:

    pip install kopf kubernetes
    

    熟悉基本的 Kubernetes 概念(Deployment、Service、CRD)

步骤 1:定义自定义资源(CRD)

首先,我们需要在集群中注册自定义资源。创建一个名为 crd-website.yaml 的文件:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: websites.example.com
spec:
  group: example.com
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                image:
                  type: string
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 10
              required: ["image"]
  scope: Namespaced
  names:
    plural: websites
    singular: website
    kind: WebSite
    shortNames:
      - ws

将该 CRD 部署到集群:

kubectl apply -f crd-website.yaml

步骤 2:编写 Operator 代码

创建文件 operator.py,内容如下:

import kopf
import kubernetes
import yaml

# 当 WebSite 资源创建或更新时触发
@kopf.on.create('example.com', 'v1', 'websites')
@kopf.on.update('example.com', 'v1', 'websites')
def create_or_update_website(spec, name, namespace, logger, **kwargs):
    # 从 spec 中获取参数
    image = spec.get('image')
    replicas = spec.get('replicas', 1)

    # 构建 Deployment 和 Service 的 YAML 定义
    deployment_manifest = {
        'apiVersion': 'apps/v1',
        'kind': 'Deployment',
        'metadata': {
            'name': f'{name}-deployment',
            'namespace': namespace,
            'labels': {'app': name}
        },
        'spec': {
            'replicas': replicas,
            'selector': {'matchLabels': {'app': name}},
            'template': {
                'metadata': {'labels': {'app': name}},
                'spec': {
                    'containers': [{
                        'name': 'web',
                        'image': image,
                        'ports': [{'containerPort': 80}]
                    }]
                }
            }
        }
    }

    service_manifest = {
        'apiVersion': 'v1',
        'kind': 'Service',
        'metadata': {
            'name': f'{name}-service',
            'namespace': namespace
        },
        'spec': {
            'selector': {'app': name},
            'ports': [{'protocol': 'TCP', 'port': 80, 'targetPort': 80}],
            'type': 'ClusterIP'
        }
    }

    # 加载 Kubernetes 配置(在集群内运行时会自动加载 ServiceAccount)
    api = kubernetes.client.AppsV1Api()
    v1 = kubernetes.client.CoreV1Api()

    # 创建或更新 Deployment
    try:
        # 检查 Deployment 是否已存在
        api.read_namespaced_deployment(name=f'{name}-deployment', namespace=namespace)
        # 如果存在则更新
        api.patch_namespaced_deployment(name=f'{name}-deployment', namespace=namespace, body=deployment_manifest)
        logger.info(f"Deployment {name}-deployment updated")
    except kubernetes.client.rest.ApiException as e:
        if e.status == 404:
            # 不存在则创建
            api.create_namespaced_deployment(namespace=namespace, body=deployment_manifest)
            logger.info(f"Deployment {name}-deployment created")
        else:
            raise

    # 创建或更新 Service
    try:
        v1.read_namespaced_service(name=f'{name}-service', namespace=namespace)
        v1.patch_namespaced_service(name=f'{name}-service', namespace=namespace, body=service_manifest)
        logger.info(f"Service {name}-service updated")
    except kubernetes.client.rest.ApiException as e:
        if e.status == 404:
            v1.create_namespaced_service(namespace=namespace, body=service_manifest)
            logger.info(f"Service {name}-service created")
        else:
            raise

    # 更新自定义资源的状态(可选)
    return {'status': 'deployed', 'url': f'http://{name}-service.{namespace}.svc.cluster.local'}


# 当 WebSite 资源被删除时触发
@kopf.on.delete('example.com', 'v1', 'websites')
def delete_website(name, namespace, logger, **kwargs):
    api = kubernetes.client.AppsV1Api()
    v1 = kubernetes.client.CoreV1Api()

    # 删除 Deployment
    try:
        api.delete_namespaced_deployment(name=f'{name}-deployment', namespace=namespace)
        logger.info(f"Deployment {name}-deployment deleted")
    except kubernetes.client.rest.ApiException as e:
        if e.status != 404:
            raise

    # 删除 Service
    try:
        v1.delete_namespaced_service(name=f'{name}-service', namespace=namespace)
        logger.info(f"Service {name}-service deleted")
    except kubernetes.client.rest.ApiException as e:
        if e.status != 404:
            raise

步骤 3:容器化 Operator

为了让 Operator 在 Kubernetes 集群中运行,我们需要将其打包成 Docker 镜像。

创建 Dockerfile

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY operator.py .

CMD ["kopf", "run", "/app/operator.py", "--verbose"]

创建 requirements.txt

kopf
kubernetes
pyyaml

构建镜像(假设镜像名为 my-website-operator:latest):

docker build -t my-website-operator:latest .

步骤 4:部署 Operator 到 Kubernetes

创建必要的 RBAC 权限,因为 Operator 需要创建/删除 Deployment 和 Service。创建一个名为 rbac.yaml 的文件:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: website-operator
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: website-operator
rules:
- apiGroups: ["example.com"]
  resources: ["websites"]
  verbs: ["get", "list", "watch", "patch", "update"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["services"]
  verbs: ["*"]
- apiGroups: ["apiextensions.k8s.io"]
  resources: ["customresourcedefinitions"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: website-operator
subjects:
- kind: ServiceAccount
  name: website-operator
  namespace: default
roleRef:
  kind: ClusterRole
  name: website-operator
  apiGroup: rbac.authorization.k8s.io

创建 Deployment deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: website-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      app: website-operator
  template:
    metadata:
      labels:
        app: website-operator
    spec:
      serviceAccountName: website-operator
      containers:
      - name: operator
        image: my-website-operator:latest
        imagePullPolicy: IfNotPresent

应用 RBAC 和 Deployment:

kubectl apply -f rbac.yaml
kubectl apply -f deployment.yaml

检查 Operator 日志:

kubectl logs -f deployment/website-operator

步骤 5:测试 Operator

现在可以创建自定义资源实例了。创建一个文件 example-website.yaml

apiVersion: example.com/v1
kind: WebSite
metadata:
  name: my-nginx
spec:
  image: nginx:latest
  replicas: 2

应用它:

kubectl apply -f example-website.yaml

查看 Operator 日志,应该能看到创建 Deployment 和 Service 的信息。检查 Kubernetes 资源:

kubectl get deployments
kubectl get services
kubectl get websites

应该能看到名为 my-nginx-deploymentmy-nginx-service 的资源。删除自定义资源:

kubectl delete website my-nginx

相应的 Deployment 和 Service 也会被自动删除。

进一步探索

  • Status 更新:在 create_or_update_website 函数中返回的字典会被自动填充到资源的 status 字段,用户可以通过 kubectl get website my-nginx -o yaml 查看。
  • 错误处理:Kopf 提供了重试机制,可以通过 @kopf.on.createretries 参数控制。
  • 更复杂的逻辑:可以根据业务需求添加更多自定义控制器行为。

总结

这个入门指南展示了如何使用 Python 和 Kopf 快速构建一个 Kubernetes Operator。Kopf 封装了事件监听、资源变化处理等底层细节,让开发者可以专注于业务逻辑。

posted @ 2026-02-15 00:39  wanghongwei-dev  阅读(13)  评论(0)    收藏  举报