crd+controller Operator

在 Kubernetes 中启动自定义控制器(如基于 Python 编写的 Operator),需将其部署为集群内的 Pod。以下是详细步骤及完整示例:

---

### **步骤 1: 容器化控制器**
将控制器代码打包为 Docker 镜像,确保能在 Kubernetes 中运行。

#### **1.1 编写 Dockerfile**
```dockerfile
# Dockerfile
FROM python:3.9-slim

# 安装依赖
RUN pip install kubernetes pyyaml

# 复制控制器代码
COPY controller.py /app/controller.py

# 设置启动命令
CMD ["python", "/app/controller.py"]
```

#### **1.2 构建并推送镜像**
```bash
docker build -t your-registry/my-controller:v1 .
docker push your-registry/my-controller:v1
```

---

### **步骤 2: 创建 Kubernetes 部署清单**
编写 Deployment 和 RBAC 权限配置文件。

#### **2.1 Deployment 配置**
```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-controller
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: my-controller
template:
metadata:
labels:
app: my-controller
spec:
serviceAccountName: my-controller-sa # 关联 RBAC 账号
containers:
- name: controller
image: your-registry/my-controller:v1
imagePullPolicy: Always
```

#### **2.2 RBAC 权限配置**
```yaml
# rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-controller-sa
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: my-controller-role
rules:
- apiGroups: ["example.com"] # CRD 的 API 组
resources: ["myapps"] # CRD 资源名称
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["*"]
- apiGroups: [""] # 核心 API 组(如 Pods)
resources: ["pods", "events"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: my-controller-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: my-controller-role
subjects:
- kind: ServiceAccount
name: my-controller-sa
namespace: default
```

---

### **步骤 3: 部署控制器**
#### **3.1 应用所有配置**
```bash
kubectl apply -f rbac.yaml # 权限配置
kubectl apply -f deployment.yaml # 控制器部署
```

#### **3.2 验证 Pod 状态**
```bash
kubectl get pods -l app=my-controller
# 预期输出:
# NAME READY STATUS RESTARTS AGE
# my-controller-5d8f7b6d4b-2xqzv 1/1 Running 0 10s
```

---

### **步骤 4: 测试控制器功能**
#### **4.1 创建 MyApp 实例**
```bash
kubectl apply -f myapp-instance.yaml # 使用之前定义的 CRD 实例
```

#### **4.2 检查资源创建**
```bash
kubectl get myapp # 查看 CR 状态
kubectl get deployments # 检查生成的 Deployment
kubectl describe myapp demo-app # 查看事件和状态更新
```

---

### **关键配置解析**
#### **1. RBAC 权限**
- **ServiceAccount**:为控制器分配身份。
- **ClusterRole**:定义控制器需要的权限(监听 CRD、管理 Deployment 等)。
- **ClusterRoleBinding**:将权限绑定到 ServiceAccount。

#### **2. 控制器高可用**
- **增加副本数**:修改 Deployment 的 `replicas` 字段,实现多实例容灾。
- **Leader Election**:添加分布式锁机制(需代码实现),避免多实例冲突。

#### **3. 日志与监控**
- **查看日志**:
```bash
kubectl logs -l app=my-controller --tail=100
```
- **Prometheus 监控**:暴露指标端点,配置 ServiceMonitor。

---

### **完整部署流程示意图**
```mermaid
graph TD
A[编写控制器代码] --> B[构建 Docker 镜像]
B --> C[推送镜像到仓库]
C --> D[定义 RBAC 权限]
D --> E[创建 Deployment]
E --> F[控制器监听 CRD]
F --> G[处理资源变更]
```

---

### **常见问题排查**
#### **1. 控制器未启动**
- **检查项**:
```bash
kubectl describe pod/my-controller-xxx # 查看 Events 错误信息
kubectl logs pod/my-controller-xxx # 检查启动日志
```
- **可能原因**:
- 镜像拉取失败(检查镜像路径和权限)
- RBAC 权限不足(检查 ClusterRoleBinding)

#### **2. 控制器不响应 CRD 变更**
- **检查项**:
```bash
kubectl get crd/myapps.example.com # 确认 CRD 已注册
kubectl get myapp # 确认 CR 实例存在
```
- **可能原因**:
- 控制器代码未正确处理事件
- RBAC 缺少 `watch` 权限

---

### **总结**
在 Kubernetes 中启动控制器的核心步骤为:**容器化代码 → 配置 RBAC → 部署为 Pod**。通过 Deployment 管理控制器生命周期,结合 RBAC 确保最小权限原则。实际生产环境中,还需添加就绪探针、资源限制和监控告警策略。






以下是 **Kubernetes 自定义控制器(Custom Controller)** 的完整实现示例与详细解释,涵盖 CRD 定义、控制器逻辑、部署流程及核心原理:

---

### **一、完整示例架构**
```mermaid
graph TD
  A[定义 CRD] --> B[编写控制器]
  B --> C[容器化控制器]
  C --> D[部署到集群]
  D --> E[用户创建 CR 实例]
  E --> F[控制器自动创建资源]
```

---

### **二、完整代码与配置**

#### **1. 定义 CRD(Custom Resource Definition)**
```yaml
# myapp-crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: myapps.example.com
spec:
  group: example.com
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                replicas:
                  type: integer
                  minimum: 1
                image:
                  type: string
  scope: Namespaced
  names:
    plural: myapps
    singular: myapp
    kind: MyApp
    shortNames:
      - ma
```

#### **2. 控制器代码(Python)**
```python
# controller.py
from kubernetes import client, config, watch
import time

# 加载 Kubernetes 配置
config.load_incluster_config()  # 集群内使用
api = client.CustomObjectsApi()
apps_api = client.AppsV1Api()

# CRD 参数
GROUP = "example.com"
VERSION = "v1alpha1"
PLURAL = "myapps"

def create_deployment(name, namespace, replicas, image):
    """根据 MyApp 资源创建 Deployment"""
    deployment = client.V1Deployment(
        metadata=client.V1ObjectMeta(name=f"{name}-deploy"),
        spec=client.V1DeploymentSpec(
            replicas=replicas,
            selector=client.V1LabelSelector(match_labels={"app": name}),
            template=client.V1PodTemplateSpec(
                metadata=client.V1ObjectMeta(labels={"app": name}),
                spec=client.V1PodSpec(
                    containers=[
                        client.V1Container(
                            name="main",
                            image=image,
                            ports=[client.V1ContainerPort(container_port=80)]
                    )
                )
            )
        )
    )
    apps_api.create_namespaced_deployment(namespace, deployment)

def update_status(name, namespace, status):
    """更新 MyApp 状态"""
    patch = {"status": {"phase": status}}
    api.patch_namespaced_custom_object_status(
        GROUP, VERSION, namespace, PLURAL, name, patch)

def reconcile(obj):
    """调和循环核心逻辑"""
    try:
        spec = obj.get("spec", {})
        meta = obj.get("metadata", {})
        name = meta.get("name")
        namespace = meta.get("namespace")
        replicas = spec.get("replicas", 1)
        image = spec.get("image", "nginx:alpine")

        # 检查 Deployment 是否存在
        deploy_name = f"{name}-deploy"
        apps_api.read_namespaced_deployment(deploy_name, namespace)
        update_status(name, namespace, "Ready")
    except client.ApiException as e:
        if e.status == 404:
            create_deployment(name, namespace, replicas, image)
            update_status(name, namespace, "Created")
        else:
            print(f"Error: {e}")

def main():
    """主监听循环"""
    w = watch.Watch()
    while True:
        try:
            for event in w.stream(
                api.list_cluster_custom_object,
                group=GROUP,
                version=VERSION,
                plural=PLURAL
            ):
                print(f"Event: {event['type']} {event['object']['metadata']['name']}")
                reconcile(event['object'])
        except Exception as e:
            print(f"Watch error: {e}")
            time.sleep(5)

if __name__ == "__main__":
    main()
```

#### **3. 容器化 Dockerfile**
```dockerfile
# Dockerfile
FROM python:3.9-slim
RUN pip install kubernetes
COPY controller.py /app/
CMD ["python", "/app/controller.py"]
```

#### **4. RBAC 权限配置**
```yaml
# rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp-controller-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: myapp-controller-role
rules:
- apiGroups: ["example.com"]
  resources: ["myapps"]
  verbs: ["get", "list", "watch", "update"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["pods", "events"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: myapp-controller-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: myapp-controller-role
subjects:
- kind: ServiceAccount
  name: myapp-controller-sa
  namespace: default
```

#### **5. 控制器 Deployment**
```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp-controller
  template:
    metadata:
      labels:
        app: myapp-controller
    spec:
      serviceAccountName: myapp-controller-sa
      containers:
      - name: controller
        image: your-registry/my-controller:v1
        imagePullPolicy: Always
```

---

### **三、部署与测试步骤**

#### **1. 部署所有组件**
```bash
# 应用 CRD
kubectl apply -f myapp-crd.yaml

# 应用 RBAC
kubectl apply -f rbac.yaml

# 构建并推送控制器镜像
docker build -t your-registry/my-controller:v1 .
docker push your-registry/my-controller:v1

# 部署控制器
kubectl apply -f deployment.yaml

# 验证控制器运行
kubectl get pods -l app=myapp-controller
```

#### **2. 创建自定义资源实例**
```yaml
# myapp-instance.yaml
apiVersion: example.com/v1alpha1
kind: MyApp
metadata:
  name: demo-app
spec:
  replicas: 3
  image: nginx:1.23
```

```bash
kubectl apply -f myapp-instance.yaml
```

#### **3. 验证结果**
```bash
# 查看 MyApp 状态
kubectl get myapp -o wide

# 检查生成的 Deployment
kubectl get deployments -l app=demo-app

# 查看事件日志
kubectl describe myapp demo-app
```

---

### **四、核心原理详解**

#### **1. 调和循环(Reconciliation Loop)**
- **监听机制**:通过 Kubernetes Watch API 实时监听 MyApp 资源变更
- **状态对比**:比较 `spec`(期望状态)与当前集群实际状态
- **执行操作**:创建/更新/删除关联资源(如 Deployment)
- **状态更新**:将操作结果写入 `status` 字段

#### **2. 事件驱动流程**
```mermaid
sequenceDiagram
  participant User
  participant APIServer
  participant Controller
  participant Deployment

  User->>APIServer: 创建 MyApp 实例
  APIServer->>Controller: 发送 ADDED 事件
  Controller->>APIServer: 查询当前状态
  Controller->>Deployment: 创建新 Deployment
  Controller->>APIServer: 更新 MyApp 状态
```

#### **3. 关键设计模式**
- **声明式 API**:用户只需声明目标状态
- **水平扩展**:通过多个控制器副本实现高可用
- **幂等操作**:重复执行调和操作不会产生副作用

---

### **五、进阶扩展建议**

#### **1. 增强功能**
- **Finalizers**:处理资源删除时的清理操作
  ```python
  metadata:
    finalizers: ["example.com/finalizer"]
  ```
- **状态子资源**:分离 `status` 字段更新权限
  ```yaml
  subresources:
    status: {}
  ```
- **版本迁移**:实现 CRD 多版本转换

#### **2. 生产优化**
- **Leader Election**:多副本控制器选主
  ```python
  from kubernetes.client import CoordinationV1Api
  lease = CoordinationV1Api().create_namespaced_lease(...)
  ```
- **指标暴露**:集成 Prometheus 监控
  ```python
  from prometheus_client import start_http_server
  start_http_server(8000)
  ```
- **资源限制**:配置 Pod 资源请求/限制
  ```yaml
  resources:
    limits:
      cpu: 500m
      memory: 256Mi
  ```

---

### **六、故障排查指南**

| **现象**                | **检查步骤**                                                                 |
|-------------------------|-----------------------------------------------------------------------------|
| 控制器无法启动          | 1. 检查镜像地址是否正确<br>2. 查看 Pod 日志 `kubectl logs <pod-name>`       |
| MyApp 状态未更新        | 1. 确认 RBAC 权限配置<br>2. 检查控制器是否处理事件<br>3. 验证 CRD 是否注册  |
| Deployment 未创建       | 1. 查看控制器日志<br>2. 确认 ServiceAccount 绑定正确<br>3. 检查 API 版本兼容性 |
| 资源删除后未清理        | 1. 实现 Finalizers 逻辑<br>2. 监听 DELETE 事件                             |

---

通过此完整示例,您可以在 Kubernetes 中实现从自定义资源定义到自动化运维的完整闭环。这种模式可广泛应用于数据库管理、CI/CD 流水线、多云编排等复杂场景,真正实现基础设施即代码(IaC)。
 

  

posted on 2025-05-10 14:50  吃草的青蛙  阅读(40)  评论(0)    收藏  举报

导航