在 Kubernetes 中启动自定义控制器(如基于 Python 编写的 Operator),需将其部署为集群内的 Pod。以下是详细步骤及完整示例:
---
### **步骤 1: 容器化控制器**
将控制器代码打包为 Docker 镜像,确保能在 Kubernetes 中运行。
#### **1.1 编写 Dockerfile**
```dockerfile
# Dockerfile
FROM python:3.9-slim
# 安装依赖
RUN pip install kubernetes pyyaml
# 复制控制器代码
COPY controller.py /app/controller.py
# 设置启动命令
CMD ["python", "/app/controller.py"]
```
#### **1.2 构建并推送镜像**
```bash
docker build -t your-registry/my-controller:v1 .
docker push your-registry/my-controller:v1
```
---
### **步骤 2: 创建 Kubernetes 部署清单**
编写 Deployment 和 RBAC 权限配置文件。
#### **2.1 Deployment 配置**
```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-controller
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: my-controller
template:
metadata:
labels:
app: my-controller
spec:
serviceAccountName: my-controller-sa # 关联 RBAC 账号
containers:
- name: controller
image: your-registry/my-controller:v1
imagePullPolicy: Always
```
#### **2.2 RBAC 权限配置**
```yaml
# rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-controller-sa
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: my-controller-role
rules:
- apiGroups: ["example.com"] # CRD 的 API 组
resources: ["myapps"] # CRD 资源名称
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["*"]
- apiGroups: [""] # 核心 API 组(如 Pods)
resources: ["pods", "events"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: my-controller-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: my-controller-role
subjects:
- kind: ServiceAccount
name: my-controller-sa
namespace: default
```
---
### **步骤 3: 部署控制器**
#### **3.1 应用所有配置**
```bash
kubectl apply -f rbac.yaml # 权限配置
kubectl apply -f deployment.yaml # 控制器部署
```
#### **3.2 验证 Pod 状态**
```bash
kubectl get pods -l app=my-controller
# 预期输出:
# NAME READY STATUS RESTARTS AGE
# my-controller-5d8f7b6d4b-2xqzv 1/1 Running 0 10s
```
---
### **步骤 4: 测试控制器功能**
#### **4.1 创建 MyApp 实例**
```bash
kubectl apply -f myapp-instance.yaml # 使用之前定义的 CRD 实例
```
#### **4.2 检查资源创建**
```bash
kubectl get myapp # 查看 CR 状态
kubectl get deployments # 检查生成的 Deployment
kubectl describe myapp demo-app # 查看事件和状态更新
```
---
### **关键配置解析**
#### **1. RBAC 权限**
- **ServiceAccount**:为控制器分配身份。
- **ClusterRole**:定义控制器需要的权限(监听 CRD、管理 Deployment 等)。
- **ClusterRoleBinding**:将权限绑定到 ServiceAccount。
#### **2. 控制器高可用**
- **增加副本数**:修改 Deployment 的 `replicas` 字段,实现多实例容灾。
- **Leader Election**:添加分布式锁机制(需代码实现),避免多实例冲突。
#### **3. 日志与监控**
- **查看日志**:
```bash
kubectl logs -l app=my-controller --tail=100
```
- **Prometheus 监控**:暴露指标端点,配置 ServiceMonitor。
---
### **完整部署流程示意图**
```mermaid
graph TD
A[编写控制器代码] --> B[构建 Docker 镜像]
B --> C[推送镜像到仓库]
C --> D[定义 RBAC 权限]
D --> E[创建 Deployment]
E --> F[控制器监听 CRD]
F --> G[处理资源变更]
```
---
### **常见问题排查**
#### **1. 控制器未启动**
- **检查项**:
```bash
kubectl describe pod/my-controller-xxx # 查看 Events 错误信息
kubectl logs pod/my-controller-xxx # 检查启动日志
```
- **可能原因**:
- 镜像拉取失败(检查镜像路径和权限)
- RBAC 权限不足(检查 ClusterRoleBinding)
#### **2. 控制器不响应 CRD 变更**
- **检查项**:
```bash
kubectl get crd/myapps.example.com # 确认 CRD 已注册
kubectl get myapp # 确认 CR 实例存在
```
- **可能原因**:
- 控制器代码未正确处理事件
- RBAC 缺少 `watch` 权限
---
### **总结**
在 Kubernetes 中启动控制器的核心步骤为:**容器化代码 → 配置 RBAC → 部署为 Pod**。通过 Deployment 管理控制器生命周期,结合 RBAC 确保最小权限原则。实际生产环境中,还需添加就绪探针、资源限制和监控告警策略。
以下是 **Kubernetes 自定义控制器(Custom Controller)** 的完整实现示例与详细解释,涵盖 CRD 定义、控制器逻辑、部署流程及核心原理:
---
### **一、完整示例架构**
```mermaid
graph TD
A[定义 CRD] --> B[编写控制器]
B --> C[容器化控制器]
C --> D[部署到集群]
D --> E[用户创建 CR 实例]
E --> F[控制器自动创建资源]
```
---
### **二、完整代码与配置**
#### **1. 定义 CRD(Custom Resource Definition)**
```yaml
# myapp-crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: myapps.example.com
spec:
group: example.com
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
replicas:
type: integer
minimum: 1
image:
type: string
scope: Namespaced
names:
plural: myapps
singular: myapp
kind: MyApp
shortNames:
- ma
```
#### **2. 控制器代码(Python)**
```python
# controller.py
from kubernetes import client, config, watch
import time
# 加载 Kubernetes 配置
config.load_incluster_config() # 集群内使用
api = client.CustomObjectsApi()
apps_api = client.AppsV1Api()
# CRD 参数
GROUP = "example.com"
VERSION = "v1alpha1"
PLURAL = "myapps"
def create_deployment(name, namespace, replicas, image):
"""根据 MyApp 资源创建 Deployment"""
deployment = client.V1Deployment(
metadata=client.V1ObjectMeta(name=f"{name}-deploy"),
spec=client.V1DeploymentSpec(
replicas=replicas,
selector=client.V1LabelSelector(match_labels={"app": name}),
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={"app": name}),
spec=client.V1PodSpec(
containers=[
client.V1Container(
name="main",
image=image,
ports=[client.V1ContainerPort(container_port=80)]
)
)
)
)
)
apps_api.create_namespaced_deployment(namespace, deployment)
def update_status(name, namespace, status):
"""更新 MyApp 状态"""
patch = {"status": {"phase": status}}
api.patch_namespaced_custom_object_status(
GROUP, VERSION, namespace, PLURAL, name, patch)
def reconcile(obj):
"""调和循环核心逻辑"""
try:
spec = obj.get("spec", {})
meta = obj.get("metadata", {})
name = meta.get("name")
namespace = meta.get("namespace")
replicas = spec.get("replicas", 1)
image = spec.get("image", "nginx:alpine")
# 检查 Deployment 是否存在
deploy_name = f"{name}-deploy"
apps_api.read_namespaced_deployment(deploy_name, namespace)
update_status(name, namespace, "Ready")
except client.ApiException as e:
if e.status == 404:
create_deployment(name, namespace, replicas, image)
update_status(name, namespace, "Created")
else:
print(f"Error: {e}")
def main():
"""主监听循环"""
w = watch.Watch()
while True:
try:
for event in w.stream(
api.list_cluster_custom_object,
group=GROUP,
version=VERSION,
plural=PLURAL
):
print(f"Event: {event['type']} {event['object']['metadata']['name']}")
reconcile(event['object'])
except Exception as e:
print(f"Watch error: {e}")
time.sleep(5)
if __name__ == "__main__":
main()
```
#### **3. 容器化 Dockerfile**
```dockerfile
# Dockerfile
FROM python:3.9-slim
RUN pip install kubernetes
COPY controller.py /app/
CMD ["python", "/app/controller.py"]
```
#### **4. RBAC 权限配置**
```yaml
# rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: myapp-controller-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: myapp-controller-role
rules:
- apiGroups: ["example.com"]
resources: ["myapps"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["*"]
- apiGroups: [""]
resources: ["pods", "events"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: myapp-controller-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: myapp-controller-role
subjects:
- kind: ServiceAccount
name: myapp-controller-sa
namespace: default
```
#### **5. 控制器 Deployment**
```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-controller
spec:
replicas: 1
selector:
matchLabels:
app: myapp-controller
template:
metadata:
labels:
app: myapp-controller
spec:
serviceAccountName: myapp-controller-sa
containers:
- name: controller
image: your-registry/my-controller:v1
imagePullPolicy: Always
```
---
### **三、部署与测试步骤**
#### **1. 部署所有组件**
```bash
# 应用 CRD
kubectl apply -f myapp-crd.yaml
# 应用 RBAC
kubectl apply -f rbac.yaml
# 构建并推送控制器镜像
docker build -t your-registry/my-controller:v1 .
docker push your-registry/my-controller:v1
# 部署控制器
kubectl apply -f deployment.yaml
# 验证控制器运行
kubectl get pods -l app=myapp-controller
```
#### **2. 创建自定义资源实例**
```yaml
# myapp-instance.yaml
apiVersion: example.com/v1alpha1
kind: MyApp
metadata:
name: demo-app
spec:
replicas: 3
image: nginx:1.23
```
```bash
kubectl apply -f myapp-instance.yaml
```
#### **3. 验证结果**
```bash
# 查看 MyApp 状态
kubectl get myapp -o wide
# 检查生成的 Deployment
kubectl get deployments -l app=demo-app
# 查看事件日志
kubectl describe myapp demo-app
```
---
### **四、核心原理详解**
#### **1. 调和循环(Reconciliation Loop)**
- **监听机制**:通过 Kubernetes Watch API 实时监听 MyApp 资源变更
- **状态对比**:比较 `spec`(期望状态)与当前集群实际状态
- **执行操作**:创建/更新/删除关联资源(如 Deployment)
- **状态更新**:将操作结果写入 `status` 字段
#### **2. 事件驱动流程**
```mermaid
sequenceDiagram
participant User
participant APIServer
participant Controller
participant Deployment
User->>APIServer: 创建 MyApp 实例
APIServer->>Controller: 发送 ADDED 事件
Controller->>APIServer: 查询当前状态
Controller->>Deployment: 创建新 Deployment
Controller->>APIServer: 更新 MyApp 状态
```
#### **3. 关键设计模式**
- **声明式 API**:用户只需声明目标状态
- **水平扩展**:通过多个控制器副本实现高可用
- **幂等操作**:重复执行调和操作不会产生副作用
---
### **五、进阶扩展建议**
#### **1. 增强功能**
- **Finalizers**:处理资源删除时的清理操作
```python
metadata:
finalizers: ["example.com/finalizer"]
```
- **状态子资源**:分离 `status` 字段更新权限
```yaml
subresources:
status: {}
```
- **版本迁移**:实现 CRD 多版本转换
#### **2. 生产优化**
- **Leader Election**:多副本控制器选主
```python
from kubernetes.client import CoordinationV1Api
lease = CoordinationV1Api().create_namespaced_lease(...)
```
- **指标暴露**:集成 Prometheus 监控
```python
from prometheus_client import start_http_server
start_http_server(8000)
```
- **资源限制**:配置 Pod 资源请求/限制
```yaml
resources:
limits:
cpu: 500m
memory: 256Mi
```
---
### **六、故障排查指南**
| **现象** | **检查步骤** |
|-------------------------|-----------------------------------------------------------------------------|
| 控制器无法启动 | 1. 检查镜像地址是否正确<br>2. 查看 Pod 日志 `kubectl logs <pod-name>` |
| MyApp 状态未更新 | 1. 确认 RBAC 权限配置<br>2. 检查控制器是否处理事件<br>3. 验证 CRD 是否注册 |
| Deployment 未创建 | 1. 查看控制器日志<br>2. 确认 ServiceAccount 绑定正确<br>3. 检查 API 版本兼容性 |
| 资源删除后未清理 | 1. 实现 Finalizers 逻辑<br>2. 监听 DELETE 事件 |
---
通过此完整示例,您可以在 Kubernetes 中实现从自定义资源定义到自动化运维的完整闭环。这种模式可广泛应用于数据库管理、CI/CD 流水线、多云编排等复杂场景,真正实现基础设施即代码(IaC)。