Docker + CentOS 部署 Zookeeper 集群 + Kubernetes Operator 自动化运维方案

环境说明

主机 IP主机名节点角色数据目录Kubernetes 节点标签
192.168.10.100 zk1 Master /opt/zookeeper/data zk-cluster=true
192.168.10.101 zk2 Worker /opt/zookeeper/data zk-cluster=true
192.168.10.102 zk3 Worker /opt/zookeeper/data zk-cluster=true
192.168.10.103 zk4 Worker /opt/zookeeper/data zk-cluster=true
192.168.10.104 zk5 Worker /opt/zookeeper/data zk-cluster=true

一、基础环境部署(所有节点)

1. 系统配置

bash
 
# 设置主机名
sudo hostnamectl set-hostname zk1  # 分别在每台机器执行对应主机名

# 编辑hosts文件
sudo tee -a /etc/hosts <<EOF
192.168.10.100 zk1
192.168.10.101 zk2
192.168.10.102 zk3
192.168.10.103 zk4
192.168.10.104 zk5
EOF

# 关闭SELinux
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

# 优化内核参数
sudo tee -a /etc/sysctl.conf <<EOF
net.core.somaxconn=65535
net.ipv4.tcp_max_syn_backlog=65535
vm.swappiness=1
EOF
sudo sysctl -p

2. Docker 安装

bash
 
# 安装依赖
sudo dnf install -y yum-utils device-mapper-persistent-data lvm2

# 添加Docker仓库
sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo

# 安装Docker
sudo dnf install -y docker-ce docker-ce-cli containerd.io

# 配置Docker
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<EOF
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  }
}
EOF

# 启动Docker
sudo systemctl start docker
sudo systemctl enable docker

3. Kubernetes 组件安装

bash
 
# 禁用Swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# 安装kubeadm/kubelet/kubectl
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

sudo dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
sudo systemctl enable --now kubelet

# 初始化Master节点 (仅在zk1执行)
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 \
  --control-plane-endpoint="zk1:6443" \
  --upload-certs \
  --apiserver-advertise-address=192.168.10.100

# 配置kubectl (在zk1执行)
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 安装网络插件
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

# 加入Worker节点 (在zk2-zk5执行)
# 使用kubeadm init输出的join命令
kubeadm join zk1:6443 --token <token> --discovery-token-ca-cert-hash <hash>

二、Zookeeper Operator 部署

1. 安装 Zookeeper Operator

bash
 
# 创建命名空间
kubectl create ns zookeeper-operator

# 部署Operator
kubectl apply -f https://raw.githubusercontent.com/pravega/zookeeper-operator/master/deploy/all_ns/rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/pravega/zookeeper-operator/master/deploy/all_ns/operator.yaml

# 验证Operator状态
kubectl get pods -n zookeeper-operator

2. 创建 Zookeeper 集群 CRD

zookeeper-cluster.yaml:

yaml
 
apiVersion: zookeeper.pravega.io/v1beta1
kind: ZookeeperCluster
metadata:
  name: zookeeper-cluster
  namespace: default
spec:
  replicas: 5
  image:
    repository: zookeeper
    tag: 3.8.0
  persistence:
    storageClassName: local-storage
    volumeReclaimPolicy: Retain
    size: 20Gi
  config:
    initLimit: 15
    syncLimit: 5
    tickTime: 2000
    autopurge:
      snapRetainCount: 10
      purgeInterval: 24
  pod:
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - zookeeper
          topologyKey: kubernetes.io/hostname
    nodeSelector:
      zk-cluster: "true"
    securityContext:
      runAsUser: 1000
      fsGroup: 1000
    resources:
      requests:
        memory: "2Gi"
        cpu: "1"
      limits:
        memory: "4Gi"
        cpu: "2"
  security:
    enable: true
    jaasConfig:
      secretRef: zk-jaas-secret
    tlsConfig:
      enable: true
      secretRef: zk-tls-secret
  metrics:
    enable: true
    port: 7000

3. 创建安全配置

bash
 
# JAAS 认证配置
kubectl create secret generic zk-jaas-secret \
  --from-literal=jaas-config="Server {
    org.apache.zookeeper.server.auth.DigestLoginModule required
    user_admin=\"adminpassword\"
    user_appuser=\"apppassword\";
};"

# TLS 证书配置
# (提前生成keystore.jks)
kubectl create secret generic zk-tls-secret \
  --from-file=keystore.jks=keystore.jks \
  --from-literal=keystore-password=changeit

4. 创建存储类

local-storage.yaml:

yaml
 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

5. 部署集群

bash
 
kubectl apply -f local-storage.yaml
kubectl apply -f zookeeper-cluster.yaml

# 查看集群状态
kubectl get zookeepercluster
kubectl get pods -l app=zookeeper

三、自动化运维功能实现

1. 自动扩缩容

bash
 
# 水平扩展
kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"replicas":7}}'

# 垂直扩容
kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"pod":{"resources":{"limits":{"memory":"8Gi"}}}}}'

2. 自动备份与恢复

zk-backup-job.yaml:

yaml
 
apiVersion: batch/v1
kind: CronJob
metadata:
  name: zk-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: zookeeper:3.8.0
            command: ["/bin/sh", "-c"]
            args:
              - |
                echo "Connecting to ${ZK_SERVER}"
                echo "savemn" | nc ${ZK_SERVER} 2181
                tar czf /backup/$(date +%Y%m%d).tar.gz -C /data .
            volumeMounts:
            - name: backup-volume
              mountPath: /backup
            - name: data-volume
              mountPath: /data
          restartPolicy: OnFailure
          volumes:
          - name: backup-volume
            persistentVolumeClaim:
              claimName: zk-backup-pvc
          - name: data-volume
            persistentVolumeClaim:
              claimName: $(ZK_PVC)

3. 自动监控告警

prometheus-monitoring.yaml:

yaml
 
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: zookeeper-monitor
spec:
  selector:
    matchLabels:
      app: zookeeper
  endpoints:
  - port: metrics
    interval: 15s
  namespaceSelector:
    any: true

4. 自动证书轮换

bash
 
# 证书更新后滚动重启
kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"tlsConfig":{"certUpdated":true}}}'

四、安全合规与灾备

1. 安全加固

yaml
 
# 在CRD中增加安全配置
spec:
  security:
    enable: true
    jaasConfig:
      secretRef: zk-jaas-secret
    tlsConfig:
      enable: true
      secretRef: zk-tls-secret
    networkPolicy:
      enabled: true
      allowedClients:
      - 192.168.10.0/24

2. 跨集群灾备

yaml
 
apiVersion: zookeeper.pravega.io/v1beta1
kind: ZookeeperCluster
metadata:
  name: zookeeper-dr
spec:
  replicas: 3
  config:
    # 配置为观察者模式
    peerType: observer
    # 连接主集群
    initConfig: |
      server.1=zk1:2888:3888:participant;2181
      server.2=zk2:2888:3888:participant;2181
      server.3=zk3:2888:3888:participant;2181
      server.4=dr-zk1:2888:3888:observer;2181
      server.5=dr-zk2:2888:3888:observer;2181
      server.6=dr-zk3:2888:3888:observer;2181

五、日常运维操作

1. 集群状态检查

bash
 
# 查看集群状态
kubectl get zookeepercluster
kubectl describe zk zookeeper-cluster

# 检查节点角色
kubectl exec zookeeper-cluster-0 -- zkServer.sh status

2. 日志管理

bash
 
# 查看实时日志
kubectl logs -f zookeeper-cluster-0

# 日志归档配置 (Operator自动管理)

3. 配置热更新

bash
 
# 修改配置后触发更新
kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"config":{"tickTime":"3000"}}}'

六、扩展与升级

1. 集群升级流程

bash
 
# 滚动升级到新版本
kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"image":{"tag":"3.9.0"}}}'

# 监控升级进度
kubectl get pods -w -l app=zookeeper

2. 多集群管理

bash
 
# 部署多套Zookeeper集群
kubectl apply -f zookeeper-cluster-app1.yaml
kubectl apply -f zookeeper-cluster-app2.yaml

# 统一监控
kubectl apply -f zookeeper-global-monitor.yaml

七、备份与恢复方案

1. Velero 全集群备份

bash
 
# 安装Velero
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.0.0 \
  --bucket zk-backups \
  --secret-file ./credentials-velero \
  --use-restic

# 创建备份
velero backup create zk-full-backup --include-namespaces default --selector app=zookeeper

# 灾难恢复
velero restore create --from-backup zk-full-backup

2. 数据迁移

bash
 
# 使用zkTransfer工具
kubectl exec zookeeper-cluster-0 -- zkTransfer.sh \
  --source zk1:2181 \
  --target new-zk1:2181 \
  --path /critical_data \
  --parallel 8

运维检查清单

检查项频率命令/方法
集群健康状态 每日 kubectl get zk
节点资源使用率 每日 kubectl top pods
证书有效期检查 每月 keytool -list -v -keystore
备份恢复测试 每季度 Velero恢复演练
安全漏洞扫描 每月 Trivy扫描镜像
故障转移演练 每半年 模拟节点故障
性能压测 每年 ZK Benchmark工具

通过Kubernetes Operator实现Zookeeper集群的全生命周期自动化管理,结合Velero实现灾备,Prometheus实现监控,显著提升运维效率。生产环境建议:

  1. 使用HashiCorp Vault管理密钥

  2. 部署多可用区集群

  3. 集成OpenPolicyAgent进行策略管理

  4. 使用GitOps工作流(Argo CD)管理配置

 
posted @ 2025-07-08 02:00  Johny_Zhao  阅读(97)  评论(0)    收藏  举报