rook ceph部署手册(helm版本)

📚 Rook Ceph 完整部署指南 (1 Master + 2 Worker 架构)

概览:本文档专门针对 1 Master + 2 Worker 的 Kubernetes 架构设计,提供从零开始部署 Rook Ceph 存储集群的完整流程,涵盖前置准备、Operator 安装、集群创建到验证测试的全过程。


📋 部署流程概览

整个部署流程分为四个主要阶段,每个阶段都有明确的目标和验证步骤:

graph TD A[第0阶段:前置检查与准备] --> B[第1阶段:安装Rook Ceph Operator] B --> C[第2阶段:创建Ceph集群] C --> D[第3阶段:验证与访问] A --> A1[K8s集群检查] A --> A2[存储设备检查] A --> A3[Helm版本检查] A --> A4[lvm2安装] B --> B1[添加Helm仓库] B --> B2[创建命名空间] B --> B3[配置文件准备] B --> B4[安装Operator] B --> B5[验证运行状态] C --> C1[节点标签配置] C --> C2[配置文件准备] C --> C3[集群安装] C --> C4[启动监控] D --> D1[集群健康验证] D --> D2[存储类测试] D --> D3[Dashboard访问]

✅ 第 0 阶段:前置检查与准备

在开始部署前,请逐项完成以下检查,这是后续所有步骤成功的基础。

0.1 检查 Kubernetes 集群

kubectl get nodes

期望输出(示例):

NAME     STATUS   ROLES           AGE   VERSION
master   Ready    control-plane   15d   v1.28.0
worker1  Ready    worker          15d   v1.28.0
worker2  Ready    worker          15d   v1.28.0
  • 看到 3 个节点:1个 Master 控制平面节点 + 2个 Worker 节点,状态均为 Ready
  • 记下两个 Worker 节点的名字(例如 worker1, worker2),后面配置要用

0.2 检查所有节点的存储设备

在 Master 节点和两个 Worker 节点上分别执行:

sudo lsblk -f

期望状态(示例):
worker1worker2 节点上:

NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0   50G  0 disk
├─sda1   8:1    0   500M  0 part /boot/efi
└─sda2   8:2    0   49.5G 0 part /
sdb      8:16   0   100G  0 disk  ← 未格式化,用于 OSD

master 节点上:

NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0   50G  0 disk
├─sda1   8:1    0   500M  0 part /boot/efi
└─sda2   8:2    0   49.5G 0 part /
  • 两个 Worker 节点 上都有一块类似 /dev/sdb 的磁盘,且 FSTYPE 列为空
  • Master 节点不需要存储设备,可以没有 sdb

0.3 检查 Helm 版本

helm version

要求:版本为 3.x

0.4 在所有节点安装 lvm2 (建议)

在两个 Worker 节点上执行:

# OpenEuler / CentOS / RHEL
sudo yum install -y lvm2

# Ubuntu/Debian
sudo apt-get install -y lvm2

🚀 第 1 阶段:安装 Rook Ceph Operator

1.1 添加 Helm 仓库并更新

helm repo add rook-release https://charts.rook.io/release
helm repo update

1.2 创建命名空间

kubectl create namespace rook-ceph

1.3 准备 Ceph Operator 的 Values 配置文件

创建 rook-ceph-values.yaml 文件,内容如下:

# CSI 插件配置(使用国内镜像源)
csi:
  cephcsi:
    # -- Ceph CSI image repository
    repository: quay.io/cephcsi/cephcsi
    # -- Ceph CSI image tag
    tag: v3.14.1

  registrar:
    # -- Kubernetes CSI registrar image repository
    repository: registry.aliyuncs.com/google_containers/csi-node-driver-registrar
    # -- Registrar image tag
    tag: v2.13.0

  provisioner:
    # -- Kubernetes CSI provisioner image repository
    repository: registry.aliyuncs.com/google_containers/csi-provisioner
    # -- Provisioner image tag
    tag: v5.2.0

  snapshotter:
    # -- Kubernetes CSI snapshotter image repository
    repository: registry.aliyuncs.com/google_containers/csi-snapshotter
    # -- Snapshotter image tag
    tag: v8.2.1

  attacher:
    # -- Kubernetes CSI Attacher image repository
    repository: registry.aliyuncs.com/google_containers/csi-attacher
    # -- Attacher image tag
    tag: v4.8.1

  resizer:
    # -- Kubernetes CSI resizer image repository
    repository: registry.aliyuncs.com/google_containers/csi-resizer
    # -- Resizer image tag
    tag: v1.13.2

# 资源限制配置(生产环境建议调整)
resources:
  limits:
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 128Mi   

1.4 安装 Operator Chart

helm upgrade --install rook-ceph-operator rook-release/rook-ceph \
  --namespace rook-ceph \
  --create-namespace \
  --version v1.17.6 \
  -f rook-ceph-values.yaml

1.5 验证 Operator 运行

# 等待片刻,直到 Pod 状态变为 Running
kubectl -n rook-ceph get pods -l app=rook-ceph-operator

期望输出

NAME                                   READY   STATUS    RESTARTS   AGE
rook-ceph-operator-xxxxxxxx-yyyyy       1/1     Running   0          2m

🗂️ 第 2 阶段:创建 Ceph 集群 (核心配置)

2.1 为 Kubernetes 节点打上角色标签

根据调度策略,需要为节点添加标签:

# 1. 查看节点现有标签
kubectl get nodes --show-labels

# 2. 为两个 Worker 节点打上 worker 角色标签
# 假设你的节点名为 worker1 和 worker2,请根据实际情况修改
kubectl label nodes worker1 node-role.kubernetes.io/worker=true
kubectl label nodes worker2 node-role.kubernetes.io/worker=true

# 3. 再次确认标签
kubectl get nodes --show-labels

期望输出(示例):

NAME     STATUS   ROLES                  AGE   VERSION
master   Ready    control-plane,master   15d   v1.28.0
worker1  Ready    worker                 15d   v1.28.0
worker2  Ready    worker                 15d   v1.28.0
  • worker1worker2 节点应显示 node-role.kubernetes.io/worker=true
  • master 节点应显示 control-plane,master 角色

2.2 准备 Ceph 集群的 Values 配置文件

创建一个名为 rook-ceph-cluster-values.yaml 的文件,内容如下:

# ============================================
# Rook Ceph 集群 Helm Chart 定制化配置文件
# 版本:v1.17.6
# 适用环境:1 Master + 2 Worker 节点的 Kubernetes 集群
# 存储配置:每个 Worker 节点使用 /dev/sdb 作为 Ceph OSD 磁盘
# 启用功能:块存储 (RBD) + 共享文件系统 (CephFS)
# 注意:请将所有 <需要替换的项> 替换为实际值
# ============================================

# -- 主 Rook Operator 所在的命名空间(必须与安装operator时一致)
operatorNamespace: rook-ceph

# -- CephCluster 自定义资源的名称
clusterName: rook-ceph

# 启用 Ceph 工具箱,用于集群管理和故障排查
toolbox:
  enabled: true
  image: quay.io/ceph/ceph:v19.2.2  # 与集群版本保持一致

# 配置 Ingress 以通过域名访问 Ceph Dashboard
ingress:
  dashboard:
    enabled: true
    host:
      name: <hostname>  # 请确保此域名已解析到 Ingress Controller
      path: /
      pathType: Prefix
    ingressClassName: <ingressClassName>  # 必须与集群中已部署的 Ingress Controller 类型匹配,如nginx
    annotations:
      nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"  # Ceph Dashboard 使用 HTTPS

# ============================================
# Ceph 集群核心配置 (cephClusterSpec)
# 定义 Ceph 守护进程的基本部署参数
# ============================================
cephClusterSpec:
  # 指定 Ceph 容器镜像版本
  cephVersion:
    image: quay.io/ceph/ceph:v19.2.2  # 稳定版 Reef
    allowUnsupported: false

  # Rook 在宿主机上存储配置和数据的路径
  dataDirHostPath: /var/lib/rook

  # Monitor 配置:部署2个以实现仲裁(两个在Worker)
  # 注意:一般情况下最低部署3个(奇数,3,5等),因部署的k8s集群是2个worker节点,所以只部署2个
  mon:
    count: 2
    allowMultiplePerNode: false  # 禁止同一节点运行多个mon

  # Manager 配置:部署2个实现高可用
  mgr:
    count: 2
    allowMultiplePerNode: false

  # 启用 Ceph Dashboard
  dashboard:
    enabled: true
    ssl: true  # 启用 SSL 加密访问

  # 网络配置:使用 host 模式以获得最佳性能
  network:
    provider: host
    # 重要:host模式要求节点间Ceph端口可通
    # 需要放行端口:6789 (mon), 6800-7300 (osd), 8443/7000 (dashboard)

  # 调度策略:控制组件在节点上的分布
  placement:
    # Monitor:优先调度到Worker节点,Master作为备用
    mon:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
            - key: node-role.kubernetes.io/worker
              operator: In
              values: ["true"]
      tolerations:  # 新增容忍度
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule

    # OSD:必须运行在标记为worker的节点上
    osd:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: node-role.kubernetes.io/worker
              operator: In
              values: ["true"]

  # 存储配置:精确控制OSD在哪些节点的哪些磁盘上创建
  storage:
    useAllNodes: false      # 不自动使用所有节点
    useAllDevices: false    # 不自动使用所有设备
    deviceFilter: "sdb"     # 全局设备过滤规则
    nodes:
      # 重要:必须替换为你的实际Worker节点主机名
      - name: "<worker-node-1-hostname>"  # 替换为第一个Worker节点名
        devices:
          - name: "sdb"
      - name: "<worker-node-2-hostname>"  # 替换为第二个Worker节点名
        devices:
          - name: "sdb"
    config:
      databaseSizeMB: "1024"
      journalSizeMB: "1024"

  # 资源限制(可根据节点实际配置调整)
  resources:
    osd:
      limits:
        memory: "4Gi"
      requests:
        cpu: "200m"
        memory: "1Gi"
    mgr:
      limits:
        memory: "1Gi"
      requests:
        cpu: "200m"
        memory: "512Mi"
    mon:
      limits:
        memory: "2Gi"
      requests:
        cpu: "200m"
        memory: "512Mi"

  cleanupPolicy:
    confirmation: ""  # 保持空值,不删除集群
    sanitizeDisks:
      method: complete  # 从 quick 改为 complete
      dataSource: zero
      iteration: 1
    allowUninstallWithVolumes: false

# ============================================
# 块存储 (RBD) 配置
# 提供 RWO (ReadWriteOnce) 卷,适合数据库等应用
# ============================================
# 若不启用,配置为 cephBlockPools: [] 即可
cephBlockPools:
  - name: ceph-blockpool
    spec:
      failureDomain: host       # 副本跨节点分布
      replicated:
        size: 2                 # 关键:2个副本,匹配2个OSD节点(默认3个)
        # 警告:size=2 意味着写入数据需要同时写入2个OSD
        # 一个节点故障时,数据仍可用但处于降级状态
    storageClass:
      enabled: true
      isDefault: false          # 建议不设为默认存储类
      name: rook-ceph-block
      reclaimPolicy: Retain
      allowVolumeExpansion: true
      volumeBindingMode: Immediate
      parameters:
        # (optional) mapOptions is a comma-separated list of map options.
        # For krbd options refer
        # https://docs.ceph.com/docs/latest/man/8/rbd/#kernel-rbd-krbd-options
        # For nbd options refer
        # https://docs.ceph.com/docs/latest/man/8/rbd-nbd/#options
        # mapOptions: lock_on_read,queue_depth=1024

        # (optional) unmapOptions is a comma-separated list of unmap options.
        # For krbd options refer
        # https://docs.ceph.com/docs/latest/man/8/rbd/#kernel-rbd-krbd-options
        # For nbd options refer
        # https://docs.ceph.com/docs/latest/man/8/rbd-nbd/#options
        # unmapOptions: force

        # RBD image format. Defaults to "2".
        imageFormat: "2"

        # RBD image features, equivalent to OR'd bitfield value: 63
        # Available for imageFormat: "2". Older releases of CSI RBD
        # support only the `layering` feature. The Linux kernel (KRBD) supports the
        # full feature complement as of 5.4
        imageFeatures: layering

        # These secrets contain Ceph admin credentials.
        csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
        csi.storage.k8s.io/provisioner-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
        csi.storage.k8s.io/controller-expand-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
        csi.storage.k8s.io/node-stage-secret-namespace: "{{ .Release.Namespace }}"
        # Specify the filesystem type of the volume. If not specified, csi-provisioner
        # will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
        # in hyperconverged settings where the volume is mounted on the same node as the osds.
        csi.storage.k8s.io/fstype: ext4

# ============================================
# 共享文件系统 (CephFS) 配置
# 提供 RWX (ReadWriteMany) 卷,适合共享存储场景
# ============================================
cephFileSystems:
  - name: ceph-filesystem
    spec:
      # 元数据池:存储文件系统元数据(目录结构、权限等)
      metadataPool:
        failureDomain: host
        replicated:
          size: 2               # 关键:2个副本,匹配2个OSD节点(默认3个)
      # 数据池:存储实际文件内容
      dataPools:
        - name: data-pool       # 显式命名,便于管理
          failureDomain: host
          replicated:
            size: 2             # 2个副本
      # 元数据服务器配置
      metadataServer:
        activeCount: 1          # 活动MDS数量(2节点环境建议1个)
        activeStandby: true     # 启用备用MDS,主备自动切换
        # 资源限制
        resources:
          limits:
            memory: "4Gi"
          requests:
            cpu: "200m"
            memory: "1Gi"
        priorityClassName: system-cluster-critical
    # 为CephFS创建对应的StorageClass
    storageClass:
      enabled: true
      isDefault: false
      name: rook-cephfs         # StorageClass名称
      pool: data-pool           # 使用上面定义的data-pool
      reclaimPolicy: Retain
      allowVolumeExpansion: true
      volumeBindingMode: Immediate
      parameters:
        # The secrets contain Ceph admin credentials.
        csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
        csi.storage.k8s.io/provisioner-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
        csi.storage.k8s.io/controller-expand-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
        csi.storage.k8s.io/node-stage-secret-namespace: "{{ .Release.Namespace }}"
        # Specify the filesystem type of the volume. If not specified, csi-provisioner
        # will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
        # in hyperconverged settings where the volume is mounted on the same node as the osds.
        csi.storage.k8s.io/fstype: ext4

# ============================================
# 对象存储 (RGW) 配置 - 本次暂不启用
# ============================================
cephObjectStores: []  # 空数组表示不启用

# ============================================
# 监控配置 - 按需启用
# ============================================
monitoring:
  enabled: false  # 如果已部署Prometheus,可设为true

# ============================================
# 其他高级配置
# ============================================
# 禁用Pod安全策略(除非集群要求)
pspEnable: false

# CSI驱动名前缀(通常无需修改)
csiDriverNamePrefix:

重要提示

  • 配置文件中的 name 字段(worker1, worker2)必须与你的节点 主机名 完全一致
  • 可以通过 kubectl get nodes 查看节点名称,确保名称匹配
  • 如果节点名称不是 worker1/worker2,请根据实际情况修改配置文件

2.3 安装 Ceph 集群 Chart

helm upgrade --install rook-ceph-cluster rook-release/rook-ceph-cluster \
  --create-namespace \
  --namespace rook-ceph \
  --version v1.17.6 \
  -f rook-ceph-cluster-values.yaml

2.4 监控集群启动过程

watch -n 5 'kubectl -n rook-ceph get pods'

这是一个关键步骤,需要耐心等待(通常5-15分钟)

期望看到以下 Pod 全部就绪(基于 1 Master + 2 Worker 架构):

NAME                                   READY   STATUS    RESTARTS   AGE
rook-ceph-mon-a-xxxxxxxx-yyyyy         1/1     Running   0          5m
rook-ceph-mon-b-xxxxxxxx-yyyyy         1/1     Running   0          5m
rook-ceph-mgr-a-xxxxxxxx-yyyyy         1/1     Running   0          4m
rook-ceph-mgr-b-xxxxxxxx-yyyyy         1/1     Running   0          4m
rook-ceph-osd-0-xxxxxxxx-yyyyy         1/1     Running   0          6m     # 在 worker1 上
rook-ceph-osd-1-xxxxxxxx-yyyyy         1/1     Running   0          6m     # 在 worker2 上
rook-ceph-tools-xxxxxxxx-yyyyy         1/1     Running   0          3m
  • 2 个 Monitor Podrook-ceph-mon-* 状态为 Running(分布在两个 Worker 节点上)
  • 2 个 Manager Podrook-ceph-mgr-* 状态为 Running
  • 2 个 OSD Podrook-ceph-osd-* 状态为 Running(分别在两个 Worker 节点上)
  • 1 个 Tools Podrook-ceph-tools-* 状态为 Running

✅ 第 3 阶段:验证与访问

3.1 验证集群健康状态

集群 Pod 就绪后,通过工具箱检查:

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status

成功标志

cluster:
  id:     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  health: HEALTH_OK

services:
  mon: 2 daemons, quorum a,b,c (age 5m)
  mgr: 2 daemons, quorum a,b (age 5m)
  osd: 2 osds: 2 up, 2 in

data:
  pools:   1 pools, 1 pgs
  objects: 0 objects, 0 B
  usage:   2.0 GiB used, 9.8 GiB / 12 GiB avail
  pgs:     1 active+clean

注意

  • health: HEALTH_OK(初始可能为 HEALTH_WARN,等待几分钟后应恢复)
  • osd: 2 osds: 2 up, 2 in
  • mon: 2 daemons

3.2 测试存储类

创建一个测试 PVC 来验证存储供应是否正常:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
spec:
  storageClassName: rook-ceph-block
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
EOF

# 查看PVC状态
kubectl get pvc test-pvc

期望输出:STATUS 为 Bound

3.3 访问 Ceph Dashboard

# 获取 Service 的 NodePort
kubectl -n rook-ceph get svc rook-ceph-mgr-dashboard

# 获取登录密码
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

访问方式

  • 使用浏览器访问 https://<你的任一节点IP>:<NodePort>
  • 用户名:admin
  • 密码:上面命令的输出

⚠️ 关键注意事项

安全性

  • 此流程主要关注功能部署
  • 生产环境务必配置网络安全策略
  • 考虑为 dataDirHostPath 使用更安全的路径
  • 妥善保管 Dashboard 密码

资源要求(针对 1 Master + 2 Worker 架构)

  • Master 节点:主要用于控制平面,不需要额外资源
  • Worker 节点:每个需要承载 1-2 个 Ceph 组件
    • worker1:通常部署 1 个 Monitor + 1 个 OSD
    • worker2:通常部署 1 个 Monitor + 1 个 OSD
  • 总配置:2 个 Mon、2 个 Mgr 和 2 个 OSD
  • 建议资源:每个 Worker 节点至少 4GB 可用内存,2 CPU 核心

故障排除

如果 Pod 长时间不健康,使用以下命令排查:

# 查看 Pod 详细信息
kubectl -n rook-ceph describe pod <pod-name>

# 查看 Pod 日志
kubectl -n rook-ceph logs <pod-name>

# 查看 Ceph 集群状态
kubectl -n rook-ceph get cephcluster -o yaml

📋 快速检查清单

部署前检查(1 Master + 2 Worker)

部署过程检查

验证检查


🔄 卸载流程

如需卸载 Rook Ceph,请按顺序执行:

一、重要安全警告

#!/bin/bash
# ============================================================================
# Rook Ceph 集群安全卸载脚本
# 警告: 此操作将永久删除所有 Ceph 集群数据,且不可恢复!
# 执行前请确保:
# 1. 所有重要数据已备份
# 2. 已停止所有使用 Rook 存储的业务应用
# 3. 已获授权在生产环境执行
# ============================================================================

二、卸载脚本

Rook 官方推荐的清理顺序
业务 PV/PVC → CEPHCLUSTER(带 CLEANUPPOLICY)→ OPERATOR → CRDS → NAMESPACE → 节点数据

2.1 定义变量(可根据实际情况修改)

# ========================== 可配置变量 ==========================
NAMESPACE="rook-ceph"
CLUSTER_NAME="rook-ceph"

# 如果你的 Helm release 名字不是下面这两个,按实际情况改:
HELM_RELEASE_CLUSTER="${NAMESPACE}-cluster"
HELM_RELEASE_OPERATOR="${NAMESPACE}-operator"

# 如果你在集群 CR 里改过 dataDirHostPath,这里要改成对应路径,默认是 /var/lib/rook
DATA_DIR_HOSTPATH_BASE="/var/lib/rook"

# ========================== 可配置变量 ==========================

2.2 清理业务资源(pv/pvc)

echo "=== 检查是否还有使用 Rook 存储的 PVC / PV ==="
PVC_COUNT=$(kubectl get pvc -A 2>/dev/null | grep -c -E "${NAMESPACE}|rook" || true)
PV_COUNT=$(kubectl get pv 2>/dev/null | grep -c -E "${NAMESPACE}|rook" || true)

echo "发现 $PVC_COUNT 个相关PVC, $PV_COUNT 个相关PV"

if [ "$PVC_COUNT" -gt 0 ] || [ "$PV_COUNT" -gt 0 ]; then
    echo "=== 详细列表 ==="
    [ "$PVC_COUNT" -gt 0 ] && kubectl get pvc -A | grep -E "${NAMESPACE}|rook"
    [ "$PV_COUNT" -gt 0 ] && kubectl get pv | grep -E "${NAMESPACE}|rook"

    echo ""
    echo "❌ 警告: 发现未清理的存储资源!正在自动清理..."

    # --- 自动删除 PVC ---
    if [ "$PVC_COUNT" -gt 0 ]; then
        echo "1. 删除所有 rook 相关 PVC..."
        kubectl get pvc -A --no-headers 2>/dev/null | \
            grep -E "${NAMESPACE}|rook" | \
            while read -r line; do
                ns=$(echo "$line" | awk '{print $1}')
                name=$(echo "$line" | awk '{print $2}')
                echo "   → kubectl delete pvc $name -n $ns"
                kubectl delete pvc "$name" -n "$ns" --ignore-not-found
            done
    fi

    # --- 自动删除 PV ---
    if [ "$PV_COUNT" -gt 0 ]; then
        echo "2. 删除所有 rook 相关 PV..."
        kubectl get pv --no-headers 2>/dev/null | \
            grep -E "${NAMESPACE}|rook" | \
            while read -r line; do
                pv_name=$(echo "$line" | awk '{print $1}')
                echo "   → kubectl delete pv $pv_name"
                kubectl delete pv "$pv_name" --ignore-not-found
            done
    fi

    sleep 2
    echo "✅ 自动清理完成"
fi

echo ""
echo "✅ 存储资源检查通过"
echo ""

2.3 删除 CephCluster

echo "=== 给 CephCluster 加 cleanupPolicy ==="
kubectl -n "${NAMESPACE}" patch cephcluster "${CLUSTER_NAME}" \
  --type=merge \
  -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}'

if [ $? -eq 0 ]; then
    echo "✅ cleanupPolicy 设置成功"
    echo "注意:现在可以安全地删除 CephCluster,数据将被自动清理"
else
    echo "❌ cleanupPolicy 设置失败"
    echo "可能原因:CephCluster 不存在或名称错误"
    kubectl -n "${NAMESPACE}" get cephcluster
    exit 1
fi

echo ""
echo "=== 删除 CephCluster CR ==="
kubectl -n "${NAMESPACE}" delete cephcluster "${CLUSTER_NAME}" --wait=false

echo ""
echo "=== 等待 CephCluster 被删除 ==="
echo "监控状态(最多 300 秒,Ctrl+C 可中断监控,不影响后续操作)"
timeout 300 kubectl -n "${NAMESPACE}" get cephcluster --watch 2>/dev/null || true

# 检查是否真的删除了
echo ""
echo "=== 确认 CephCluster 删除状态 ==="
if kubectl -n "${NAMESPACE}" get cephcluster "${CLUSTER_NAME}" 2>/dev/null; then
    echo "❌ CephCluster 仍然存在,可能卡在 Deleting 状态"
    echo "可以尝试等待自动清理,或使用非常规手段清理"
else
    echo "✅ CephCluster 已删除"
fi

2.4 检查并强制清理残留的 Ceph CR

echo ""
echo "=== 检查并强制清理残留的 Ceph CR ==="

# 首先检查所有 Ceph 资源状态
echo "1. 检查当前 Ceph 资源状态..."
CEPH_RESOURCES=$(kubectl get cephcluster,cephfilesystem,cephblockpool,cephobjectstore,cephnfs,cephrbdmirror -n "${NAMESPACE}" 2>&1 || true)

# 修正判断逻辑
if echo "$CEPH_RESOURCES" | grep -q -E "(No resources found|not found)" || [ -z "$(echo "$CEPH_RESOURCES" | grep -v "^NAME")" ]; then
    echo "✅ 所有 Ceph CR 已清理干净"
else
    echo "发现残留的 Ceph 资源:"
    echo "$CEPH_RESOURCES"
    echo ""

    echo "2. 分析资源删除状态..."

    # 获取所有资源的 JSON 格式数据
    RESOURCE_JSON=$(kubectl get cephcluster,cephfilesystem,cephblockpool,cephobjectstore -n "${NAMESPACE}" -o json 2>/dev/null || echo '{"items":[]}')

    # 检查是否有 deletionTimestamp(正在删除但卡住)
    DELETING_RESOURCES=$(echo "$RESOURCE_JSON" | \
        jq -r '.items[] | select(.metadata.deletionTimestamp != null) | "\(.kind)/\(.metadata.name)"' 2>/dev/null || true)

    if [ -n "$DELETING_RESOURCES" ] && [ "$DELETING_RESOURCES" != "" ]; then
        echo "以下资源卡在删除状态:"
        echo "$DELETING_RESOURCES"
        echo ""

        echo "3. 清理卡住资源的 finalizer..."
        for resource in $DELETING_RESOURCES; do
            echo "  清理: $resource"
            kubectl patch $resource -n "${NAMESPACE}" \
                -p '{"metadata":{"finalizers":[]}}' \
                --type=merge 2>/dev/null || true
        done
    else
        echo "3. 无正在删除的资源"
    fi

    # 检查未开始删除的资源
    NOT_DELETING_RESOURCES=$(echo "$RESOURCE_JSON" | \
        jq -r '.items[] | select(.metadata.deletionTimestamp == null) | "\(.kind)/\(.metadata.name)"' 2>/dev/null || true)

    if [ -n "$NOT_DELETING_RESOURCES" ] && [ "$NOT_DELETING_RESOURCES" != "" ]; then
        echo "以下资源尚未开始删除:"
        echo "$NOT_DELETING_RESOURCES"
        echo ""

        echo "4. 先删除这些资源..."
        for resource in $NOT_DELETING_RESOURCES; do
            echo "  删除: $resource"
            kubectl delete $resource -n "${NAMESPACE}" --wait=false 2>/dev/null || true
        done

        echo "等待5秒让删除开始..."
        sleep 5

        echo "5. 检查是否卡住并清理 finalizer..."
        for resource in $NOT_DELETING_RESOURCES; do
            # 检查资源是否存在
            if kubectl get $resource -n "${NAMESPACE}" 2>/dev/null | grep -q "Terminating" || \
               kubectl get $resource -n "${NAMESPACE}" 2>/dev/null | grep -q "Deleting"; then
                echo "  $resource 卡住,清理 finalizer..."
                kubectl patch $resource -n "${NAMESPACE}" \
                    -p '{"metadata":{"finalizers":[]}}' \
                    --type=merge 2>/dev/null || true
            fi
        done
    else
        echo "4. 无未开始删除的资源"
    fi

    echo ""
    echo "6. 最终检查..."
    sleep 3

    # 修正的最终检查逻辑
    FINAL_OUTPUT=$(kubectl get cephcluster,cephfilesystem,cephblockpool,cephobjectstore -n "${NAMESPACE}" 2>&1 || true)

    # 统计有效行数(排除标题行和错误信息)
    if echo "$FINAL_OUTPUT" | grep -q -E "(No resources found|not found)"; then
        FINAL_COUNT=0
    else
        # 统计非标题行
        FINAL_COUNT=$(echo "$FINAL_OUTPUT" | grep -v "^NAME" | grep -v "^error" | grep -c -E "^[a-zA-Z]" || echo 0)
    fi

    if [ "$FINAL_COUNT" -eq 0 ]; then
        echo "✅ Ceph 资源清理完成"
    else
        echo "⚠️  仍有 $FINAL_COUNT 个资源残留:"
        echo "$FINAL_OUTPUT"
        echo ""

        read -p "是否强制继续?(输入 'FORCE-CONTINUE' 确认): " CONFIRM
        if [[ "$CONFIRM" = "FORCE-CONTINUE" ]]; then
            echo "继续执行后续步骤..."
        else
            echo "❌ 用户中止操作"
            exit 1
        fi
    fi
fi
echo ""

2.5 Helm 卸载

echo ""
echo "=== Helm 卸载 cluster release ==="
if helm list -n "${NAMESPACE}" | grep -q "${HELM_RELEASE_CLUSTER}"; then
    helm uninstall "${HELM_RELEASE_CLUSTER}" -n "${NAMESPACE}" --wait
    echo "✅ Cluster release 卸载完成"
else
    echo "ℹ️  Cluster release 未找到(可能已卸载)"
fi

sleep 10

echo ""
echo "=== Helm 卸载 operator release ==="
if helm list -n "${NAMESPACE}" | grep -q "${HELM_RELEASE_OPERATOR}"; then
    helm uninstall "${HELM_RELEASE_OPERATOR}" -n "${NAMESPACE}" --wait
    echo "✅ Operator release 卸载完成"
else
    echo "ℹ️  Operator release 未找到(可能已卸载)"
fi

echo ""
echo "=== 等待 operator 相关资源清理 ==="
sleep 30

echo ""
echo "=== 删除 Rook Ceph 相关 CRDs ==="
CRD_COUNT=$(kubectl get crds 2>/dev/null | grep -c '\.ceph\.rook\.io' || echo "0")

if [ "$CRD_COUNT" -gt 0 ]; then
    echo "发现 $CRD_COUNT 个相关 CRD"
    kubectl get crds | awk '/\.ceph\.rook\.io/ {print $1}' | xargs -r kubectl delete crd --wait=false
    echo "✅ CRD 删除命令已发送"
else
    echo "ℹ️  未发现 Rook Ceph 相关 CRD"
fi

echo ""
echo "=== 删除 Rook Ceph 相关 configmap secrets==="
kubectl -n "${NAMESPACE}" patch configmap rook-ceph-mon-endpoints --type merge -p '{"metadata":{"finalizers": []}}'
kubectl -n "${NAMESPACE}" patch secrets rook-ceph-mon --type merge -p '{"metadata":{"finalizers": []}}'

echo ""
echo "=== 删除 namespace ==="
if kubectl get namespace "${NAMESPACE}" 2>/dev/null; then
    kubectl delete namespace "${NAMESPACE}" --wait=false
    echo "✅ Namespace 删除命令已发送"

    # 检查删除状态
    echo "等待命名空间删除(最多120秒)..."
    for i in {1..24}; do
        if ! kubectl get namespace "${NAMESPACE}" 2>/dev/null; then
            echo "✅ 命名空间已删除"
            break
        fi
        echo "等待中... ($((i*5))秒)"
        sleep 5
    done
else
    echo "ℹ️  命名空间 ${NAMESPACE} 不存在"
fi

三、清理节点上的残留数据(安全优化版)

3.1 安全警告

echo ""
echo "========================================================"
echo "               节点数据清理(手动步骤)                 "
echo "========================================================"
echo "注意:以下步骤需要在每个运行过 Rook 的节点上手动执行"
echo "建议先在一个测试节点验证,再推广到所有节点"
echo "========================================================"
echo ""

3.2 清理 dataDirHostPath 目录(安全版)

# 保存为独立脚本,复制到每个节点执行
cat > /tmp/cleanup_rook_node.sh << 'EOF'
#!/bin/bash
set -e

# 配置变量
DATA_DIR_HOSTPATH_BASE="/var/lib/rook"
DATA_DIR="${DATA_DIR_HOSTPATH_BASE}"

echo "=== 节点清理脚本开始执行 ==="
echo "主机名: $(hostname)"
echo "当前用户: $(whoami)"
echo ""

# 检查目录是否存在
if [ -d "${DATA_DIR}" ]; then
    echo "发现 Rook 数据目录: ${DATA_DIR}"
    echo "目录内容:"
    ls -la "${DATA_DIR}" || true
    echo ""

    # 确认删除
    read -p "是否删除此目录及其所有内容?(输入 'DELETE-NOW' 确认): " NODE_CONFIRM
    if [[ "$NODE_CONFIRM" = "DELETE-NOW" ]]; then
        echo "正在删除 ${DATA_DIR} ..."
        sudo rm -rf "${DATA_DIR}"
        echo "✅ 目录已删除"
    else
        echo "❌ 跳过目录删除"
    fi
else
    echo "ℹ️  未发现目录 ${DATA_DIR}"
fi
EOF

chmod +x /tmp/cleanup_rook_node.sh
echo "节点清理脚本已生成: /tmp/cleanup_rook_node.sh"
echo "请将此脚本复制到每个节点并执行"
echo ""

3.3 清理设备映射残留(安全版)

cat >> /tmp/cleanup_rook_node.sh << 'EOF'
echo ""
echo "=== 清理设备映射残留 ==="

# 检查并安全清理 /dev/mapper/ceph-*
CEPH_MAPPER_COUNT=$(ls /dev/mapper/ceph-* 2>/dev/null | wc -l || echo "0")
if [ "$CEPH_MAPPER_COUNT" -gt 0 ]; then
    echo "发现 $CEPH_MAPPER_COUNT 个 Ceph 设备映射"
    echo "列表:"
    ls -l /dev/mapper/ceph-* 2>/dev/null || true

    read -p "是否移除这些设备映射?(输入 'REMOVE-MAPPER' 确认): " MAPPER_CONFIRM
    if [[ "$MAPPER_CONFIRM" = "REMOVE-MAPPER" ]]; then
        for device in /dev/mapper/ceph-*; do
            if [ -e "$device" ]; then
                echo "  移除: $device"
                sudo dmsetup remove "$(basename $device)" 2>/dev/null || \
                echo "    警告: 移除失败(可能已被移除)"
            fi
        done
        echo "✅ 设备映射清理完成"
    fi
else
    echo "ℹ️  未发现 Ceph 设备映射"
fi

# 清理残留的符号链接
echo ""
echo "=== 清理残留符号链接 ==="
for dir in /dev/ceph-* /dev/mapper/ceph--*; do
    if [ -e "$dir" ]; then
        echo "删除: $dir"
        sudo rm -rf "$dir" 2>/dev/null || true
    fi
done
EOF

chmod +x /tmp/cleanup_rook_node.sh
echo "节点清理脚本已生成: /tmp/cleanup_rook_node.sh"
echo "请将此脚本复制到每个节点并执行"
echo ""

3.4 Zapping Devices(可选,安全版)

echo "正在生成节点清理脚本..."

cat > /tmp/cleanup_rook_node.sh << "EOF"
#!/bin/bash

echo ""
echo "=== 磁盘清理(可选,用于盘复用)==="
echo "⚠️  注意:以下操作会清空磁盘所有数据!"
echo ""
echo "当前磁盘信息:"
lsblk -o NAME,SIZE,TYPE,MOUNTPOINT,FSTYPE,MODEL | grep -v "loop"

echo ""
# 添加 -t 0 检查是否在交互式终端
if [ -t 0 ]; then
    read -p "是否需要清理特定磁盘以复用?(yes/NO): " ZAP_CONFIRM
else
    echo "非交互式终端,跳过磁盘清理"
    ZAP_CONFIRM="no"
fi

if [[ "$ZAP_CONFIRM" = "yes" ]]; then
    echo "请输入要清理的磁盘路径(如 /dev/sdb): "
    read DISK

    if [ ! -b "$DISK" ]; then
        echo "❌ 错误: $DISK 不是有效的块设备"
        exit 1
    fi

    echo ""
    echo "⚠️  即将清理磁盘: $DISK"
    echo "磁盘信息:"
    sudo fdisk -l "$DISK" | head -20

    read -p "确认清理此磁盘所有数据?(输入 'WIPE-DISK' 确认): " WIPE_CONFIRM
    if [[ "$WIPE_CONFIRM" = "WIPE-DISK" ]]; then
        echo "步骤1: 擦除文件系统签名..."
        sudo wipefs -a "$DISK" 2>/dev/null || true

        echo "步骤2: 清除分区表..."
        sudo sgdisk --zap-all "$DISK" 2>/dev/null || true

        echo "步骤3: 清除LVM/Ceph可能残留的元数据(官方方法)..."
        # 清除磁盘开头(可能包含MBR/分区表残留)
        sudo dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=0 2>/dev/null || true
        # 清除可能在1GB偏移处的LVM元数据
        sudo dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((1 * 1024**2)) 2>/dev/null || true
        # 清除可能在10GB偏移处的LVM元数据  
        sudo dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((10 * 1024**2)) 2>/dev/null || true
        # 清除可能在100GB偏移处的LVM元数据
        sudo dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((100 * 1024**2)) 2>/dev/null || true
        # 清除可能在1000GB偏移处的LVM元数据
        sudo dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((1000 * 1024**2)) 2>/dev/null || true

        echo "步骤4: 尝试 SSD 清理..."
        sudo blkdiscard "$DISK" 2>/dev/null || echo "blkdiscard 不支持(HDD正常)"

        echo "步骤5: 重新探测分区..."
        sudo partprobe "$DISK" 2>/dev/null || true

        echo "✅ 磁盘清理完成"
        echo "清理后状态:"
        sudo fdisk -l "$DISK" 2>/dev/null | head -5 || true
    else
        echo "❌ 磁盘清理已取消"
    fi
else
    echo "ℹ️  跳过磁盘清理"
fi

echo ""
echo "=== 节点清理完成 ==="
echo "建议: 如需完全清理,可重启节点以确保所有内核状态被清除"
EOF

# 添加执行权限
chmod +x /tmp/cleanup_rook_node.sh
echo "磁盘清理部分已添加到脚本"
echo "✅ 已添加执行权限"
echo ""
echo "========================================================"
echo "                    卸载脚本总结                        "
echo "========================================================"
echo "✅ 第1-2部分: 已完成K8s集群内资源清理"
echo "📋 第3部分: 节点数据清理需要手动操作"
echo ""
echo "下一步操作:"
echo "1. 将 /tmp/cleanup_rook_node.sh 复制到每个节点"
echo "2. 在节点上执行: sudo bash /tmp/cleanup_rook_node.sh"
echo "3. 按提示确认每个操作"
echo "4. 建议完成后重启所有节点"
echo "========================================================"

提示:如果磁盘仍被占用,重启节点通常能释放设备映射器锁。

3.5 可选:整盘写零(非常彻底,但非常慢)

# 非常慢,仅在前面的步骤仍然无法让 ceph-volume raw list 识别为"空盘"时使用
# sudo dd if=/dev/zero of="$DISK" bs=1M status=progress
# sync

四、验证卸载完成

# 1. 检查命名空间是否已删除
kubectl get namespace "${NAMESPACE}"

# 2. 检查是否还有相关的 CRD
kubectl get crds | grep ceph

# 3. 在节点上检查残留文件
echo "=== 检查节点上是否还有残留文件 ==="
ls -la /var/lib/rook 2>/dev/null || echo "没有 /var/lib/rook 目录"
ls -la /dev/mapper/ceph-* 2>/dev/null || echo "没有 /dev/mapper/ceph-* 设备"
ls -la /dev/ceph-* 2>/dev/null || echo "没有 /dev/ceph-* 设备"

📞 技术支持

如果在部署过程中遇到任何问题,请提供以下信息以便诊断:

  1. 错误信息和命令输出
  2. 集群状态:kubectl -n rook-ceph get all
  3. Pod 详细信息:kubectl -n rook-ceph describe pod <pod-name>
  4. 配置文件内容(脱敏敏感信息)

📚 参考文献与资源

官方文档


📝 版本信息

文档版本:v1.0
最后更新:2026年
适用版本

  • Kubernetes:v1.28-v1.33(推荐 v1.28+)
  • 架构:1 Master + 2 Worker
  • Rook Ceph:v1.17.6
  • Helm:3.x
posted @ 2025-07-16 09:54  怀恋小时候  阅读(639)  评论(0)    收藏  举报