1 问题现象

1.1 数据库集群是单节多副本集群

clickhouse-prd-ClickHousefvLb.mrs-feq4.com :) select * from system.clusters;

SELECT *
FROM system.clusters

Query id: b77fa863-893c-475f-b93a-71f73d790b27

┌─cluster─────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────────────────────────────────┬─host_address───┬─port─┬─is_local─┬─user─┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ default_cluster │         111 │ clickhouse-prd-ClickHouseOgKV.mrs-feq4.com │ 192.168.23.2694400 │      │                  │            00 │
│ default_cluster │         112 │ clickhouse-prd-ClickHousefvLb.mrs-feq4.com │ 192.168.23.24794400 │      │                  │            00 │
└─────────────────┴───────────┴──────────────┴─────────────┴────────────────────────────────────────────┴────────────────┴──────┴──────────┴──────┴──────────────────┴──────────────┴─────────────────────────┘

2 rows in set. Elapsed: 0.034 sec.

1.2 其中192.168.23.247  replica_num  2 大量truncate操作卡住

2 问题分析

2.1 查看数据库是否有未完成的合并(merge)或变更(mutation)任务

2.2 查看zookeeper信息

[zk: 192.168.23.241:2181(CONNECTED) 61] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx
[block_numbers, blocks, columns, leader_election, log, metadata, mutations, nonincrement_block_numbers, quorum, replicas, temp]
[zk: 192.168.23.241:2181(CONNECTED) 62]
[zk: 192.168.23.241:2181(CONNECTED) 62] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/block_numbers
[all]
[zk: 192.168.23.241:2181(CONNECTED) 63] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/blocks
[]
[zk: 192.168.23.241:2181(CONNECTED) 64] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/replicas
[1, 2]
[zk: 192.168.23.241:2181(CONNECTED) 65] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/log
[log-0000315340, log-0000315341, log-0000315342, log-0000315343, log-0000315344, log-0000315345, log-0000315346, log-0000315347, log-0000315348, log-0000315349]
[zk: 192.168.23.241:2181(CONNECTED) 66]
[zk: 192.168.23.241:2181(CONNECTED) 66] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/nonincrement_block_numbers
[]
[zk: 192.168.23.241:2181(CONNECTED) 67] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/mutations
[]

3 问题处理

3.1 zk处理

1. 清理 ZooKeeper 操作日志

步骤 1.1:删除 /log 节点

# 进入 ZooKeeper 客户端
deleteall /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/log

步骤 1.2:重置 /block_numbers

# 清理残留的块编号(避免冲突)
deleteall /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/block_numbers

2. 检查并修复副本状态

步骤 2.1:确认副本是否活跃

检查副本 2(对应分片 2)的 is_active 状态:

get /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/replicas/2/is_active
  • 如果返回 0,手动激活副本:

    set /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/replicas/2/is_active 1

步骤 2.2:重置副本队列

# 删除副本的队列数据(强制重新同步)
rmr /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/replicas/2/queue

3. 强制重置表元数据

步骤 3.1:删除表 ZooKeeper 路径

rmr /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local

步骤 3.2:在副本 2 上删除本地数据

# 停止 ClickHouse
sudo service clickhouse-server stop

# 删除数据目录(路径根据实际配置调整)
rm -rf /srv/BigData/data1/clickhouse/data/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/

# 重启服务
sudo service clickhouse-server start

4. 重建表结构

在副本 2 上执行以下 SQL:

-- 删除表(如果存在)
DROP TABLE IF EXISTS app.app_yk_dws_zt_v_settlement_mb_inv_da_new_local SYNC;

-- 重建表(使用原始表结构)
CREATE TABLE app.app_yk_dws_zt_v_settlement_mb_inv_da_new_local (
    `setl_id` String COMMENT '原类型:character varying',
    -- 其他字段省略,需按实际结构填写
) 
ENGINE = ReplicatedReplacingMergeTree(
  '/clickhouse/tables/{shard}/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local', 
  '{replica}'
) 
ORDER BY setl_id;

5. 异步执行 TRUNCATE

避免使用 SYNC 模式:

TRUNCATE TABLE app.app_yk_dws_zt_v_settlement_mb_inv_da_new_local; -- 异步模式

6. 升级 ClickHouse 版本

当前版本 21.3.4.25 存在复制表同步缺陷,升级到 22.3+

# 下载新版 RPM 包
curl -O https://packages.clickhouse.com/rpm/stable/clickhouse-server-22.3.2.1.noarch.rpm

# 执行升级
sudo rpm -Uvh clickhouse-server-22.3.2.1.noarch.rpm

# 重启服务
sudo service clickhouse-server restart

 

3.2 验证步骤

 

 posted on 2025-05-21 10:59  xibuhaohao  阅读(65)  评论(0)    收藏  举报