1 问题现象
1.1 数据库集群是单节多副本集群
clickhouse-prd-ClickHousefvLb.mrs-feq4.com :) select * from system.clusters; SELECT * FROM system.clusters Query id: b77fa863-893c-475f-b93a-71f73d790b27 ┌─cluster─────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────────────────────────────────┬─host_address───┬─port─┬─is_local─┬─user─┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐ │ default_cluster │ 1 │ 1 │ 1 │ clickhouse-prd-ClickHouseOgKV.mrs-feq4.com │ 192.168.23.26 │ 9440 │ 0 │ │ │ 0 │ 0 │ │ default_cluster │ 1 │ 1 │ 2 │ clickhouse-prd-ClickHousefvLb.mrs-feq4.com │ 192.168.23.247 │ 9440 │ 0 │ │ │ 0 │ 0 │ └─────────────────┴───────────┴──────────────┴─────────────┴────────────────────────────────────────────┴────────────────┴──────┴──────────┴──────┴──────────────────┴──────────────┴─────────────────────────┘ 2 rows in set. Elapsed: 0.034 sec.
1.2 其中192.168.23.247 replica_num 2 大量truncate操作卡住
2 问题分析
2.1 查看数据库是否有未完成的合并(merge)或变更(mutation)任务
2.2 查看zookeeper信息
[zk: 192.168.23.241:2181(CONNECTED) 61] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx [block_numbers, blocks, columns, leader_election, log, metadata, mutations, nonincrement_block_numbers, quorum, replicas, temp] [zk: 192.168.23.241:2181(CONNECTED) 62] [zk: 192.168.23.241:2181(CONNECTED) 62] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/block_numbers [all] [zk: 192.168.23.241:2181(CONNECTED) 63] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/blocks [] [zk: 192.168.23.241:2181(CONNECTED) 64] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/replicas [1, 2] [zk: 192.168.23.241:2181(CONNECTED) 65] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/log [log-0000315340, log-0000315341, log-0000315342, log-0000315343, log-0000315344, log-0000315345, log-0000315346, log-0000315347, log-0000315348, log-0000315349] [zk: 192.168.23.241:2181(CONNECTED) 66] [zk: 192.168.23.241:2181(CONNECTED) 66] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/nonincrement_block_numbers [] [zk: 192.168.23.241:2181(CONNECTED) 67] ls /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/xxx/mutations []
3 问题处理
3.1 zk处理
1. 清理 ZooKeeper 操作日志
步骤 1.1:删除 /log
节点
# 进入 ZooKeeper 客户端
deleteall /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/log
步骤 1.2:重置 /block_numbers
# 清理残留的块编号(避免冲突)
deleteall /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/block_numbers
2. 检查并修复副本状态
步骤 2.1:确认副本是否活跃
检查副本 2(对应分片 2
)的 is_active
状态:
get /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/replicas/2/is_active
-
如果返回
0
,手动激活副本:set /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/replicas/2/is_active 1
步骤 2.2:重置副本队列
# 删除副本的队列数据(强制重新同步)
rmr /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/replicas/2/queue
3. 强制重置表元数据
步骤 3.1:删除表 ZooKeeper 路径
rmr /clickhouse/tables/27/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local
步骤 3.2:在副本 2 上删除本地数据
# 停止 ClickHouse
sudo service clickhouse-server stop
# 删除数据目录(路径根据实际配置调整)
rm -rf /srv/BigData/data1/clickhouse/data/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local/
# 重启服务
sudo service clickhouse-server start
4. 重建表结构
在副本 2 上执行以下 SQL:
-- 删除表(如果存在)
DROP TABLE IF EXISTS app.app_yk_dws_zt_v_settlement_mb_inv_da_new_local SYNC;
-- 重建表(使用原始表结构)
CREATE TABLE app.app_yk_dws_zt_v_settlement_mb_inv_da_new_local (
`setl_id` String COMMENT '原类型:character varying',
-- 其他字段省略,需按实际结构填写
)
ENGINE = ReplicatedReplacingMergeTree(
'/clickhouse/tables/{shard}/d5252602-4936-4531-83c4-6e28d372b8de/app/app_yk_dws_zt_v_settlement_mb_inv_da_new_local',
'{replica}'
)
ORDER BY setl_id;
5. 异步执行 TRUNCATE
避免使用 SYNC
模式:
TRUNCATE TABLE app.app_yk_dws_zt_v_settlement_mb_inv_da_new_local; -- 异步模式
6. 升级 ClickHouse 版本
当前版本 21.3.4.25
存在复制表同步缺陷,升级到 22.3+
:
# 下载新版 RPM 包
curl -O https://packages.clickhouse.com/rpm/stable/clickhouse-server-22.3.2.1.noarch.rpm
# 执行升级
sudo rpm -Uvh clickhouse-server-22.3.2.1.noarch.rpm
# 重启服务
sudo service clickhouse-server restart
3.2 验证步骤