性能问题 -- SuperInsight 纳管节点失败

问题:没有明显的exception,通过日志时间查看,超时严重。

定位:
1、查看异常日志、查看线程日志

2、查看系统资源

  1. 获取uuid
    cat webservice.log |grep "collect start: manager:3"|grep GdbClusterCollectTask

  2. 过滤uuid
    cat webservice.log |grep uuid |grep -E "WARN|ERROR"

cat webservice.log|grep -i "manager: 3"|grep -E "WARN|ERROR"

cat webservice.log|grep -i "manager 3"|grep -E "WARN|ERROR"

(下面视频你)

  1. 获取uuid
    cat webservice.log |grep "cluster collect start: manager:3"
  2. 过滤uuid
    cat webservice.log |grep uuid > uuid.log
  3. 查看第一页日志 和 最后一页日志

解决:
1、发现纳管的gdb管理节点所在的机器,cpu 99% ------ 可能会导致接口响应变慢,拖累si的线程,触发timeout上限。
2、代码里存在for循环里迭代Rest接口,导致累加

/**
     * 获取租户 IP 信息(带超时控制)
     * @param timeout 超时时间(毫秒)
     */
    public String getTenancyIpInfo(TenancyUserInfo tenancy, GdbManager gdbManager, long timeout) {
        if (tenancy == null || tenancy.getClusterId() == null) {
            log.debug("Skip null tenancy or tenancy without clusterId");
            return "";
        }

        String clusterId = tenancy.getClusterId().toString();
        ExecutorService executor = Executors.newSingleThreadExecutor(
                new ThreadFactoryBuilder().setNameFormat("tenancy-ip-%d").build()
        );

        try {
            CompletableFuture<String> future = CompletableFuture.supplyAsync(
                    RequestIdMdcUtil.wraps(() -> doGetTenancyIpInfo(tenancy, gdbManager, clusterId)),
                    executor
            );

            return future.get(timeout, TimeUnit.SECONDS);

        } catch (TimeoutException e) {
            log.warn("Get tenancy IP timeout for clusterId={}, timeout={}ms", clusterId, timeout);
            return "";
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            log.error("Get tenancy IP interrupted for clusterId={}", clusterId, e);
            return "";
        } catch (ExecutionException e) {
            log.error("Get tenancy IP failed for clusterId={}", clusterId, e.getCause());
            return "";
        } finally {
            executor.shutdown();
            try {
                if (!executor.awaitTermination(1, TimeUnit.SECONDS)) {
                    executor.shutdownNow();
                }
            } catch (InterruptedException e) {
                executor.shutdownNow();
                Thread.currentThread().interrupt();
            }
        }
    }

    /**
     * 实际执行逻辑(抽离便于测试)
     */
    String doGetTenancyIpInfo(TenancyUserInfo tenancy, GdbManager gdbManager, String clusterId) {
        List<DnInfo> dnInfos = getDnInfos(gdbManager, clusterId);
        List<CnInfo> cnInfos = getCnInfos(gdbManager, clusterId);

        DnInfo masterDb = getMasterIp(dnInfos, tenancy.getInstanceType());
        if (masterDb == null || Strings.isBlank(masterDb.getDbIp()) || masterDb.getDbPort() == null) {
            log.warn("Failed to get master node for clusterId={}, instanceType={}",
                    clusterId, tenancy.getInstanceType());
            return "";
        }

        List<String> tenantIpsList = buildTenantIpsList(masterDb, dnInfos, cnInfos);
        return String.join(";", tenantIpsList);
    }

posted @ 2026-03-11 09:51  静水深耕,云停风驻  阅读(0)  评论(0)    收藏  举报