性能问题 -- SuperInsight 纳管节点失败
问题:没有明显的exception,通过日志时间查看,超时严重。
定位:
1、查看异常日志、查看线程日志
2、查看系统资源
-
获取uuid
cat webservice.log |grep "collect start: manager:3"|grep GdbClusterCollectTask -
过滤uuid
cat webservice.log |grep uuid |grep -E "WARN|ERROR"
cat webservice.log|grep -i "manager: 3"|grep -E "WARN|ERROR"
cat webservice.log|grep -i "manager 3"|grep -E "WARN|ERROR"
(下面视频你)
- 获取uuid
cat webservice.log |grep "cluster collect start: manager:3" - 过滤uuid
cat webservice.log |grep uuid > uuid.log - 查看第一页日志 和 最后一页日志
解决:
1、发现纳管的gdb管理节点所在的机器,cpu 99% ------ 可能会导致接口响应变慢,拖累si的线程,触发timeout上限。
2、代码里存在for循环里迭代Rest接口,导致累加
/**
* 获取租户 IP 信息(带超时控制)
* @param timeout 超时时间(毫秒)
*/
public String getTenancyIpInfo(TenancyUserInfo tenancy, GdbManager gdbManager, long timeout) {
if (tenancy == null || tenancy.getClusterId() == null) {
log.debug("Skip null tenancy or tenancy without clusterId");
return "";
}
String clusterId = tenancy.getClusterId().toString();
ExecutorService executor = Executors.newSingleThreadExecutor(
new ThreadFactoryBuilder().setNameFormat("tenancy-ip-%d").build()
);
try {
CompletableFuture<String> future = CompletableFuture.supplyAsync(
RequestIdMdcUtil.wraps(() -> doGetTenancyIpInfo(tenancy, gdbManager, clusterId)),
executor
);
return future.get(timeout, TimeUnit.SECONDS);
} catch (TimeoutException e) {
log.warn("Get tenancy IP timeout for clusterId={}, timeout={}ms", clusterId, timeout);
return "";
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.error("Get tenancy IP interrupted for clusterId={}", clusterId, e);
return "";
} catch (ExecutionException e) {
log.error("Get tenancy IP failed for clusterId={}", clusterId, e.getCause());
return "";
} finally {
executor.shutdown();
try {
if (!executor.awaitTermination(1, TimeUnit.SECONDS)) {
executor.shutdownNow();
}
} catch (InterruptedException e) {
executor.shutdownNow();
Thread.currentThread().interrupt();
}
}
}
/**
* 实际执行逻辑(抽离便于测试)
*/
String doGetTenancyIpInfo(TenancyUserInfo tenancy, GdbManager gdbManager, String clusterId) {
List<DnInfo> dnInfos = getDnInfos(gdbManager, clusterId);
List<CnInfo> cnInfos = getCnInfos(gdbManager, clusterId);
DnInfo masterDb = getMasterIp(dnInfos, tenancy.getInstanceType());
if (masterDb == null || Strings.isBlank(masterDb.getDbIp()) || masterDb.getDbPort() == null) {
log.warn("Failed to get master node for clusterId={}, instanceType={}",
clusterId, tenancy.getInstanceType());
return "";
}
List<String> tenantIpsList = buildTenantIpsList(masterDb, dnInfos, cnInfos);
return String.join(";", tenantIpsList);
}

浙公网安备 33010602011771号