【负载均衡】负载不均衡问题:配置了 4 个 pod,并发压测的时候并发的 100 个请求都打在同一个 pod 上,问题排查 & 解决方案
1. 问题根因分析
1.1 Kubernetes Service 负载均衡机制
核心问题:Kubernetes Service 默认的负载均衡算法可能不适合你的压测场景。
2. 详细排查步骤
2.1 检查当前 Service 配置
# 查看 Service 详细配置
kubectl describe service your-service-name
# 查看 Endpoints 分布
kubectl get endpoints your-service-name
# 查看 Pod 标签和状态
kubectl get pods -l app=your-app-label -o wide
2.2 Java 代码:模拟负载均衡排查
/**
* 负载均衡诊断工具
*/
@Component
public class LoadBalancerDiagnoser {
@Autowired
private RestTemplate restTemplate;
@Value("${service.url}")
private String serviceUrl;
/**
* 诊断请求分布
*/
public void diagnoseLoadDistribution(int requestCount) {
Map<String, Integer> podRequestCount = new ConcurrentHashMap<>();
CountDownLatch latch = new CountDownLatch(requestCount);
for (int i = 0; i < requestCount; i++) {
CompletableFuture.runAsync(() -> {
try {
// 发送请求并记录响应的 Pod 信息
ResponseEntity<String> response = restTemplate.getForEntity(
serviceUrl + "/debug/pod-info", String.class);
// 从响应头或响应体中提取 Pod 信息
String podName = extractPodName(response);
podRequestCount.merge(podName, 1, Integer::sum);
} catch (Exception e) {
log.error("Request failed", e);
} finally {
latch.countDown();
}
});
}
latch.await();
// 输出分布情况
log.info("请求分布统计:");
podRequestCount.forEach((pod, count) -> {
double percentage = (double) count / requestCount * 100;
log.info("Pod {}: {} 请求 ({:.2f}%)", pod, count, percentage);
});
}
/**
* 在应用中添加调试端点,返回当前 Pod 信息
*/
@RestController
public static class DebugController {
@Value("${HOSTNAME:unknown}")
private String podName;
@GetMapping("/debug/pod-info")
public Map<String, String> getPodInfo() {
Map<String, String> info = new HashMap<>();
info.put("podName", podName);
info.put("timestamp", Instant.now().toString());
return info;
}
}
}
3. 常见原因及解决方案
3.1 原因1:Session Affinity(会话保持)启用
检查方法:
kubectl get service your-service -o yaml
# 查看 spec.sessionAffinity 配置
解决方案:
apiVersion: v1
kind: Service
metadata:
name: your-service
spec:
sessionAffinity: None # 确保为 None,而不是 ClientIP
selector:
app: your-app
ports:
- port: 80
targetPort: 8080
3.2 原因2:iptables 模式的负载均衡缺陷
Kubernetes 默认使用 iptables 做负载均衡,它使用随机算法,但在高并发下可能不均匀。
解决方案:切换到 ipvs 模式
# 检查当前代理模式
kubectl get configmap -n kube-system kube-proxy -o yaml | grep mode
# 如果使用 ipvs,需要修改配置
创建 ConfigMap 配置:
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs" # 使用 ipvs 替代 iptables
ipvs:
scheduler: "lc" # 最少连接算法,更均衡
3.3 原因3:客户端连接池复用
问题分析:HTTP 客户端连接池导致请求复用同一个连接。
Java 客户端解决方案:
@Configuration
public class LoadBalancerConfig {
/**
* 配置支持负载均衡的 RestTemplate
*/
@Bean
@LoadBalanced
public RestTemplate loadBalancedRestTemplate() {
return new RestTemplate(createLoadBalancingClient());
}
private ClientHttpRequestFactory createLoadBalancingClient() {
// 使用连接池,但设置合理的参数
HttpClient httpClient = HttpClient.create()
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000)
.doOnConnected(conn ->
conn.addHandlerLast(new ReadTimeoutHandler(5000, TimeUnit.MILLISECONDS)))
.compress(true)
.followRedirect(true);
// 关键:配置连接池,避免长连接导致粘滞
ConnectionProvider provider = ConnectionProvider.builder("lb-connection-pool")
.maxConnections(100) // 最大连接数
.maxIdleTime(Duration.ofSeconds(20)) // 最大空闲时间,避免长连接
.maxLifeTime(Duration.ofMinutes(5)) // 最大生存时间
.pendingAcquireTimeout(Duration.ofSeconds(10))
.evictInBackground(Duration.ofSeconds(30))
.build();
return new ReactorClientHttpConnector(HttpClient.create(provider));
}
/**
* 对于 HTTP 客户端,添加随机化策略
*/
@Bean
public WebClient loadBalancedWebClient() {
// 使用 Reactor LoadBalancer 的随机策略
return WebClient.builder()
.clientConnector(new ReactorClientHttpConnector(
HttpClient.create(ConnectionProvider.newConnection())))
.filter((request, next) -> {
// 为每个请求添加时间戳,避免缓存
return next.exchange(ClientRequest.from(request)
.header("X-Request-Timestamp", String.valueOf(System.currentTimeMillis()))
.build());
})
.build();
}
}
3.4 原因4:DNS 缓存问题
解决方案:配置 DNS 缓存策略
@Configuration
public class DnsConfig {
@Bean
public HttpClient httpClientWithDns() {
return HttpClient.create()
.resolver(spec -> spec.roundRobinSelection(true)) // DNS轮询
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000)
.doOnConnected(conn ->
conn.addHandlerLast(new ReadTimeoutHandler(5000, TimeUnit.MILLISECONDS)));
}
// 配置 JVM DNS 缓存时间
@PostConstruct
public void setDnsCacheSettings() {
// 设置 DNS 缓存时间为 10 秒
java.security.Security.setProperty("networkaddress.cache.ttl", "10");
java.security.Security.setProperty("networkaddress.cache.negative.ttl", "5");
}
}
4. 高级解决方案
4.1 使用 Service Mesh(Istio)进行智能负载均衡
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: your-service
spec:
host: your-service.default.svc.cluster.local
trafficPolicy:
loadBalancer:
simple: LEAST_CONN # 使用最少连接算法
connectionPool:
tcp:
maxConnections: 100
connectTimeout: 30ms
http:
http1MaxPendingRequests: 1024
maxRequestsPerConnection: 1024
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: your-service
spec:
hosts:
- your-service.default.svc.cluster.local
http:
- route:
- destination:
host: your-service.default.svc.cluster.local
subset: v1
weight: 100
4.2 客户端负载均衡策略
/**
* 自定义负载均衡策略
*/
@Component
public class CustomLoadBalancer {
private final AtomicInteger counter = new AtomicInteger(0);
private final List<String> availablePods = Collections.synchronizedList(new ArrayList<>());
@Scheduled(fixedRate = 30000) // 每30秒刷新Pod列表
public void refreshPodList() {
// 从Kubernetes API获取当前可用的Pod列表
List<String> currentPods = kubernetesClient.pods()
.inNamespace("default")
.withLabels(Collections.singletonMap("app", "your-app"))
.list()
.getItems()
.stream()
.map(pod -> pod.getStatus().getPodIP())
.collect(Collectors.toList());
availablePods.clear();
availablePods.addAll(currentPods);
}
/**
* 加权随机负载均衡
*/
public String choosePod() {
if (availablePods.isEmpty()) {
throw new IllegalStateException("No available pods");
}
// 简单的轮询
int index = counter.getAndIncrement() % availablePods.size();
return availablePods.get(index);
}
/**
* 使用自定义负载均衡的HTTP客户端
*/
public <T> T executeWithLoadBalance(Function<String, T> requestFunction) {
String targetPod = choosePod();
String url = "http://" + targetPod + ":8080";
return requestFunction.apply(url);
}
}
5. 压测脚本优化
5.1 确保压测工具正确配置
# 使用 wrk 进行压测,确保使用多个连接
wrk -t12 -c100 -d30s http://your-service --connections 100
# 使用 Apache Bench,禁用 keep-alive
ab -n 1000 -c 100 -H "Connection: close" http://your-service/
# 使用 hey(更现代的替代品)
hey -n 1000 -c 100 -disable-keepalive http://your-service
5.2 Java 压测客户端优化
/**
* 优化的压测客户端,确保请求分布均匀
*/
public class BalancedLoadTest {
public void performLoadTest() throws InterruptedException {
int concurrentUsers = 100;
int requestsPerUser = 100;
CountDownLatch latch = new CountDownLatch(concurrentUsers);
Map<String, AtomicInteger> requestDistribution = new ConcurrentHashMap<>();
// 为每个并发用户创建独立的HTTP客户端
List<CompletableFuture<Void>> futures = new ArrayList<>();
for (int i = 0; i < concurrentUsers; i++) {
final int userIndex = i;
CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {
// 每个用户使用独立的HTTP客户端实例
CloseableHttpClient httpClient = HttpClients.custom()
.setMaxConnTotal(1) // 每个客户端一个连接
.setMaxConnPerRoute(1)
.disableConnectionState() // 禁用连接状态跟踪
.build();
try {
for (int j = 0; j < requestsPerUser; j++) {
HttpGet request = new HttpGet("http://your-service/api");
// 添加随机参数避免缓存
request.setHeader("Cache-Control", "no-cache");
request.setHeader("User-Agent", "LoadTest-User-" + userIndex);
try (CloseableHttpResponse response = httpClient.execute(request)) {
String podInfo = EntityUtils.toString(response.getEntity());
String podName = extractPodName(podInfo);
requestDistribution
.computeIfAbsent(podName, k -> new AtomicInteger())
.incrementAndGet();
}
// 添加微小延迟,避免请求完全同步
Thread.sleep(ThreadLocalRandom.current().nextInt(10));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
latch.countDown();
}
});
futures.add(future);
}
latch.await(2, TimeUnit.MINUTES);
// 输出分布报告
System.out.println("=== 负载分布报告 ===");
requestDistribution.forEach((pod, count) -> {
double percentage = (double) count.get() / (concurrentUsers * requestsPerUser) * 100;
System.out.printf("Pod %s: %d 请求 (%.2f%%)%n", pod, count.get(), percentage);
});
}
}
6. 监控和验证
6.1 实时监控请求分布
# 实时查看每个Pod的请求计数
kubectl get pods -l app=your-app -o wide | awk '{print $1}' | xargs -I {} kubectl logs {} --tail=10 | grep "REQUEST_COUNT"
# 使用 Prometheus 查询
sum(rate(http_requests_total[1m])) by (pod)
6.2 验证解决方案是否生效
实施上述任一解决方案后,重新运行压测,观察请求分布:
# 期望看到的结果示例
Pod your-app-7cbbf5d56f-abcde: 1250 请求 (25.0%)
Pod your-app-7cbbf5d56f-fghij: 1248 请求 (24.96%)
Pod your-app-7cbbf5d56f-klmno: 1252 请求 (25.04%)
Pod your-app-7cbbf5d56f-pqrst: 1250 请求 (25.0%)
总结
这个问题通常由以下原因导致,按优先级排查:
- Session Affinity 配置(最常见)
- 客户端连接池复用
- iptables 负载均衡算法缺陷
- DNS 缓存问题
- 压测工具配置不当
建议的解决顺序:
- 检查并禁用 Session Affinity
- 优化客户端连接池配置
- 考虑切换到 ipvs 模式
- 使用 Service Mesh 进行高级负载均衡
通过系统性的排查和优化,可以确保流量均匀分布到所有 Pod。
本文来自博客园,作者:NeoLshu,转载请注明原文链接:https://www.cnblogs.com/neolshu/p/19120272

浙公网安备 33010602011771号