环境

[root@worker ~]# uname -a
Linux worker 3.10.0-1160.119.1.el7.x86_64 #1 SMP Tue Jun 4 14:43:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
[root@worker ~]# cat /etc/redhat-release 
CentOS Linux release 7.9.2009 (Core)

问题现象

[root@minikube ~]# kubectl get all -n trafficguard
NAME READY STATUS RESTARTS AGE
pod/myapp-pod 1/1 Running 0 11m
pod/trafficguard-db-0 1/1 Running 0 13m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/db-service ClusterIP 10.102.252.92 3306/TCP 13m

NAME READY AGE
statefulset.apps/trafficguard-db 1/1 13m

[root@minikube ~]# kubectl get pod -n trafficguard -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myapp-pod 1/1 Running 0 12m 10.244.1.97 worker
trafficguard-db-0 1/1 Running 0 13m 10.244.1.95 worker

[root@minikube ~]# kubectl describe service db-service -n trafficguard
Name: db-service
Namespace: trafficguard
Labels:
Annotations:
Selector: app=db
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.102.252.92
IPs: 10.102.252.92
Port: 3306/TCP
TargetPort: 3306/TCP
Endpoints: 10.244.1.95:3306
Session Affinity: None
Events:

[root@minikube ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane 10d v1.28.15
worker Ready 10d v1.28.15

[root@minikube ~]# kubectl exec -it myapp-pod -n trafficguard -- /bin/sh
/ # nc -z 10.244.1.95 3306
/ # nc -z 10.102.252.92 3306
/ # echo $?
1
/ # nslookup db-service
Server: 10.96.0.10
Address: 10.96.0.10:53
Name: db-service.trafficguard.svc.cluster.local
Address: 10.102.252.92
** server can't find db-service.cluster.local: NXDOMAIN
** server can't find db-service.svc.cluster.local: NXDOMAIN
** server can't find db-service.cluster.local: NXDOMAIN
** server can't find db-service.svc.cluster.local: NXDOMAIN
/ # nc -z db-service 3306
/ # echo $?
1
/ #

可以看出,在 myapp-pod pod 中无法访问 db-service service,dns 解析没有问题,直接访问 db-service service 的 trafficguard-db-0 endpoint 也没问题。

问题分析

由于 myapp-podtrafficguard-db-0 都在 worker 节点上,断定是 worker 节点上的 kube-proxy-gqnwh pod 没有正确地将流量从 db-service 的 ClusterIP 转发到 trafficguard-db-0 的 Pod IP。

接着查看这个 kube-proxy-gqnwh pod 的日志:

[root@minikube ~]# kubectl get pod -n kube-system -o wide
kube-proxy-gqnwh 1/1 Running 0 24m 192.168.31.56 worker
kube-proxy-mq6gp 1/1 Running 0 24m 192.168.31.215 minikube

[root@minikube ~]# kubectl logs -n kube-system kube-proxy-gqnwh
I0905 12:54:09.019850 1 proxier.go:260] "Missing br-netfilter module or unset sysctl br-nf-call-iptables, proxy may not work as intended"
I0905 12:54:09.020167 1 proxier.go:260] "Missing br-netfilter module or unset sysctl br-nf-call-iptables, proxy may not work as intended"

可以看到,如果 br_netfilter 模块没有加载,或者相关的 sysctl 参数没有设置,kube-proxyiptables 规则就无法正确地拦截和处理流量,导致从 Service ClusterIP 到 Pod IP 的转发失败。

解决方法

加载 br_netfilter 模块
[root@worker ~]# modprobe br_netfilter
设置 sysctl 参数
# 临时设置,重启后会失效
[root@worker ~]# sysctl net.bridge.bridge-nf-call-iptables=1
# 永久设置,修改 /etc/sysctl.conf 文件
[root@worker ~]# echo 'net.bridge.bridge-nf-call-iptables=1' | tee -a /etc/sysctl.conf
# 重新加载 sysctl 配置
[root@worker ~]# sysctl -p
重启 kube-proxy

回到 control plane 节点,删除并重启 worker 节点上的 kube-proxy Pod,让它重新加载正确的配置。

[root@minikube ~]# kubectl delete pod kube-proxy-gqnwh -n kube-system
pod "kube-proxy-gqnwh" deleted
posted on 2025-09-06 16:50  心向所想  阅读(10)  评论(0)    收藏  举报