k8s中coredns不是running状态，CoreDNS Pod 仅在master节点上

如果 CoreDNS Pod 仅运行在 master 节点上（而非工作节点），通常是由于 Kubernetes 调度策略、节点污点（Taint）或资源限制导致的。这种情况可能导致 DNS 解析不稳定（如果 master 节点负载过高或不可用）。以下是详细排查和解决方案：

确认 CoreDNS Pod 的调度位置
检查 CoreDNS Pod 所在的节点
bash
kubectl get pods -n kube-system -o wide | grep coredns
输出示例：

plaintext
coredns-5d4dd4b4db-xxxxx 1/1 Running 0 5m
如果 CoreDNS 只运行在 master 节点，说明它未被调度到工作节点。

排查原因
2.1 工作节点被标记了 NoSchedule 污点
Kubernetes 默认可能给 master 节点添加污点（node-role.kubernetes.io/master 或 node-role.kubernetes.io/control-plane），阻止普通 Pod 调度到 master 节点。但如果工作节点也被标记了污点，CoreDNS 可能无法调度到它们。

检查节点污点
bash
kubectl describe nodes | grep Taints -A 10
正常情况：
master 节点有污点（如 node-role.kubernetes.io/master:NoSchedule）。
工作节点无污点（或污点被容忍）。
异常情况：
工作节点被错误标记了 NoSchedule 污点，导致 CoreDNS 无法调度。
解决方案
移除工作节点的污点（谨慎操作）：
bash
kubectl taint nodes node-role.kubernetes.io/master:NoSchedule-
或允许 CoreDNS 容忍污点（推荐方法见下文）。
2.2 CoreDNS 未配置容忍（Toleration）
CoreDNS 的 Deployment 默认可能未配置容忍 master 节点的污点，因此不会调度到 master 节点。但如果它仅运行在 master 节点，可能是：

手动修改了 CoreDNS 的 Deployment，强制调度到 master 节点。
工作节点资源不足（CPU/内存），导致 CoreDNS 无法调度。
检查 CoreDNS 的 Toleration
bash
kubectl describe deployment coredns -n kube-system | grep -A 10 Toleration
正常情况：
CoreDNS 应容忍 master 节点的污点（如 node-role.kubernetes.io/master:NoSchedule）。
示例：
yaml
tolerations:

key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
异常情况：
如果 tolerations 为空，CoreDNS 不会调度到 master 节点（除非手动修改）。
解决方案
修改 CoreDNS 的 Deployment，添加容忍（推荐）：
bash
kubectl edit deployment coredns -n kube-system
在 spec.template.spec 下添加：
yaml
tolerations:
key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
或允许调度到所有节点（包括 master 和工作节点）：
yaml
tolerations:
operator: "Exists" # 容忍所有污点（谨慎使用）
2.3 工作节点资源不足
如果工作节点的 CPU/内存不足，CoreDNS 可能无法调度到它们，只能运行在 master 节点。

检查节点资源
bash
kubectl describe nodes | grep -A 10 "Allocated resources"
如果工作节点的 Allocatable 资源接近耗尽，CoreDNS 会保持 Pending 状态。
解决方案
增加工作节点资源（扩容或释放资源）。
调整 CoreDNS 的资源请求（降低 requests）：
bash
kubectl edit deployment coredns -n kube-system
修改 resources.requests：
yaml
resources:
requests:
cpu: "50m"
memory: "64Mi"
2.4 CoreDNS 使用 nodeSelector 限制调度
如果 CoreDNS 的 Deployment 配置了 nodeSelector，可能强制它仅运行在 master 节点。

检查 nodeSelector
bash
kubectl describe deployment coredns -n kube-system | grep nodeSelector
异常情况：
如果 nodeSelector 指定了 kubernetes.io/hostname=，CoreDNS 只会运行在 master 节点。
解决方案
移除 nodeSelector（允许调度到所有节点）：
bash
kubectl edit deployment coredns -n kube-system
删除 spec.template.spec.nodeSelector 部分。
3. 最佳实践：让 CoreDNS 运行在所有节点
3.1 修改 CoreDNS 的 Deployment
bash
kubectl edit deployment coredns -n kube-system
确保配置包含：

yaml
spec:
template:
spec:
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
# 可选：如果希望 CoreDNS 优先运行在工作节点，可以添加：
# nodeSelector:
# node-role.kubernetes.io/worker: "true" # 需提前给工作节点打标签
3.2 使用 DaemonSet 替代 Deployment（推荐）
CoreDNS 通常应作为 DaemonSet 运行，确保每个节点都有一个实例（包括 master 和工作节点）。

将 CoreDNS 从 Deployment 迁移到 DaemonSet
备份当前 CoreDNS Deployment：
bash
kubectl get deployment coredns -n kube-system -o yaml > coredns-deployment-backup.yaml
删除现有 Deployment：
bash
kubectl delete deployment coredns -n kube-system
创建 DaemonSet（使用官方 CoreDNS DaemonSet 配置）：
bash
kubectl apply -f https://raw.githubusercontent.com/coredns/deployment/master/kubernetes/coredns-daemonset.yaml
或手动修改 DaemonSet 配置：
yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: coredns
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
spec:
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
containers:
- name: coredns
image: coredns/coredns:1.10.1
args: [ "-conf", "/etc/coredns/Corefile" ]
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
4. 验证修复
检查 CoreDNS Pod 分布：
bash
kubectl get pods -n kube-system -o wide | grep coredns
确保每个节点（包括 master 和工作节点）都有一个 CoreDNS Pod。
测试 DNS 解析：
bash
kubectl run -it --rm test-dns --image=busybox:1.28 --restart=Never -- nslookup kubernetes.default
如果解析成功，说明 CoreDNS 已正常工作。
总结
问题原因关键命令
工作节点有 NoSchedule 污点 kubectl taint nodes node-role.kubernetes.io/master:NoSchedule-
CoreDNS 缺少 Toleration kubectl edit deployment coredns -n kube-system + 添加 tolerations
工作节点资源不足 kubectl describe nodes + 调整 resources.requests
CoreDNS 使用 nodeSelector kubectl edit deployment coredns -n kube-system + 移除 nodeSelector
推荐方案将 CoreDNS 改为 DaemonSet 并配置 tolerations

最终建议：将 CoreDNS 部署为 DaemonSet 并配置 tolerations，确保它在所有节点上运行，提高 DNS 解析的可靠性。

posted @ 2025-07-09 21:10 油炸小趴菜阅读(82) 评论(0) 收藏举报

刷新页面返回顶部

yzxpc

k8s中coredns不是running状态，CoreDNS Pod 仅在master节点上

公告