calico部署踩坑之排查过程
背景:
目前公司线上环境使用的k8s是aws 的eks服务,突然客户要求自建一套测试环境集群,一切都很顺利,最后发现calico处于running状态,但是ready处于微就绪状态。

1、查看pod的Events
kubectl -n kube-system describe pod calico-node-54jdp
发现出现以下问题:
Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 172.0.0.219,172.0.0.512021-09-15 08:39:28.412 [INFO][149] health.go 156: Number of node(s) with BGP peering established = 0
2、修复
通过在网上查看一些资料,都是要修改calico.yaml指定网卡 全文搜索“CALICO_IPV4POOL_IPIP”在它附近添加以下配置
- name: IP_AUTODETECTION_METHOD value: "interface=eth0"
重新apply一下:
kubectl apply -f calico.yaml
再次查看pod状态,好。并没什么卵用。
3、上大招
安装calicoctl工具
curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.9.2/calicoctl chmod +x calicoctl mv calicoctl /usr/local/bin
注意安装的calicoctl要和自己安装的calico版本一致
添加calicoctl.cfg
mkdir /etc/calico cat > /etc/calico/calicoctl.cfg << EOF apiVersion: projectcalico.org/v3 kind: CalicoAPIConfig metadata: spec: datastoreType: "kubernetes" kubeconfig: "/root/.kube/config" EOF
如果是外部的etcd集群自己百度
1、查看网络节点
[root@master-01 bin]# calicoctl get node NAME master-01 master-02 master-03
好。没问题。
2、查看节点网络状态
[root@master-01 bin]# calicoctl node status Calico process is running. IPv4 BGP status +--------------+-------------------+-------+----------+---------+ | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | +--------------+-------------------+-------+----------+---------+ | 172.0.0.32 | node-to-node mesh | start | 08:36:56 | Passive | | 172.0.0.51 | node-to-node mesh | start | 08:36:56 | Passive | +--------------+-------------------+-------+----------+---------+
好。有问题。正常的情况下state是up状态。
3、查看节点的yaml文件
这一步骤主要是查看第一步做的加载网卡ip有没有生效
root@master-01 bin]# calicoctl get node master-01 -o yaml apiVersion: projectcalico.org/v3 kind: Node metadata: annotations: projectcalico.org/kube-labels: '{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"master-01","kubernetes.io/os":"linux","node-role.kubernetes.io/master":""}' creationTimestamp: 2021-09-15T04:06:49Z labels: beta.kubernetes.io/arch: amd64 beta.kubernetes.io/os: linux kubernetes.io/arch: amd64 kubernetes.io/hostname: master-01 kubernetes.io/os: linux node-role.kubernetes.io/master: "" name: master-01 resourceVersion: "43603" uid: 3aeaf76f-b671-4c3e-ab86-71ef00be43bd spec: bgp: ipv4Address: 172.0.0.219/24
可以看到最后的ipv4Address: 172.0.0.219/24,又懵了。也没问题。
注意有时候第一次部署的时候,可能加载的node ip不对,先确认一下是不是node ip
3、最终解决
每个节点上会部署一个bird(BGP Client)负责通过 BGP 协议广播告诉剩余 calico 节点,从而实现网络互通,它默认监听的是tcp 179端口。安全组放开179端口就好了。