k8s 集群使用 calico 并开启 bgp

k8s 集群使用 calico 并开启 bgp

1、集群信息

主机 角色 说明
10.10.10.150 master1 k8s 主节点,系统 ubuntu 22.04.4,BGP AS 号 65002
10.10.10.151 master2 k8s 主节点,系统 ubuntu 22.04.4,BGP AS 号 65002
10.10.10.152 master3 k8s 主节点,系统 ubuntu 22.04.4,BGP AS 号 65002
10.10.10.153 worker1 k8s 工作节点,系统 ubuntu 22.04.4,BGP AS 号 65002
10.10.10.154 worker2 k8s 工作节点,系统 ubuntu 22.04.4,BGP AS 号 65002
10.10.10.155 旁路由 虚拟路由,安装 quagga 以便支持 bgp,BGP AS 号 65001

使用 vmware 虚拟环境安装以上主机

安装前 10.10.10.150-10.10.10.154 需要配置 ssh 免密登陆

2、安装旁路由

由于没有支持 bgp 的硬件,这里使用虚拟路由来实现 bgp

首先下载虚拟路由固件

列表地址:https://fw.koolcenter.com/iStoreOS/x86_64/

下载一个:https://fw.koolcenter.com/iStoreOS/x86_64/istoreos-22.03.7-2024120615-x86-64-squashfs-combined.img.gz

需要把镜像转换成vmware可用格式

转换工具:https://www.starwindsoftware.com/tmplink/starwindconverter.exe

转换完成后新建 vmware 虚拟机:

导入刚才转换好的镜像即可

设置完成后启动,启动完成后使用 web 登陆设置虚拟路由,默认账号 root/password:

点击网络向导:设置为旁路由模式即可

3、安装 k8s 集群

使用开源项目安装:https://github.com/easzlab/kubeasz

在 master1 节点上操作

下载工具脚本:

export release=3.6.4
wget https://github.com/easzlab/kubeasz/releases/download/${release}/ezdown
chmod +x ./ezdown

下载kubeasz代码、二进制、默认容器镜像

# 国内环境
./ezdown -D

启动 kubeasz 容器

./ezdown -S

创建集群:

docker exec -it kubeasz ezctl new k8s-01

编辑集群配置文件:

# vi /etc/kubeasz/clusters/k8s-01/config.yml
############################
# prepare
############################
# 可选离线安装系统软件包 (offline|online)
INSTALL_SOURCE: "offline"

# 可选进行系统安全加固 github.com/dev-sec/ansible-collection-hardening
# (deprecated) 未更新上游项目,未验证最新k8s集群安装,不建议启用
OS_HARDEN: false


############################
# role:deploy
############################
# default: ca will expire in 100 years
# default: certs issued by the ca will expire in 50 years
CA_EXPIRY: "876000h"
CERT_EXPIRY: "438000h"

# force to recreate CA and other certs, not suggested to set 'true'
CHANGE_CA: false

# kubeconfig 配置参数
CLUSTER_NAME: "cluster1"
CONTEXT_NAME: "context-{{ CLUSTER_NAME }}"

# k8s version
K8S_VER: "1.30.1"

# set unique 'k8s_nodename' for each node, if not set(default:'') ip add will be used
# CAUTION: 'k8s_nodename' must consist of lower case alphanumeric characters, '-' or '.',
# and must start and end with an alphanumeric character (e.g. 'example.com'),
# regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'
K8S_NODENAME: "{%- if k8s_nodename != '' -%} \
                    {{ k8s_nodename|replace('_', '-')|lower }} \
               {%- else -%} \
                    k8s-{{ inventory_hostname|replace('.', '-') }} \
               {%- endif -%}"

# use 'K8S_NODENAME' to set hostname
ENABLE_SETTING_HOSTNAME: true


############################
# role:etcd
############################
# 设置不同的wal目录,可以避免磁盘io竞争,提高性能
ETCD_DATA_DIR: "/var/lib/etcd"
ETCD_WAL_DIR: ""


############################
# role:runtime [containerd,docker]
############################
# [.]启用拉取加速镜像仓库
ENABLE_MIRROR_REGISTRY: true

# [.]添加信任的私有仓库
# 必须按照如下示例格式,协议头'http://'和'https://'不能省略
INSECURE_REG:
  - "http://easzlab.io.local:5000"
  - "https://reg.yourcompany.com"

# [.]基础容器镜像
SANDBOX_IMAGE: "easzlab.io.local:5000/easzlab/pause:3.9"

# [containerd]容器持久化存储目录
CONTAINERD_STORAGE_DIR: "/var/lib/containerd"

# [docker]容器存储目录
DOCKER_STORAGE_DIR: "/var/lib/docker"

# [docker]开启Restful API
DOCKER_ENABLE_REMOTE_API: false


############################
# role:kube-master
############################
# k8s 集群 master 节点证书配置,可以添加多个ip和域名(比如增加公网ip和域名)
MASTER_CERT_HOSTS:
  - "10.1.1.1"
  - "k8s.easzlab.io"
  #- "www.test.com"

# node 节点上 pod 网段掩码长度(决定每个节点最多能分配的pod ip地址)
# 如果flannel 使用 --kube-subnet-mgr 参数,那么它将读取该设置为每个节点分配pod网段
# https://github.com/coreos/flannel/issues/847
NODE_CIDR_LEN: 24


############################
# role:kube-node
############################
# Kubelet 根目录
KUBELET_ROOT_DIR: "/var/lib/kubelet"

# node节点最大pod 数
MAX_PODS: 110

# 配置为kube组件(kubelet,kube-proxy,dockerd等)预留的资源量
# 数值设置详见templates/kubelet-config.yaml.j2
KUBE_RESERVED_ENABLED: "no"

# k8s 官方不建议草率开启 system-reserved, 除非你基于长期监控,了解系统的资源占用状况;
# 并且随着系统运行时间,需要适当增加资源预留,数值设置详见templates/kubelet-config.yaml.j2
# 系统预留设置基于 4c/8g 虚机,最小化安装系统服务,如果使用高性能物理机可以适当增加预留
# 另外,集群安装时候apiserver等资源占用会短时较大,建议至少预留1g内存
SYS_RESERVED_ENABLED: "no"


############################
# role:network [flannel,calico,cilium,kube-ovn,kube-router]
############################
# ------------------------------------------- flannel
# [flannel]设置flannel 后端"host-gw","vxlan"等
FLANNEL_BACKEND: "vxlan"
DIRECT_ROUTING: false

# [flannel] 
flannel_ver: "v0.22.2"

# ------------------------------------------- calico
# [calico] IPIP隧道模式可选项有: [Always, CrossSubnet, Never],跨子网可以配置为Always与CrossSubnet(公有云建议使用always比较省事,其他的话需要修改各自公有云的网络配置,具体可以参考各个公有云说明)
# 其次CrossSubnet为隧道+BGP路由混合模式可以提升网络性能,同子网配置为Never即可.
#CALICO_IPV4POOL_IPIP: "Always"
CALICO_IPV4POOL_IPIP: "CrossSubnet"	# CrossSubnet 只有跨子网时使用 IPIP 隧道, Never 关闭 ipip 模式

# calico 网络 包含 overlay  和 underlay 网络
# overlay IPIP 和 VXLAN 模式,IPIP 可以使用 BGP 协议,VXLAN 不能使用 BGP 协议, 可以跨网段,建立隧道
# underlay BGP 模式, 必须同一个 C 网


# [calico]设置 calico-node使用的host IP,bgp邻居通过该地址建立,可手工指定也可以自动发现
IP_AUTODETECTION_METHOD: "can-reach={{ groups['kube_master'][0] }}"

# [calico]设置calico 网络 backend: bird, vxlan, none
CALICO_NETWORKING_BACKEND: "bird"

# [calico]设置calico 是否使用route reflectors
# 如果集群规模超过50个节点,建议启用该特性
CALICO_RR_ENABLED: false
# 这里开启 route reflectors 会安装失败,好像是 kubeasz 的 bug

# CALICO_RR_NODES 配置route reflectors的节点,如果未设置默认使用集群master节点 
# CALICO_RR_NODES: ["192.168.1.1", "192.168.1.2"]
CALICO_RR_NODES: []

# [calico]更新支持calico 版本: ["3.19", "3.23"]
calico_ver: "v3.26.4"

# [calico]calico 主版本
calico_ver_main: "{{ calico_ver.split('.')[0] }}.{{ calico_ver.split('.')[1] }}"

# ------------------------------------------- cilium
# [cilium]镜像版本
cilium_ver: "1.15.5"
cilium_connectivity_check: true
cilium_hubble_enabled: false
cilium_hubble_ui_enabled: false

# ------------------------------------------- kube-ovn
# [kube-ovn]离线镜像tar包
kube_ovn_ver: "v1.11.5"

# ------------------------------------------- kube-router
# [kube-router]公有云上存在限制,一般需要始终开启 ipinip;自有环境可以设置为 "subnet"
OVERLAY_TYPE: "full"

# [kube-router]NetworkPolicy 支持开关
FIREWALL_ENABLE: true

# [kube-router]kube-router 镜像版本
kube_router_ver: "v1.5.4"


############################
# role:cluster-addon
############################
# coredns 自动安装
dns_install: "yes"
corednsVer: "1.11.1"
ENABLE_LOCAL_DNS_CACHE: true
dnsNodeCacheVer: "1.22.28"
# 设置 local dns cache 地址
LOCAL_DNS_CACHE: "169.254.20.10"

# metric server 自动安装
metricsserver_install: "yes"
metricsVer: "v0.7.1"

# dashboard 自动安装
dashboard_install: "no"
dashboardVer: "v2.7.0"
dashboardMetricsScraperVer: "v1.0.8"

# prometheus 自动安装
prom_install: "no"
prom_namespace: "monitor"
prom_chart_ver: "45.23.0"

# kubeapps 自动安装,如果选择安装,默认同时安装local-storage(提供storageClass: "local-path")
kubeapps_install: "no"
kubeapps_install_namespace: "kubeapps"
kubeapps_working_namespace: "default"
kubeapps_storage_class: "local-path"
kubeapps_chart_ver: "12.4.3"

# local-storage (local-path-provisioner) 自动安装
local_path_provisioner_install: "no"
local_path_provisioner_ver: "v0.0.26"
# 设置默认本地存储路径
local_path_provisioner_dir: "/opt/local-path-provisioner"

# nfs-provisioner 自动安装
nfs_provisioner_install: "no"
nfs_provisioner_namespace: "kube-system"
nfs_provisioner_ver: "v4.0.2"
nfs_storage_class: "managed-nfs-storage"
nfs_server: "192.168.1.10"
nfs_path: "/data/nfs"

# network-check 自动安装
network_check_enabled: false 
network_check_schedule: "*/5 * * * *"

############################
# role:harbor
############################
# harbor version,完整版本号
HARBOR_VER: "v2.10.2"
HARBOR_DOMAIN: "harbor.easzlab.io.local"
HARBOR_PATH: /var/data
HARBOR_TLS_PORT: 8443
HARBOR_REGISTRY: "{{ HARBOR_DOMAIN }}:{{ HARBOR_TLS_PORT }}"

# if set 'false', you need to put certs named harbor.pem and harbor-key.pem in directory 'down'
HARBOR_SELF_SIGNED_CERT: true

# install extra component
HARBOR_WITH_TRIVY: false
# vi /etc/kubeasz/clusters/k8s-01/hosts
# 'etcd' cluster should have odd member(s) (1,3,5,...)
[etcd]
10.10.10.150
10.10.10.151
10.10.10.152

# CAUTION: 'k8s_nodename' must consist of lower case alphanumeric characters, '-' or '.',
# and must start and end with an alphanumeric character
[kube_master]
10.10.10.150 k8s_nodename='master1'
10.10.10.151 k8s_nodename='master2'
10.10.10.152 k8s_nodename='master3'

# work node(s), set unique 'k8s_nodename' for each node
# CAUTION: 'k8s_nodename' must consist of lower case alphanumeric characters, '-' or '.',
# and must start and end with an alphanumeric character
[kube_node]
10.10.10.153 k8s_nodename='worker1'
10.10.10.154 k8s_nodename='worker2'

# [optional] harbor server, a private docker registry
# 'NEW_INSTALL': 'true' to install a harbor server; 'false' to integrate with existed one
[harbor]
#192.168.1.8 NEW_INSTALL=false

# [optional] loadbalance for accessing k8s from outside
[ex_lb]
#192.168.1.6 LB_ROLE=backup EX_APISERVER_VIP=192.168.1.250 EX_APISERVER_PORT=8443
#192.168.1.7 LB_ROLE=master EX_APISERVER_VIP=192.168.1.250 EX_APISERVER_PORT=8443

# [optional] ntp server for the cluster
[chrony]
#192.168.1.1

[all:vars]
# --------- Main Variables ---------------
# Secure port for apiservers
SECURE_PORT="6443"

# Cluster container-runtime supported: docker, containerd
# if k8s version >= 1.24, docker is not supported
CONTAINER_RUNTIME="containerd"

# Network plugins supported: calico, flannel, kube-router, cilium, kube-ovn
CLUSTER_NETWORK="calico"

# Service proxy mode of kube-proxy: 'iptables' or 'ipvs'
PROXY_MODE="ipvs"

# K8S Service CIDR, not overlap with node(host) networking
SERVICE_CIDR="10.68.0.0/16"

# Cluster CIDR (Pod CIDR), not overlap with node(host) networking
CLUSTER_CIDR="172.20.0.0/16"

# NodePort Range
NODE_PORT_RANGE="30000-32767"

# Cluster DNS Domain
CLUSTER_DNS_DOMAIN="cluster.local"

# -------- Additional Variables (don't change the default value right now) ---
# Binaries Directory
bin_dir="/opt/kube/bin"

# Deploy Directory (kubeasz workspace)
base_dir="/etc/kubeasz"

# Directory for a specific cluster
cluster_dir="{{ base_dir }}/clusters/k8s-01"

# CA and other components cert/key Directory
ca_dir="/etc/kubernetes/ssl"

# Default 'k8s_nodename' is empty
k8s_nodename=''

# Default python interpreter
ansible_python_interpreter=/usr/bin/python3

开始安装:

docker exec -it kubeasz ezctl setup k8s-01 all

安装完成:

$ kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-6946cb87d6-l7st4   1/1     Running   0          124m
kube-system   calico-node-4c4x4                          1/1     Running   0          124m
kube-system   calico-node-8cg82                          1/1     Running   0          124m
kube-system   calico-node-9qn27                          1/1     Running   0          124m
kube-system   calico-node-hs7nd                          1/1     Running   0          124m
kube-system   calico-node-kr8bd                          1/1     Running   0          124m
kube-system   coredns-c5768dcc7-znxp2                    1/1     Running   0          124m
kube-system   metrics-server-65b5b555f5-rvgxp            1/1     Running   0          124m
kube-system   node-local-dns-5jsr4                       1/1     Running   0          124m
kube-system   node-local-dns-gtfqc                       1/1     Running   0          124m
kube-system   node-local-dns-k7xhl                       1/1     Running   0          124m
kube-system   node-local-dns-kvpdp                       1/1     Running   0          124m
kube-system   node-local-dns-tb4xd                       1/1     Running   0          124m

$ kubectl get nodes
NAME      STATUS                     ROLES    AGE    VERSION
master1   Ready,SchedulingDisabled   master   125m   v1.30.1
master2   Ready,SchedulingDisabled   master   125m   v1.30.1
master3   Ready,SchedulingDisabled   master   125m   v1.30.1
worker1   Ready                      node     125m   v1.30.1
worker2   Ready                      node     125m   v1.30.1

查看 calico 状态:

$ calicoctl get nodes -o wide
NAME      ASN       IPV4              IPV6   
master1   (64512)   10.10.10.150/24          
master2   (64512)   10.10.10.151/24          
master3   (64512)   10.10.10.152/24          
worker1   (64512)   10.10.10.153/24          
worker2   (64512)   10.10.10.154/24   

# on master1
$ calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.10.10.151 | node-to-node mesh | up    | 09:27:53 | Established |
| 10.10.10.152 | node-to-node mesh | up    | 09:27:53 | Established |
| 10.10.10.153 | node-to-node mesh | up    | 09:27:53 | Established |
| 10.10.10.154 | node-to-node mesh | up    | 09:27:53 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

# on master2
$ calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.10.10.150 | node-to-node mesh | up    | 09:27:52 | Established |
| 10.10.10.152 | node-to-node mesh | up    | 09:27:52 | Established |
| 10.10.10.153 | node-to-node mesh | up    | 09:27:52 | Established |
| 10.10.10.154 | node-to-node mesh | up    | 09:27:52 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

$ calicoctl get ipPool
NAME                  CIDR            SELECTOR   
default-ipv4-ippool   172.20.0.0/16   all()      

# 可以看到使用的的 overlay 网络, ipip 模式 CrossSubnet
$ calicoctl get ipPool -o yaml
apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    creationTimestamp: "2024-12-18T07:21:56Z"
    name: default-ipv4-ippool
    resourceVersion: "626"
    uid: 21b43022-44d5-415b-8211-2ad837da8877
  spec:
    allowedUses:
    - Workload
    - Tunnel
    blockSize: 26
    cidr: 172.20.0.0/16
    ipipMode: CrossSubnet
    natOutgoing: true
    nodeSelector: all()
    vxlanMode: Never
kind: IPPoolList
metadata:
  resourceVersion: "169278"

4、开启 BGP

4.1 设置旁路由

ssh 登陆旁理由,默认账号 root/password

opkg update && opkg install quagga quagga-zebra quagga-bgpd quagga-vtysh

成功安装之后,会自动启动并监听端口:

$ netstat -lantp | grep -e 'zebra\|bgpd'
tcp        0      0 0.0.0.0:2601            0.0.0.0:*               LISTEN      20616/zebra
tcp        0      0 0.0.0.0:2605            0.0.0.0:*               LISTEN      20625/bgpd
tcp        0      0 :::2601                 :::*                    LISTEN      20616/zebra
tcp        0      0 :::2605                 :::*                    LISTEN      20625/bgpd
  • 这里并没有看到 bpgd 用于接收路由信息而监听的 179 端口,这是因为该路由还没有分配 AS。不着急,让我们使用命令 vtysh进入 vty 进行配置:
$ vtysh
OpenWrt# conf t
OpenWrt(config)# router bgp 65001
OpenWrt(config-router)# neighbor 10.10.10.150 remote-as 65002
OpenWrt(config-router)# neighbor 10.10.10.150 description master1
OpenWrt(config-router)# neighbor 10.10.10.151 remote-as 65002
OpenWrt(config-router)# neighbor 10.10.10.151 description master2
OpenWrt(config-router)# neighbor 10.10.10.152 remote-as 65002
OpenWrt(config-router)# neighbor 10.10.10.152 description master3
OpenWrt(config-router)# neighbor 10.10.10.153 remote-as 65002
OpenWrt(config-router)# neighbor 10.10.10.153 description worker1
OpenWrt(config-router)# neighbor 10.10.10.154 remote-as 65002
OpenWrt(config-router)# neighbor 10.10.10.154 description worker2
OpenWrt(config-router)# exit
OpenWrt(config)# exit
  • 这里可以不用把集群所有主机都添加进来,添加 2-3 个保证高可用就可以了。如果 calico 开启 route reflect 模式的话,添加 rr 节点即可
  • 删除 neighbor:no neighbor 10.10.10.151

4.2 设置 calico

master1 节点执行:

# vi calico.yaml 
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: true
  asNumber: 65002
  # 暴露集群 cluster ip 路由给旁路由,从而可以使外界网络可以通过旁路由访问到 cluster ip,一般不建议这样做
  # 参考 https://docs.tigera.io/calico/latest/networking/configuring/advertise-service-ips
  #serviceClusterIPs:
  #  - cidr: 10.68.0.0/16
# vi BGPPeer.yaml 
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: global-router-155
spec:
  peerIP: 10.10.10.155
  asNumber: 65001
  • 这是 global BGP peer,集群中每个主机都会去连接共享 bgp 路由信息

应用:

calicoctl apply -f calico.yaml
calicoctl apply -f BGPPeer.yaml

master1 查看:

$ calicoctl get node -o wide
NAME      ASN       IPV4              IPV6   
master1   (65002)   10.10.10.150/24          
master2   (65002)   10.10.10.151/24          
master3   (65002)   10.10.10.152/24          
worker1   (65002)   10.10.10.153/24          
worker2   (65002)   10.10.10.154/24          

$ calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.10.10.155 | global            | up    | 08:38:51 | Established |
| 10.10.10.151 | node-to-node mesh | up    | 09:27:52 | Established |
| 10.10.10.152 | node-to-node mesh | up    | 09:27:52 | Established |
| 10.10.10.153 | node-to-node mesh | up    | 09:27:52 | Established |
| 10.10.10.154 | node-to-node mesh | up    | 09:27:52 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

查看 ip route

# master1 上执行
$ kubectl -n kube-system get pod -o wide | grep metrics
metrics-server-65b5b555f5-rvgxp            1/1     Running   0          143m   172.20.235.129   worker1   <none>           <none>

$ ip route 
default via 10.10.10.2 dev ens32 proto dhcp src 10.10.10.150 metric 100 
10.10.10.0/24 dev ens32 proto kernel scope link src 10.10.10.150 metric 100 
10.10.10.2 dev ens32 proto dhcp scope link src 10.10.10.150 metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.20.136.0/26 via 10.10.10.152 dev ens32 proto bird 
blackhole 172.20.137.64/26 proto bird 
172.20.180.0/26 via 10.10.10.151 dev ens32 proto bird 
172.20.189.64/26 via 10.10.10.154 dev ens32 proto bird 
172.20.235.128/26 via 10.10.10.153 dev ens32 proto bird 
# 172.20.235.128/26 via 10.10.10.153 dev ens32 proto bird 这条路由通过 bird (即 calico bgp 协议获取到的)

# 访问 pod 网络
$ ping 172.20.235.129
PING 172.20.235.129 (172.20.235.129) 56(84) bytes of data.
64 bytes from 172.20.235.129: icmp_seq=1 ttl=63 time=0.254 ms
64 bytes from 172.20.235.129: icmp_seq=2 ttl=63 time=0.204 ms
64 bytes from 172.20.235.129: icmp_seq=3 ttl=63 time=0.211 ms
^C
--- 172.20.235.129 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2034ms

再到旁路由上去看,也可以看到集群的 bgp 协议同步过来的路由信息:

$ ip route
default via 10.10.10.2 dev br-lan proto static 
10.10.10.0/24 dev br-lan proto kernel scope link src 10.10.10.155 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.20.136.0/26 via 10.10.10.152 dev br-lan proto zebra metric 20 
172.20.137.64/26 via 10.10.10.150 dev br-lan proto zebra metric 20 
172.20.180.0/26 via 10.10.10.151 dev br-lan proto zebra metric 20 
172.20.189.64/26 via 10.10.10.154 dev br-lan proto zebra metric 20 
172.20.235.128/26 via 10.10.10.153 dev br-lan proto zebra metric 20 

# 访问 pod 网络
$ ping 172.20.235.129
PING 172.20.235.129 (172.20.235.129): 56 data bytes
64 bytes from 172.20.235.129: seq=0 ttl=63 time=0.330 ms
64 bytes from 172.20.235.129: seq=1 ttl=63 time=0.288 ms
64 bytes from 172.20.235.129: seq=2 ttl=63 time=0.858 ms
^C
--- 172.20.235.129 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.288/0.492/0.858 ms

# 进入虚拟路由配置
$ vtysh

Hello, this is Quagga (version 1.2.4).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

iStoreOS# 
iStoreOS# show ip bgp summary
BGP router identifier 172.17.0.1, local AS number 65001
RIB entries 9, using 1008 bytes of memory
Peers 5, using 44 KiB of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.10.10.150    4 65002      92      80        0    0    0 01:12:25        5
10.10.10.151    4 65002      93      81        0    0    0 01:12:21        5
10.10.10.152    4 65002      93      81        0    0    0 01:12:18        5
10.10.10.153    4 65002      90      81        0    0    0 01:12:14        5
10.10.10.154    4 65002      91      81        0    0    0 01:12:10        5

Total number of neighbors 5

Total num. Established sessions 5
Total num. of routes received     25
iStoreOS# 
iStoreOS# show ip route 
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, A - Babel, N - NHRP,
       > - selected route, * - FIB route

K>* 0.0.0.0/0 via 10.10.10.2, br-lan
C>* 10.10.10.0/24 is directly connected, br-lan
C>* 127.0.0.0/8 is directly connected, lo
B>* 172.20.136.0/26 [20/0] via 10.10.10.152, br-lan, 01:06:39
B>* 172.20.137.64/26 [20/0] via 10.10.10.150, br-lan, 01:12:42
B>* 172.20.180.0/26 [20/0] via 10.10.10.151, br-lan, 00:23:41
B>* 172.20.189.64/26 [20/0] via 10.10.10.154, br-lan, 01:06:39
B>* 172.20.235.128/26 [20/0] via 10.10.10.153, br-lan, 01:06:39
iStoreOS# 
iStoreOS# ping 172.20.235.129
PING 172.20.235.129 (172.20.235.129): 56 data bytes
64 bytes from 172.20.235.129: seq=0 ttl=63 time=0.294 ms
64 bytes from 172.20.235.129: seq=1 ttl=63 time=0.248 ms
64 bytes from 172.20.235.129: seq=2 ttl=63 time=0.245 ms
^C
--- 172.20.235.129 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.245/0.262/0.294 ms
iStoreOS# 

4.3 禁用 ipipMode 和vxlan Mode

查看原配置:

$ calicoctl get ipPool
NAME                  CIDR            SELECTOR   
default-ipv4-ippool   172.20.0.0/16   all()      

# 可以看到使用的的 overlay 网络, ipip 模式 CrossSubnet
$ calicoctl get ipPool -o yaml
apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    creationTimestamp: "2024-12-18T07:21:56Z"
    name: default-ipv4-ippool
    resourceVersion: "626"
    uid: 21b43022-44d5-415b-8211-2ad837da8877
  spec:
    allowedUses:
    - Workload
    - Tunnel
    blockSize: 26
    cidr: 172.20.0.0/16
    ipipMode: CrossSubnet
    natOutgoing: true
    nodeSelector: all()
    vxlanMode: Never
kind: IPPoolList
metadata:
  resourceVersion: "169278"

禁用:

$ calicoctl apply -f - <<EOF
apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    name: default-ipv4-ippool
  spec:
    allowedUses:
    - Workload
    blockSize: 26
    cidr: 172.20.0.0/16
    ipipMode: Never
    natOutgoing: true
    nodeSelector: all()
    vxlanMode: Never
kind: IPPoolList
EOF


$ calicoctl get ipPool -o yaml
apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    creationTimestamp: "2024-12-18T07:21:56Z"
    name: default-ipv4-ippool
    resourceVersion: "171163"
    uid: 21b43022-44d5-415b-8211-2ad837da8877
  spec:
    allowedUses:
    - Workload
    blockSize: 26
    cidr: 172.20.0.0/16
    ipipMode: Never
    natOutgoing: true
    nodeSelector: all()
    vxlanMode: Never
kind: IPPoolList
metadata:
  resourceVersion: "171215"

4.4 node-to-node mesh 模式

calico bgp 模式下默认使用 node-to-node mesh 模式维护集群路由信息。这时候,每台宿主机上的BGP Client都需要跟其他所有节点的BGP Client进行通信以便交换路由信息。但是,随着节点数量N的增加,这些连接的数量就会以的规模快速增长,从而给集群本身的网络带来巨大的压力。node-to-node mesh 模式一般推荐用在少于100个节点的集群里。而在更大规模的集群中,需要用到的是一个叫 Route Reflector (RR)的模式。在这种模式下,Calico会指定一个或几个专门的节点,来负责跟所有节点建立BGP连接从而学习到全局的路由规则。而其他节点,只需要跟这几个专门的节点交换路由信息,就可以获得整个集群的路由规则信息了,这些专门的节点就是所谓的Route Reflect。

可以禁用 calico NodeToNodeMesh:

master1 上操作

# vi calico.yaml,设置 nodeToNodeMeshEnabled 为 false
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: false
  asNumber: 65002
$ calicoctl apply -f calico.yaml

$ calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-----------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE |  SINCE   |    INFO     |
+--------------+-----------+-------+----------+-------------+
| 10.10.10.155 | global    | up    | 08:38:52 | Established |
+--------------+-----------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

# 可以看到 pod 的 route 已经没有了,也无法访问
$ ip route
default via 10.10.10.2 dev ens32 proto dhcp src 10.10.10.150 metric 100 
10.10.10.0/24 dev ens32 proto kernel scope link src 10.10.10.150 metric 100 
10.10.10.2 dev ens32 proto dhcp scope link src 10.10.10.150 metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
blackhole 172.20.137.64/26 proto bird 

$ ping 172.20.235.129
PING 172.20.235.129 (172.20.235.129) 56(84) bytes of data.
^C
--- 172.20.235.129 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2031ms

解决方法,把 master1 的默认路由设置为旁路由:

$ vi /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
  ethernets:
    ens32:
      dhcp4: false
      addresses: [10.10.10.150/24]
      optional: true
      routes:
        - to: default
          via: 10.10.10.155
      nameservers:
        addresses: [114.114.114.114, 8.8.8.8]
  version: 2

$ netplan apply

$ ip route
default via 10.10.10.155 dev ens32 proto static 
10.10.10.0/24 dev ens32 proto kernel scope link src 10.10.10.150 metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
blackhole 172.20.137.64/26 proto bird 

$ ping 172.20.235.129
PING 172.20.235.129 (172.20.235.129) 56(84) bytes of data.
64 bytes from 172.20.235.129: icmp_seq=1 ttl=63 time=0.333 ms
64 bytes from 172.20.235.129: icmp_seq=2 ttl=63 time=0.277 ms
64 bytes from 172.20.235.129: icmp_seq=3 ttl=63 time=0.363 ms
64 bytes from 172.20.235.129: icmp_seq=4 ttl=63 time=0.235 ms
^C
--- 172.20.235.129 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3073ms
rtt min/avg/max/mdev = 0.235/0.302/0.363/0.049 ms
  • 不过这时候 master1 访问 pod 时流量需要经过旁路由转发到 10.10.10.153 主机,不像前面使用 NodeToNodeMesh 时,可以直接路由到 10.10.10.153 主机

4.4.1 开启 Route Reflect 模式

fullmesh 全连接形式在大规模集群中并不适用,集群主机数大于 50 个时,建议关闭bgp fullmesh的模式,采取bgp route reflector

关闭 bgp fullmesh 模式

$ calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.10.10.155 | global            | up    | 08:38:52 | Established |
| 10.10.10.151 | node-to-node mesh | up    | 03:16:28 | Established |
| 10.10.10.152 | node-to-node mesh | up    | 03:16:28 | Established |
| 10.10.10.153 | node-to-node mesh | up    | 03:16:28 | Established |
| 10.10.10.154 | node-to-node mesh | up    | 03:16:29 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

# vi calico.yaml,设置 nodeToNodeMeshEnabled 为 false
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: false
  asNumber: 65002

$ calicoctl apply -f calico.yaml

$ calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-----------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE |  SINCE   |    INFO     |
+--------------+-----------+-------+----------+-------------+
| 10.10.10.155 | global    | up    | 08:38:52 | Established |
+--------------+-----------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

# 关闭后无法访问 cluster IP 和 pod ip

$ curl -k https://10.68.108.110
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to 10.68.108.110:443 

$ ping 172.20.235.129
PING 172.20.235.129 (172.20.235.129) 56(84) bytes of data.
^C
--- 172.20.235.129 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1027ms

首先查看当前集群中的节点:

$ calicoctl get node -o wide
NAME      ASN       IPV4              IPV6   
master1   (65002)   10.10.10.150/24          
master2   (65002)   10.10.10.151/24          
master3   (65002)   10.10.10.152/24          
worker1   (65002)   10.10.10.153/24          
worker2   (65002)   10.10.10.154/24          

可以在集群中选择 1 个或多个节点作为 rr 节点,这里先选择节点:master1

#配置routeReflectorClusterID, 集群中未使用的 ip 即可
$ calicoctl patch node master1 -p '{"spec": {"bgp": {"routeReflectorClusterID": "244.0.0.1"}}}'

#配置node label
$ calicoctl patch node master1 -p '{"metadata": {"labels": {"route-reflector": "true"}}}'


$ calicoctl get node master1 -o yaml
apiVersion: projectcalico.org/v3
kind: Node
metadata:
  annotations:
    projectcalico.org/kube-labels: '{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"master1","kubernetes.io/os":"linux","kubernetes.io/role":"master"}'
  creationTimestamp: "2024-12-18T07:22:00Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: master1
    kubernetes.io/os: linux
    kubernetes.io/role: master
    route-reflector: "true"
  name: master1
  resourceVersion: "189487"
  uid: 618eb1ca-eef9-45c8-a899-fd6025380168
spec:
  bgp:
    ipv4Address: 10.10.10.150/24
    routeReflectorClusterID: 244.0.0.1
  orchRefs:
  - nodeName: master1
    orchestrator: k8s
status: {}

配置 BGP node 与 Route Reflector 的连接建立规则

$ cat << EOF | calicoctl create -f -
kind: BGPPeer
apiVersion: projectcalico.org/v3
metadata:
  name: peer-with-route-reflectors-1
spec:
  nodeSelector: all()
  peerSelector: route-reflector == 'true'
EOF

验证增加 rr 之后的bgp 连接情况

$ docker exec -it kubeasz ansible -i /etc/kubeasz/clusters/k8s-01/hosts all -m shell -a '/opt/kube/bin/calicoctl node status'

10.10.10.152 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.150 | node specific | up    | 05:12:53 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.10.10.151 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.150 | node specific | up    | 05:12:52 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.10.10.154 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.150 | node specific | up    | 05:12:52 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.10.10.153 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.150 | node specific | up    | 05:12:51 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.10.10.150 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.151 | node specific | up    | 05:12:51 | Established |
| 10.10.10.152 | node specific | up    | 05:12:53 | Established |
| 10.10.10.153 | node specific | up    | 05:12:51 | Established |
| 10.10.10.154 | node specific | up    | 05:12:51 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
  • 可以看到所有其他节点都与所选rr节点建立bgp连接。

再增加一个 rr 节点(略):步骤同上,添加成功后可以看到所有其他节点都与两个rr节点建立bgp连接,两个rr节点之间也建立bgp连接。对于节点数较多的K8S集群建议配置2-3个 RR 节点。

添加后集群主机又能够正常访问了:

$ curl -k https://10.68.108.110
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

$ ping 172.20.235.129
PING 172.20.235.129 (172.20.235.129) 56(84) bytes of data.
64 bytes from 172.20.235.129: icmp_seq=1 ttl=63 time=0.273 ms
64 bytes from 172.20.235.129: icmp_seq=2 ttl=63 time=0.210 ms
^C
--- 172.20.235.129 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1012ms
rtt min/avg/max/mdev = 0.210/0.241/0.273/0.031 ms

$ ip route
default via 10.10.10.2 dev ens32 proto dhcp src 10.10.10.150 metric 100 
10.10.10.0/24 dev ens32 proto kernel scope link src 10.10.10.150 metric 100 
10.10.10.2 dev ens32 proto dhcp scope link src 10.10.10.150 metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.20.136.0/26 via 10.10.10.152 dev ens32 proto bird 
blackhole 172.20.137.64/26 proto bird 
172.20.180.0/26 via 10.10.10.151 dev ens32 proto bird 
172.20.189.64/26 via 10.10.10.154 dev ens32 proto bird 
172.20.235.128/26 via 10.10.10.153 dev ens32 proto bird 

开启 Route Reflect 模式后,不建议使用 global BGP peer了,没有必要每个集群主机都去连接边界路由器

# vi BGPPeer.yaml 
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: global-router-155
spec:
  peerIP: 10.10.10.155
  asNumber: 65001

calicoctl delete -f BGPPeer.yaml


使用 node-specific BGP peer(or per-node BGP peer):

# vi nodeBGPPeer.yaml 
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: node-specific-router-155-for-master1
spec:
  peerIP: 10.10.10.155
  asNumber: 65001
  node: master1
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: node-specific-router-155-for-master2
spec:
  peerIP: 10.10.10.155
  asNumber: 65001
  node: master2
  • node 就是 rr 节点

应用:

calicoctl apply -f nodeBGPPeer.yaml

查看:

$ calicoctl get bgpPeer
NAME                                   PEERIP         NODE      ASN     
node-specific-router-155-for-master1   10.10.10.155   master1   65001   
node-specific-router-155-for-master2   10.10.10.155   master2   65001   
peer-with-route-reflectors-1                          all()     0       

$ docker exec -it kubeasz ansible -i /etc/kubeasz/clusters/k8s-01/hosts all -m shell -a '/opt/kube/bin/calicoctl node status'
10.10.10.151 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.150 | node specific | up    | 05:34:31 | Established |
| 10.10.10.152 | node specific | up    | 05:34:39 | Established |
| 10.10.10.153 | node specific | up    | 05:34:39 | Established |
| 10.10.10.154 | node specific | up    | 05:34:39 | Established |
| 10.10.10.155 | node specific | up    | 05:56:41 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.10.10.150 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.152 | node specific | up    | 05:12:53 | Established |
| 10.10.10.153 | node specific | up    | 05:12:51 | Established |
| 10.10.10.154 | node specific | up    | 05:12:51 | Established |
| 10.10.10.151 | node specific | up    | 05:34:31 | Established |
| 10.10.10.155 | node specific | up    | 05:56:41 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.10.10.153 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.150 | node specific | up    | 05:12:51 | Established |
| 10.10.10.151 | node specific | up    | 05:34:40 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.10.10.154 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.150 | node specific | up    | 05:12:51 | Established |
| 10.10.10.151 | node specific | up    | 05:34:39 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.10.10.152 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.10.10.150 | node specific | up    | 05:12:52 | Established |
| 10.10.10.151 | node specific | up    | 05:34:39 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
posted @ 2024-12-23 17:15  leffss  阅读(378)  评论(0)    收藏  举报