docker swarm ingress网络

DOCKER INGRESS 介绍

# docker ingress官网介绍:https://docs.docker.com/engine/swarm/ingress/

如docker官网所述,swarm 模式下使用ingress routing mesh 路由,可以实现服务在一个节点发布后,访问swarm任意节点地址都可以访问到该服务,即使该node节点没有该服务副本在运行。

环境验证

验证环境我们使用3个节点构建一个一主两从的docker swarm集群:
PS:请使用相同版本docker,且3台主机的操作系统及内核版本要求一致。

# 节点信息:
# worker-1: 192.168.100.228
# worker-2:192.168.100.234
# leader:192.168.100.253

[root@253 ~]# docker node ls
ID                            HOSTNAME   STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
wll4l9u5sj9xyon5u1bvq8wth     228        Ready     Active                          20.10.12
t1ll9hzipxjms5mt14kxx7u3o     234        Ready     Active                          20.10.12
pkv93oel5hesk9bn22uyit9rz *   253        Ready     Active         Leader           20.10.12

使用whoami镜像部署一个单副本的Service,访问该Service会返回Service所在容器机名及IP地址。
# 镜像地址:docker pull containous/whoami:1.5.0

部署whoami服务

# 在leader节点253执行
[root@253 ~]# docker service create --name whoami --replicas 1 -p 8080:80 hub.dehuinet.com:58443/middleware/whoami:v1.5.0
mnauoiowxg541iw0fenhqwemq
overall progress: 1 out of 1 tasks
1/1: running   [==================================================>]
verify: Service converged

查看 whoami service 的状态及所在节点

# 在leader节点253执行
[root@253 ~]# docker service ps whoami
ID             NAME       IMAGE                                             NODE      DESIRED STATE   CURRENT STATE           ERROR     PORTS
qmpt38h7rk36   whoami.1   hub.dehuinet.com:58443/middleware/whoami:v1.5.0   253       Running         Running 2 minutes ago
# 从命令输出结果来看,service服务被分配到253 leader节点

# 在leader节点253执行
[root@253 ~]# docker ps
CONTAINER ID   IMAGE                                             COMMAND        CREATED         STATUS         PORTS                                                           NAMES
07e821c9674b   hub.dehuinet.com:58443/middleware/whoami:v1.5.0   "/whoami"      3 minutes ago   Up 3 minutes   80/tcp                                                          whoami.1.qmpt38h7rk36evcywsm5pvaft

使用浏览器访问253节点8080端口,返回信息如下:

浏览器访问234节点8080端口:

浏览器访问228节点8080端口:

我们发现正如docke官网所讲,访问集群任意节点的8080端口,都可以访问到服务,那么其大概实现原理是什么?

DOCKER INGRESS 原理

请求在本地网卡接口

先看一下各节点所在服务器的iptables表。
流量首先经过本地网卡 ens192:

[root@253 docker]# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 11740 packets, 708K bytes)
 pkts bytes target     prot opt in     out     source               destination
1542K   93M DOCKER-INGRESS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL
  20M 1219M DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 11740 packets, 708K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 708 packets, 42669 bytes)
 pkts bytes target     prot opt in     out     source               destination
    3   180 DOCKER-INGRESS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL
    1    60 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 708 packets, 42669 bytes)
 pkts bytes target     prot opt in     out     source               destination
    3   180 MASQUERADE  all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match src-type LOCAL
  399 28723 MASQUERADE  all  --  *      !docker_gwbridge  192.168.0.0/20       0.0.0.0/0
 114K 6913K MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0
    0     0 MASQUERADE  tcp  --  *      *       172.17.0.7           172.17.0.7           tcp dpt:9000

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
    8   480 RETURN     all  --  docker_gwbridge *       0.0.0.0/0            0.0.0.0/0
    2   120 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0
  167 10020 DNAT       tcp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:9000 to:172.17.0.7:9000

Chain DOCKER-INGRESS (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 to:192.168.0.2:8080
1542K   93M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

经NAT表PREROUTING链,将其转到NAT表的INGRESS链,匹配到目的端口8080后,将其做DNAT转发,DNAT后请求地址变为:192.168.0.2:8080,使用“ifconfig/ip a”命令在服务器上没找到有这个网段的网卡,那么这个“192.168.0.2”地址是哪里来的呢?先继续往下看
PREROUTING确认是需要转发后,继续匹配FORWARD链

[root@253 docker]# iptables -nvL
Chain INPUT (policy ACCEPT 19540 packets, 1497K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
 946K 1192M DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0
 946K 1192M DOCKER-INGRESS  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3301K 4111M DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0
27686   95M ACCEPT     all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 DOCKER     all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0
26943 1583K ACCEPT     all  --  docker_gwbridge !docker_gwbridge  0.0.0.0/0            0.0.0.0/0
  12M   18G ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
  230 13848 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0
2026K 3654M ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
   19  1188 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0
    0     0 DROP       all  --  docker_gwbridge docker_gwbridge  0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT 18203 packets, 1201K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
  167 10020 ACCEPT     tcp  --  !docker0 docker0  0.0.0.0/0            172.17.0.7           tcp dpt:9000

Chain DOCKER-INGRESS (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED tcp spt:8080
2134K 2679M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
 pkts bytes target     prot opt in     out     source               destination
26943 1583K DOCKER-ISOLATION-STAGE-2  all  --  docker_gwbridge !docker_gwbridge  0.0.0.0/0            0.0.0.0/0
2026K 3654M DOCKER-ISOLATION-STAGE-2  all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
  14M   21G RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
 pkts bytes target     prot opt in     out     source               destination
   14   728 DROP       all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0
    0     0 DROP       all  --  *      docker0  0.0.0.0/0            0.0.0.0/0
2053K 3656M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain DOCKER-USER (1 references)
 pkts bytes target     prot opt in     out     source               destination
  14M   21G RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

FORWARD链中匹配后只有一个放行动作,则流量继续匹配POSTROUTING链

[root@253 docker]# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 11740 packets, 708K bytes)
 pkts bytes target     prot opt in     out     source               destination
1542K   93M DOCKER-INGRESS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL
  20M 1219M DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 11740 packets, 708K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 708 packets, 42669 bytes)
 pkts bytes target     prot opt in     out     source               destination
    3   180 DOCKER-INGRESS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL
    1    60 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 708 packets, 42669 bytes)
 pkts bytes target     prot opt in     out     source               destination
    3   180 MASQUERADE  all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match src-type LOCAL
  399 28723 MASQUERADE  all  --  *      !docker_gwbridge  192.168.0.0/20       0.0.0.0/0
 114K 6913K MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0
    0     0 MASQUERADE  tcp  --  *      *       172.17.0.7           172.17.0.7           tcp dpt:9000

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
    8   480 RETURN     all  --  docker_gwbridge *       0.0.0.0/0            0.0.0.0/0
    2   120 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0
  167 10020 DNAT       tcp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:9000 to:172.17.0.7:9000

Chain DOCKER-INGRESS (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 to:192.168.0.2:8080
1542K   93M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

在POSTROUTING链中MASQUERADE修改了请求源地址为本机地址192.168.100.253,此时流量的请求头变为:
src:192.168.100.253
dst:192.168.0.2
此时,在看这个192.168.0.2地址,其实他是docker在初始化swarm集群时自动创建的一个网桥,名字叫docker_gwbridge 执行如下命令查看:

[root@253 docker]# docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
99806da37661   bridge            bridge    local
e8fdcfd2b3f0   docker_gwbridge   bridge    local
b68f7bfff223   host              host      local
n3dex97iv1gs   ingress           overlay   swarm
7e38e2b5d547   none              null      local

# 查看docker_gwbridge的详细信息
[root@253 docker]# docker network inspect docker_gwbridge
[
    {
        "Name": "docker_gwbridge",
        "Id": "e8fdcfd2b3f03ae9ecc7f4548df5f2629ed1d0a52ef050e510d2c928221bc78a",
        "Created": "2025-01-14T16:09:16.75781064+08:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "192.168.0.0/20",
                    "Gateway": "192.168.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "c482097de23a1296adfb2f4bde725309ad87137b53d32f2362375a50faf8cf8c": {
                "Name": "gateway_b19d36310a0c",
                "EndpointID": "dcb1d4c82ef30d7d0580feec09af643bcae39b07dae6b290d43910fdf575156e",
                "MacAddress": "02:42:c0:a8:00:03",
                "IPv4Address": "192.168.0.3/20",
                "IPv6Address": ""
            },
            "ingress-sbox": {
                "Name": "gateway_ingress-sbox",
                "EndpointID": "8fee04df9694740d19a7582e66c026b867887f5d2070c3f49c163a5a6fa604bc",
                "MacAddress": "02:42:c0:a8:00:02",
                "IPv4Address": "192.168.0.2/20",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.bridge.enable_icc": "false",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.name": "docker_gwbridge"
        },
        "Labels": {}
    }
]

从输出结果看,总结信息如下:
docker_gwbridge 所属网段是192.168.0.0/20
docker_gwbridge 网关是192.168.0.1
docker_gwbridge 关联了两个容器:
whoami(c482097de23a1296adfb2f4bde725309ad87137b53d32f2362375a50faf8cf8c)
ingress-sbox

这个ingress-sbox并非是一个真实的容器,而是docker创建的一个网络命名空间(network namespace)。而192.168.0.2这个地址就是ingress-sbox中的接口地址,执行命令确认下:

[root@253 netns]# nsenter --net="/run/docker/netns/ingress_sbox" ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2504: eth0@if2505: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:00:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.2/24 brd 10.0.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.0.0.97/32 scope global eth0
       valid_lft forever preferred_lft forever
2506: eth1@if2507: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:c0:a8:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 192.168.0.2/20 brd 192.168.15.255 scope global eth1
       valid_lft forever preferred_lft forever

我们看到ingress_sbox命名空间内有除lo回环接口外,还有eth0和eth1两个接口地址,其中eth1的接口地址正是流量从本机ens192网卡出站后目的地址。那我们看一下ingress_box命名空间的策略:

[root@253 netns]# nsenter --net=/run/docker/netns/ingress_sbox iptables -t mangle -nvL
Chain PREROUTING (policy ACCEPT 67 packets, 7047 bytes)
 pkts bytes target     prot opt in     out     source               destination
   36  3380 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 MARK set 0x205a

Chain INPUT (policy ACCEPT 36 packets, 3380 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.0.0.97            MARK set 0x205a

Chain FORWARD (policy ACCEPT 31 packets, 3667 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 36 packets, 3380 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain POSTROUTING (policy ACCEPT 67 packets, 7047 bytes)
 pkts bytes target     prot opt in     out     source               destination

在PREROUTING链中,使用MARK给匹配到的流量标记了0x205a(转成十进制后为8282),这个标记可以用于后续的路由或防火墙规则中,以对这些包进行特殊处理。有了这个标记后,内核在后续路由时会捕获这个标记,并进行转发,确认下:

[root@253 netns]# nsenter --net=/run/docker/netns/ingress_sbox ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  8282 rr
  -> 10.0.0.98:0                  Masq    1      0          0

内核捕获8282标记的流量后,会对流量进行再次进行NAT转发(Masq),目的地址是10.0.0.98,即容器在ingress网络内的地址。

[root@253 netns]# docker network inspect ingress
[
    {
        "Name": "ingress",
        "Id": "n3dex97iv1gs5ugdluoc34dxi",
        "Created": "2025-01-15T17:01:36.92041533+08:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": true,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "c482097de23a1296adfb2f4bde725309ad87137b53d32f2362375a50faf8cf8c": {
                "Name": "whoami.1.oqfzi4mpkxgneq76zd5stkitx",
                "EndpointID": "fe5a7fc1cff508a3ec30cc486b6f2f956e9ad5cb51c11ae6a89501bc37714e3d",
                "MacAddress": "02:42:0a:00:00:62",
                "IPv4Address": "10.0.0.98/24",
                "IPv6Address": ""
            },
            "ingress-sbox": {
                "Name": "ingress-endpoint",
                "EndpointID": "bbe2023c0e3b4069a6e4bc06c51647fb4db962734a4584b94fd52c16ea6935a2",
                "MacAddress": "02:42:0a:00:00:02",
                "IPv4Address": "10.0.0.2/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4096"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "4cc45fe7cbbe",
                "IP": "192.168.100.253"
            },
            {
                "Name": "05c5717c0238",
                "IP": "192.168.100.234"
            },
            {
                "Name": "2182e20600db",
                "IP": "192.168.100.228"
            }
        ]
    }
]

查看ingress_sbox的arp表,可以找到10.0.0.98的arp地址

[root@253 netns]# nsenter --net=/run/docker/netns/ingress_sbox arp -a
? (10.0.0.37) at 02:42:0a:00:00:25 [ether] on eth0
? (10.0.0.47) at 02:42:0a:00:00:2f [ether] on eth0
? (10.0.0.8) at 02:42:0a:00:00:08 [ether] on eth0
? (10.0.0.155) at 02:42:0a:00:00:9b [ether] on eth0
? (10.0.0.154) at 02:42:0a:00:00:9a [ether] on eth0
? (10.0.0.139) at 02:42:0a:00:00:8b [ether] on eth0
? (10.0.0.96) at 02:42:0a:00:00:60 [ether] on eth0
? (10.0.0.98) at 02:42:0a:00:00:62 [ether] on eth0

总结

posted @ 2025-01-17 17:03  Linux小飞象  阅读(238)  评论(0)    收藏  举报