Docker Swarm 集群搭建

一、环境准备

服务器:4台     (均安装docker-ce)

系   统:Centos 7  (确定你是Centos7及以上版本)

内   存:2G      (至少)

内   核:1核

二、安装步骤

操作技巧:

使用Xshell,在终端右击鼠标勾选【发送键盘输入的所有会话】,所有终端机器都可同步执行!!

1.安装gcc相关环境(确保虚拟机可以上外网)

 yum -y install gcc gcc-c++

 2.卸载旧版本

yum remove docker \
                  docker-client \
                  docker-client-latest \
                  docker-common \
                  docker-latest \
                  docker-latest-logrotate \
                  docker-logrotate \
                  docker-engine

 3.安装需要的软件包

yum install -y yum-utils

 4.设置镜像仓库

yum-config-manager \
    --add-repo \
    https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

 5.更新yum软件包索引

yum makecache fast

 6.安装Docker CE

yum install -y docker-ce docker-ce-cli containerd.io

 7.启动Docker

systemctl start docker

 8.测试

docker version

docker run hello-world

docker images

docker ps -a

 9.卸载

systemctl stop docker

yum -y remove docker-ce docker-ce-cli containerd.io

rm -rf /var/lib/docker

 10.安装镜像加速器

sudo mkdir -p /etc/docker

sudo tee /etc/docker/daemon.json <<-'EOF'

{
  "registry-mirrors": ["https://registry.docker-cn.com"]
}

EOF

sudo systemctl daemon-reload

sudo systemctl restart docker

 三、Swarm 集群搭建

 官网文档地址:https://docs.docker.com/engine/swarm/how-swarm-mode-works/nodes/

(1) 环境要求:

♦ ♦ ♦  至少3台Manager 主节点(如果是2台,其中一台宕机,则集群无法使用,请看后面第四节具体实验)

(2) 集群运行简介:

1. 集群中为了两种工作节点,Manager(管理节点)和 worker(工作节点 )

2. Manager 节点间可互通,而 worker 节点间不可互通

3. Manager 节点可管理 worker工作节点,而worker不可管理Manager节点

4. 所有操作指令只能在Manager,worker 无法操作指令

 

 (3)搭建集群

docker-1 操作

1. 查看网络

docker network ls

 2. 查看Swarm 命令

[root@localhost ~]# docker swarm --help

Usage:  docker swarm COMMAND

Manage Swarm

Commands:
  ca          Display and rotate the root CA    # 显示并旋转根CA
  init        Initialize a swarm             # 初始化一个swarm集群
  join        Join a swarm as a node and/or manager  # 作为节点和/或管理者加入集群
  join-token  Manage join tokens          # 创建一个tokens令牌
  leave       Leave the swarm            # 离开swarm集群
  unlock      Unlock swarm                  # 解锁swarm
  unlock-key  Manage the unlock key         # 管理解锁钥匙
  update      Update the swarm             # 更新swarm集群

Run 'docker swarm COMMAND --help' for more information on a command.     

 3. 初始化集群

[root@localhost ~]# docker swarm init --help

Usage:  docker swarm init [OPTIONS]

Initialize a swarm

Options:
      --advertise-addr string                  Advertised address (format: <ip|interface>[:port])   # 播发地址,对外连接要怎么连接,(重点是这条命令)

网络分为:

私网(不需要走外网,访问速度快,耗时短)  

公网(需要走外网)

4. 查看服务器IP

[root@localhost ~]# ip addr
    inet 192.168.1.230/24 brd 192.168.1.255 scope global ens32

 5. 配置 Manager  主节点

[root@localhost ~]# docker swarm init --advertise-addr 192.168.1.230
Swarm initialized: current node (bbksnbwq0aq96ap3z6s1znk09) is now a manager.
# 当前节点(bbksnbwq0aq96ap3z6s1znk09)现在是一个管理器。 To add a worker to this swarm, run the following command:
# 要将工作进程添加到此群,请运行以下命令: docker swarm join --token SWMTKN-1-0dnpje2x2vv04k51oc76styuwoe9y2ddeonh3ie0zpi4naviwn-f1o9oknc8uqazyay6tb8d8oqf 192.168.1.230:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
# 要将管理器添加到此群,请运行“docker swarm join token manager”并按照说明进行操作。

6. 获取令牌:

只能在Manager上操作,worker无法操作

docker swarm join-token manager    # 生成一个manager令牌
docker swarm join-token worker     # 生成一个worker令牌

 

docker-2 操作:

1. 将docker-2 加入swarm 集群

[root@localhost ~]# docker swarm join --token SWMTKN-1-0dnpje2x2vv04k51oc76styuwoe9y2ddeonh3ie0zpi4naviwn-f1o9oknc8uqazyay6tb8d8oqf 192.168.1.230:2377

报错 :

无报错,可忽略!

报错问题:

Error response from daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.1.230:2377: connect: no route to host"

 原因 :

这个错误是因为将node节点加入swarm中导致的,原因就是manager节点这台机器上的防火墙没有关闭。

 解决办法 :

♦ 关闭Manger及worker服务器上的防火墙

(1) 查看manage节点机器上防火墙状态

systemctl status firewalld.service

(2) 停止防火墙

systemctl stop firewalld.service

(3) 永久关闭防火墙

systemctl disable firewalld.service

 2.再次执行加入集群命令

[root@localhost ~]# docker swarm join --token SWMTKN-1-0dnpje2x2vv04k51oc76styuwoe9y2ddeonh3ie0zpi4naviwn-f1o9oknc8uqazyay6tb8d8oqf 192.168.1.230:2377
This node joined a swarm as a worker.   
# 此节点作为工作节点加入集群 。

 docker-1 操作:

 1.查看节点状态

[root@localhost ~]# docker node ls
ID                            HOSTNAME                STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
65ohc7a6u28qlanpb2tygpd8k     localhost.localdomain   Ready     Active                          20.10.5    
bbksnbwq0aq96ap3z6s1znk09 *   localhost.localdomain   Ready     Active         Leader           20.10.5

其中MANAGER显示:

Leader 为Manager节点,worker节点为空则为工作节点

2.在docker-1上生成一个worker令牌(和初始化时的令牌一致)

[root@localhost ~]# docker swarm join-token worker
To add a worker to this swarm, run the following command:  # 要将工作进程添加到此群,请运行以下命令: 

docker swarm join --token SWMTKN-1-0dnpje2x2vv04k51oc76styuwoe9y2ddeonh3ie0zpi4naviwn-f1o9oknc8uqazyay6tb8d8oqf 192.168.1.230:2377

 docker-3 操作:

 将docker-3 加入swarm 集群

[root@localhost ~]# docker swarm join --token SWMTKN-1-0dnpje2x2vv04k51oc76styuwoe9y2ddeonh3ie0zpi4naviwn-f1o9oknc8uqazyay6tb8d8oqf 192.168.1.230:2377
This node joined a swarm as a worker.

 docker-1 操作:

 1. 查看节点状态

[root@localhost ~]# docker node ls
ID                            HOSTNAME                STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
2giiey95hal5fv2rgfmx3souu     localhost.localdomain   Ready     Active                          20.10.5    # docker-3
65ohc7a6u28qlanpb2tygpd8k     localhost.localdomain   Ready     Active                          20.10.5    # docker-2
bbksnbwq0aq96ap3z6s1znk09 *   localhost.localdomain   Ready     Active         Leader           20.10.5    # dcoker-1

 2.生一个Manager 令牌

[root@localhost ~]# docker swarm join-token manager
To add a manager to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-0dnpje2x2vv04k51oc76styuwoe9y2ddeonh3ie0zpi4naviwn-coihmyn50k2y7d156xq0l87m3 192.168.1.230:2377

 docker-4 操作:

将docker-4 加入swarm 集群,作为Manager管理节点

[root@localhost ~]# docker swarm join --token SWMTKN-1-0dnpje2x2vv04k51oc76styuwoe9y2ddeonh3ie0zpi4naviwn-coihmyn50k2y7d156xq0l87m3 192.168.1.230:2377
This node joined a swarm as a manager.    # 这个节点作为管理者加入了一个集群

 docker-1 操作:

 1. 查看节点状态

[root@localhost ~]# docker node ls
ID                            HOSTNAME                STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
2giiey95hal5fv2rgfmx3souu     localhost.localdomain   Ready     Active                          20.10.5  # docker-3
65ohc7a6u28qlanpb2tygpd8k     localhost.localdomain   Ready     Active                          20.10.5  # docker-2  
bbksnbwq0aq96ap3z6s1znk09 *   localhost.localdomain   Ready     Active         Leader           20.10.5  # docker-1
ww14ifp4ki9gyw1gdhvzgbzff     localhost.localdomain   Ready     Active         Reachable        20.10.5  # docker-4

其中MANAGER显示:

Leader 为Manager节点,Reachable为Manager主节点的从节点,如果Leader节点宕机,则Reachable立即切换为Leader主节点

worker节点为空则为工作节点

 

至此swarm集群搭建完成!但是双主双从是没有意义的,如果其中1台Manager宕机,则集群无法运行,所以需要三个主节点才可以!!!

四、Rfat一致性协议

Rfat协议:

保证大多数节点存活才可以使用。只要大于1台,集群至少大于3台。

1 .问题:

目前有2台主节点,假如其中1台Manager主节点宕机,集群是否还可用?

2. 实验:

(1)将docker-1机器停止,就相当于宕机,目前配置的是双主双从,另外一个主节点也不能使用了!

docker-1 操作:

停止 docker

[root@localhost ~]# systemctl stop docker

 docker-4 操作:

查看节点状态

[root@localhost ~]# docker node ls
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
# 有可能在线的manager太少了。现在只有一个是不行的!

docker-1 操作:

(1) 启动docker-1

[root@localhost ~]# systemctl start docker

 (2) 查看节点状态

[root@localhost ~]# docker node ls
ID                            HOSTNAME                STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
2giiey95hal5fv2rgfmx3souu     localhost.localdomain   Unknown   Active                          20.10.5
65ohc7a6u28qlanpb2tygpd8k     localhost.localdomain   Ready     Active                          20.10.5
bbksnbwq0aq96ap3z6s1znk09 *   localhost.localdomain   Ready     Active         Reachable        20.10.5
ww14ifp4ki9gyw1gdhvzgbzff     localhost.localdomain   Ready     Active         Leader           20.10.5

 此上可见,docker-1变成了Reachable,而docker-4变成了Leader

 docker-3 操作:

将docker-3移出集群,重新加入到集群

[root@localhost ~]# docker swarm leave
Node left the swarm.

docker-1 操作:

查看节点状态

[root@localhost ~]# docker node ls
ID                            HOSTNAME                STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
2giiey95hal5fv2rgfmx3souu     localhost.localdomain   Down      Active                          20.10.5
65ohc7a6u28qlanpb2tygpd8k     localhost.localdomain   Ready     Active                          20.10.5
bbksnbwq0aq96ap3z6s1znk09 *   localhost.localdomain   Ready     Active         Reachable        20.10.5
ww14ifp4ki9gyw1gdhvzgbzff     localhost.localdomain   Ready     Active         Leader           20.10.5

 以上,docker-3 已经显示为Down

3.将移出的docker-3 加入集群,作为主节点

docker-1 操作:

(1)在docker-1上生成一个manager令牌

[root@localhost ~]# docker swarm join-token manager
To add a manager to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-0dnpje2x2vv04k51oc76styuwoe9y2ddeonh3ie0zpi4naviwn-coihmyn50k2y7d156xq0l87m3 192.168.1.230:2377

docker-3 操作:

(2) 将生成的令牌在docker-3 上执行

[root@localhost ~]# docker node ls
ID                            HOSTNAME                STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
2giiey95hal5fv2rgfmx3souu     localhost.localdomain   Down      Active                          20.10.5
65ohc7a6u28qlanpb2tygpd8k     localhost.localdomain   Ready     Active                          20.10.5
bbksnbwq0aq96ap3z6s1znk09 *   localhost.localdomain   Ready     Active         Reachable        20.10.5
ww14ifp4ki9gyw1gdhvzgbzff     localhost.localdomain   Ready     Active         Leader           20.10.5
xttcuf3iu5cvirxwwiydi3808     localhost.localdomain   Ready     Active         Reachable        20.10.5  # dcoker-3

  此上可见,docker-3也变成了Reachable

4. 将docker-1停止,是否还可以使用呢?

docker-1 操作:

停止docker-1

[root@localhost ~]# systemctl stop docker

 dcoker-3 操作:

[root@localhost ~]# docker node ls
ID                            HOSTNAME                STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
2giiey95hal5fv2rgfmx3souu     localhost.localdomain   Down      Active                          20.10.5
65ohc7a6u28qlanpb2tygpd8k     localhost.localdomain   Ready     Active                          20.10.5
bbksnbwq0aq96ap3z6s1znk09     localhost.localdomain   Down      Active         Unreachable      20.10.5
ww14ifp4ki9gyw1gdhvzgbzff     localhost.localdomain   Ready     Active         Leader           20.10.5
xttcuf3iu5cvirxwwiydi3808 *   localhost.localdomain   Ready     Active         Reachable        20.10.5

  以上,docker-1 已经显示为Unreachable,表示无法到达,也证明了我们的集群还可以正常使用,就达到了高可用!

如果再停止了docker-3,则集群无法使用!

 

posted @ 2021-03-03 16:06  西瓜君~  阅读(497)  评论(0编辑  收藏  举报