ansible roles 案例2 二进制部署redis cluster

 集群规划

此role 模拟3台节点,每个节点运行两个redis,共计6节点。

仓库地址

github: 

目录结构

创建redis_cluster roles

ansible-galaxy init redis_cluster
roles/redis_cluster/
├── defaults
│   └── main.yml
├── files
│   ├── redis-6.2.11     # 编译后的文件,此role需要提前编译
│   ├── redis-6.2.11.tar.gz
│   ├── redis-benchmark     # 下面是编译好的src目录下的二进制执行命令文件
│   ├── redis-check-aof
│   ├── redis-check-rdb
│   ├── redis-cli
│   ├── redis-server
│   └── redis-trib.rb
├── handlers
│   └── main.yml
├── hosts
│   └── inventory.yaml
├── meta
│   └── main.yml
├── README.md
├── tasks
│   ├── add_user.yml
│   ├── check_cluster.yml
│   ├── check_reids.yml
│   ├── chown_dir.yml
│   ├── copy_bin.yml
│   ├── create_cluster.yml
│   ├── create_redis_dir.yml
│   ├── main.yml
│   ├── rsync_dir.yml
│   ├── set_sysctl.yml
│   ├── start_redis.yml
│   ├── template_redis_conf.yml
│   └── template_redis_service.yml
├── templates
│   ├── redis.conf.j2      # redis配置文件
│   └── redis.service.j2   # redis启动文件
├── tests
│   ├── inventory
│   └── test.yml
└── vars
    └── main.yml 

静态文件

编译redis:略

主机清单

[root@master-1 roles]# cat roles/redis_cluster/hosts/inventory.yaml
all:
  hosts: {}
  children:
    redis_cluster:
      vars:
        base_port: 6379
        redis_data_base_dir: /opt/redis_cluster
        cluster_node: "192.168.43.129:6379 192.168.43.129:6380 192.168.43.130:6381 192.168.43.130:6382 192.168.43.131:6383 192.168.43.131:6384"
      hosts:
        node1:
          ansible_host: 192.168.43.129
          index: 0    # 索引,后面可以用base_port变量与index来动态生成一些配置
          create_node: true   # 创建集群与检查集群的判断配置
        node2:
          ansible_host: 192.168.43.129
          index: 1
        node3:
          ansible_host: 192.168.43.130
          index: 2
        node4:
          ansible_host: 192.168.43.130
          index: 3
        node5:
          ansible_host: 192.168.43.131
          index: 4
        node6:
          ansible_host: 192.168.43.131
          index: 5 

变量设置

[root@master-1 roles]# cat roles/redis_cluster/vars/main.yml
---
# vars file for redis-_cluster
password: 123456
rewrite_percentage: 25   # aof文件压缩比
cluster_enabled: 'yes'
listen_port: "{{ base_port + index }}"    # 动态生成redis listen port
data_dir: "{{ redis_data_base_dir }}/{{ base_port + index }}"    # 动态生成redis家目录
maxmemory_policy: volatile-lru   # 淘汰策略

渲染模板

 redis配置文件

[root@master-1 roles]# cat roles/redis_cluster/templates/redis.conf.j2
bind 0.0.0.0
protected-mode yes
port {{ listen_port }}   # 动态生成,因为每个节点的index不同,当执行到本节点,取index变量,就可以动态生成port
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
pidfile {{ data_dir }}/redis_{{ listen_port }}.pid
loglevel notice
logfile {{ data_dir }}/logs   # 这个也是一样
databases 16
always-show-logo no
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
save 3600 1
save 300 100
save 60 10000
stop-writes-on-bgsave-error no
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
rdb-del-sync-files no
dir {{ data_dir }}/data
masterauth {{ password }}
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 30
repl-diskless-load disabled
repl-ping-replica-period 10
repl-timeout 60
repl-disable-tcp-nodelay no
repl-backlog-size 512mb
repl-backlog-ttl 3600
replica-priority 100
acllog-max-len 128
requirepass  {{ password }}
maxclients 10000
maxmemory {{ ansible_memtotal_mb // 2 }}mb   # 获取主机信息后,动态设置redis maxmemory
maxmemory-policy {{ maxmemory_policy }}
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage {{ rewrite_percentage }}
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
cluster-enabled {{cluster_enabled }}
cluster-config-file nodes-{{ listen_port }}.conf   # 集群配置文件
cluster-node-timeout 15000
cluster-replica-validity-factor 10
cluster-migration-barrier 1
cluster-require-full-coverage no
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes

redis启动文件

[root@master-1 roles]# cat roles/redis_cluster/templates/redis.service.j2
[Unit]
Description=Redis In-Memory Data Store
After=network.target

[Service]
Type=forking
ExecStart=/usr/local/sbin/redis-server  {{ data_dir }}/redis.conf --supervised systemd   # 根据redis 不同家目录的配置文件启动redis
ExecStop=/usr/local/sbin/redis-cli -p {{ listen_port }}  shutdown  # 根据端口生成配置文件
Restart=always
User=redis   # redis 启动用户
Group=redis
LimitNOFILE=65535
LimitNPROC=65535

# 这指定了一个运行时目录,用于存储服务运行时的临时文件。这个目录在服务停止时会被清理。
# 这设置了运行时目录的权限。0755意味着目录的所有者可以读、写、执行,而组用户和其他用户只能读和执行。
# RuntimeDirectory=redis
# RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target

task任务

 main文件

[root@master-1 tasks]# cat main.yml
---
# tasks file for redis-_cluster
- include: set_sysctl.yml
- include: add_user.yml
- include: create_redis_dir.yml
- include: rsync_dir.yml
- include: template_redis_conf.yml
- include: template_redis_service.yml
- include: copy_bin.yml
- include: chown_dir.yml
- include: start_redis.yml
- include: check_reids.yml
- include: create_cluster.yml
- include: check_cluster.yml

创建redis家目录

[root@master-1 tasks]# cat create_redis_dir.yml
- name: create redis dir
  ansible.builtin.file:
    path: "{{ data_dir }}"
    state: directory

传输redis文件到redis 不同家目录

[root@master-1 tasks]# cat rsync_dir.yml
- name: copy redis dir
  #ansible.builtin.copy:
    # 如果 path 是目录,则递归复制。在这种情况下,如果 path 以“/”结尾,则只有该目录的内部内容才会复制到目标。否则,如果它不以“/”结尾,则复制目录本身及其所有内容。
   # src: redis-6.2.11/
    # 如果 `dest' 是不存在的路径,并且 `dest' 以“/”结尾或 `src' 是目录,则创建 `dest'。
    # 如果 `src' 和 `dest' 都是文件,则不会创建 `dest' 的父目录,如果该目录尚不存在,则任务失败。
   # dest: "{{ data_dir }}/"
  ansible.builtin.synchronize:
# /表示文件目录下所有文件,不加/表示目录本身 src: redis
-6.2.11/ dest: "{{ data_dir }}/"
# 报错时间,权限,符号,链接不变 archive: yes
# 压缩,加速传输 compress: yes owner: yes group: yes
# 保留权限不变 perms: yes
# 是否删除源文件 delete: no
# 这个配置是让两台主机之间同步数据,而不是在ansible本机 #delegate_to:
"{{ inventory_hostname }}"

 拷贝redis bin文件

[root@master-1 tasks]# cat copy_bin.yml
- name: copy redis bin
  ansible.builtin.copy:
    src: "{{ item }}"
    dest: "/usr/local/sbin/{{ item }}"
    mode: 0551
    force: true
  loop:
    - redis-cli
    - redis-check-aof
    - redis-check-rdb
    - redis-server
    - redis-trib.rb

 启动redis

[root@master-1 tasks]# cat start_redis.yml
- name: start redis
  ansible.builtin.systemd:
# 在执行到node时,获取到当前node节点的port,并启动 name: redis
-{{listen_port}} daemon_reload: yes state: started

 检查redis进程及端口

[root@master-1 tasks]# cat check_reids.yml
- name: check redis port & process
  ansible.builtin.shell: |
    ps -C redis-server --no-headers |wc -l   # 统计进程数量
  register: redis_process_count
  failed_when: redis_process_count.stdout | int != 2  # 转为数字当不等于2
  changed_when: false   # 此操作不认为发生改变

创建集群

[root@master-1 tasks]# cat create_cluster.yml
- name: create redis cluster
  ansible.builtin.shell: |
    echo "yes" |redis-cli -a {{ password }} --cluster create {{ cluster_node }} --cluster-replicas 1
sleep 5 # 这个可以避免集群刚创建成功就去检测,就会检测不通过,等初始化完成 when: create_node is defined and create_node # 只在打了该标签的节点执行

检查集群状态

[root@master-1 tasks]# cat check_cluster.yml
- name: check cluster status
  ansible.builtin.shell: |
   redis-cli  -a {{ password }}  -h {{ ansible_host }} -p {{ listen_port }} cluster info   # 输出集群信息
  register: cluster_status
  when: create_node is defined and create_node   # 只在打了该标签的节点运行
  changed_when: false
  failed_when: cluster_status.rc != 0 and 'cluster_state:ok' not in cluster_status.stdout   # 当cluster info执行结果返回不为0 和该字段不在输出结果

  - name: output resister
    ansible.builtin.debug:
      var: cluster_status   # 输出变量所有结果,也可以用cluster_status.stdout,cluster_status.stderr等字段。

- name: output msg
  ansible.builtin.debug:
    msg: "集群状态检查成功..."
  when: create_node is defined and create_node and cluster_status.rc == 0

执行role

[root@master-1 roles]# ansible-playbook -i roles/redis_cluster/hosts/inventory.yaml main_redis_all.yml  --list-hosts

playbook: main_redis_all.yml

  play #1 (redis_cluster): redis_cluster        TAGS: []
    pattern: [u'redis_cluster']
    hosts (6):
      node1
      node3
      node2
      node5
      node4
      node6

[root@master-1 roles]# ansible-playbook -i roles/redis_cluster/hosts/inventory.yaml main_redis_all.yml 

PLAY [redis_cluster] **********************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************************
ok: [node1]
ok: [node2]
ok: [node5]
ok: [node4]
ok: [node3]
ok: [node6]

TASK [redis_cluster : set vm.overcommit_memory] *******************************************************************************************************************************
ok: [node1]
ok: [node4]
ok: [node3]
ok: [node2]
ok: [node5]
ok: [node6]

TASK [redis_cluster : add redis user] *****************************************************************************************************************************************
ok: [node3]
ok: [node4]
ok: [node5]
ok: [node1]
ok: [node2]
ok: [node6]

TASK [redis_cluster : create redis dir] ***************************************************************************************************************************************
ok: [node5]
ok: [node3]
ok: [node4]
ok: [node1]
ok: [node2]
ok: [node6]

TASK [redis_cluster : copy redis dir] *****************************************************************************************************************************************
changed: [node5]
changed: [node2]
changed: [node1]
changed: [node4]
changed: [node3]
changed: [node6]

TASK [redis_cluster : 设置 Redis 配置] ********************************************************************************************************************************************
changed: [node5]
changed: [node4]
changed: [node2]
changed: [node1]
changed: [node3]
changed: [node6]

TASK [redis_cluster : 设置 Redis 启动配置] ******************************************************************************************************************************************
ok: [node1]
ok: [node5]
ok: [node2]
ok: [node4]
ok: [node3]
ok: [node6]

TASK [redis_cluster : chown redis_dir] ****************************************************************************************************************************************
changed: [node3]
changed: [node1]
ok: [node2]
ok: [node4]
changed: [node5]
ok: [node6]

TASK [redis_cluster : start redis] ********************************************************************************************************************************************
changed: [node5]
changed: [node3]
changed: [node4]
changed: [node2]
changed: [node1]
changed: [node6]

TASK [redis_cluster : check redis port & process] *****************************************************************************************************************************
ok: [node5]
ok: [node4]
ok: [node3]
ok: [node1]
ok: [node2]
ok: [node6]

TASK [redis_cluster : create redis cluster] ***********************************************************************************************************************************
skipping: [node3]
skipping: [node2]
skipping: [node5]
skipping: [node4]
skipping: [node6]
changed: [node1]

TASK [redis_cluster : output cluster info] ************************************************************************************************************************************
skipping: [node3]
skipping: [node2]
skipping: [node5]
skipping: [node4]
skipping: [node6]
ok: [node1]     # 可以看到判断标签条件已生效,其他节点被跳过了

TASK [redis_cluster : output resister] ****************************************************************************************************************************************
ok: [node1] => {
    "cluster_status": {
        "changed": false,
        "cmd": "redis-cli  -a 123456  -h 192.168.43.129 -p 6379 cluster info\n",
        "delta": "0:00:00.063657",
        "end": "2025-02-25 14:30:36.592194",
        "failed": false,
        "failed_when_result": false,
        "rc": 0,
        "start": "2025-02-25 14:30:36.528537",
        "stderr": "Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.",
        "stderr_lines": [
            "Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe."
        ],
        "stdout": "cluster_state:ok\r\ncluster_slots_assigned:16384\r\ncluster_slots_ok:16384\r\ncluster_slots_pfail:0\r\ncluster_slots_fail:0\r\ncluster_known_nodes:6\r\ncluster_size:3\r\ncluster_current_epoch:6\r\ncluster_my_epoch:1\r\ncluster_stats_messages_ping_sent:14\r\ncluster_stats_messages_pong_sent:15\r\ncluster_stats_messages_sent:29\r\ncluster_stats_messages_ping_received:10\r\ncluster_stats_messages_pong_received:14\r\ncluster_stats_messages_meet_received:5\r\ncluster_stats_messages_received:29",
        "stdout_lines": [
            "cluster_state:ok",
            "cluster_slots_assigned:16384",
            "cluster_slots_ok:16384",
            "cluster_slots_pfail:0",
            "cluster_slots_fail:0",
            "cluster_known_nodes:6",
            "cluster_size:3",
            "cluster_current_epoch:6",
            "cluster_my_epoch:1",
            "cluster_stats_messages_ping_sent:14",
            "cluster_stats_messages_pong_sent:15",
            "cluster_stats_messages_sent:29",
            "cluster_stats_messages_ping_received:10",
            "cluster_stats_messages_pong_received:14",
            "cluster_stats_messages_meet_received:5",
            "cluster_stats_messages_received:29"
        ]
    }
}
ok: [node3] => {
    "cluster_status": {
        "changed": false,
        "skip_reason": "Conditional result was False",
        "skipped": true
    }
}
ok: [node2] => {
    "cluster_status": {
        "changed": false,
        "skip_reason": "Conditional result was False",
        "skipped": true
    }
}
ok: [node5] => {
    "cluster_status": {
        "changed": false,
        "skip_reason": "Conditional result was False",
        "skipped": true
    }
}
ok: [node4] => {
    "cluster_status": {
        "changed": false,
        "skip_reason": "Conditional result was False",
        "skipped": true
    }
}
ok: [node6] => {
    "cluster_status": {
        "changed": false,
        "skip_reason": "Conditional result was False",
        "skipped": true
    }
}

TASK [redis_cluster : output msg] *********************************************************************************************************************************************
ok: [node1] => {
    "msg": "集群状态检查成功..."
}
skipping: [node3]
skipping: [node2]
skipping: [node5]
skipping: [node4]
skipping: [node6]

PLAY RECAP ********************************************************************************************************************************************************************
node1                      : ok=14   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
node2                      : ok=11   changed=3    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
node3                      : ok=11   changed=4    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
node4                      : ok=11   changed=3    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
node5                      : ok=11   changed=4    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
node6                      : ok=11   changed=3    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0

遇到问题

1. copy权限发生变化,并且设置后权限没达到预期

错误示例

- name: copy redis bin
  ansible.builtin.copy:
    src: "{{ item }}"
    dest: "/usr/local/sbin/{{ item }}"
    mode: 551
    force: true
  loop:
    - redis-cli
    - redis-check-aof
    - redis-check-rdb
    - redis-server
    - redis-trib.rb

结果

----r--rwt 1 root root  9548592 2月  24 14:23 redis-check-aof
----r--rwt 1 root root  9548592 2月  24 14:23 redis-check-rdb
----r--rwt 1 root root  5004456 2月  24 14:23 redis-cli
----r--rwt 1 root root  9548592 2月  24 14:23 redis-server
----r--rwt 1 root root     3600 2月  24 14:23 redis-trib.rb

分析:

mode需要写八进制

- name: copy redis bin
  ansible.builtin.copy:
    src: "{{ item }}"
    dest: "/usr/local/sbin/{{ item }}"
    mode: 0551
    force: true
  loop:
    - redis-cli
    - redis-check-aof
    - redis-check-rdb
    - redis-server
    - redis-trib.rb

2. 解决只在其中一台创建集群,而不是定义的所有主机

      hosts:
        node1:
          ansible_host: 192.168.43.129
          index: 0
          create_node: true

- name: check cluster status
  ansible.builtin.shell: |
   redis-cli  -a {{ password }}  -h {{ ansible_host }} -p {{ listen_port }} cluster info |grep 'cluster_state:ok'
  register: cluster_status
  when: create_node is defined and create_node   # 加判断
  changed_when: false

- name: output msg
  ansible.builtin.debug:
    msg: "集群状态检查成功..."
  when: create_node is defined and create_node and cluster_status.rc == 0  加判断

3. 在启动redis时错误使用了loop

[root@master-1 tasks]# cat start_redis.yml
- name: start redis
  ansible.builtin.systemd:
    name: redis-{{listen_port}}
    daemon_reload: yes
    state: started
  loop: "{{ groups['redis_cluster'] }}"   # 不需要

当使用loop后,等于是在一台节点上重复启动6次,所以不需要使用loop。

4. 集群状态检测when条件没生效

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- name: check cluster status
  ansible.builtin.shell: |
   redis-cli  -a {{ password }}  -h {{ ansible_host }} -p {{ listen_port }} cluster info
  register: cluster_status
  when: create_node is defined and create_node
  changed_when: false
  failed_when: cluster_status.rc != 0 and 'cluster_state:ok' not in cluster_status.stdout

- name: output resister
  ansible.builtin.debug:
     var: cluster_status

- name: output msg
  ansible.builtin.debug:
    msg: "集群状态检查成功..."
  when: create_node is defined and create_node and cluster_status.rc == 0


TASK [redis_cluster : output resister] ****************************************************************************************************************************************
ok: [node1] => {
    "cluster_status": {
        "changed": false,
        "cmd": "redis-cli  -a 123456  -h 192.168.43.129 -p 6379 cluster info\n",
        "delta": "0:00:00.035331",
        "end": "2025-02-25 13:29:46.126410",
        "failed": false,
        "failed_when_result": false,
        "rc": 0,
        "start": "2025-02-25 13:29:46.091079",
        "stderr": "Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.",
        "stderr_lines": [
            "Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe."
        ],
        "stdout": "cluster_state:fail\r\ncluster_slots_assigned:16384\r\ncluster_slots_ok:16384\r\ncluster_slots_pfail:0\r\ncluster_slots_fail:0\r\ncluster_known_nodes:6\r\ncluster_size:3\r\ncluster_current_epoch:6\r\ncluster_my_epoch:1\r\ncluster_stats_messages_ping_sent:8\r\ncluster_stats_messages_pong_sent:12\r\ncluster_stats_messages_sent:20\r\ncluster_stats_messages_ping_received:7\r\ncluster_stats_messages_pong_received:8\r\ncluster_stats_messages_meet_received:5\r\ncluster_stats_messages_received:20",
        "stdout_lines": [
            "cluster_state:fail",
            "cluster_slots_assigned:16384",
            "cluster_slots_ok:16384",
            "cluster_slots_pfail:0",
            "cluster_slots_fail:0",
            "cluster_known_nodes:6",
            "cluster_size:3",
            "cluster_current_epoch:6",
            "cluster_my_epoch:1",
            "cluster_stats_messages_ping_sent:8",
            "cluster_stats_messages_pong_sent:12",
            "cluster_stats_messages_sent:20",
            "cluster_stats_messages_ping_received:7",
            "cluster_stats_messages_pong_received:8",
            "cluster_stats_messages_meet_received:5",
            "cluster_stats_messages_received:20"
        ]
    }
}

但是从输出来看,"cluster_state:fail,我在想是不是failed_when条件没执行吗。failed_when: cluster_status.rc != 0 and 'cluster_state:ok' not in cluster_status.stdout

分析:

  failed_when: cluster_status.rc != 0 and 'cluster_state:ok' not in cluster_status.stdout

这个判断条件从结果来看cluster_status.rc != 0 为false,'cluster_state:ok' not in cluster_status.stdout为true,and条件判断来说,只有两个都为true,结果为true,所以说结果为false,条件不成立,failed_when为false。

整改

将条件改成false,只要有一个条件为false,条件为false,failed_when为false,结果成了,task失败。

failed_when: cluster_status.rc != 0 or 'cluster_state:ok' not in cluster_status.stdout

5. 集群状态检测不通过

执行集群状态检测

TASK [redis_cluster : check cluster status] ***********************************************************************************************************************************
skipping: [node3]
skipping: [node2]
skipping: [node5]
skipping: [node4]
skipping: [node6]
fatal: [node1]: FAILED! => {"changed": false, "cmd": "redis-cli  -a 123456  -h 192.168.43.129 -p 6379 cluster info\n", "delta": "0:00:00.037506", "end": "2025-02-25 13:53:21.595669", "failed_when_result": true, "rc": 0, "start": "2025-02-25 13:53:21.558163", "stderr": "Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.", "stderr_lines": ["Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe."], "stdout": "cluster_state:fail\r\ncluster_slots_assigned:16384\r\ncluster_slots_ok:16384\r\ncluster_slots_pfail:0\r\ncluster_slots_fail:0\r\ncluster_known_nodes:6\r\ncluster_size:3\r\ncluster_current_epoch:6\r\ncluster_my_epoch:1\r\ncluster_stats_messages_ping_sent:8\r\ncluster_stats_messages_pong_sent:13\r\ncluster_stats_messages_sent:21\r\ncluster_stats_messages_ping_received:8\r\ncluster_stats_messages_pong_received:8\r\ncluster_stats_messages_meet_received:5\r\ncluster_stats_messages_received:21", "stdout_lines": ["cluster_state:fail", "cluster_slots_assigned:16384", "cluster_slots_ok:16384", "cluster_slots_pfail:0", "cluster_slots_fail:0", "cluster_known_nodes:6", "cluster_size:3", "cluster_current_epoch:6", "cluster_my_epoch:1", "cluster_stats_messages_ping_sent:8", "cluster_stats_messages_pong_sent:13", "cluster_stats_messages_sent:21", "cluster_stats_messages_ping_received:8", "cluster_stats_messages_pong_received:8", "cluster_stats_messages_meet_received:5", "cluster_stats_messages_received:21"]}

- name: check cluster status
  ansible.builtin.shell: |
   redis-cli  -a {{ password }}  -h {{ ansible_host }} -p {{ listen_port }} cluster info
  register: cluster_status
  when: create_node is defined and create_node
  changed_when: false
  failed_when: cluster_status.rc != 0 or 'cluster_state:ok' not in cluster_status.stdout

- name: output resister
  ansible.builtin.debug:
     var: cluster_status

- name: output msg
  ansible.builtin.debug:
    msg: "集群状态检查成功..."
  when: create_node is defined and create_node and cluster_status.rc == 0

问题:当我手动执行,发现状态是cluster_state:ok的,我不明白为什么ansible执行这个命令为什么cluster_state:fail

[root@master-1 roles]# redis-cli  -a 123456  -h 192.168.43.129 -p 6379 cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:33
cluster_stats_messages_pong_sent:37
cluster_stats_messages_sent:70
cluster_stats_messages_ping_received:32
cluster_stats_messages_pong_received:33
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:70

分析:

从redis日志来看,redis集群需要链接其他节点做初始化操作,如果集群刚创建好就去检测,可能会失败。

解决

加上睡眠5s,等集群初始化完成后再去检测

- name: create redis cluster
  ansible.builtin.shell: |
    echo "yes" |redis-cli -a {{ password }} --cluster create {{ cluster_node }} --cluster-replicas 1
    sleep 5
  when: create_node is defined and create_node

 

posted @ 2025-02-25 10:32  不会跳舞的胖子  阅读(23)  评论(0)    收藏  举报