2.keepalive+nginx+nfs
1、 原理/背景
Keepalived高可用服务对之间的故障切换转移,是通过VRRP实现的。正常工作时,主Master节点不断地向各备节点发送(多播方式224.0.0.18)心跳清息,用以告诉备Backup节点自已还活着,故障时,备没收到信息,就调用自身的接管程序。
VRRP:Master通过竞选的方式产生,虚拟路由器由VRID和一组IP地址组成,对外表现为一个周知的MAC地址,00.00.5E.00.01-VRID。
2、 安装
2.1. 环境
lb01(node01):10.146.3.131 keepalived主服务器(nginx主负载均衡器)
lb02(node02):10.146.3.132 keepalived备服务器(nginx备负载均衡器)
web01(node03):10.146.3.133 web01服务器
web02(node04):10.146.3.134 web02服务器
2.2. 安装
#tar zxvf /home/oldboy/ keepalived-1.2.20.tar.gz
# mv /home/oldboy/tools/keepalived-1.2.20 /application/
#cd /application/keepalived
#./configure
#make && make install
# cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/
#cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf
# cp /usr/local/sbin/keepalived /usr/sbin/
# cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/
安装完成,启动测试:
# /etc/init.d/keepalived start
#ps -ef | grep keepalived
#ip addr | grep 192.168 ##默认配置有3个192.168的IP会启动在对应端口。
2.3. 配置文件内容说明
2.3.1. keepalived.conf配置文件说明
全局定义(global Definitions):设置
! Configuration File for keepalived
global_defs {
notification_email { ##定义服务故障报警的Email地址。
acassen@firewall.loc ##收件人的mail地址。
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc ##指定发送邮件的发送人,可选。
smtp_server 192.168.200.1 ##发送mail的SMTP服务器。
smtp_connect_timeout 30 ##连接smtp的超时时间,可选
router_id LVS_DEVEL ##路由标识
vrrp_skip_check_adv_addr ##检查vrrp报文中的所有地址比较耗时,设置此标志的意思是如果接收的到报文和上一个报文来至同一个路由器,则不执行检查。默认是跳过检查
vrrp_strict ##严格遵守vrrp协议,此模式不支持节点单播
}
2.3.2. VRRP实例定义区块(VRRP instance):
vrrp_instance VI_1 {
state MASTER ##MASTER /BACKUP
interface eth2
virtual_router_id 55 ##实例ID。
priority 150 ##大而优
advert_int 1 ##通信检查时间间隔1秒
authentication {
auth_type PASS ##官网建议用此认证类型,明文的。
auth_pass 1111
}
virtual_ipaddress {
10.146.3.135 dev eth2 label eth2:1 ##建议用此方式绑定到具体的网卡上。
}
}
3、 keepalived高用服务单实例单主模式
3.1. 主、备服务器均进行配置
#vim /etc/keepalive/keepalive.conf
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id lb01 ##备的改为lb02
vrrp_skip_check_adv_addr
vrrp_strict
}
vrrp_instance VI_1 {
state MASTER ##备的改为BACKUP
interface eth2
virtual_router_id 55 ##主备必须相同。
priority 150
advert_int 1
track interface { ##跟踪接口,设置额外的监控,里面任意一块网卡出现问题,都会进入故障(FAULT)状态
eth1
eth2
}
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.146.3.135 dev eth2 label eth2:1
}
}
3.2. 分别启动、关闭keeplived服务进行VIP漂移验证
# /etc/init.d/keepalived start
4、 keepalived高用服务双实例双主模式--vip漂移
4.1. 主、备服务器均进行配置,增加以下内容
#vim /etc/keepalived/keepalived.conf
vrrp_instance VI_2 { ##不能与第一个实例相同
state BACKUP ##在实例2中,lb01为BACKUP节点
interface eth2
virtual_router_id 56 ##在同一个.conf文件中,不能重复。
priority 100 ##做为备节点,优先级低
advert_int 1
nopreempt ##不抢占,只能設置在BACKUP節點上。
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.146.3.138/24 dev eth2 label eth2:2
}
}
5、 Nginx+keepalived
5.1. lb01、lb02先配置好Nginx负载均衡
lb01#/application/nginx/conf/nginx.conf
worker_processes 1;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
sendfile on;
keepalive_timeout 65;
upstream www_server_pools {
server 10.146.3.133:80 weight=1;
server 10.146.3.134:80 weight=1;
check interval=300 rise=2 fall=5 timeout=1000 type=http;
}
server {
listen 80;
server_name www.etiantian.org;
access_log logs/access_www.log main;
location / {
proxy_pass http://www_server_pools;
include proxy.conf;
}
location /status {
check_status;
access_log off;
}
}
}
5.2. keepalive配置用前面已配置好的单实例单主模式配置
6、 另外指定keepalived服務日誌
默認存在/var/log/messages
#vim /etc/sysconfig/keepalived
KEEPALIVED_OPTIONS="-D -d -S 0" ##-s 0表示指定為local0設備
#vim /etc/rsyslog.conf
*.info;mail.none;authpriv.none;cron.none;local0.none /var/log/messages ##加上local0.none
local0.* /var/log/keepalived.log
7、 防止腦裂腳本
在備節點上執行腳本,如果可以ping通主節點真實IP,且備節點有VIP就判斷為腦裂。
#vim script/check_split_brain.sh
#!/bin/sh
lb01_vip=10.146.3.138
lb01_ip=10.146.3.131
while true
do
ping -c 2 -W 3 $lb01_ip &> /dev/null
if [ $? -eq 0 -a `ip add|grep "$lb01_vip"|wc -l` -eq 1 ]
then
echo "ha is split brain.warning."
else
echo "ha is ok"
fi
sleep 5
done
8、 其它配置參數
8.1. notify_master、notify_backup、notify_fault、notify、smtp_alert
notify_master /path/to/to_master.sh:表示当切换到master状态时,要执行的脚本
notify_backup /path_to/to_backup.sh:表示当切换到backup状态时,要执行的脚本
notify_fault
“/path/fault.sh VG_1”
notify /path/to/notify.sh
smtp_alert表示切换时给global defs中定义的邮件地址发送右键通知
參數位置及用法示例:
vrrp_sync_group VG_1 { ##單獨一個區塊
group {
http ## 與vrrp_instance名一致
mysql
}
notify_master /path/to/to_master.sh
notify_backup /path_to/to_backup.sh
notify_fault “/path/fault.sh VG_1”
notify /path/to/notify.sh
smtp_alert
}
8.2. nopreempt、preemtp_delay
配置在vrrp_instance實例內
不抢占,只能設置在BACKUP節點上。
preemtp_delay 300 ##搶占延遲300秒
8.3. track interface
配置在vrrp_instance區塊內
rack interface:跟踪接口,设置额外的监控,里面任意一块网卡出现问题,都会进入故障(FAULT)状态
9、 FAQ
9.1. 解决服务监听的网卡上不存IP地址问题
如果配置使用"listen 10.146.3.138:80"的方式指定IP监听服务(nginx.conf配置),而本地网卡上没有10.146.3.138这个IP,nginx会报错---cannot assign requested address
答:#vim /etc/sysctl.conf
net.ipv4.ip_nonlocal_bind = 1
或:#echo 'net.ipv4.ip_nonlocal_bind =1 ' >> /etc/sysctl.conf
#sysctl -p
9.2. 解决keepalived仅在对方宕机或keepalived停掉的时候才接管业务
使用脚本的方式检查业务服务工作状况
答:方法一:写一个守护进程脚本,当nginx业务有问题时,就停掉keepalived服务
#vim check_nginx.sh
#!/bin/bash
while true
do
if [ 'netstat -lntup|grep nginx|wc -l' -ne 1 ];then
/etc/init./keepalived stop
fi
sleep 5
done
方法二:用keepalived配置文件参数触发写好的监测服务脚本。
# vim chk_nginx_proxy.sh
#!/bin/bash
if [ 'netstat -lntup|grep nginx|wc -l' -ne 1 ];then
/etc/init.d/keepalived stop
fi
在keepalived.conf中单独起一个区块
#vim /etc/keepalived/keepalived.conf
vrrp_script chk_nginx_proxy { ##定认vrrp脚本,检测http端口
script "/scripts/chk_nginx_proxy.sh" ##执行脚本
interval 2 ##间隔2秒,检测一次
weight 2
}
vrrp_instance VI_1 { #3在vrrp实例中配置
track_script {
chk_nginx_proxy ##触发检查
}
}
多组keepalived服务器在一个局域网的冲突
多播地址冲容造成,都是例用224.0.0.18。手动指定一个多播地址
global_defs {
router_id lb01
vrrp_mcast_group4 224.0.0.19 ##指定一个多播地址
}
*****************************************************************
下面是网上摘抄的nfs keeplive配置,留个记录。
*****************************************************************
$ cat << EOF | tee /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id NFS-Master
}
vrrp_script chk_nfs
{
script "/etc/keepalived/nfs_check.sh" #监控脚本
interval 2 # 2秒一次
weight -20 # keepalived部署了两台所以设为20,如果三台就设为30
}
vrrp_instance VI_1 {
state BACKUP # 两台主机都设为backup非抢占模式
interface ens32 # 网卡文件名称,根据实际修改
virtual_router_id 127 # 在同一子网中的id要唯一
priority 100
advert_int 1
nopreempt # 设置为非抢占模式必须要该参数
authentication {
auth_type PASS
auth_pass 123456
}
track_script {
chk_nfs
}
notify_stop /etc/keepalived/notify_stop.sh # keepalived停服时调用
virtual_ipaddress {
192.168.106.127/24 # 虚拟出的公共IP
}
}
EOF
# 配置监控脚本
$ cat << EOF | tee /etc/keepalived/nfs_check.sh
#!/bin/bash
# 日志文件大于5M截取最后50行
[ `du -m /tmp/nfs-chk.log | awk '{print $1}'` -gt 5 ] && tail -50 /tmp/nfs-chk.log>/tmp/nfs-tmp && mv /tmp/nfs-tmp /tmp/nfs-chk.log -f
vip=`ip a |grep 106.127|wc -l`
if [ $vip -eq 1 ];then # 主keepalived机器检查
service nfs status &>/dev/null # 检查nfs可用性
if [ $? -ne 0 ];then # 如果服务状态不正常,先尝试重启服务
time=`date "+%F %H:%M:%S"`
echo -e "$time ------主机NFS服务故障,重启之!------\n" >>/tmp/nfs-chk.log
service nfs start &>/dev/null
fi
nfsStatus=`ps -C nfsd --no-header | wc -l`
if [ $nfsStatus -eq 0 ];then # 若重启nfs服务后,仍不正常
time=`date "+%F %H:%M:%S"`
echo -e "$time ------nfs服务故障且重启失败,切换到备用服务器------\n" >>/tmp/nfs-chk.log
service nfs stop &>>/tmp/nfs-chk.log # 停止nfs服务
umount /dev/drbd1 &>>/tmp/nfs-chk.log # 卸载drbd设备
drbdadm secondary r1 &>>/tmp/nfs-chk.log # 将drbd主降级为备
service keepalived stop &>>/tmp/nfs-chk.log # 关闭keepalived
time=`date "+%F %H:%M:%S"`
echo -e "$time ------切换结束!------\n" >>/tmp/nfs-chk.log
sleep 2
service keepalived start &>>/tmp/nfs-chk.log # 再开启keepalived
else
# drbd置主没有,挂载没有
drbdadm role r1|grep Primary &>/dev/null
if [ $? -ne 0 ];then # drbd未置Primary
time=`date "+%F %H:%M:%S"`
echo -e "$time ------将本机置为DRBD主机并挂载/nfs目录------\n" >>/tmp/nfs-chk.log
drbdadm primary --force r1 &>>/tmp/nfs-chk.log # 将drbd置为主
mount /dev/drbd1 /nfs &>>/tmp/nfs-chk.log # 挂载drbd设备
fi
fi
else # keepalived备机检查
service nfs status &>/dev/null
if [ $? -eq 0 ];then # NFS服务必须处于关闭状态
time=`date "+%F %H:%M:%S"`
echo -e "$time ------关闭备机NFS服务------\n" >>/tmp/nfs-chk.log
service nfs stop &>>/tmp/nfs-chk.log
fi
drbdadm role r1|grep Primary &>/dev/null
if [ $? -eq 0 ];then # drbd必须置备并卸载drbd设备
time=`date "+%F %H:%M:%S"`
echo -e "$time ------备机置secondary并卸载备机drbd设备------\n" >>/tmp/nfs-chk.log
drbdadm secondary r1 &>>/tmp/nfs-chk.log
umount /dev/drbd1 &>>/tmp/nfs-chk.log &>>/tmp/nfs-chk.log
fi
fi
EOF
# 配置keepalived停服脚本
$ cat << EOF | tee /etc/keepalived/notify_stop.sh
#!/bin/bash
time=`date "+%F %H:%M:%S"`
echo -e "$time ------开始切换到备用服务器------\n" >>/tmp/nfs-chk.log
service nfs stop &>>/tmp/nfs-chk.log # 停止nfs服务
umount /dev/drbd1 &>>/tmp/nfs-chk.log # 卸载drbd设备
drbdadm secondary r1 &>>/tmp/nfs-chk.log # 将drbd主降级为备
time=`date "+%F %H:%M:%S"`
echo -e "$time ------切换结束!------\n" >>/tmp/nfs-chk.log
sleep 2
service keepalived start &>>/tmp/nfs-chk.log # 再开启keepalived
EOF
$ chmod +x /etc/keepalived/*.sh
$ systemctl start keepalived.service && systemctl enable keepalived.service
# 以下在备机上,修改/etc/keepalived/keepalived.conf
router_id NFS-Slave
priority 80 #从节点的权重要比主节点低
# 再重启keepalived服务
$ service keepalived restart

浙公网安备 33010602011771号