mysql高可用集群——heartbeat+drbd

heartbeat+drbd+mysql是一种早期的mysql高可用技术。

资料来源:http://www.drbd.org

DRBD原理:DRBD是对磁盘块操作的复制,可看做网络raid1。不复制磁盘内容,只复制操作。原理可见下图

架构描述

服务器列表

192.168.1.82 192.168.1.1 3306 /dev/drbd0
192.168.1.82 192.168.1.2 3306 /dev/drbd0

 

 架构图

 

安装配置:

配置drbd

1.检查机器名解析:

1.查看解析
sudo vi /etc/hosts

192.168.1.1   mysql-1
192.168.1.2   mysql-2

2.查看内核

$ uname -a
Linux mysql-2-2 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

根据linux内核版本来确定可选择哪一版本的drbd

可以选则tar包自己安装,下载地址:http://oss.linbit.com/drbd/,也有部分yum源提供drbd的安装,本文作者使用centos 5.8,可直接yum安装

3.安装drbd

$ sudo yum list | grep drbd
drbd.x86_64                           8.0.16-5.el5.centos        extras         
drbd82.x86_64                         8.2.6-1.el5.centos         extras         
drbd83.x86_64                         8.3.15-2.el5.centos        extras         
drbdlinks.noarch                      1.26-1.el5                 epel           
kmod-drbd.x86_64                      8.0.16-5.el5_3             extras         
kmod-drbd-xen.x86_64                  8.0.16-5.el5_3             extras         
kmod-drbd82.x86_64                    8.2.6-2                    extras         
kmod-drbd82-xen.x86_64                8.2.6-2                    extras         
kmod-drbd83.x86_64                    8.3.15-3.el5.centos        extras         
kmod-drbd83-xen.x86_64                8.3.15-3.el5.centos        extras

这里我们选:drbd83.x86_64

sudo yum install -y drbd83.x86_64

4.配置drbd

rpm包安装的配置文件位置在/etc/drbd.conf

tar包安装的配置文件位置安装目录下的./etc/drbd.conf

global {
    # minor-count 64;
    # dialog-refresh 5; # 5 seconds
    # disable-ip-verification;
    usage-count no;
}

common {
  protocol C;

  disk {
    on-io-error   detach;
    #size 3982G;
    no-disk-flushes;
    no-md-flushes;
  }

  net {
    sndbuf-size 512k;
    # timeout       60;    #  6 seconds  (unit = 0.1 seconds)
    # connect-int   10;    # 10 seconds  (unit = 1 second)
    # ping-int      10;    # 10 seconds  (unit = 1 second)
    # ping-timeout   5;    # 500 ms (unit = 0.1 seconds)
    max-buffers     8000;
    unplug-watermark   1024;
    max-epoch-size  8000;
    # ko-count 4;
    # allow-two-primaries;
    cram-hmac-alg "sha1";
    shared-secret "hdhwXes23sYEhart8t";
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
    # data-integrity-alg "md5";
    # no-tcp-cork;
  }

  syncer {
    rate 120M;
    al-extents 517;
  }
}

resource data {
  on mysql-2-1 {
    device     /dev/drbd0;
    disk       /dev/sdb1;
    address    192.168.1.1:7788;
    meta-disk internal;
  }

  on mysql-2-2 {
    device    /dev/drbd0;
    disk      /dev/sdb1;
    address   192.168.1.2:7788;
    meta-disk internal;
  }
}

5.格式化同步磁盘区

sudo /sbin/fdisk -l
sudo
/sbin/mkfs.ext3 /dev/sdb1

sudo dd if=/dev/zero of=/dev/sdb1 bs=1M count=1;sync

6.启动drbd

同样rpm包安装的配置文件位置在/etc/init.d/目录下,源码安装的在 安装目录/etc/init.d/下

sudo /etc/init.d/drbd start

错误1:

Starting DRBD resources: Can not load the drbd module.

原因是,缺少内核模块,需要执行:

sudo yum install -y kmod-drbd83

错误2:

再启动,报错:
Starting DRBD resources: [ 
data
no suitable meta data found :(
Command '/sbin/drbdmeta 0 v08 /dev/sdb1 internal check-resize' terminated with exit code 255
drbdadm check-resize data: exited with code 255
d(data) 0: Failure: (119) No valid meta-data signature found.

    ==> Use 'drbdadm create-md res' to initialize meta-data area. <==


[data] cmd /sbin/drbdsetup 0 disk /dev/sdb1 /dev/sdb1 internal --set-defaults --create-device --no-md-flushes --no-disk-flushes --on-io-error=detach  failed - continuing!
 
s(data) n(data) ]..........

处理办法:

sudo /sbin/drbdadm create-md data

data是resource的模块名称。

7.检查服务

查看端口:

netstat -ant
...
tcp        0      0 192.168.1.1:7788          192.168.1.2:36040         ESTABLISHED 
tcp        0      0 192.168.1.1:38371         192.168.1.2:7788          ESTABLISHED
...
tcp        0      0 192.168.1.2:36040         192.168.1.1:7788          ESTABLISHED 
tcp        0      0 192.168.1.2:7788          192.168.1.1:38371         ESTABLISHED 

查看状态:

sudo cat /proc/drbd 
version: 8.3.15 (api:88/proto:86-97)
GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder10.centos.org, 2013-03-27 16:01:26
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:3888655588

version: 8.3.15 (api:88/proto:86-97)
GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder10.centos.org, 2013-03-27 16:01:26
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:3888655588 

有的时候proc会反应一个过程,比如你磁盘格式化之后

8.指定主库(在一台执行)

sudo /sbin/drbdadm -- --overwrite-data-of-peer primary all

sudo /sbin/drbdsetup /dev/drbd1 primary -o

指定主库之后,我们再看/proc/drbd。

sudo cat /proc/drbd 
version: 8.3.15 (api:88/proto:86-97)
GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder10.centos.org, 2013-03-27 16:01:26
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n-
    ns:69312336 nr:0 dw:61154476 dr:8190020 al:53128 bm:501 lo:1 pe:21 ua:254 ap:0 ep:1 wo:b oos:3819537276
    [>....................] sync'ed:  1.8% (3730016/3797512)M
    finish: 9:21:51 speed: 113,284 (99,164) K/sec

9.格式化磁盘,挂盘(在主库上执行)

sudo /sbin/mkfs.ext3 /dev/drbd0
mount /dev/drbd0 /data

10.切换测试

切换前,2.218为主,2.223为备,

(1)在2.218/data目录创建文件1.txt

-rw-r--r-- 1 root root     0 Jun 24 09:38 1.txt

(2)卸载磁盘/data,sudo umount /data

(3)为2.218降级:

$ sudo /sbin/drbdadm secondary data
$ sudo cat /proc/drbd 
version: 8.3.15 (api:88/proto:86-97)
GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder10.centos.org, 2013-03-27 16:01:26
 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
    ns:1741363424 nr:0 dw:61154560 dr:1680209989 al:53130 bm:237344 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

(4)为2.223升级:

$ sudo /sbin/drbdadm primary data
$ sudo cat /proc/drbd 
version: 8.3.15 (api:88/proto:86-97)
GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder10.centos.org, 2013-03-27 16:01:26
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:1741363424 dw:1741363424 dr:0 al:0 bm:237341 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

(5)在2.223上挂载磁盘:sudo mount /dev/drbd0 /data

(6)查看存在1.txt

-rw-r--r-- 1 root root     0 Jun 24 09:38 1.txt

(7)编辑1.txt,输入123,保存

(8)按上面步骤将主切回1.218,查看1.txt

[leiche@mysql-2-2 data]$ sudo cat 1.txt 
123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123

(9)再将主切回2.223,发现内容一样。

问题:我写入的是123,为何存入这么多123?

不管如何,切换时成功了。至此,安装drbd完成。

 

配置heartbeat

1.下载

heartbeat同样可用源码安装,源码下载地址如下。也可以用yum源安装,heartbeat与linux内核版本关系不大。

下载libnet
http://sourceforge.jp/projects/sfnet_libnet-dev/releases/
    
下载heartbeat
http://www.ultramonkey.org/download/heartbeat/2.1.3/heartbeat-2.1.3.tar.gz

2.安装heartbeat

$ sudo yum list | grep heartbeat
heartbeat.i386                        2.1.4-11.el5               epel           
heartbeat.x86_64                      2.1.4-11.el5               epel           
heartbeat-devel.i386                  2.1.4-11.el5               epel           
heartbeat-devel.x86_64                2.1.4-11.el5               epel           
heartbeat-gui.x86_64                  2.1.4-11.el5               epel           
heartbeat-ldirectord.x86_64           2.1.4-11.el5               epel           
heartbeat-pils.i386                   2.1.4-11.el5               epel           
heartbeat-pils.x86_64                 2.1.4-11.el5               epel           
heartbeat-stonith.i386                2.1.4-11.el5               epel           
heartbeat-stonith.x86_64              2.1.4-11.el5               epel

 安装:

yum install -y heartbeat heartbeat-ldirectord heartbeat-pils heartbeat-stonith

3.配置drbd

(1)配置authkeys

$ sudo cat /etc/ha.d/authkeys 
#这个文件用来配置密码认证方式,支持3种认证方式,crc,md5和sha1
auth 2
#1 crc
2 sha1 47e9336850f1db6fa58bc470bc9b7810eb397f04
#3 md5 Hellomysql

sudo chmod 600 /etc/ha.d/authkeys

(2)配置ha.cf

[leiche@mysql-2-1 3306]$ sudo cat /etc/ha.d/ha.cf 
#日志 debugfile
/var/log/ha-debug logfile /var/log/ha-log
# logfacility local0 #心跳设定 #检测心跳,每2秒检测一次 keepalive
2 #60秒连接不上认为对方挂掉了 deadtime 60 #连续10次连接不上则警告提示 warntime 10 #为重启预留一段时间 initdead 180
#有三种广播方式ucast,mcast,bcast,也就是心跳线,694是默认端口,1是ttl:允许生存时间 mcast eth1 225.0.0.37 694 1 0
#不回切 auto_failback off node mysql
-2-1 node mysql-2-2

#关闭crm crm no

(3)配置hareresources

  格式:[node-name] IPaddr drbddisk Filesystem 启动项

  [node-name] 需要和ha.cf中的node值一致;

  IPaddr:由/etc/ha.d/resource.d/IPaddr 控制,用::隔开,意思是IP为192.168.1.82,子网掩码为255.255.255.0,基础网卡为eth0

  drbddisk:由/etc/ha.d/resource.d/drbddisk控制,用::隔开,意思是挂载共享盘,data为drbd配置文件中的resource data模块

 

  Filesystem:由/etc/ha.d/resource.d/Filesystem控制,用::隔开,挂载共享盘,等同于mount -t ext3 /dev/drbd0 /data

  可执行文件:默认目录为/etc/init.d/,要求可执行文件有start|stop命令选项

 

 

#主218,node mysql-2-1
mysql-2-1 IPaddr::192.168.1.82/24/eth1 drbddisk::data Filesystem::/dev/drbd0::/data::ext3 mysql3306
#备233,node mysql-2-2
mysql-2-2 IPaddr::192.168.1.82/24/eth1 drbddisk::data Filesystem::/dev/drbd0::/data::ext3 mysql3306

 

4.设置开机启动

chkconfig mysql off
chkconfig --add heartbeat 
chkconfig heartbeat on

5.启动heartbeat

[leiche@mysql-2-1 ~]$ sudo /etc/init.d/heartbeat start
Starting High-Availability services: 
2014/06/25_11:18:10 INFO:  Resource is stopped
                                                           [  OK  ]

查看磁盘:

/dev/drbd0            3.6T  1.8G  3.4T   1% /data

查看ip:

eth0:0    Link encap:Ethernet  HWaddr F8:BC:12:48:65:B4  
          inet addr:192.168.1.82  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:194 Memory:d91a0000-d91b0000

查看主备错误日志有没有明显error。

 

切换测试:

停止前状态:

  master1:192.179.1.218

磁盘:
Filesystem Size Used Avail Use% Mounted on /dev/drbd0 3.6T 1.1T 2.3T 33% /data
进程:
ps aux | grep mysql进程运行正常 网卡: eth0:0 Link encap:Ethernet HWaddr F8:BC:12:48:65:B4 inet addr:192.168.1.82 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:194 Memory:d91a0000-d91b0000
drbd状态:
sudo cat /proc/drbd version: 8.3.15 (api:88/proto:86-97) GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder10.centos.org, 2013-03-27 16:01:26 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:1289487736 nr:104088 dw:1289592052 dr:1265759530 al:1359461 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

 

  master2:192.168.1.2

磁盘,网卡,mysql服务都不存在

drbd状态:
[leiche@mysql-2-2 ~]$ sudo cat /proc/drbd 
version: 8.3.15 (api:88/proto:86-97)
GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder10.centos.org, 2013-03-27 16:01:26
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:103512 nr:1291455972 dw:1291559484 dr:108510 al:662 bm:6 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

(1)停止主机heartdrbd服务:/etc/init.d/mysql3306 stop

    停止后,主从切换,状态互换,这时候master2为主写,master1宕掉了。

    我们再把master1的heartbeat服务,发现主写不会切回来,原因是ha.cf文件中的参数:

auto_failback off

  若需要主写需要切回值master1,则需要模拟master2的heartbeat宕掉;

  可以看到相关日志:

eartbeat[30459]: 2014/07/01_16:20:42 info: Heartbeat restart on node mysql-2-2
heartbeat[30459]: 2014/07/01_16:20:42 info: Link mysql-2-2:eth1 up.
heartbeat[30459]: 2014/07/01_16:20:42 info: Status update for node mysql-2-2: status init
heartbeat[30459]: 2014/07/01_16:20:42 info: Status update for node mysql-2-2: status up
heartbeat[30459]: 2014/07/01_16:20:42 debug: StartNextRemoteRscReq(): child count 1
heartbeat[25083]: 2014/07/01_16:20:42 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[25083]:    2014/07/01_16:20:42 info: Running /etc/ha.d/rc.d/status status
heartbeat[25099]: 2014/07/01_16:20:42 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[25099]:    2014/07/01_16:20:42 info: Running /etc/ha.d/rc.d/status status
heartbeat[30459]: 2014/07/01_16:20:43 debug: get_delnodelist: delnodelist= 
heartbeat[30459]: 2014/07/01_16:20:43 info: all clients are now paused
heartbeat[30459]: 2014/07/01_16:20:43 info: Status update for node mysql-2-2: status active
heartbeat[25115]: 2014/07/01_16:20:43 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[25115]:    2014/07/01_16:20:43 info: Running /etc/ha.d/rc.d/status status
heartbeat[30459]: 2014/07/01_16:20:44 info: remote resource transition completed.
heartbeat[30459]: 2014/07/01_16:20:44 info: all clients are now resumed

(2)只停止mysql服务

  heartbeat不会有任何反馈,甚至ha-debug和ha-log都没有记录。

脑裂

   什么是脑裂?脑裂就是两边大脑各说各的,没有协调,思想不统一,然后你整个人就混乱了。

   在heartbeat+drbd这种架构中,脑裂是指:由于某些网络或服务故障,导致heartbeat心跳线暂时断开,从而引起主备都被启用,互不相让的情况。

   在本例中即指master1和master2都被启用,都有ip;192.168.1.82,都有mysql服务。

  如何处理?

   1.加仲裁

   2.监控log,漂移则报警,然后人为检查并强制停止一台;

posted @ 2014-07-01 16:53  wyett  阅读(1013)  评论(1编辑  收藏  举报