RHCS+GFS2实现高可用集群
1、环境介绍
本次实验环境实在VMware11中搭建,一共四台虚拟机,其中一台为openfile主机,提供iscsi存储,两台节点机,由于没有fence设备,我们用使用Qdisk(仲裁盘)下面为具体环境:
| 主机名 | ip地址 | 用途 |
| luci | 172.16.80.200 | luci服务器 |
| www.userzr.com | 172.16.80.10 | 存储服务器 |
| Node1 | 172.16.80.5 | 节点服务器 |
| Node2 | 172.16.80.7 | 节点服务器 |
2、RHCS搭建
2.1 yum配置
[root@luci ~]# cat /etc/yum.repos.d/rhcs.repo[base]name=CentOS-$releasever - Basebaseurl=file:///mntenabled=1gpgcheck=0gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6 |
本次实验用Centos6.5,如果用的redhat的话,yum如下面配置,否则集群包不能正常安装
[rhel_6_iso]name=local isobaseurl=file:///mnt/gpgcheck=0gpgkey=file:////mnt/cdrom/RPM-GPG-KEY-redhat-release [rhel_6-HA_iso]name=local isobaseurl=file:///mnt/HighAvailabilitygpgcheck=0gpgkey=file:////mnt/RPM-GPG-KEY-redhat-release [rhel_6-LB_iso]name=local isobaseurl=file:///mnt/LoadBalancergpgcheck=0gpgkey=file:////mnt/RPM-GPG-KEY-redhat-release [rhel_6-RS_iso]name=local isobaseurl=file:///mnt/ResilientStoragegpgcheck=0gpgkey=file:////mnt/RPM-GPG-KEY-redhat-release |
2.2 安装RHCS
2.2.1 安装luci
luci程序可以单独安装到一台主机上,也可以安装到任意一台节点机上,因为增加集群时节点机会重启,所以我们安装到单独的一台主机上。
[root@luci ~]# yum install luci -y[root@luci ~]# service luci startStart luci... [确定]Point your web browser to https://luci:8084 (or equivalent) to access luci |
启动luci后,我们就可以在浏览器上输入https://172.16.80.200:8084访问luci
协议是https不是http

默认用户和密码为系统中root一样

2.2.2 在节点机安装ricci、rgmanager、cman
我们有两台节点机node1和node2,如果机器多的话一台一台装就太费劲了,因此我们可以写个for循环利用ssh来依次在node1和node2上安装软件,最好先配置luci和node1、node2免密码登录,否则得一个一个的输密码!!
luci主机hosts信息,和node1、node2一样
[root@luci ~]# cat /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6172.16.80.200 luci172.16.80.5 node1172.16.80.7 node2 |
配置免密码登录,luci主机生成公钥和私钥
[root@luci ~]# ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa):Enter passphrase (empty for no passphrase):Enter same passphrase again:Your identification has been saved in /root/.ssh/id_rsa.Your public key has been saved in /root/.ssh/id_rsa.pub.The key fingerprint is:c0:85:c9:ce:47:25:d2:e7:06:2f:2c:5c:b0:7b:95:20 root@luciThe key's randomart image is:+--[ RSA 2048]----+| .E=+.. || .+==o.. || +++ =o || =o+.+ || .oSo || . || || || |+-----------------+ |
将公钥上传到node1和node2
[root@luci ~]# ssh-copy-id 172.16.80.5root@172.16.80.5's password:Now try logging into the machine, with "ssh '172.16.80.5'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [root@luci ~]# ssh-copy-id 172.16.80.7root@172.16.80.7's password:Now try logging into the machine, with "ssh '172.16.80.7'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. |
接下来将for循环定义为一个别名cop
[root@luci ~]# alias cop='for i in {1,2};do ssh root@node$i'[root@luci ~]# cop hostname;donenode1node2 |
记得加上for循环结束标志;done
接下来安装集群软件(node1和ndoe2配置好yum源,本地远程都可以)
[root@luci ~]# cop yum install -y ricci rgmanager cman;done |
默认ricci没有密码,所以我们得先配置ricci用户密码
在node1和node2配置ricci密码
[root@node1 ~]# passwd ricci |
启动r应用,并添加开机启动,(启动ricci就可以了,因为没有在luci中配置集群节点,cman是启动不了的。)
[root@luci ~]# cop services ricci start;done[root@luci ~]# cop chkconfig ricci on;done[root@luci ~]# cop chkconfig cman on;done[root@luci ~]# cop chkconfig rgmanager on;done |
接下来登录luci界面,添加集群节点
点击Manager Clusters----->Create创建集群

点击Create Cluster创建集群,节点将重启。


登录到node1,执行clustat查看集群状态
[root@node1 ~]# clustatCluster Status for userzr-ha @ Thu Mar 10 20:11:32 2016Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1 1 Online, Local node2 2 Online |
2.2.3 添加服务
添加浮动IP,点击luci页面上的Resource,选择IP Address,注意:Netmask Bits必须是掩码位(8、16、24)类型,不能是255.255.255.0,否则服务不能启动。

添加httpd资源(添加类型为Script)指定个名字,还有脚本路径

添加Failover Domain,将两个节点加入fa失效域,当node1出现问题,服务将切换到node2上,默认服务启动在node1上,因为数字越小,优先级越高。

将资源添加到服务组里点击Service Groups,输入一个服务名userzr-ser,Automatically Start This Service 自动启动服务,将Failover Domain设置为fa,点击Add resource添加刚才加的资源。



点击Submit
然后服务组会自动运行在优先级高的节点机上,node1的优先级为1,node2的优先级为10,数字越小优先级越高。
在node1上查看集群状态
[root@node1 ~]# clustatCluster Status for userzr-ha @ Thu Mar 10 20:43:03 2016Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1 1 Online, Local, rgmanager node2 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:userzr-ser node1 started |
Node1执行命令ip addr查看浮动IP,service httpd status查看httpd服务是否启动。
[root@node1 ~]# ip addr show eth02: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:a1:76:80 brd ff:ff:ff:ff:ff:ff inet 172.16.80.5/24 brd 172.16.80.255 scope global eth0 inet 172.16.80.6/24 scope global secondary eth0 inet6 fe80::20c:29ff:fea1:7680/64 scope link valid_lft forever preferred_lft forever[root@node1 ~]# service httpd statushttpd (pid 7185) 正在运行... |
3、挂载文件系统
我们有一台openfile存储服务器,共享了一块40G的iscsi盘,首先我们把iscsi盘挂载上
挂载iscsi盘必须先安装iscsi-initiator-utils
[root@luci ~]#cop yum install -y iscsi-initiator-utils;done[root@luci ~]#service iscsi start |
发现并连接iscsi磁盘
[root@node1 ~]# iscsiadm -m discovery -t sendtargets -p 172.16.80.10172.16.80.10:3260,1 iqn.2006-01.com.user:rd5 |
通过扫面可以看到172.16.80.10下有一块盘,下面挂载。
[root@node1 ~]# iscsiadm -m node -T iqn.2006-01.com.user:rd5 -p 172.16.80.10 -lLogging in to [iface: default, target: iqn.2006-01.com.user:rd5, portal: 172.16.80.10,3260] (multiple)Login to [iface: default, target: iqn.2006-01.com.user:rd5, portal: 172.16.80.10,3260] successful.[root@node1 ~]# fdisk -l Disk /dev/sda: 21.5 GB, 21474836480 bytes255 heads, 63 sectors/track, 2610 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk identifier: 0x000666b5 Device Boot Start End Blocks Id System/dev/sda1 * 1 64 512000 83 LinuxPartition 1 does not end on cylinder boundary./dev/sda2 64 587 4194304 82 Linux swap / SolarisPartition 2 does not end on cylinder boundary./dev/sda3 587 2611 16264192 83 Linux Disk /dev/sdb: 40.9 GB, 40936407040 bytes64 heads, 32 sectors/track, 39040 cylindersUnits = cylinders of 2048 * 512 = 1048576 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk identifier: 0x00000000 |
可以看到刚才挂载的iscsi磁盘为sdb
在node2执行同样的挂载操作,这儿不在演示了。
node1和node2同时挂载后,在node1对磁盘分区。
[root@node1~]# fdisk /dev/sdbDevice contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabelBuilding a new DOS disklabel with disk identifier 0x6edee83e.Changes will remain in memory only, until you decide to write them.After that, of course, the previous content won't be recoverable. Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) WARNING: DOS-compatible mode is deprecated. It's strongly recommended to switch off the mode (command 'c') and change display units to sectors (command 'u'). Command (m for help): p Disk /dev/sdb: 40.9 GB, 40936407040 bytes64 heads, 32 sectors/track, 39040 cylindersUnits = cylinders of 2048 * 512 = 1048576 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk identifier: 0x6edee83e Device Boot Start End Blocks Id System Command (m for help): nCommand action e extended p primary partition (1-4)pPartition number (1-4): 1First cylinder (1-39040, default 1):Using default value 1Last cylinder, +cylinders or +size{K,M,G} (1-39040, default 39040): +1G Command (m for help): nCommand action e extended p primary partition (1-4)pPartition number (1-4): 2First cylinder (1026-39040, default 1026):Using default value 1026Last cylinder, +cylinders or +size{K,M,G} (1026-39040, default 39040): +10G Command (m for help): nCommand action e extended p primary partition (1-4)pPartition number (1-4): 3First cylinder (11267-39040, default 11267):Using default value 11267Last cylinder, +cylinders or +size{K,M,G} (11267-39040, default 39040): +1G Command (m for help): wThe partition table has been altered! Calling ioctl() to re-read partition table.Syncing disks.[root@node2 ~]#[root@node2 ~]#[root@node2 ~]#[root@node2 ~]# fdisk -l Disk /dev/sda: 21.5 GB, 21474836480 bytes255 heads, 63 sectors/track, 2610 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk identifier: 0x0006744d Device Boot Start End Blocks Id System/dev/sda1 * 1 64 512000 83 LinuxPartition 1 does not end on cylinder boundary./dev/sda2 64 587 4194304 82 Linux swap / SolarisPartition 2 does not end on cylinder boundary./dev/sda3 587 2611 16264192 83 Linux Disk /dev/sdb: 40.9 GB, 40936407040 bytes64 heads, 32 sectors/track, 39040 cylindersUnits = cylinders of 2048 * 512 = 1048576 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk identifier: 0x6edee83e Device Boot Start End Blocks Id System/dev/sdb1 1 1025 1049584 83 Linux/dev/sdb2 1026 11266 10486784 83 Linux/dev/sdb3 11267 12291 1049600 83 Linux |
格式化sdb1,格式为ext4
[root@node1 ~]# mkfs.ext4 /dev/sdb1mke2fs 1.41.12 (17-May-2010)警告: 252 块未使用. 文件系统标签=操作系统:Linux块大小=4096 (log=2)分块大小=4096 (log=2)Stride=0 blocks, Stripe width=0 blocks65664 inodes, 262144 blocks13119 blocks (5.00%) reserved for the super user第一个数据块=0Maximum filesystem blocks=2684354568 block groups32768 blocks per group, 32768 fragments per group8208 inodes per groupSuperblock backups stored on blocks:32768, 98304, 163840, 229376 正在写入inode表: 完成 Creating journal (8192 blocks): 完成Writing superblocks and filesystem accounting information: 完成 This filesystem will be automatically checked every 20 mounts or180 days, whichever comes first. Use tune2fs -c or -i to override. |
将刚才格式化的sdb1挂载到httpd根目录,并输入一些信息,我们输入node1好了
[root@node1 ~]# mount /dev/sdb1 /var/www/html/[root@node1 ~]# echo "<h1>node1</h1>" > /var/www/html/index.html |
完成后将sdb1卸载
[root@node1 ~]# umount /dev/sdb1 |
在luci往RHCS集群添加磁盘资源


将添加的磁盘资源添加到userzr-ser服务组

点击Submit后磁盘自动挂载到node1
访问172.16.80.6

4、通过命令切换集群
|
cltstat |
查看集群状态 |
|
clusvcadm -e userzr-ser |
启动服务组(可以直接-m 主机名指定在那台主机启动) |
|
clusvcadm -s userzr-ser |
关闭服务组 |
|
clusvcadm -r userzr-ser -m node2 |
将服务从node1切换到node2 |
[root@node1 ~]# clusvcadm -r userzr-ser -m node2我们测试下,将userzr-ser服务组从node1切换到node2
Trying to relocate service:userzr-ser to node2...Successservice:userzr-ser is now running on node2[root@node1 ~]# clustatCluster Status for userzr-ha @ Wed Mar 16 22:22:59 2016Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1 1 Online, rgmanager node2 2 Online, Local, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:userzr-ser node2 started |
在node2查看是否有浮动IP,httpd服务以及磁盘有没有挂载过来
[root@node2 ~]# ip addr show eth02: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:33:75:62 brd ff:ff:ff:ff:ff:ff inet 172.16.80.7/24 brd 172.16.80.255 scope global eth0 inet 172.16.80.6/24 scope global secondary eth0 inet6 fe80::20c:29ff:fe33:7562/64 scope link valid_lft forever preferred_lft forever[root@node2 ~]# service httpd statushttpd (pid 10758) 正在运行...[root@node2 ~]# df -hTFilesystem Type Size Used Avail Use% Mounted on/dev/sda3 ext4 16G 3.1G 12G 22% /tmpfs tmpfs 946M 26M 921M 3% /dev/shm/dev/sda1 ext4 485M 37M 424M 8% /boot/dev/sdb1 ext4 1008M 34M 924M 4% /var/www/html |
经过切换后,userzr-ser从node1切换到node2
4、挂载GFS2文件系统
将sdb2卷标修改为8e
[root@node1 ~]# fdisk /dev/sdb Command (m for help): tPartition number (1-4): 2Hex code (type L to list codes): 8eChanged system type of partition 2 to 8e (Linux LVM) Command (m for help): p Disk /dev/sdb: 40.9 GB, 40936407040 bytes64 heads, 32 sectors/track, 39040 cylindersUnits = cylinders of 2048 * 512 = 1048576 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk identifier: 0x6edee83e Device Boot Start End Blocks Id System/dev/sdb1 1 1025 1049584 83 Linux/dev/sdb2 1026 11266 10486784 8e Linux LVM/dev/sdb3 11267 12291 1049600 83 Linux Command (m for help): wThe partition table has been altered! Calling ioctl() to re-read partition table.Syncing disks. |
将sdb2只作为CLVM (Cluster LVM)
[root@node1 ~]# pvcreate /dev/sdb2 Physical volume "/dev/sdb2" successfully created[root@node1 ~]# lvmconf --enable-cluster #开启集群模式[root@node1 ~]# vgcreate cluvg /dev/sdb2 Clustered volume group "cluvg" successfully created[root@node1 ~]# lvcreate -L 10G -n clulv cluvg Logical volume "clulv" created |
在node2查看lvm卷
[root@node2 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert clulv cluvg -wi-a----- 10.00g |
格式化GFS2文件系统
[root@node1 ~]# mkfs.gfs2 -p lock_dlm -t userzr-ha:mygfs2 -j 2 /dev/cluvg/clulvThis will destroy any data on /dev/cluvg/clulv.It appears to contain: symbolic link to `../dm-0' Are you sure you want to proceed? [y/n] y Device: /dev/cluvg/clulvBlocksize: 4096Device Size 10.00 GB (2621440 blocks)Filesystem Size: 10.00 GB (2621438 blocks)Journals: 2Resource Groups: 40Locking Protocol: "lock_dlm"Lock Table: "userzr-ha:mygfs2"UUID: ba529bf7-cf82-1087-b02d-7e2a21cfc1a5 |
-p:用来指定gfs的锁机制,一般情况下会选择lock_dlm,如果不加此参数,当在两个
系统中同时挂载此分区时就会像EXT3格式一样,两个系统的信息不能同步
-j:指定journal个数(可加入节点数),一般情况下应留有冗余,否则后期还得再调整;
查看journals:# gfs2_tool journals /data3
增加journals:# gfs2_jadd -j1 /data3 ##增加一个journals
-t:指定DLM锁所在的表名称,mycluster就是RHCS集群的名称,必须与cluster.conf
文件中Cluster标签的name值相同。
格式为ClusterName:FS_Path_Name
最后一个参数是指定逻辑卷的详细路径
挂载GFS2文件系统
[root@node1 ~]# mkdir /gfs2[root@node1 ~]# mount -t gfs2 /dev/cluvg/clulv /gfs2/[root@node1 ~]# df -hTFilesystem Type Size Used Avail Use% Mounted on/dev/sda3 ext4 16G 3.2G 12G 22% /tmpfs tmpfs 946M 32M 915M 4% /dev/shm/dev/sda1 ext4 485M 37M 424M 8% /boot/dev/mapper/cluvg-clulv gfs2 10G 259M 9.8G 3% /gfs2 |
添加GFS2磁盘资源

可以通过blkid查看每个磁盘的label和uuid
[root@node1 ~]# blkid/dev/sda3: UUID="8f6ebb90-384f-4be1-b18e-f1d1565528b8" TYPE="ext4"/dev/sdb1: UUID="12bf8c19-a7e6-4a7c-9601-b08b83d4429c" TYPE="ext4"/dev/sda1: UUID="25e40e8f-5d93-462d-b88e-ee500a66c216" TYPE="ext4"/dev/sda2: UUID="57628bb8-a5c1-40bd-b01f-3282e59cecbc" TYPE="swap"/dev/sdb2: UUID="WWTyBr-z6OO-3tIA-yLhu-IlAg-qiKR-NeU8ie" TYPE="LVM2_member"/dev/mapper/cluvg-clulv: LABEL="userzr-ha:mygfs2" UUID="ba529bf7-cf82-1087-b02d-7e2a21cfc1a5" TYPE="gfs2" |

将mygfs2资源添加到userzr-ser服务中

Submit后GFS2盘将挂载到node2上
将node2的GFS2磁盘卸载,过一会发现userzr-ser自动切换到node1上
[root@node2 ~]# umount /gfs2[root@node2 ~]# clustatCluster Status for userzr-ha @ Wed Mar 16 22:55:51 2016Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1 1 Online, rgmanager node2 2 Online, Local, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:userzr-ser node1 started |
5、设置Qdisk(仲裁盘&表决磁盘两个叫法)
创建一个仲裁盘(必须使用共享存储,我们刚才把iscsi分了3个分区,1个做了ext4,一个做了GFS,剩下这个做仲裁盘)
[root@node1 ~]# mkqdisk -c /dev/sdb3 -l myqdiskmkqdisk v3.0.12.1 Writing new quorum disk label 'myqdisk' to /dev/sdb3.WARNING: About to destroy all data on /dev/sdb3; proceed [N/y] ? yInitializing status block for node 1...Initializing status block for node 2...Initializing status block for node 3...Initializing status block for node 4...Initializing status block for node 5...Initializing status block for node 6...Initializing status block for node 7...Initializing status block for node 8...Initializing status block for node 9...Initializing status block for node 10...Initializing status block for node 11...Initializing status block for node 12...Initializing status block for node 13...Initializing status block for node 14...Initializing status block for node 15...Initializing status block for node 16...[root@node1 ~]# mkqdisk -L #查看仲裁盘mkqdisk v3.0.12.1 /dev/block/8:19:/dev/disk/by-id/scsi-14f504e46494c455274764f7053722d425459632d64744f31-part3:/dev/disk/by-path/ip-172.16.80.10:3260-iscsi-iqn.2006-01.com.user:rd5-lun-0-part3:/dev/sdb3:Magic: eb7a62c2Label: myqdiskCreated: Wed Mar 16 23:01:49 2016Host: node1Kernel Sector Size: 512Recorded Sector Size: 512 |
在luci界面设置Qdisk


Interval:表示间隔多长时间执行一次检查评估,单位是秒。
TKO:表示允许检查失败的次数。一个节点在TKO*Interval时间内如果还连接不上qdisk分区,那么就认为此节点失败,会从集群中隔离。
Minimum Total Score:指定最小投票值是多少。
Label:Qdisk分区对应的卷标名,也就是在创建qdisk时指定的“myqdisk”,这里建议用卷标名,因为设备名有可能会在系统重启后发生变化,但卷标名称是不会发生改变的。
Device:指定共享存储在节点中的设备名是什么。
Path to program: 配置第三方应用程序来扩展对节点状态检测的精度,这里配置的是ping命令,ping -c3 -t2 172.16.80.2
Score:设定ping命令的投票值。
interval:设定多长时间执行ping命令一次。
保存后查看Qdisk
[root@node1 ~]# clustat -lCluster Status for userzr-ha @ Wed Mar 16 23:11:41 2016Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1 1 Online, Local, rgmanager node2 2 Online, rgmanager /dev/block/8:19 0 Online, Quorum Disk Service Information------- ----------- Service Name : service:userzr-ser Current State : started (112) Flags : none (0) Owner : node1 Last Owner : node2 Last Transition : Wed Mar 16 22:56:50 2016 |
模拟node2网络故障,测试集群工作状态
将node2网卡禁用,查看node1 /var/log/messages观察现象

浙公网安备 33010602011771号