知识回顾:
RA classes:
OCF
pacemaker
linbit
LSB
legacy Heartbeat V1
STONITH
RA: Resource Agent
代为管理资源
LRM: local Resource Manager
DC
TE:
PE:
CRM: Cluster Resource Manager
haresource (heartbeat v1)
crm, haresource (heartbeat v2)
pacemaker (heartbeat v3)
rgmanager (RHCS)
为那些非ha-aware的应用程序提供高可用的基础平台:
crmd: 管理API,守护进程
GUI
CLI
Web:
vip, httpd, filesystem
HA services:
Resource Type:
Primitive(native) 主资源
group
clone
STONITH
Cluster Filesystem
dlm: Distributed Lock Manager(分布式锁管理器)
master/slave
drbd(分布式磁盘块镜像)
资源粘性:
资源是否倾向于留在当前节点;
正数:乐意
负数:离开
node1.magedu.com: 100, 200
node2.magedu.com: 100, inf(正无穷)
IPaddr::172.16.100.1/16/eth0
资源约束:
location(位置约束)
colocation(排列约束)
order(顺序约束)
heartbeat:
authkeys
ha.cf
node
bcast,mcast,ucast
haresource
HA:
1、时间同步;
2、SSH双击互信
3、主机名称要与uname -n, 并通过/etc/hosts解析;
CIB: Cluster Information Base(集群信息库),crm实现管理高可用集群时候它的资源定义的配置文件,或者叫配置文件的机制,因为它里面包含了不止一个配置文件,它可能将集群的所有配置文件都保存在CIB的配置文件当中;
xml格式
crm --> pacemaker
原理简介
组播报文的目的地址使用D类IP地址, 范围是从224.0.0.0到239.255.255.255。D类地址不能出现在IP报文的源IP地址字段。单播数据传输过程中,一个数据包传输的路径是从源地址路由到目的地址,利用“逐跳”(hop-by-hop)的原理在IP网络中传输。然而在ip组播环中,数据包的目的地址不是一个,而是一组,形成组地址。所有的信息接收者都加入到一个组内,并且一旦加入之后,流向组地址的数据立即开始向接收者传输,组中的所有成员都能接收到数据包。组播组中的成员是动态的,主机可以在任何时刻加入和离开组播组。
组播组分类
组播组可以是永久的也可以是临时的。组播组地址中,有一部分由官方分配的,称为永久组播组。永久组播组保持不变的是它的ip地址,组中的成员构成可以发生变化。永久组播组中成员的数量都可以是任意的,甚至可以为零。那些没有保留下来供永久组播组使用的ip组播地址,可以被临时组播组利用。
224.0.0.0~224.0.0.255为预留的组播地址(永久组地址),地址224.0.0.0保留不做分配,其它地址供路由协议使用;
224.0.1.0~224.0.1.255是公用组播地址,可以用于Internet;
224.0.2.0~238.255.255.255为用户可用的组播地址(临时组地址),全网范围内有效;
239.0.0.0~239.255.255.255为本地管理组播地址,仅在特定的本地范围内有效。
常用预留组播地址
列表如下:
224.0.0.0 基准地址(保留)
224.0.0.1 所有主机的地址 (包括所有路由器地址)
224.0.0.2 所有组播路由器的地址
224.0.0.3 不分配
224.0.0.4 dvmrp 路由器
224.0.0.5 ospf 路由器
224.0.0.6 ospf dr
224.0.0.7 st 路由器
224.0.0.8 st 主机
224.0.0.9 rip-2 路由器
224.0.0.10 Eigrp 路由器
224.0.0.11 活动代理
224.0.0.12 dhcp 服务器/中继代理
224.0.0.13 所有pim 路由器
224.0.0.14 rsvp 封装
224.0.0.15 所有cbt 路由器
224.0.0.16 指定sbm
224.0.0.17 所有sbms
224.0.0.18 vrrp
以太网传输单播ip报文的时候,目的mac地址使用的是接收者的mac地址。但是在传输组播报文时,传输目的不再是一个具体的接收者,而是一个成员不确定的组,所以使用的是组播mac地址。组播mac地址是和组播ip地址对应的。iana(internet assigned number authority)规定,组播mac地址的高24bit为0x01005e,mac 地址的低23bit为组播ip地址的低23bit。
由于ip组播地址的后28位中只有23位被映射到mac地址,这样就会有32个ip组播地址映射到同一mac地址上。
HA2:
[root@node2 ~]# date(查看当前系统时间) 2016年 04月 06日 星期三 18:00:21 CST [root@node2 ~]# ssh node1 'date'(查看node1主机的系统时间) 2016年 04月 06日 星期三 18:00:50 CST 提示:现在时间有这么大差距,高可用集群一定会出问题的; [root@node2 ~]# ntpdate 172.16.100.254(向ntp服务同步时间) 6 Apr 18:14:12 ntpdate[30456]: adjust time server 172.16.100.6 offset 0.000003 sec
HA1:
[root@node2 ~]# ntpdate 172.16.100.254(向ntp服务同步时间)
6 Apr 18:14:12 ntpdate[30456]: adjust time server 172.16.100.6 offset 0.000003 sec
[root@node1 ~]# tail /var/log/messages(查看messages日志文件后10行)
Apr 6 18:20:40 localhost heartbeat: [1136]: info: acquire local HA resources (standby).
Apr 6 18:20:40 localhost ResourceManager[1149]: info: Acquiring resource group: node1.magedu.com IPaddr::172.16.100.1/16/eth0 Filesyste
m::172.16.100.5:/web/htdocs::/var/www/html::nfs httpd
Apr 6 18:20:40 localhost IPaddr[1176]: INFO: Resource is stopped
Apr 6 18:20:40 localhost ResourceManager[1149]: info: Running /etc/ha.d/resource.d/IPaddr 172.16.100.1/16/eth0 start
Apr 6 18:20:40 localhost IPaddr[1274]: INFO: Using calculated netmask for 172.16.100.1: 255.255.0.0
Apr 6 18:20:40 localhost IPaddr[1274]: INFO: eval ifconfig eth0:0 172.16.100.1 netmask 255.255.0.0 broadcast 172.16.255.255
Apr 6 18:20:40 localhost IPaddr[1245]: INFO: Success
Apr 6 18:20:40 localhost Filesystem[1381]: INFO: Resource is stopped
Apr 6 18:20:40 localhost ResourceManager[1149]: info: Running /etc/ha.d/resource.d/Filesystem 172.16.100.5:/web/htdocs /var/www/html nfs
start
Apr 6 18:20:41 localhost Filesystem[1462]: INFO: Running start for 172.16.100.5:/web/htdocs on /var/www/html
[root@node1 ~]# tail -f /var/log/messages(查看message文件后10行,-f追加显示)
Apr 6 18:21:17 localhost hb_standby[1775]: Going standby [foreign].
Apr 6 18:21:17 localhost heartbeat: [353]: info: node1.magedu.com wants to go standby [foreign]
Apr 6 18:21:17 localhost heartbeat: [353]: info: standby: node2.magedu.com can take our foreign resources
Apr 6 18:21:17 localhost heartbeat: [1789]: info: give up foreign HA resources (standby).
Apr 6 18:21:17 localhost heartbeat: [1789]: info: foreign HA resource release completed (standby).
Apr 6 18:21:17 localhost heartbeat: [353]: info: Local standby process completed [foreign].
Apr 6 18:21:18 localhost heartbeat: [353]: WARN: 1 lost packet(s) for [node2.magedu.com] [2465:2467]
Apr 6 18:21:18 localhost heartbeat: [353]: info: remote resource transition completed.
Apr 6 18:21:18 localhost heartbeat: [353]: info: No pkts missing from node2.magedu.com!
Apr 6 18:21:18 localhost heartbeat: [353]: info: Other node completed standby takeover of foreign resources.
提示:使用广播方式,在一个广播网段每个集群自定义两个节点它俩之间传递的心跳信息别人都能收到一份,因为使用的是bcast,所以别人都能统统收到一份,只不过别人传过来以
后WARN,failed authentication认证失败,所以信息都可以传过来没错,但不能接收;
[root@node1 ~]# cd /etc/ha.d/(切换到/etc/ha.d目录)
[root@node1 ha.d]# ls(查看当前目录文件及子目录)
authkeys ha.cf harc haresources rc.d README.config resource.d shellfuncs
[root@node1 ha.d]# ssh node2 '/etc/init.d/heartbeat stop'(停止node2主机的heartbeat服务)
Stopping High-Availability services:
[ OK ]
[root@node1 ha.d]# /etc/init.d/heartbeat stop(停止当前主机的heartbeat服务)
Stopping High-Availability services:
[ OK ]
[root@node1 ha.d]# vim ha.cf(编辑ha.cf配置文件)
#bcast eth0 # Linux
#mcast eth0 225.0.0.1 694 1 0
mcast eth0 225.0.100.19 694 1 0
crm respawn(使用crm机制来管理集群,crm和resources并不兼容,所以它并不会读取haresources配置文件)
/bcast
[root@node1 ha.d]# vim haresources(编辑haresources配置文件)
node1.magedu.com IPaddr::172.16.100.1/16/eth0 Filesystem::172.16.100.5:/web/htdocs::/var/www/html::nfs httpd
提示:使用crm管理集群资源,通过haresources配置的资源,第一个资源IP,第二个资源文件系统,第三个资源web服务,这些统统都不再能够使用了,统统都失效了,因为hares
ources不被crm读取,如果此前配置的资源又想使用;
[root@node1 ha.d]# vim ha.cf(编辑ha.cf文件)
crm respawn
提示:在ha.cf文件改成crm以后,不要急着启动,要执行/usr/lib/heartbeat目录下脚本aresources2cib.py;
[root@node1 ha.d]# cd /usr/lib/heartbeat(切换到/usr/lib/heartbeat目录)
[root@node1 heartbeat]# ls(查看当前目录文件及子目录)
api_test clmtest crm_utils.pyo haresources2cib.pyo ipfail pengine stonithdtest
apphbd crm_commands.py cts hb_addnode logtest pingd tengine
apphbtest crm_commands.pyc dopd hb_delnode lrmadmin plugins TestHeartbeatComm
atest crm_commands.pyo drbd-peer-outdater hb_setsite lrmd(本地资源管理器进程) quorumd(法定票数)
transient-test.sh attrd crmd(crm服务) findif hb_setweight lrmtest quorumdtest ttest
base64_md5_test crm.dtd ha_config hb_standby mach_down ra-api-1.dtd utillib.sh
BasicSanityCheck crm_primitive.py ha_logd hb_takeover mgmtd recoverymgrd
ccm(集群资源配置管理器) crm_primitive.pyc ha_logger heartbeat mgmtdtest req_resource
ccm_testclient crm_primitive.pyo ha_propagate ipctest mlock ResourceManager
cib crm_utils.py haresources2cib.py ipctransientclient ocf-returncodes send_arp(arp欺骗)
cibmon crm_utils.pyc haresources2cib.pyc ipctransientserver ocf-shellfuncs stonithd
提示:haresources2cib.py集群信息库,执行haresources2cib.py脚本并指定haresources文件,它在执行结束之后会将我们指定的/etc/ha.d/haresources这样资源配置文
件转换为CIB的xml格式,并保存至/var/lib/heartbeat/crm目录下,这时候再启动高可用集群就会直接生效了,它会从这个位置去读取CIB文件的,现在就不去通过haresources
文件来配置,直接启动一个空的,没有任何资源的集群,这两个节点之间可以互相传递心跳信息,但是上面就是没有配置任何资源,我们要想配置任何资源也可以,只要手动能够编辑CIB
就可以,但是编辑CIB又比较麻烦,crm管理器给我们提供了非常方便的管理接口,像GUI的client,就是提供了这样能够生成CIB.xml配置文件一种机制,这个脚本haresources2ci
b.py执行为依赖于ha_propagate(propagate扩展通告宣布)这个脚本实现配置CIB信息库的时候就像编辑haresources一样,必须要复制到其它节点上去,但是以后使用GUI方式
管理或者使用CLI方式来管理,甚至使用vim直接打开这个cib.xml配置文件去管理,在某一时刻只能在一个节点上编辑,编辑完以后怎么送到其它节点上去,难道还要手动scp,其实不
需要,这个脚本ha_propagate在某一个节点编辑好集群信息库文件以后它会自动的调用这个脚本通过ssh的双机互信统统都同步到其它主机上去,所以它这个过程完全是自动完成的,
这个ha_propagate脚本的意义;
[root@node1 heartbeat]# ls /var/lib/heartbeat/(查看/var/lib/heartbeat目录文件及子目录)
cores crm delhostcache fifo hb_generation hb_uuid pengine
[root@node1 heartbeat]# ls /var/lib/heartbeat/crm/(查看/var/lib/heartbeat/crm/目录文件及子目录)
[root@node1 heartbeat]# cd /etc/ha.d/(切换到/etc/ha.d目录)
[root@node1 ha.d]# ls(查看当前目录文件及子目录)
authkeys ha.cf harc haresources rc.d README.config resource.d shellfuncs
[root@node1 ha.d]# vim ha.cf(编辑ha.cf配置文件)
[root@node1 ha.d]# /usr/lib/heartbeat/ha_propagate(执行ha_propagate脚本)
Propagating HA configuration files to node node2.magedu.com.
The authenticity of host 'node2.magedu.com (172.16.100.7)' can't be established.
RSA key fingerprint is ae:84:06:36:5b:88:0e:22:9f:16:04:cc:b3:ee:f3:ae.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2.magedu.com' (RSA) to the list of known hosts.
ha.cf 100% 10KB 10.4KB/s 00:00
authkeys 100% 691 0.7KB/s 00:00
Setting HA startup configuration on node node2.magedu.com.
chkconfig version 1.3.30.2 - Copyright (C) 1997-2000 Red Hat, Inc.
This may be freely redistributed under the terms of the GNU Public License.
usage: chkconfig --list [name]
chkconfig --add <name>
chkconfig --del <name>
chkconfig [--level <levels>] <name> <on|off|reset|resetpriorities>
提示: /usr/lib/heartbeat/ha_propagate将ha.cf和authkeys脚本都复制到另外一个节点上去了,所以以后在复制的时候也不需要使用scp了,如果说用不着haresources,
只需要复制ha.cf、authkeys文件的话,比如更新的密钥文件,或者更新了主配置文件的配置,直接执行ha_propagate脚本,就自动同步过去了;
[root@node1 ha.d]# service heartbeat start(启动heartbeat服务)
Starting High-Availability services:
2016/04/06_20:58:03 INFO: Resource is stopped
[ OK ]
[root@node1 ha.d]# ssh node2 'service heartbeat start'(启动node2主机的heartbeat服务)
Starting High-Availability services:
2016/04/06_20:58:33 INFO: Resource is stopped
[ OK ]
[root@node1 ha.d]# tail -f /var/log/messages(查看messages日志文件后10行,-f追加显示)
Apr 6 20:58:43 localhost crmd: [8633]: info: ccm_event_detail: NEW: node2.magedu.com [nodeid=1, born=2]
Apr 6 20:58:43 localhost cib: [8629]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
Apr 6 20:58:43 localhost cib: [8629]: info: mem_handle_event: no mbr_track info
Apr 6 20:58:43 localhost cib: [8629]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Apr 6 20:58:43 localhost cib: [8629]: info: mem_handle_event: instance=2, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
Apr 6 20:58:43 localhost cib: [8629]: info: cib_ccm_msg_callback: PEER: node1.magedu.com
Apr 6 20:58:43 localhost cib: [8629]: info: cib_ccm_msg_callback: PEER: node2.magedu.com
Apr 6 20:58:44 localhost attrd: [8632]: info: main: Starting mainloop...
Apr 6 20:58:44 localhost heartbeat: [8616]: WARN: 1 lost packet(s) for [node2.magedu.com] [27:29]
Apr 6 20:58:44 localhost heartbeat: [8616]: info: No pkts missing from node2.magedu.com!
Apr 6 20:59:12 localhost crmd: [8633]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped!
Apr 6 20:59:12 localhost crmd: [8633]: WARN: do_log: [[FSA]] Input I_DC_TIMEOUT from crm_timer_popped() received in state (S_PENDING)
Apr 6 20:59:12 localhost crmd: [8633]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause
=C_TIMER_POPPED origin=crm_timer_popped ]
Apr 6 20:59:12 localhost crmd: [8633]: info: do_election_count_vote: Updated voted hash for node1.magedu.com to vote
Apr 6 20:59:12 localhost crmd: [8633]: info: do_election_count_vote: Election ignore: our vote (node1.magedu.com)
Apr 6 20:59:12 localhost crmd: [8633]: info: do_election_check: Still waiting on 1 non-votes (2 total)
Apr 6 20:59:12 localhost crmd: [8633]: info: do_election_count_vote: Updated voted hash for node2.magedu.com to no-vote
Apr 6 20:59:12 localhost crmd: [8633]: info: do_election_count_vote: Election ignore: no-vote from node2.magedu.com
Apr 6 20:59:12 localhost crmd: [8633]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Apr 6 20:59:12 localhost crmd: [8633]: info: start_subsystem: Starting sub-system "tengine"
Apr 6 20:59:12 localhost crmd: [8633]: info: start_subsystem: Starting sub-system "pengine"
Apr 6 20:59:12 localhost pengine: [8640]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Apr 6 20:59:12 localhost pengine: [8640]: info: pe_init: Starting pengine
Apr 6 20:59:12 localhost crmd: [8633]: info: do_dc_takeover: Taking over DC status for this partition
Apr 6 20:59:12 localhost tengine: [8639]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Apr 6 20:59:12 localhost tengine: [8639]: info: G_main_add_TriggerHandler: Added signal manual handler
Apr 6 20:59:12 localhost tengine: [8639]: info: G_main_add_TriggerHandler: Added signal manual handler
Apr 6 20:59:12 localhost cib: [8629]: info: cib_process_readwrite: We are now in R/W mode
Apr 6 20:59:12 localhost cib: [8629]: info: revision_check: Updating CIB revision to 2.0
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: - <cib epoch="0" num_updates="0"/>
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + <cib epoch="1" num_updates="1" crm_feature_set="2.0"/>
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: - <cib epoch="1"/>
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + <cib epoch="2">
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + <configuration>
Apr 6 20:59:12 localhost crmd: [8633]: info: erase_node_from_join: Removed dead node node1.magedu.com from join calculations: welcom
ed=0 itegrated=0 finalized=0 confirmed=0
Apr 6 20:59:12 localhost cib: [8641]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + <crm_config>
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + <cluster_property_set id="cib-bootstrap-options">
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + <attributes>
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + <nvpair id="cib-bootstrap-options-dc-version" na
me="dc-version" value="2.1.4-node: aa909246edb386137b986c5773344b98c6969999"/>
Apr 6 20:59:12 localhost crmd: [8633]: info: join_make_offer: Making join offers based on membership 2
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + </attributes>
Apr 6 20:59:12 localhost cib: [8641]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Apr 6 20:59:12 localhost crmd: [8633]: info: erase_node_from_join: Removed dead node node2.magedu.com from join calculations: welcom
ed=0 itegrated=0 finalized=0 confirmed=0
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + </cluster_property_set>
Apr 6 20:59:12 localhost cib: [8641]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (di
gest: /var/lib/heartbeat/crm/cib.xml.sig.last)
Apr 6 20:59:12 localhost crmd: [8633]: info: do_dc_join_offer_all: join-1: Waiting on 2 outstanding join acks
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + </crm_config>
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + </configuration>
Apr 6 20:59:12 localhost cib: [8641]: info: write_cib_contents: Wrote version 0.1.1 of the CIB to disk (digest: c5c60c7572d257bf2ebd
9199848282ac)
Apr 6 20:59:12 localhost cib: [8629]: info: log_data_element: cib:diff: + </cib>
Apr 6 20:59:12 localhost cib: [8629]: info: cib_null_callback: Setting cib_diff_notify callbacks for tengine: on
Apr 6 20:59:12 localhost tengine: [8639]: info: te_init: Registering TE UUID: 06214598-7abe-4cda-9a35-8eeda6ab6ec0
Apr 6 20:59:12 localhost cib: [8641]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Apr 6 20:59:12 localhost tengine: [8639]: info: set_graph_functions: Setting custom graph functions
Apr 6 20:59:12 localhost cib: [8641]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (di
gest: /var/lib/heartbeat/crm/cib.xml.sig.last)
Apr 6 20:59:12 localhost cib: [8642]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Apr 6 20:59:12 localhost tengine: [8639]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses
Apr 6 20:59:12 localhost tengine: [8639]: info: te_init: Starting tengine
Apr 6 20:59:12 localhost cib: [8642]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Apr 6 20:59:12 localhost tengine: [8639]: info: te_connect_stonith: Attempting connection to fencing daemon...
Apr 6 20:59:12 localhost cib: [8642]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (dig
est: /var/lib/heartbeat/crm/cib.xml.sig.last)
Apr 6 20:59:12 localhost cib: [8642]: info: write_cib_contents: Wrote version 0.2.1 of the CIB to disk (digest: 1ae249af06b25401ab237
4b539bae1fd)
Apr 6 20:59:12 localhost cib: [8642]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Apr 6 20:59:12 localhost cib: [8642]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (dig
est: /var/lib/heartbeat/crm/cib.xml.sig.last)
Apr 6 20:59:13 localhost crmd: [8633]: info: update_dc: Set DC to node1.magedu.com (2.0)
Apr 6 20:59:13 localhost cib: [8629]: info: sync_our_cib: Syncing CIB to node2.magedu.com
Apr 6 20:59:13 localhost crmd: [8633]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGR
ATED cause=C_FSA_INTERNAL origin=check_join_state ]
Apr 6 20:59:13 localhost crmd: [8633]: info: do_state_transition: All 2 cluster nodes responded to the join offer.
Apr 6 20:59:13 localhost crmd: [8633]: info: update_attrd: Connecting to attrd...
Apr 6 20:59:13 localhost cib: [8629]: info: sync_our_cib: Syncing CIB to all peers
Apr 6 20:59:13 localhost cib: [8629]: info: log_data_element: cib:diff: - <cib epoch="2"/>
Apr 6 20:59:13 localhost cib: [8629]: info: log_data_element: cib:diff: + <cib epoch="3" dc_uuid="26ae7ed4-39de-4627-8ba9-d1de388e25
b2"/>
Apr 6 20:59:13 localhost attrd: [8632]: info: attrd_local_callback: Sending full refresh
Apr 6 20:59:13 localhost cib: [8643]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Apr 6 20:59:13 localhost cib: [8643]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Apr 6 20:59:13 localhost cib: [8643]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (dige
st: /var/lib/heartbeat/crm/cib.xml.sig.last)
Apr 6 20:59:13 localhost cib: [8643]: info: write_cib_contents: Wrote version 0.3.3 of the CIB to disk (digest: e85768f011bebb0708419b
dfaece3abe)
Apr 6 20:59:13 localhost cib: [8643]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /
var/lib/heartbeat/crm/cib.xml.sig)
Apr 6 20:59:13 localhost cib: [8643]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (dige
st: /var/lib/heartbeat/crm/cib.xml.sig.last)
Apr 6 20:59:13 localhost tengine: [8639]: info: te_connect_stonith: Connected
Apr 6 20:59:13 localhost tengine: [8639]: info: update_abort_priority: Abort priority upgraded to 1000000
Apr 6 20:59:13 localhost tengine: [8639]: info: update_abort_priority: 'DC Takeover' abort superceeded
Apr 6 20:59:14 localhost crmd: [8633]: info: update_dc: Set DC to node1.magedu.com (2.0)
Apr 6 20:59:14 localhost crmd: [8633]: info: do_dc_join_ack: join-1: Updating node state to member for node1.magedu.com
Apr 6 20:59:16 localhost crmd: [8633]: info: do_dc_join_ack: join-1: Updating node state to member for node2.magedu.com
Apr 6 20:59:16 localhost crmd: [8633]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALI
ZED cause=C_FSA_INTERNAL origin=check_join_state ]
Apr 6 20:59:16 localhost crmd: [8633]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
Apr 6 20:59:16 localhost pengine: [8640]: info: determine_online_status: Node node1.magedu.com is online
Apr 6 20:59:16 localhost pengine: [8640]: info: determine_online_status: Node node2.magedu.com is online
Apr 6 20:59:16 localhost crmd: [8633]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE
_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
Apr 6 20:59:16 localhost tengine: [8639]: info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe-input-
0.bz2
Apr 6 20:59:16 localhost tengine: [8639]: info: unpack_graph: Unpacked transition 0: 2 actions in 2 synapses
Apr 6 20:59:16 localhost tengine: [8639]: info: send_rsc_command: Initiating action 2: probe_complete probe_complete on node2.magedu.com
Apr 6 20:59:16 localhost tengine: [8639]: info: send_rsc_command: Initiating action 3: probe_complete probe_complete on node1.magedu.com
Apr 6 20:59:16 localhost tengine: [8639]: info: run_graph: Transition 0: (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Apr 6 20:59:16 localhost tengine: [8639]: info: notify_crmd: Transition 0 status: te_complete - <null>
Apr 6 20:59:16 localhost crmd: [8633]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_IPC_MESSAGE origin=route_message ]
Apr 6 20:59:16 localhost tengine: [8639]: info: extract_event: Aborting on transient_attributes changes for 26ae7ed4-39de-4627-8ba9-d
1de388e25b2
Apr 6 20:59:16 localhost tengine: [8639]: info: update_abort_priority: Abort priority upgraded to 1000000
Apr 6 20:59:16 localhost pengine: [8640]: info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/heartbeat/pengine
/pe-input-0.bz2
Apr 6 20:59:16 localhost crmd: [8633]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=
C_IPC_MESSAGE origin=route_message ]
Apr 6 20:59:16 localhost crmd: [8633]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
Apr 6 20:59:16 localhost pengine: [8640]: info: determine_online_status: Node node1.magedu.com is online
Apr 6 20:59:16 localhost pengine: [8640]: info: determine_online_status: Node node2.magedu.com is online
Apr 6 20:59:16 localhost crmd: [8633]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_P
E_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
Apr 6 20:59:16 localhost pengine: [8640]: info: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/heartbeat/pengine
/pe-input-1.bz2
Apr 6 20:59:16 localhost tengine: [8639]: info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe-input
-1.bz2
Apr 6 20:59:16 localhost tengine: [8639]: info: unpack_graph: Unpacked transition 1: 1 actions in 1 synapses
Apr 6 20:59:16 localhost tengine: [8639]: info: send_rsc_command: Initiating action 2: probe_complete probe_complete on node2.magedu.com
Apr 6 20:59:16 localhost tengine: [8639]: info: run_graph: Transition 1: (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Apr 6 20:59:16 localhost tengine: [8639]: info: notify_crmd: Transition 1 status: te_complete - <null>
Apr 6 20:59:16 localhost crmd: [8633]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_IPC_MESSAGE origin=route_message ]
Apr 6 20:59:17 localhost tengine: [8639]: info: extract_event: Aborting on transient_attributes changes for 513f0f88-2996-48bb-b157-f
2d5eebf30e6
Apr 6 20:59:17 localhost tengine: [8639]: info: update_abort_priority: Abort priority upgraded to 1000000
Apr 6 20:59:17 localhost crmd: [8633]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=
C_IPC_MESSAGE origin=route_message ]
Apr 6 20:59:17 localhost crmd: [8633]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
Apr 6 20:59:17 localhost pengine: [8640]: info: determine_online_status: Node node1.magedu.com is online
Apr 6 20:59:17 localhost pengine: [8640]: info: determine_online_status: Node node2.magedu.com is online
Apr 6 20:59:17 localhost crmd: [8633]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE
_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
Apr 6 20:59:17 localhost tengine: [8639]: info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe-input-2.
bz2
Apr 6 20:59:17 localhost tengine: [8639]: info: unpack_graph: Unpacked transition 2: 0 actions in 0 synapses
Apr 6 20:59:17 localhost tengine: [8639]: info: run_graph: Transition 2: (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Apr 6 20:59:17 localhost tengine: [8639]: info: notify_crmd: Transition 2 status: te_complete - <null>
Apr 6 20:59:17 localhost crmd: [8633]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_IPC_MESSAGE origin=route_message ]
Apr 6 20:59:17 localhost pengine: [8640]: info: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/heartbeat/pengine/
pe-input-2.bz2
[root@node1 ha.d]# netstat -tnlp(查看系统服务,-t代表tcp,-n以数字显示,-l监听端口,-p显示服务名称)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:2208 0.0.0.0:* LISTEN 3494/./hpiod
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 3175/portmap
tcp 0 0 0.0.0.0:852 0.0.0.0:* LISTEN 3214/rpc.statd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 3515/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 3527/cupsd
tcp 0 0 0.0.0.0:5560 0.0.0.0:* LISTEN 8634/mgmtd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 3564/sendmail
tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN 30858/sshd
tcp 0 0 127.0.0.1:2207 0.0.0.0:* LISTEN 3499/python
tcp 0 0 :::22 :::* LISTEN 3515/sshd
tcp 0 0 ::1:6010 :::* LISTEN 30858/sshd
提示: mgmtd的服务就是crm相关进程,坚挺的端口是tcp的5560;
HA2:
[root@node2 ~]# netstat -tnlp(查看系统服务,-t代表tcp,-n以数字显示,-l监听端口,-p显示服务名称)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:2208 0.0.0.0:* LISTEN 3525/./hpiod
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 3177/portmap
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 3548/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 3562/cupsd
tcp 0 0 0.0.0.0:5560 0.0.0.0:* LISTEN 5396/mgmtd
tcp 0 0 0.0.0.0:856 0.0.0.0:* LISTEN 3218/rpc.statd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 3603/sendmail
tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN 30357/sshd
tcp 0 0 127.0.0.1:2207 0.0.0.0:* LISTEN 3530/python
tcp 0 0 :::22 :::* LISTEN 3548/sshd
tcp 0 0 ::1:6010 :::* LISTEN 30357/sshd
提示:两个节点都会监听tcp的5560端口,所以crmd这个进程需要在每一个节点上都要运行起来的,而后我们就可以使用crm的各种命令行工具来对它进行管理了;
[root@node2 ~]# cib(cib开头的命令,cib集群信息库)
cibadmin ciblint
提示:cibadmin管理cib信息库工具和ciblint;
[root@node2 ~]# cibadmin --help(查看cibadmin的帮助)
usage: cibadmin [V?o:QDUCEX:t:Srwlsh:MmBfbdRx:pP5] command
where necessary, XML data will be obtained using -X, -x, or -p options
Options
--obj_type (-o) <type> object type being operated on
Valid values are: nodes, resources, constraints, crm_config, status
--verbose (-V) turn on debug info. additional instance increase verbosity
--help (-?) this help message
Commands
--cib_erase (-E) Erase the contents of the whole CIB(清空信息库)
--cib_query (-Q) (查询信息库)
--cib_create (-C)
--md5-sum (-5) Calculate an XML file's digest. Requires either -X, -x or -p
--cib_replace (-R) Recursivly replace an object in the CIB
--cib_update (-U) Recursivly update an object in the CIB
--cib_modify (-M) Find the object somewhere in the CIB's XML tree and update is as --cib_update would
--cib_delete (-D)
Delete the first object matching the supplied criteria
Eg. <op id="rsc1_op1" name="monitor"/>
The tagname and all attributes must match in order for the element to be deleted
--cib_delete_alt (-d)
Delete the object at specified fully qualified location
Eg. <resource id="rsc1"><operations><op id="rsc1_op1"/>...
Requires -o
--cib_bump (-B)
--cib_ismaster (-m)
--cib_sync (-S)
XML data
--crm_xml (-X) <string> Retrieve XML from the supplied string
--xml-file (-x) <filename> Retrieve XML from the named file
--xml-pipe (-p) Retrieve XML from STDIN
Advanced Options
--host (-h) send command to specified host. Applies to cib_query and cib_sync commands only
--local (-l) command takes effect locally on the specified host
--no-bcast (-b) command will not be broadcast even if it altered the CIB
--sync-call (-s) wait for call to complete before returning
[root@node2 ~]# crm(crm开头的命令)
crmadmin crm_diff crm_master crm_resource crm_standby crm_verify
crm_attribute crm_failcount crm_mon crm_sh crm_uuid
提示: crmadmin用于管理crm本身的,crm_verify校验crm所配置完主配置文件cib_xml是否有语法错误,crm_mon监控集群,crm_resource用于做资源配置;
[root@node2 ~]# crm_mon(监控集群)
Refresh in 13S...(每隔15秒刷新一次来显示当前集群状态)
============
Last updated: Wed Apr 6 21:11:44 2016
Current DC: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2)(当前的DC是什么)
2 Nodes configured.(有几个节点)
0 Resources configured.(有多少资源)
============
Node: node2.magedu.com (513f0f88-2996-48bb-b157-f2d5eebf30e6): online(两个节点的状态,online在线)
Node: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2): online
[root@node2 ~]# crm_resource --help(查看crm_resource命令的帮助)
usage: crm_resource [-?VS] -(L|Q|W|D|C|P|p) [options]
--help (-?) : this help message
--verbose (-V) : turn on debug info. additional instances increase verbosity
--quiet (-Q) : Print only the value on stdout (for use with -W)
Commands
--list (-L) : List all resources
--query-xml (-x) : Query a resource
Requires: -r
--locate (-W) : Locate a resource(定位一个资源,可以查一下某个资源到底运行在那个节点上)
Requires: -r
--migrate (-M) : Migrate a resource from it current location. Use -H to specify a destination
If -H is not specified, we will force the resource to move by creating a rule for the current location and a score of -IN
FINITY (手动转移一个资源,让一个资源从一个节点迁移到另外一个节点)
NOTE: This will prevent the resource from running on this node until the constraint is removed with -U
Requires: -r, Optional: -H, -f, --lifetime
--un-migrate (-U) : Remove all constraints created by -M
Requires: -r
--delete (-D) : Delete a resource from the CIB(删除一个资源)
Requires: -r, -t
--cleanup (-C) : Delete a resource from the LRM(清空一个资源的状态信息)
Requires: -r. Optional: -H
--reprobe (-P) : Recheck for resources started outside of the CRM(对资源检查)
Optional: -H
--refresh (-R) : Refresh the CIB from the LRM(刷新)
Optional: -H
--set-parameter (-p) <string> : Set the named parameter for a resource(给资源设置参数)
Requires: -r, -v. Optional: -i, -s, --meta
--get-parameter (-g) <string> : Get the named parameter for a resource(给资源命名参数)
Requires: -r. Optional: -i, -s, --meta
--delete-parameter (-d) <string>: Delete the named parameter for a resource(删除资源命名参数)
Requires: -r. Optional: -i, --meta
Options
--resource (-r) <string> : Resource ID
--resource-type (-t) <string> : Resource type (primitive, clone, group, ...)
--property-value (-v) <string> : Property value
--host-uname (-H) <string> : Host name
--meta : Modify a resource's configuration option rather than one which is passed to the resource agent script.
For use with -p, -g, -d
--lifetime (-u) <string> : Lifespan of migration constraints
--force (-f) : Force the resource to move by creating a rule for the current location and a score of -INFINITY
This should be used if the resource's stickiness and constraint scores total more than INFINITY (Currently 100,000)
NOTE: This will prevent the resource from running on this node until the constraint is removed with -U or the --lifetime
duration expires
-s <string> : (Advanced Use Only) ID of the instance_attributes object to change
-i <string> : (Advanced Use Only) ID of the nvpair object to change/delete
[root@node2 ~]# crm(查看crm开头命令)
crmadmin crm_diff crm_master crm_resource crm_standby crm_verify
crm_attribute crm_failcount crm_mon crm_sh crm_uuid
提示:crm_standby让自己转为被节点;
[root@node2 ~]# crm_standby --help(查看crm_standby命令的帮助)
usage: crm_standby [-?V] -(u|U) -(D|G|v) [-l]
Options
--help (-?) : this help message
--verbose (-V) : turn on debug info. additional instances increase verbosity
--quiet (-Q) : Print only the value on stdout (use with -G)
--get-value (-G) : Retrieve rather than set the preference to be promoted
--delete-attr (-D) : Delete rather than set the attribute
--attr-value (-v) <string> : Value to use (ignored with -G)
--attr-id (-i) <string> : The 'id' of the attribute. Advanced use only.
--node-uuid (-u) <node_uuid> : UUID of the node to change(指定让那个节点转为被节点)
--node-uname (-U) <node_uname> : uname of the node to change(指定节点名称转为被节点)
--lifetime (-l) <string> : How long the preference lasts (reboot|forever)
If a forever value exists, it is ALWAYS used by the CRM
instead of any reboot value
[root@node2 ~]# crm(查看crm开头命令)
crmadmin crm_diff crm_master crm_resource crm_standby crm_verify
crm_attribute crm_failcount crm_mon crm_sh crm_uuid
提示:crm_failcount统计故障转移次数,crm_attribute用来定义每一个资源相关的属性的,crm_sh命令行工具;
[root@node2 ~]# crm_attribute --help(查看crm_attribute命令帮助)
usage: crm_attribute [-?V] -(D|G|v) [options]
Options
--help (-?) : this help message
--verbose (-V) : turn on debug info. additional instances increase verbosity
--quiet (-Q) : Print only the value on stdout (use with -G)
--get-value (-G) : Retrieve rather than set the preference to be promoted
--delete-attr (-D) : Delete rather than set the attribute
--attr-value (-v) <string> : Value to use (ignored with -G)
--attr-id (-i) <string> : The 'id' of the attribute. Advanced use only.
--node-uuid (-u) <node_uuid> : UUID of the node to change
--node-uname (-U) <node_uname> : uname of the node to change
--set-name (-s) <string> : Set of attributes in which to read/write the attribute
--attr-name (-n) <string> : Attribute to set
--type (-t) <string> : Which section of the CIB to set the attribute: (nodes|status|crm_config)
-t=nodes options: -(U|u) -n [-s]
-t=status options: -(U|u) -n [-s]
-t=crm_config options: -n [-s]
--inhibit-policy-engine (-!) : Make a change and prevent the TE/PE from seeing it straight away.
You may think you want this option but you don't. Advanced use only - you have been warned!
[root@node2 ~]# crm_sh(进入crm的sh)
crm #
crm # help(帮助)
Usage: crm (nodes|config|resources)(管理节点,管理配置,管理资源)
crm # nodes(进入nodes模式)
crm nodes # help(帮助)
Usage: nodes (status|list)(状态,列出)
crm nodes # list(列出当前有多少节点)
<node id="513f0f88-2996-48bb-b157-f2d5eebf30e6" uname="node2.magedu.com" type="normal"/>
<node id="26ae7ed4-39de-4627-8ba9-d1de388e25b2" uname="node1.magedu.com" type="normal"/>
crm nodes # status(查看当前每个节点状态)
<node_state id="26ae7ed4-39de-4627-8ba9-d1de388e25b2" uname="node1.magedu.com" crmd="online" crm-debug-origin="do_lrm_query" shutdown=
"0" in_ccm="true" ha="active" join="member" expected="member">
<node_state id="513f0f88-2996-48bb-b157-f2d5eebf30e6" uname="node2.magedu.com" crmd="online" crm-debug-origin="do_lrm_query" shutdown=
"0" in_ccm="true" ha="active" join="member" expected="member">
crm nodes # exit(退出)
ERROR: ** unknown exception encountered, details follow
Traceback (most recent call last):
File "/usr/sbin/crm_sh", line 338, in ?
rc = main_loop(args)
File "/usr/sbin/crm_sh", line 258, in main_loop
return d()
File "/usr/sbin/crm_sh", line 257, in <lambda>
d = lambda: func(*cmd_args, **cmd_options)
File "/usr/lib/heartbeat/crm_commands.py", line 73, in exit
sys.exit(0)
NameError: global name 'sys' is not defined
[root@node2 ~]# crm_sh(进入crm的sh)
crm # resources(进入resources模式)
crm resources # help(帮助)
Usage: resources (status|list)(状态,列出)
crm resources # list(列出当前所有资源)
NO resources configured
crm resources # exit(退出)
ERROR: ** unknown exception encountered, details follow
Traceback (most recent call last):
File "/usr/sbin/crm_sh", line 338, in ?
rc = main_loop(args)
File "/usr/sbin/crm_sh", line 258, in main_loop
return d()
File "/usr/sbin/crm_sh", line 257, in <lambda>
d = lambda: func(*cmd_args, **cmd_options)
File "/usr/lib/heartbeat/crm_commands.py", line 73, in exit
sys.exit(0)
NameError: global name 'sys' is not defined
[root@node2 ~]# ls(查看当前目录文件及子目录)
anaconda-ks.cfg heartbeat-pils-2.1.4-10.el5.i386.rpm install.log.syslog
heartbeat-2.1.4-9.el5.i386.rpm heartbeat-stonith-2.1.4-10.el5.i386.rpm libnet-1.1.4-3.el5.i386.rpm
heartbeat-gui-2.1.4-9.el5.i386.rpm install.log perl-MailTools-1.77-1.el5.rf.noarch.rpm
提示:heartbeat-gui就是一个图形界面窗口,只有在图形界面才能访问它的,heartbeat-gui需要输入帐号密码以后才能连接上去,事实上运行heartbeat访问时的用户的帐号和密码
是安装heartbeat的时候自动创建的用户;
[root@node2 ~]# tail /etc/passwd(查看passwd文件后10行)
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
xfs:x:43:43:X Font Server:/etc/X11/fs:/sbin/nologin
haldaemon:x:68:68:HAL daemon:/:/sbin/nologin
avahi-autoipd:x:100:101:avahi-autoipd:/var/lib/avahi-autoipd:/sbin/nologin
gdm:x:42:42::/var/gdm:/sbin/nologin
sabayon:x:86:86:Sabayon user:/home/sabayon:/sbin/nologin
Smoke:x:500:500:Smoke:/home/Smoke:/bin/bash
hacluster:x:101:102:heartbeat user:/var/lib/heartbeat/cores/hacluster:/sbin/nologin
apache:x:48:48:Apache:/var/www:/sbin/nologin
提示:hacluster这个用户默认没有密码,需要给它加一个密码,并且以这个用户名和刚才设定的密码的方式来连接集群服务,在那个主机连接配置那个主机帐号密码;
HA1:
[root@node1 ha.d]# passwd hacluster(为hacluster用户添加密码) New UNIX password: BAD PASSWORD: it is based on a dictionary word Retype new UNIX password: passwd: all authentication tokens updated successfully. [root@node1 ha.d]# hb_gui &(后台运行hb_gui程序)
点击login to cluster

Server(:port):集群服务器地址,这就是连接mgmtd所监听的5560端口的,要连接那个节点一定要在那个节点上的hacluster用户给它加了密码,也就意味着我们完全可以远程连接其它节点,这个没有任何问题,连接node2也行,但是node2的hacluster用户得有密码,刚才配置的是node1的,而node1恰恰就是本机,所以使用127.0.0.1就可以了,User Name:集群服务器帐号,Password:密码,用户名hacluster,密码hacluster,点击OK;

有三个节点,有一个ping node,有意加过来一个节点仅仅作为我们当前判断节点是不是活动状态的,DC是node1,两个节点都处于running状态,with quorum满足法定票数;

点击node2.magedu.com running节点,Node Name:节点名称,Online: True在线,Is it DC:是否DC,Type: Member类型,Standby:Flalse是否是standby,false不是备用节点,Expected up:True,Shutdown:False,Unclean:False是否非干净状态;

点击linux-ha--Configurations集群全局配置,No Quorum Policy:不满足法定票数时的策略,有stop、freeze、ignore,默认为stop,Sysmmetric Cluster:对称集群和非对称集群,对称是左对称,当一个节点故障以后它的资源可以转移到任意其它节点上去,当然转移的过程可能依赖于约束,转移的次序优先转移到那个节点,非对称,每一个节点都默认不能转移,可以手动开放说那个节点可以转移,Stonith Enabled:是否启用Stonith功能,必须要有Stonith设备,并配置,Stonith Action:一旦有了Stonith设备reboot还是poweroff,Default Resource Stickiness:默认资源粘性,为0表示运行在那个节点都可以,资源粘性更重要意义表示它是否更乐意留在当前节点的,Default Resource Failure Stickiness:默认资源故障粘性,Is Managed Default:默认是否可管理;

点击Advanced,点击Yes,高级模式,DC Deadtime:DC多长时间死掉,Cluster Recheck Interval:Cluster重新检测时间间隔,Election Timeout:选举超时时间,选举DC;

点击某个节点node2 mageud.com running(dc) 点击make the node standby(让其成为standby节点),点击+加号,添加相关的条目,比如现在集群启动起来了没有资源,两个高可用集群运行起来没有任何意义,而是要在两个节点运行服务,而这个服务是高可用就可以了,向集群中添加资源,点击Resources,点击+加号;

The type of new item新项目类型,Item Type:项目类型有native(基本资源)、group(组资源)、下面三个是约束location、order、colocation,资源还有克隆和master/slave,克隆或者master/slave它首先得是一个主资源,或者基本资源,得先是个主资源将它克隆几分,或者先创建基本资源把它克隆两份,并且复制为一个主的一个从的,所以得先创建主资源再里面修改它的属性,它才能成为克隆,不能直接就创建克隆资源,这是不允许的,这就是为什么我们的资源类型只有两个native和group的,要创建一个web服务的高可用集群,我们得三个资源,VIP、服务、Filesystem,先不考虑Filesystem怎么配置,选择native点击OK;

打开一个资源配置界面,Resource ID资源ID号,叫什么名字,这是VIP,每个资源都需要有标识符,假如集群内部会运行多个高可用服务,可以把它配置为不叫vip,配置为webip也可以,Beiong to group(type for new one)配置组,属于那个组,如果多个资源都定义好的话,应该属于同一个组,这样子它就可以同进同退了,不然配置两个资源它会各自运行在两个节点上的,Type(double click for detail):选择资源代理,昨天使用配置IP地址的时候使用的是IPaddr或者IPaddr2,选择IPaddr,都是heartbeat提供的,IPaddr是ifconfig,IPaddr2是ip addr,下面配置的别名使用ifconfig是看不到的,要使用ip addr show才能看得到,假如还使用跟昨天一样IPaddr,双击IPaddr,点击Value添加参数172.16.100.1,每一个heartbeat自带的V1版本的资源代理或者ocf的资源代理它们都接受参数的,有些参数甚至是强制的,既然要使用IPaddr这样一个ra来管理某个IP地址,那IP地址是什么,要给它有个参数,像昨天IPaddr::地址/掩码/网卡别名,如何向它传递参数,点击Add Parameter就可以添加参数了,Required和Optional等等,ip [required,unique] IP是必须的,而且是唯一的,值是唯一的,nic绑定在那个网卡别名,cidr_netmask子网掩码,broadcast广播地址,iflabel指定网卡别名,local_stop_script本地停止脚本,local_start_script本地启动脚本,lvs_support是否支持lvs,支持lvs它有特殊作用;

Name:选择nic,Optional选择eth0,点击OK;

继续添加,Name:选择cidr_netmask,Optional写16位掩码,点击OK;

这是添加的第一个资源,点击Add就添加好了,选择Clone(克隆)它就克隆资源了,clone_max最多克隆几个,Clone or Master/Slave ID:克隆ID,clone_node_max:最多运行在几个节点上,选择Master/Slave选择主从类型,现在即不是clone也不是Master/Slave,点击Add就将资源添加上了;

点击Resources--webip--右键--Start,它就可以running了;

HA2:
[root@node2 ~]# ifconfig(查看当前主机网卡信息)
eth0 Link encap:Ethernet HWaddr 00:0C:29:8A:44:AB
inet addr:172.16.100.7 Bcast:172.16.100.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe8a:44ab/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:179194 errors:0 dropped:0 overruns:0 frame:0
TX packets:159547 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:34204413 (32.6 MiB) TX bytes:31328396 (29.8 MiB)
Interrupt:67 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:8A:44:AB
inet addr:172.16.100.1 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:67 Base address:0x2000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:686 errors:0 dropped:0 overruns:0 frame:0
TX packets:686 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:79789 (77.9 KiB) TX bytes:79789 (77.9 KiB)
要启动一个web服务器资源,要启动第二个资源,点击Resources--Add new item,选择native,点击OK,Resource ID: 填写httpd,提供httpd的资源代理有apache,这是ocf格式的,我们自己的web服务器有一个lsb格式的,lsb格式的管理起来更为简单,找到httpd,这两个都可以用,只不过heartbeat所提供的配置它可以接受参数,灵活性更大,而httpd的配置过程更简单,它 不接收任何参数,lsb格式的基本上不接收参数的,选择httpd,直接点击Add;

点击Resources--httpd--右键--Start,默认情况下资源是平衡运行的,各自运行在不同的节点上,当具有多个资源的时候,它们尽可能在节点中平衡的,资源平均运行在各个节点上,所以两个资源运行在不同的节点上了,这不复合我们的需要,可以定义约束,使用排列约束要求必须在一起,也可以定义加入一个组;

点击Resources--右键--Add new item,选择group,点击OK,ID: webserver,Ordered和Collocated不用管,点击OK;

Resource ID:资源叫什么,这意味着先定义组,才能定义资源,当要定义一个资源的时候,这个资源属于一个组的时候,在crm当中必须要先定义组;

停止webip和httpd资源,点击Resources--webip/httpd--右键--Cleanup Resources(清理资源状态),清理完以后删除,再重新添加资源,如果用到组得先添加组,点击Resources--Add New item,Item Type:选择group,点击OK,ID:组名叫做webserver,点击OK,第一个资源叫webip,找到IPaddr,给它一个地址172.16.100.1,点击Add Parameter,添加参数nic,选择要使用别名的网卡为eth0,点击Add Parameter,添加参数cidr_netmask,Optional:写为16位掩码,点Add;

选择Resources--webserver--右键--Add New item,选择native,点击OK,Resource ID:叫做httpd,找到httpd,点Add;

点击Resources--webserver--右键--Start,运行在节点2上,同时运行在同一个节点上;

测试:通过Windows的IE浏览器访问172.16.100.1,可以访问;

选择node2.magedu.com,点击make the node standby,或者点右键Standby,点Yes,开始运行在节点1上面了;

测试:通过Windows的IE浏览器访问172.16.100.1,可以访问;

点击node2.magedu.com--右键--Avtive;

HA2:
[root@node2 ~]# cd /etc/ha.d/(切换到/etc/ha.d目录) [root@node2 ha.d]# ls(查看当前目录文件及子目录) authkeys ha.cf harc haresources rc.d README.config resource.d shellfuncs [root@node2 ha.d]# vim ha.cf(编辑ha.cf文件) auto_failback on(原来的节点上线了,当前主机坏了就转移出去,原来的主机出现又转移回去) /auto_failback 提示:自动会转移回去的,如果说不管是auto_failback有没有,不想让它转移回去,可以定义资源默认粘性,资源默认可以有粘性,只要对当前节点默认粘性大于0,而且对其它节点也不大 于这个节点,它就会留在当前节点,对于web我们也期望有第三个资源Filesystem,也添加进来;
点击webservice--右键--Add New item--Stop,再点击webservice--右键--Add New item--Add New item,选择native,点击OK,但是在添加组的时候有个特性,Ordered:true顺序约束,按照顺序添加它就按照顺序启动,Filesystem应该在最前面,Collocated: true,是否运行在同一个节点上,这叫排列约束,这个组的资源必须要运行在同一个资源上,可以点击资源--右键--Move UP/Move Down调整资源顺序,名字webstore,选择Filesystem,底下有几个参数必须给,device 172.16.100.5:/web/htdocs(设备),directory /var/www/html(挂载点),fstype nfs(文件系统类型),点击Add;

点击webstore--右键--Move UP,点击Resources--Start,Filesystem资源的文件系统挂载不上;

HA2:
[root@node2 ha.d]# cd(切换到用户家目录)
[root@node2 ~]# mount -t nfs 172.16.100.5:/web/htdocs /var/www/html/(挂载网络文件系统172.16.100.5:/web/htdocs到/var/www/html目录,-t文件系统类型)
[root@node2 ~]# ls /var/www/html/(查看/var/www/html文件及子目录)
index.html
[root@node2 ~]# cat /var/www/html/index.html(查看index.html文件内容)
nfs server
[root@node2 ~]# umount /var/www/html/(卸载/var/www/html目录挂载文件系统)
[root@node2 ~]# tail /var/log/messages(查看messages日志文件后10行)
Apr 7 00:48:08 localhost lrmd: [5992]: info: RA output: (webip:stop:stderr) SIOCDELRT: No such process
Apr 7 00:48:08 localhost IPaddr[8404]: INFO: ifconfig eth0:0 down
Apr 7 00:48:08 localhost avahi-daemon[3754]: Withdrawing address record for 172.16.100.1 on eth0.
Apr 7 00:48:08 localhost crmd: [5995]: info: process_lrm_event: LRM operation webip_stop_0 (call=58, rc=0) complete
Apr 7 00:48:08 localhost tengine: [6002]: info: match_graph_event: Action webip_stop_0 (5) confirmed on node2.magedu.com (rc=0)
Apr 7 00:48:08 localhost tengine: [6002]: info: te_pseudo_action: Pseudo action 9 fired and confirmed
Apr 7 00:48:08 localhost tengine: [6002]: info: te_pseudo_action: Pseudo action 1 fired and confirmed
Apr 7 00:48:08 localhost tengine: [6002]: info: run_graph: Transition 69: (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Apr 7 00:48:08 localhost tengine: [6002]: info: notify_crmd: Transition 69 status: te_complete - <null>
Apr 7 00:48:08 localhost crmd: [5995]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause
=C_IPC_MESSAGE origin=route_message
[root@node2 ~]# crm_verify -L(连接到运行集群)
crm_verify[8524]: 2016/04/07_00:51:26 ERROR: unpack_rsc_op: Hard error: webstore_start_0 failed with rc=2.
crm_verify[8524]: 2016/04/07_00:51:26 ERROR: unpack_rsc_op: Preventing webstore from re-starting on node2.magedu.com
crm_verify[8524]: 2016/04/07_00:51:26 ERROR: unpack_rsc_op: Hard error: webstore_start_0 failed with rc=2.
crm_verify[8524]: 2016/04/07_00:51:26 ERROR: unpack_rsc_op: Preventing webstore from re-starting on node1.magedu.com
Warnings found during check: config may not be valid
Use -V for more details
[root@node2 ~]# crm_verify -L -V(连接到运行集群,显示详细信息)
crm_verify[8528]: 2016/04/07_00:52:10 ERROR: unpack_rsc_op: Hard error: webstore_start_0 failed with rc=2.
crm_verify[8528]: 2016/04/07_00:52:10 ERROR: unpack_rsc_op: Preventing webstore from re-starting on node2.magedu.com
crm_verify[8528]: 2016/04/07_00:52:10 WARN: unpack_rsc_op: Processing failed op webstore_start_0 on node2.magedu.com: Error
crm_verify[8528]: 2016/04/07_00:52:10 WARN: unpack_rsc_op: Compatability handling for failed op webstore_start_0 on node2.magedu.com
crm_verify[8528]: 2016/04/07_00:52:10 ERROR: unpack_rsc_op: Hard error: webstore_start_0 failed with rc=2.
crm_verify[8528]: 2016/04/07_00:52:10 ERROR: unpack_rsc_op: Preventing webstore from re-starting on node1.magedu.com
crm_verify[8528]: 2016/04/07_00:52:10 WARN: unpack_rsc_op: Processing failed op webstore_start_0 on node1.magedu.com: Error
crm_verify[8528]: 2016/04/07_00:52:10 WARN: unpack_rsc_op: Compatability handling for failed op webstore_start_0 on node1.magedu.com
crm_verify[8528]: 2016/04/07_00:52:10 WARN: native_color: Resource webstore cannot run anywhere
crm_verify[8528]: 2016/04/07_00:52:10 WARN: native_color: Resource httpd cannot run anywhere
Warnings found during check: config may not be valid
[root@node2 ~]# cd /var/lib/heartbeat/crm/(切换到/var/lib/heartbeat/crm目录)
[root@node2 crm]# ls(查看当前目录文件及子目录)
cib.xml cib.xml.last cib.xml.sig cib.xml.sig.last
提示:配置文件保存到/var/lib/hartbeat/crm/目录叫cib.xml文件;
[root@node2 crm]# cat cib.xml(查看cib.xml文件内容)
<cib generated="true" admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0"
epoch="67" num_updates="1" cib-last-written="Thu Apr 7 00:51:04 2016" ccm_transition="2" dc_uuid="513f0f88-2996-48bb-b157-f2d5eebf30e6">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.4-node: aa909246edb386137b986c5773344b98c6969999"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1459960925"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
<node id="513f0f88-2996-48bb-b157-f2d5eebf30e6" uname="node2.magedu.com" type="normal">
<instance_attributes id="nodes-513f0f88-2996-48bb-b157-f2d5eebf30e6">
<attributes>
<nvpair id="standby-513f0f88-2996-48bb-b157-f2d5eebf30e6" name="standby" value="off"/>
</attributes>
</instance_attributes>
</node>
<node id="26ae7ed4-39de-4627-8ba9-d1de388e25b2" uname="node1.magedu.com" type="normal"/>
</nodes>
<resources>
<group id="webserver">
<primitive id="webip" class="ocf" type="IPaddr" provider="heartbeat">
<instance_attributes id="webip_instance_attrs">
<attributes>
<nvpair id="6813f3b0-3f1c-463a-996b-2a6eab9c41da" name="ip" value="172.16.100.1"/>
<nvpair id="6358809d-28b1-462a-95bc-d8d23dedaa2c" name="nic" value="eth0"/>
<nvpair id="1386fbb7-efbb-4d5b-b4fc-5499c40b163f" name="cidr_netmask" value="16"/>
</attributes>
</instance_attributes>
<meta_attributes id="webip_meta_attrs">
<attributes/>
</meta_attributes>
</primitive>
<meta_attributes id="webserver_meta_attrs">
<attributes>
<nvpair name="target_role" id="webserver_metaattr_target_role" value="started"/>
<nvpair id="webserver_metaattr_ordered" name="ordered" value="true"/>
<nvpair id="webserver_metaattr_collocated" name="collocated" value="true"/>
</attributes>
</meta_attributes>
<primitive id="webstore" class="ocf" type="Filesystem" provider="heartbeat">
<instance_attributes id="webstore_instance_attrs">
<attributes>
<nvpair id="25a1bc67-abb9-42d9-af4f-db5947984979" name="device" value="172.16.100.5/web/htdocs"/>
<nvpair id="e4558991-74e8-489e-9bc6-02490c131bea" name="directory" value="/var/www/html"/>
<nvpair id="6504ff27-0394-405f-b4c2-2f286dd26a08" name="fstype" value="nfs"/>
</attributes>
</instance_attributes>
<meta_attributes id="webstore_meta_attrs">
<attributes>
<nvpair id="webstore_metaattr_target_role" name="target_role" value="started"/>
</attributes>
</meta_attributes>
</primitive>
<primitive id="httpd" class="lsb" type="httpd" provider="heartbeat"/>
</group>
</resources>
<constraints/>
</configuration>
</cib>
重新添加webstore和httpd资源,点击webstore--右键--Cleanup Resources,点击webservice--右键--Start启动资源;

测试:通过Windows的IE浏览器访问172.16.100.1,可以正常访问;

HA1:
[root@node1 ~]# ssh node2 '/etc/init.d/heartbeat stop'(停止node2节点heartbeat服务)
Stopping High-Availability services:
[ OK ]
[root@node1 ~]# tail -f /var/log/messages(查看messages日志文件后10行,-f追加显示)
Apr 7 01:36:46 localhost tengine: [17009]: info: notify_crmd: Transition 0 status: te_complete - <null>
Apr 7 01:36:46 localhost crmd: [10141]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause
=C_IPC_MESSAGE origin=route_message ]
Apr 7 01:36:47 localhost haclient: on_event:evt:cib_changed
Apr 7 01:36:47 localhost last message repeated 2 times
Apr 7 01:36:47 localhost haclient: on_event: from message queue: evt:cib_changed
Apr 7 01:36:47 localhost haclient: on_event: from message queue: evt:cib_changed
Apr 7 01:37:10 localhost heartbeat: [10123]: WARN: node node2.magedu.com: is dead
Apr 7 01:37:10 localhost heartbeat: [10123]: info: Link node2.magedu.com:eth0 dead.
Apr 7 01:37:10 localhost crmd: [10141]: notice: crmd_ha_status_callback: Status update: Node node2.magedu.com now has status [dead]
Apr 7 01:37:10 localhost haclient: on_event:evt:cib_changed
[root@node1 ~]# crm_mon(监控集群)
Refresh in 13s...
============
Last updated: Thu Apr 7 01:39:50 2016
Current DC: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2)
2 Nodes configured.
1 Resources configured.
============
Node: node2.magedu.com (513f0f88-2996-48bb-b157-f2d5eebf30e6): OFFLINE
Node: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2): online
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
webstore (ocf::heartbeat:Filesystem): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
测试:通过Windows的IE浏览器访问172.16.100.1,可以正常访问;

[root@node1 ~]# ssh node2 '/etc/init.d/heartbeat start'(启动node2主机的heartbeat服务)
Starting High-Availability services:
2016/04/07_01:41:19 INFO: Resource is stopped
[ OK ]
[root@node1 ~]# tail -f /var/log/messages(查看messages日志文件后10行,-f追加显示)
Apr 7 01:41:36 localhost haclient: on_event: from message queue: evt:cib_changed
Apr 7 01:41:36 localhost tengine: [17009]: info: match_graph_event: Action webstore_start_0 (7) confirmed on node2.magedu.com (rc=0)
Apr 7 01:41:36 localhost tengine: [17009]: info: send_rsc_command: Initiating action 8: start httpd_start_0 on node2.magedu.com
Apr 7 01:41:37 localhost haclient: on_event:evt:cib_changed
Apr 7 01:41:37 localhost tengine: [17009]: info: match_graph_event: Action httpd_start_0 (8) confirmed on node2.magedu.com (rc=0)
Apr 7 01:41:37 localhost tengine: [17009]: info: te_pseudo_action: Pseudo action 10 fired and confirmed
Apr 7 01:41:37 localhost tengine: [17009]: info: run_graph: Transition 3: (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Apr 7 01:41:37 localhost tengine: [17009]: info: notify_crmd: Transition 3 status: te_complete - <null>
Apr 7 01:41:38 localhost crmd: [10141]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause
=C_IPC_MESSAGE origin=route_message ]
Apr 7 01:41:38 localhost haclient: on_event:evt:cib_changed
[root@node1 ~]# crm_mon(监控集群)
Refresh in 2s...
============
Last updated: Thu Apr 7 01:42:46 2016
Current DC: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2)
2 Nodes configured.
1 Resources configured.
============
Node: node2.magedu.com (513f0f88-2996-48bb-b157-f2d5eebf30e6): online
Node: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2): online
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node2.magedu.com
webstore (ocf::heartbeat:Filesystem): Started node2.magedu.com
httpd (lsb:httpd): Started node2.magedu.com
[root@node1 ~]# hb_gui &(后台运行hb_gui程序)
[1] 4453
DC运行在节点2上,把wen service停止,不使用组的方式,通过约束的方式把三个资源定义在一起,而且还要定义启动次序,先启动webip,再启动webstore,最后启动httpd,它们都能运行在一起,运行在那个节点上无所谓,或者更倾向运行在节点1也行;

点击webservice--右键--Stop,停止集群,停止的次序先从底下开始,点击webservice--右键--Delete删除资源,点击Resources--右键--Add New Item,选择native,点击OK,Resource ID:叫webip,Type(double click for detail):选择IPaddr,双击,Value填写为172.16.100.1,点击Add Parameter,Name:选择nic,Value:选择eth0,点击OK,再点击Add Parameter,Name:选择cidr_netmask,Value:写16位掩码,点击OK,点击Add;

点击Resources--右键--Add New Item,选择native,点击OK,Resource ID:名字为webstore,Type(double click for detail):选择Filesystem,双击,Device值为172.16.100.5:/web/htdocs,directory值为/var/www/html,fstype值为nfs,点击Add;

点击Resources--右键--Add New Item,选择native,点击OK,Resource ID:名字为httpd,Type(doule click for detail):选择httpd,点击Add;

现在把三个资源都运行起来可能就会运行在不同节点上,运行webip、webstore、httpd三个资源,它是轮流在不同节点运行不同的资源,现在期望运行在同一个节点该怎么办;

定义排列约束,点击Colocations--右键--Add New Item,选择colocation,点击OK,ID:名字为httpd_with_filesystem,From:选择httpd,To:选择webstore,Score:INFINITY(永远在一起),-INFINITY(不能在一起),Description(描述)httpd要依赖于webstore运行,如果webstore不能在任何节点运行意味着httpd不能启动,如果不能启动httpd,webstore不受影响,webstore不依赖于httpd,点击OK,一修改webstore马上跟httpd运行在同一个节点,但是现在跟webip还没在一起;

webip要跟httpd在一起,再定义一个,点击Colocations--右键--Add New Item,选择colocation,点击OK,IP只能运行在httpd运行的地方,现在问题是先启动谁后启动谁,这到无所谓,反正只要它们在一块就行,如果先启动IP,再启动httpd,要判断它俩服务谁先启动谁后启动,一般说来应该先配置IP,再启动web服务,或者先启动web服务器,再配置IP,但有一个缺陷,如果IP不启动,web服务定义就监听在这个IP上,它不能启动,所以有时候要先启动IP,再启动web服务的原因,由此web服务应该跟IP在一起,这就麻烦了,它跟了俩主,如果webip和webstore不在一块怎么办,它到底跟谁,ID:名字webstore_with_webip,干脆webstore跟webip在一起,这样webip启动那webstore就启动那,而httpd又必须跟webstore在一起,也就意味着webstore启动那httpd就启动那,这就一串联的关系,不会产生混乱了,From:webstore,To:webip,Score:选择INFINITY(永远在一起),点击OK;

Webip在节点2上,所以大家都换到节点2上去了,现在三个都在一起了,现在问题是它们三个谁先启动谁后启动;

最好定义顺序约束,点击Orders--右键--Add New Item,Item Type:选择order,点击OK,ID:填写webstore_before_httpd,From:选择httpd,To:选择webstore,Description(描述)先启动
webstore再启动httpd,webstore没有启动,不启动httpd,先停止httpd,再停止webstore,如果httpd服务停止不了,webstore停止不了,Score:选择INFINITY,点击OK;

webip和webstore没有先后次序,但是webip和httpd有先后次序,还得定义次序约束,点击Orders--右键--Add New Item,Item Type:选择order,点击OK,ID:为webip_before_httpd,From:选择httpd,To:选择webip,Description(描述),先启动webip再启动httpd,webip没有启动,不启动httpd,先停止httpd再停止webip,如果不能停止httpd,就不停止webip,Score:选择INFINITY,点击OK;

点击node2.magedu.com--右键--Standby,点击Yes,点击切换回去,查看启动和停止顺序,无论如何节点2一上线都运行在节点2上面,如果期望运行在节点1上怎么办;

点击linux-ha--Configurations,在Default Resource Stickiness:填写默认粘性为100,点击Apply;

点击node2magedu.com--右键--Standby,让节点2称为备节点,在让节点2重新上线,看它会不会转移回去,没有转移回去,更倾向运行在当前节点,因为当前节点的值大于100,对于另外一个节点虽然大于100,但是没关系,更倾向留在当前节点,只要对当前节点粘性大于0,所以它就不会来回转移了,这就是粘性的意义,如果这时候定义就算有默认粘性,当时又定义每一个资源,或者webip更倾向运行在节点2上,对节点2它的约束值比较大;

点击Locations(位置约束)--右键--Add New Item,选择location,点击OK,可以定义那个资源的locations,其实无论那个资源给它定义一个非常大的分数,它更倾向于运行在某个节点上,最终会导致三个资源的结合的值都会大于它的粘性值,ID:填写webip_on_node2,Resource:选择webip,点击OK;

选择--Locations--webip_on_node2,Score:选择INFINITY(永远),点击Add Expression,Attribute:选择#uname,Operation:选择eq,Value:选择node2.magedu.com,点击OK,这表示如果是node2它的分数是INFINITY,更倾向运行在节点2上面,点击Apply,一个资源到底应该运行在那个节点上,取决于所有资源,它一块进退的所有资源它们的粘性值或者约束值之和,并比较所有节点的值,并没有定义三个资源对于节点1来讲位置值是多大,因此他就使用粘性值,粘性值三个加起来各自是100是300,对于节点2来讲,webip是无穷大,webstore是100,httpd是100,加起来是无穷大,对节点2是无穷大,对节点1是300,所以更倾向于运行在节点2上,直接切换到节点2上去了;

现在资源都运行在节点2上去了,现在将节点2关机了,看资源会不会运行在节点1上去,关闭节点2设备;
HA1:
[root@node1 ~]# crm_mon(监控集群) Refresh in 9s... ============ Last updated: Thu Apr 7 04:12:36 2016 Current DC: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2) 2 Nodes configured. 3 Resources configured. ============ Node: node2.magedu.com (513f0f88-2996-48bb-b157-f2d5eebf30e6): OFFLINE Node: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2): online webip (ocf::heartbeat:IPaddr): Started node1.magedu.com webstore (ocf::heartbeat:Filesystem): Started node1.magedu.com httpd (lsb:httpd): Started node1.magedu.com 提示:节点2突然凭空消失了,这接下来可能要会始终等待节点2的心跳信息,默认配置要等30秒,这时候会发现,如果暴力的方式强制将某个节点离线的话,转移速度要比停止服务要慢,因为它要 等待多次心跳信息的,如果把服务停掉了,它会通知其它人停止服务了,所以这个速度要快的多,让节点2重新上线以后它会重新加入到集群里面去的,只要确保heartbeat服务能自动启动;
测试:通过Windows系统的IE浏览器访问172.16.100.1页面,可以访问;

启动节点2的设备,而且我们定义的资源更倾向运行在节点2上面,所以当节点2一旦上线,所有资源又会转移至节点2;

基于hb V2, crm来实现MySQL高可用集群
我们要去创建mysql高可用集群,一个mysql应该支持读写操作的,也就意味着联系到任何一个mysql都有可能创建新的数据,而在运行过程当中任何时刻我们的主节点发生从节点要能取而代之,并且在主节点创建的数据从节点也要能访问到,这是比较关键的,mysql的高可用集群要不要用到共享存储;
nfs, samba, iscsi,
NFS: MySQL, app, data
/etc/my.cnf --> /etc/mysql/mysql.cnf
$MYSQL_BASE
--default-extra-file = (定义mysql配置文件目录)
node1: mysql, mysql
nfs:
vip, mysqld, filesystem
vip
filesystem
mysqld
现在要做一个高可用mysql集群,我们此前熟悉的技术无非是nfs和samba,假如这里仍然以nfs为例,来提供服务,我们在nfs上应该放的是什么,是mysql的应用程序呢,包括它的配置文件呢,还是mysql的数据,是mysql的数据,配置文件要不要放,如果不把配置文件放在同一个共享存储上,那就意味着我们以后重新配置了mysql,都要手动同步的,所以简单的方式可以把mysql的配置文件和数据统统给它放到nfs上,问题是怎么能够让mysql到nfs上读它的配置文件,不能把nfs挂载到/etc目录下;
drbd:分布式复制块设备,可以通过网络对两个磁盘数据做镜像,主机级别raid,默认只能让一个节点挂载;
nfs:
[root@nfs ~]# hostname nfs.magedu.com(修改主机名为nfs.magedu.com) [root@nfs ~]# fdisk /dev/sda(管理磁盘分区,进入交互式) The number of cylinders for this disk is set to 6527. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): p(查看分区情况) Disk /dev/sda: 53.6 GB, 53687091200 bytes 255 heads, 63 sectors/track, 6527 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 13 104391 83 Linux /dev/sda2 14 2624 20972857+ 83 Linux /dev/sda3 2625 2755 1052257+ 82 Linux swap / Solaris Command (m for help): n(创建分区) Command action e extended p primary partition (1-4) e(扩展分区) Selected partition 4(分区号) First cylinder (2756-6527, default 2756): Using default value 2756 Last cylinder or +size or +sizeM or +sizeK (2756-6527, default 6527): Command (m for help): n(新建分区) First cylinder (2756-6527, default 2756): Using default value 2756 Last cylinder or +size or +sizeM or +sizeK (2756-6527, default 6527): +20G(创建20G分区) Command (m for help): t(更改分区类型) Partition number (1-5): 5(分区号) Hex code (type L to list codes): 8e(修改为lvm类型分区) Changed system type of partition 5 to 8e (Linux LVM) Command (m for help): w(保存) The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: Re-reading the partition table failed with error 16: 设备或资源忙. The kernel still uses the old table. The new table will be used at the next reboot. Syncing disks. [root@nfs ~]# partprobe /dev/sda(让内核重新加载sda分区表) [root@nfs ~]# pvcreate /dev/sda5(创建sda5创建为物理卷) Writing physical volume data to disk "/dev/sda5" Physical volume "/dev/sda5" successfully created [root@nfs ~]# vgcreate myvg /dev/sda5(创建卷组myvg,并将sda5物理卷加入卷组) Volume group "myvg" successfully created [root@nfs ~]# lvcreate -L 10G -n mydata myvg(在卷组myvg中创建10G逻辑卷mydata) Logical volume "mydata" created [root@nfs ~]# lvs(查看系统上LV逻辑卷) LV VG Attr LSize Origin Snap% Move Log Copy% Convert mydata myvg -wi-a- 10.00G [root@nfs ~]# mke2fs -j /dev/myvg/mydata(格式化mydata逻辑卷,-j带日志文件系统) mke2fs 1.39 (29-May-2006) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 1310720 inodes, 2621440 blocks 131072 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=2684354560 80 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 34 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [root@nfs ~]# groupadd -g 3306 mysql(创建mysql组,并指定GID为3306) [root@nfs ~]# useradd -u 3306 -g mysql -s /sbin/nologin -M mysql(创建mysql用户,-u指定UID,-g指定添加加到组,-s指定默认shell,-M不创建家目录) [root@nfs ~]# id mysql(查看mysql用户信息) uid=3306(mysql) gid=3306(mysql) groups=3306(mysql) context=root:system_r:unconfined_t:SystemLow-SystemHigh [root@nfs ~]# mkdir /mydata(创建/mydata目录) [root@nfs ~]# vim /etc/fstab(编辑fstab配置文件) LABEL=/ / ext3 defaults 1 1 LABEL=/boot /boot ext3 defaults 1 2 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=SWAP-sda3 swap swap defaults 0 0 /dev/myvg/mydata /mydata ext3 defaults 0 0 [root@nfs ~]# mount -a(挂载/etc/fstab文件中所有文件系统) [root@nfs ~]# mount(查看当前系统已经挂载的文件系统) /dev/sda2 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) /dev/mapper/myvg-mydata on /mydata type ext3 (rw) [root@nfs ~]# mkdir /mydata/data(创建data目录) [root@nfs ~]# chown -R mysql.mysql /mydata/data/(更改/mydata/data的属主属组为mysql) [root@nfs ~]# ll -d /mydata/data/(查看/mydata/data目录本身详细信息) drwxr-xr-x 2 mysql mysql 4096 04-15 19:33 /mydata/data/ [root@nfs ~]# vim /etc/exports(编辑nfs导出文件系统配置文件) /web/htdocs 172.16.0.0/16(ro) /mydata 172.16.0.0/16(rw) [root@nfs ~]# exportfs -arv(重新导出所有文件系统,-r重新导出,-a所有,-v显示过程) exporting 172.16.0.0/16:/web/htdocs exporting 172.16.0.0/16:/mydata
HA2:
[root@node2 ~]# service heartbeat stop(停止heartbeat服务)
Stopping High-Availability services:
[确定]
[root@node2 ~]# ssh node1 'service heartbeat stop'(停止node1主机heartbeat服务)
Stopping High-Availability services:
[确定]
HA1:
[root@node1 ~]# groupadd -g 3306 mysql(添加mysql组,-g指定GID) [root@node1 ~]# useradd -g 3306 -u 3306 -s /sbin/nologin -M mysql(添加mysql用户,-g加入到组,-u指定UID,-s知道哪个默认shell,-M不创建家目录) [root@node1 ~]# mkdir /mydata(创建/mydata目录) [root@node1 ~]# mount 172.16.100.5:/mydata/ /mydata/(挂载172.16.100.5:/mydata文件系统到/mydata目录) [root@node1 ~]# ls /mydata/(查看/mydata目录文件及子目录) data lost+found [root@node1 ~]# ls -l /mydata/(查看/mydata目录文件及子目录详细信息) total 24 drwxr-xr-x 2 mysql mysql 4096 Apr 15 2016 data drwx------ 2 root root 16384 Apr 15 2016 lost+found 提示:也映射成为本地的mysql用户mysql组; [root@node1 ~]# su - mysql(切换到mysql用户) su: warning: cannot change directory to /home/mysql: No such file or directory This account is currently not available. 提示:无法登录; [root@node1 ~]# usermod -s /bin/bash mysql(修改mysql用户的默认shell为/bin/bash) [root@node1 ~]# su - mysql(切换到mysql用户) su: warning: cannot change directory to /home/mysql: No such file or directory -bash-3.2$ cd /mydata/data/(切换到/mydata/data目录) -bash-3.2$ touch a(创建a文件) -bash-3.2$ ls(查看当前目录文件及子目录) a -bash-3.2$ rm a(删除a文件) -bash-3.2$ exit(退出) logout [root@node1 ~]# usermod -s /sbin/nologin mysql(更改mysql用户的默认shell不允许登录) [root@node1 ~]# umount /mydata/(卸载/mydata挂载的文件系统)
HA2:
[root@node2 ~]# groupadd -g 3306 mysql(创建mysql组,-g指定GID) [root@node2 ~]# useradd -g 3306 -u 3306 -M mysql(创建mysql用户,-g指定加入基本组,-u指定UID,-M不创建家目录) [root@node2 ~]# mkdir /mydata(创建/mydata目录) [root@node2 ~]# mount -t nfs 172.16.100.5:/mydata/ /mydata/(将172.16.100.5:/mydata挂载到/mydata目录,-t指定文件系统类型) [root@node2 ~]# su - mysql(切换到mysql目录) su: warning: cannot change directory to /home/mysql: 没有那个文件或目录 -bash-3.2$ cd /mydata/data/(切换到/mydata/data目录) -bash-3.2$ touch he(创建he文件) -bash-3.2$ rm he(删除he文件) -bash-3.2$ exit(退出) logout [root@node2 ~]# usermod -s /sbin/nologin mysql(修改mysql用户属性,更改默认shell为不允许登录) [root@node2 ~]# umount /mydata(卸载/mydata目录下挂载的文件系统)
HA1:
[root@node1 ~]# lftp 172.16.0.1(连接ftp服务器) lftp 172.16.0.1:~> cd pub/Sources/mysql-5.5 lftp 172.16.0.1:/pub/Sources/mysql-5.5> get mysql-5.5.28-linux2.6-i686.tar.gz(下载mysql-5.5.28) 179907710 bytes transferred in 6 seconds (27.29M/s) lftp 172.16.0.1:/pub/Sources/mysql-5.5> bye (退出) [root@node1 ~]# tar xf mysql-5.5.28-linux2.6-i686.tar.gz -C /usr/local/(解压mysql-5.5.28,x解压,f后面跟文件名,-C指定解压目录) [root@node1 ~]# mount -t nfs 172.16.100.5:/mydata /mydata(挂载nfs文件系统172.16.100.5:/mydata到/mydata目录,-t指定文件系统类型) [root@node1 ~]# cd /usr/local/(切换到/usr/local目录) [root@node1 local]# ln -sv mysql-5.5.28-linux2.6-i686 mysql(为mysql-5.5.28创建软连接mysql,-s软连接,-v显示创建过程) create symbolic link `mysql' to `mysql-5.5.28-linux2.6-i686' [root@node1 local]# cd mysql(切换到mysql目录) [root@node1 mysql]# chown -R root:mysql ./*(更改当前目录下所有文件属主为root,属组为mysql,-R递归更改) [root@node1 mysql]# ll(查看当前目录下文件及子目录详细信息) total 132 drwxr-xr-x 2 root mysql 4096 Apr 7 05:49 bin -rw-r--r-- 1 root mysql 17987 Aug 29 2012 COPYING drwxr-xr-x 4 root mysql 4096 Apr 7 05:48 data drwxr-xr-x 2 root mysql 4096 Apr 7 05:48 docs drwxr-xr-x 3 root mysql 4096 Apr 7 05:48 include -rw-r--r-- 1 root mysql 7604 Aug 29 2012 INSTALL-BINARY drwxr-xr-x 3 root mysql 4096 Apr 7 05:48 lib drwxr-xr-x 4 root mysql 4096 Apr 7 05:48 man drwxr-xr-x 10 root mysql 4096 Apr 7 05:48 mysql-test -rw-r--r-- 1 root mysql 2552 Aug 29 2012 README drwxr-xr-x 2 root mysql 4096 Apr 7 05:48 scripts drwxr-xr-x 27 root mysql 4096 Apr 7 05:48 share drwxr-xr-x 4 root mysql 4096 Apr 7 05:48 sql-bench drwxr-xr-x 2 root mysql 4096 Apr 7 05:48 support-files [root@node1 mysql]# scripts/mysql_install_db --user=mysql --datadir=/mydata/data/(初始化mysql,--user指定运行mysql用户,--datadir指定数据目录) chown: changing ownership of `/mydata/data/': Operation not permitted Cannot change ownership of the database directories to the 'mysql' user. Check that you have the necessary permissions and try again. 提示:不能初始化,虽然说初始化之后所有的修改权限都是mysql用户进行的,但是这个时候在操作的时候,它是以管理员身份在执行,管理员所有在访问nfs的时候都被root_squash( 将root用户映射为来宾帐号),它没有任何权限,所以使用san的时候管理机制更为简便,这是nfs内在的缺陷,使用iscsi管理机制更为方便,因为是基于块输出的,只要在iscsi上我们 输出的时候挂载至本地以后,或者被识别成本地设备以后管理员一定有便捷的访问权限的,否则我们此处还得把nfs给它输出为no_root_squash的;
nfs:
[root@nfs ~]# vim /etc/exports(编辑nfs文件系统导出配置文件) /web/htdocs 172.16.0.0/16(ro) /mydata 172.16.0.0/16(no_root_squash,rw) [root@nfs ~]# exportfs -arv(重新导出所有nfs文件系统,-r重新导出,-a所有,-v显示过程) exporting 172.16.0.0/16:/web/htdocs exporting 172.16.0.0/16:/mydata
HA1:
[root@node1 mysql]# scripts/mysql_install_db --user=mysql --datadir=/mydata/data/(初始化mysql,--user指定运行mysql用户,--datadir指定数据目录) Installing MySQL system tables... OK Filling help tables... OK To start mysqld at boot time you have to copy support-files/mysql.server to the right place for your system PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER ! To do so, start the server, then issue the following commands: ./bin/mysqladmin -u root password 'new-password' ./bin/mysqladmin -u root -h node1.magedu.com password 'new-password' Alternatively you can run: ./bin/mysql_secure_installation which will also give you the option of removing the test databases and anonymous user created by default. This is strongly recommended for production servers. See the manual for more instructions. You can start the MySQL daemon with: cd . ; ./bin/mysqld_safe & You can test the MySQL daemon with mysql-test-run.pl cd ./mysql-test ; perl mysql-test-run.pl Please report any problems with the ./bin/mysqlbug script! [root@node1 mysql]# ls -l /mydata/data/(查看/mydata/data目录文件及子目录详细信息) total 24 drwx------ 2 mysql root 4096 Apr 15 2016 mysql drwx------ 2 mysql mysql 4096 Apr 15 2016 performance_schema drwx------ 2 mysql root 4096 Apr 15 2016 test [root@node1 mysql]# ls(查看当前目录文件及子目录) bin COPYING data docs include INSTALL-BINARY lib man mysql-test README scripts share sql-bench support-files [root@node1 mysql]# cp support-files/my-large.cnf /etc/my.cnf(复制my-large.cnf到/etc叫my.cnf,提供mysql配置文件) [root@node1 mysql]# vim /etc/my.cnf(编辑my.cnf配置文件) datadir = /mydata/data innodb_file_per_table = 1(让innodb每表一个表空间文件) [root@node1 mysql]# cp support-files/mysql.server /etc/init.d/mysqld(复制mysql.server文件到/etc目录叫mysqld,mysql服务启动脚本) [root@node1 mysql]# chkconfig --add mysqld(将mysql添加到系统服务) [root@node1 mysql]# chkconfig mysqld off(关闭mysqld服务在相应系统级别开机自动启动) [root@node1 mysql]# service mysqld start(启动mysqld服务) Starting MySQL.... [ OK ] [root@node1 mysql]# /usr/local/mysql/bin/mysql(连接mysql) Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.5.28-log MySQL Community Server (GPL) Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show databases;(显示数据库) +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | test | +--------------------+ 4 rows in set (0.01 sec) mysql> create database mydb;(创建mydb库) Query OK, 1 row affected (0.02 sec) mysql> show databases;(显示数据库) +--------------------+ | Database | +--------------------+ | information_schema | | mydb | | mysql | | performance_schema | | test | +--------------------+ 5 rows in set (0.00 sec) mysql> \q(退出) Bye [root@node1 mysql]# service mysqld stop(停止mysqld服务) Shutting down MySQL. [ OK ]
nfs:
[root@nfs ~]# vim /etc/exports(编辑nfs导出文件系统配置文件) /web/htdocs 172.16.0.0/16(ro) /mydata 172.16.0.0/16(rw) [root@nfs ~]# exportfs -arv(重新导出所有nfs文件系统,-r重新导出,-a所有,-v显示过程) exporting 172.16.0.0/16:/web/htdocs exporting 172.16.0.0/16:/mydata
HA1:
[root@node1 mysql]# service mysqld start(启动mysqld服务) Starting MySQL.The server quit without updating PID file (/[FAILED]ata/node1.magedu.com.pid). 提示:不能启动,管理员还需要更新里面日志文件,没有no_root_squash是绝对不可以的,但是加上no_root_squash风险很大,所以必须将/mydata输出给172.16.100.6/7原因,而不是 直接开放给网段,最好也在本机加上iptables规则,开放nfs的时候仅开放给有限的主机,只需要加上portmap的110端口限制;
nfs:
[root@nfs ~]# vim /etc/exports(编辑nfs的文件系统导出配置文件) /web/htdocs 172.16.0.0/16(ro) /mydata 172.16.0.0/16(no_root_squash,rw) [root@nfs ~]# exportfs -arv(重新导出所有nfs文件系统,-r重新导出,-a所有,-v显示过程) exporting 172.16.0.0/16:/web/htdocs exporting 172.16.0.0/16:/mydata
HA1:
[root@node1 mysql]# service mysqld start(启动mysqld服务) Starting MySQL.. [ OK ] [root@node1 mysql]# service mysqld stop(停止mysqld服务) Shutting down MySQL. [ OK ] [root@node1 mysql]# chkconfig --list mysqld(查看mysqld服务在相应系统级别开机自动启动情况) mysqld 0:off 1:off 2:off 3:off 4:off 5:off 6:off [root@node1 mysql]# cd(切换到用户家目录) [root@node1 ~]# ls(查看当前目录文件及子目录) anaconda-ks.cfg health_check.sh.bak install.log mysql-5.5.28-linux2.6-i686.tar.gz test.txt health_check.sh i386 install.log.syslog perl-MailTools-1.77-1.el5.rf.noarch.rpm [root@node1 ~]# scp mysql-5.5.28-linux2.6-i686.tar.gz node2:/root/(复制mysql-5.5.28到node2主机的/root目录) mysql-5.5.28-linux2.6-i686.tar.gz 100% 172MB 15.6MB/s 00:11 [root@node1 ~]# umount /mydata(卸载/mydata目录挂载的文件系统)
HA2:
[root@node2 ~]# tar xf mysql-5.5.28-linux2.6-i686.tar.gz -C /usr/local/(解压mysql-5.5.28到/usr/local目录,x解压,f后面跟文件名,-C指定目录)
HA1:
[root@node1 ~]# scp /etc/my.cnf node2:/etc/(复制my.cnf到node2主机的/etc目录) my.cnf 100% 4716 4.6KB/s 00:00 [root@node1 ~]# scp /etc/init.d/mysqld node2:/etc/init.d/(复制mysqld到node2主机的/etc/init.d目录) mysqld 100% 10KB 10.4KB/s 00:00
HA2:
[root@node2 ~]# cd /usr/local/(切换到/usr/local目录)
[root@node2 local]# ls(查看当前目录文件及子目录)
bin etc games include lib libexec mysql-5.5.28-linux2.6-i686 sbin share src
[root@node2 local]# ls -l(查看当前目录文件及子目录详细信息)
总计 80
drwxr-xr-x 2 root root 4096 2009-10-01 bin
drwxr-xr-x 2 root root 4096 2009-10-01 etc
drwxr-xr-x 2 root root 4096 2009-10-01 games
drwxr-xr-x 2 root root 4096 2009-10-01 include
drwxr-xr-x 2 root root 4096 2009-10-01 lib
drwxr-xr-x 2 root root 4096 2009-10-01 libexec
drwxr-xr-x 13 root root 4096 04-07 06:38 mysql-5.5.28-linux2.6-i686
drwxr-xr-x 2 root root 4096 2009-10-01 sbin
drwxr-xr-x 4 root root 4096 04-06 05:55 share
drwxr-xr-x 2 root root 4096 2009-10-01 src
[root@node2 local]# ln -sv mysql-5.5.28-linux2.6-i686 mysql(为mysql-5.5.28创建软连接叫mysql,-s软连接,-v显示创建过程)
创建指向“mysql-5.5.28-linux2.6-i686”的符号链接“mysql”
[root@node2 mysql]# chown -R root:mysql ./*(更改当前目录下所有文件的属主为root,属组为mysql,-R递归更改)
[root@node2 mysql]# ll(查看当前目录文件及子目录详细信息)
总计 132
drwxr-xr-x 2 root mysql 4096 04-07 06:38 bin
-rw-r--r-- 1 root mysql 17987 2012-08-29 COPYING
drwxr-xr-x 4 root mysql 4096 04-07 06:38 data
drwxr-xr-x 2 root mysql 4096 04-07 06:38 docs
drwxr-xr-x 3 root mysql 4096 04-07 06:38 include
-rw-r--r-- 1 root mysql 7604 2012-08-29 INSTALL-BINARY
drwxr-xr-x 3 root mysql 4096 04-07 06:38 lib
drwxr-xr-x 4 root mysql 4096 04-07 06:38 man
drwxr-xr-x 10 root mysql 4096 04-07 06:38 mysql-test
-rw-r--r-- 1 root mysql 2552 2012-08-29 README
drwxr-xr-x 2 root mysql 4096 04-07 06:38 scripts
drwxr-xr-x 27 root mysql 4096 04-07 06:38 share
drwxr-xr-x 4 root mysql 4096 04-07 06:38 sql-bench
drwxr-xr-x 2 root mysql 4096 04-07 06:38 support-files
[root@node2 mysql]# cd(切换到用户家目录)
[root@node2 ~]# mount -t nfs 172.16.100.5:/mydata /mydata/(挂载nfs文件系统172.16.100.5:/mydata到/mydata目录,-t指定文件系统类型)
[root@node2 ~]# ls /mydata/(查看/mydata目录文件及子目录)
data lost+found
[root@node2 ~]# ls /mydata/data/(查看/mydata/data目录文件及子目录)
ibdata1 ib_logfile1 mysql mysql-bin.000002 mysql-bin.000004 node1.magedu.com.err test
ib_logfile0 mydb mysql-bin.000001 mysql-bin.000003 mysql-bin.index performance_schema
[root@node2 ~]# ls -l !$(查看/mydata/data目录文件及子目录详细信息)
ls -l /mydata/data/
总计 28816
-rw-rw---- 1 mysql mysql 18874368 2016-04-15 ibdata1
-rw-rw---- 1 mysql mysql 5242880 2016-04-15 ib_logfile0
-rw-rw---- 1 mysql mysql 5242880 2016-04-15 ib_logfile1
drwx------ 2 mysql mysql 4096 2016-04-15 mydb
drwx------ 2 mysql root 4096 2016-04-15 mysql
-rw-rw---- 1 mysql mysql 209 2016-04-15 mysql-bin.000001
-rw-rw---- 1 mysql mysql 126 2016-04-15 mysql-bin.000002
-rw-rw---- 1 mysql mysql 126 2016-04-15 mysql-bin.000003
-rw-rw---- 1 mysql mysql 126 2016-04-15 mysql-bin.000004
-rw-rw---- 1 mysql mysql 76 2016-04-15 mysql-bin.index
-rw-rw---- 1 mysql root 6958 2016-04-15 node1.magedu.com.err
drwx------ 2 mysql mysql 4096 2016-04-15 performance_schema
drwx------ 2 mysql root 4096 2016-04-15 test
[root@node2 ~]# chkconfig --add mysqld(将mysqld添加为系统服务)
[root@node2 ~]# chkconfig mysqld off(关闭mysqld服务在相应系统级别自动启动)
[root@node2 ~]# service mysqld start(启动mysqld服务)
Starting MySQL.. [确定]
[root@node2 ~]# /usr/local/mysql/bin/mysql(连接mysql数据库)
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.28-log MySQL Community Server (GPL)
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;(显示数据库)
+--------------------+
| Database |
+--------------------+
| information_schema |
| mydb |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.02 sec)
mysql> \q(退出)
Bye
[root@node2 ~]# service mysqld stop(停止mysqld服务)
Shutting down MySQL. [确定]
[root@node2 ~]# umount /mydata(卸载/mydata目录挂载文件系统)
[root@node2 ~]# service heartbeat start(启动heartbeat服务)
Starting High-Availability services:
2016/04/07_07:13:45 INFO: Resource is stopped
[确定]
[root@node2 ~]# ssh node1 'service heartbeat start'(启动node1主机的heartbeat服务)
Starting High-Availability services:
2016/04/07_07:14:18 INFO: Resource is stopped
[确定]
[root@node2 ~]# crm_mon(监控集群)
Refresh in 12S...
Last updated: Thu Apr 7 07:15:53 2016
Current DC: node2.magedu.com (513f0f88-2996-48bb-b157-f2d5eebf30e6)
2 Nodes configured.
3 Resources configured.
============
Node: node2.magedu.com (513f0f88-2996-48bb-b157-f2d5eebf30e6): online
Node: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2): online
[root@node2 ~]# tail /var/log/messages(查看messages日志文件后10行)
Apr 7 07:15:02 node2 pengine: [12074]: notice: NoRoleChange: Leave resource webstore (Started node2.magedu.com)
Apr 7 07:15:02 node2 pengine: [12074]: notice: NoRoleChange: Leave resource httpd (Started node2.magedu.com)
Apr 7 07:15:02 node2 crmd: [12067]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCE
SS cause=C_IPC_MESSAGE origin=route_message ]
Apr 7 07:15:02 node2 tengine: [12073]: info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe-input-84.bz2
Apr 7 07:15:02 node2 pengine: [12074]: info: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-inp
ut-84.bz2
Apr 7 07:15:02 node2 tengine: [12073]: info: unpack_graph: Unpacked transition 2: 0 actions in 0 synapses
Apr 7 07:15:02 node2 tengine: [12073]: info: run_graph: Transition 2: (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Apr 7 07:15:02 node2 tengine: [12073]: info: notify_crmd: Transition 2 status: te_complete - <null>
Apr 7 07:15:02 node2 crmd: [12067]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=
C_IPC_MESSAGE origin=route_message ]
Apr 7 07:15:02 node2 setroubleshoot: SELinux prevented the http daemon from reading files stored on a NFS filesytem. For complete SELinux
messages. run sealert -l 881b0f9a-8b48-4bc8-b555-6662db010a37
[root@node2 ~]# crm_mon(监控集群)
Refresh in 12S...
Last updated: Thu Apr 7 07:15:53 2016
Current DC: node2.magedu.com (513f0f88-2996-48bb-b157-f2d5eebf30e6)
2 Nodes configured.
3 Resources configured.
============
Node: node2.magedu.com (513f0f88-2996-48bb-b157-f2d5eebf30e6): online
Node: node1.magedu.com (26ae7ed4-39de-4627-8ba9-d1de388e25b2): online
[root@node2 ~]# hb_gui &(后台运行hb_gui程序)
添加资源,选择资源--右键--添加新元素,元素类型:选择组资源,点击确定,标识符:填写mysql_service,点击OK,资源ID:mysqlip,类型选择IPaddr,值为172.16.100.1,点击添加参数,名称:选择nic,值选择eth0,点击确认,点击添加参数,名称选择cidr_netmask,值:填写16,点击确定,点击添加;

添加第二个资源,点击mysql_service--右键--添加新元素,元素类型选择组普通资源,点击确定,资源ID:mysqlstore,找到Filesystem,双击,device值为172.16.100.5:/mydata,directory值为/mydata,fstype值为nfs,点击添加;

添加第三个资源,点击mysql_service--右键--添加新元素,元素类型:普通资源,资源ID:mysqld,选择mysqld,类型为lsb,点击添加;

点击mysql_service--右键--启动;

HA2:
[root@node2 ~]# /usr/local/mysql/bin/mysql(连接mysql数据库) Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.5.28-log MySQL Community Server (GPL) Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> GRANT ALL ON *.* to 'root'@'%' IDENTIFIED BY 'redhat';(授权root@%用户对所有库所有表有所有权限) Query OK, 0 rows affected (0.00 sec) mysql> FLUSH PRIVILEGES;(刷新授权表) Query OK, 0 rows affected (0.01 sec) mysql> \q(退出) Bye
nfs:
[root@nfs ~]# yum -y install mysql(通过yum源安装mysql,-y所有询问回答yes) [root@nfs ~]# mysql -uroot -p -h172.16.100.1(连接mysql数据库,-u指定用户,-p密码,-h指定主机) Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.5.28-log MySQL Community Server (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> SHOW DATABASES;(显示数据库) +--------------------+ | Database | +--------------------+ | information_schema | | mydb | | mysql | | performance_schema | | test | +--------------------+ 5 rows in set (0.01 sec) mysql> USE mydb;(修改默认数据库为mydb;) Database changed mysql> CREATE TABLE testtb1 (id int unsigned not null auto_increment primary key,name char(20));(创建表testtb1,字段id,int整形,unsigned非负数, not null不允许为空,auto_increment自动增长,primary key主键,name字段,char字符型,20字符,) Query OK, 0 rows affected (0.03 sec) mysql> SHOW TABLES;(显示表) +----------------+ | Tables_in_mydb | +----------------+ | testtb1 | +----------------+ 1 row in set (0.01 sec) mysql> \q(退出) Bye
让node2节点成为被节点,点击node2.magedu.com--右键--备用,点击是(Y);

nfs:
[root@nfs ~]# mysql -uroot -p -h172.16.100.1(连接mysql数据库,-u用户,-p密码,-h指定主机) Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.5.28-log MySQL Community Server (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> SHOW DATABASES;(显示数据库) +--------------------+ | Database | +--------------------+ | information_schema | | mydb | | mysql | | performance_schema | | test | +--------------------+ 5 rows in set (0.02 sec) mysql> USE mydb;(修改默认数据库为mydb) Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> SHOW TABLES;(显示表) +----------------+ | Tables_in_mydb | +----------------+ | testtb1 | +----------------+ 1 row in set (0.01 sec) mysql> DESC testtb1;(查看testtb1表结构) +-------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | name | char(20) | YES | | NULL | | +-------+------------------+------+-----+---------+----------------+ 2 rows in set (0.01 sec) 提示:mysql高可用了,但是nfs server挂了怎么办,如果此时nfs server宕机了,整个集群还是宕机,nfs server现在称为单点故障了;
What is High Availability?
Simple Equation:
A=MTBF/(MTBF+MTTR)
MTBF = mean time between failures(平均无故障时间)
MTTF = mean time to repair(平均修复时间)
A = probability system will provide service at a random time (ranging from 0 to 1)
Two ways to improve availability:
Increase MTBF to very large values
Reduce MTTR to very low values
High Availability is achieved through the manipulation of MTBF and MTTR parameters of system design to meet availability requirements.
Hardware Filure Cases(硬件故障原因)
Hardware Failure Causes:
Design failure(rare)(设计问题)
Random failure(rare)(随机故障)
Infant Mortality(high rate of failure)
Wear Out(high rate of failure)(磨损)
Increasing hardware MTBF:
Use better components
Preemptively replace hardware prior to wear out
Software Failure Cases(软件故障原因)
Implementation Defects(very common):
Typically measured in defects per KLOC
Increasing software MTBF:
Experienced engineering team(可靠开发团队)
Peer review of all code(检索所有代码)
Simple design(简单)
Compact code foot print
Static and runtime analysis tools such as valgrind,lint,high compiler warning levels,coverity,lcov
Test coverage of the software
heartbeat:
RHEL 6.x RHCS: corosync
RHEL 5.x RHCS: openais, cman, rgmanager
corosync: Messaging Layer,官方站点 http://corosync.github.io/corosync/
openais: AIS
Coresysnc Project History(发展史)
Started life as "openais.org" in 2002
Announced Coresync in July 2008
First 1.0.0 release in July 2009
"flatiron branch"feature frozen in june 2010
"weaver's needle" branch announced in June 2010
Features Overview(特性)
Four CProgramming APIs to create HA aware applications(提供四个变成接口用于创建HA应用程序)
Ethernet and Infiniband ipv4/ipv6 Native Network Support(支持以太网和ipv4/ipv6本地网络实现高可用集群)
Diagnostics and failure analysis(支持诊断和故障分析)
32/64 bit BE/LE support(支持32/64位BE/LE运行机制)
High focus on correctness and performance(它的重心在于自我修复和性能)
Network Security Services for authentication and encryption(网络安全服务借助认证和加密)
Project Philosophy:Allow developers to create HA apps however they desire.
ha-aware
CRM(集群资源管理器pacemaker)
corosync/heartbeat v3
corosync --> pacemaker:
SUSE Linux Enterprise server: Hawk, WebGUI
LCMC:Linux Cluster Mangement Console
RHCS: Conga(luci/ricci)
webGUI
keepalived: VRRP, 2节点
rpm, sources
pacemaker, corosync
heartbeat
corosync:
1、时间同步
2、主机名
3、SSH
前提:
1)本配置共有两个测试节点,分别node1.magedu.com和node2.magedu.com,相的IP地址分别为172.16.100.11和172.16.100.12;
2)集群服务为apache的httpd服务;
3)提供web服务的地址为172.16.100.1;
4)系统为rhel5.8
1、准备工作
为了配置一台Linux主机成为HA的节点,通常需要做出如下的准备工作:
1)所有节点的主机名称和对应的IP地址解析服务可以正常工作,且每个节点的主机名称需要跟"uname -n“命令的结果保持一致;因此,需要保证两个节点上的/etc/hosts文件均为下面的内容:
172.16.100.11 node1.magedu.com node1
172.16.100.12 node2.magedu.com node2
为了使得重新启动系统后仍能保持如上的主机名称,还分别需要在各节点执行类似如下的命令:
Node1:
# sed -i 's@\(HOSTNAME=\).*@\1node1.magedu.com@g' /etc/sysconfig/network
# hostname node1.magedu.com
Node2:
# sed -i 's@\(HOSTNAME=\).*@\1node2.magedu.com@g' /etc/sysconfig/network
# hostname node2.magedu.com
2)设定两个节点可以基于密钥进行ssh通信,这可以通过类似如下的命令实现:
Node1:
# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
Node2:
# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
2、安装如下rpm包:
libibverbs, librdmacm, lm_sensors, libtool-ltdl, openhpi-libs, openhpi, perl-TimeDate
3、安装corosync和pacemaker,首先下载所需要如下软件包至本地某专用目录(这里为/root/cluster):
cluster-glue
cluster-glue-libs
heartbeat
resource-agents
corosync
heartbeat-libs
pacemaker
corosynclib
libesmtp
pacemaker-libs
下载地址:http://clusterlabs.org/rpm/。请根据硬件平台及操作系统类型选择对应的软件包;这里建议每个软件包都使用目前最新的版本。
32bits rpm包下载地址: http://clusterlabs.org/rpm/epel-5/i386/
64bits rpm包下载地址: http://clusterlabs.org/rpm/epel-5/x86_64/
使用如下命令安装:
# cd /root/cluster
# yum -y --nogpgcheck localinstall *.rpm
4、配置corosync,(以下命令在node1.magedu.com上执行)
# cd /etc/corosync
# cp corosync.conf.example corosync.conf
接着编辑corosync.conf,添加如下内容:
service {
ver: 0
name: pacemaker
# use_mgmtd: yes
}
aisexec {
user: root
group: root
}
并设定此配置文件中 bindnetaddr后面的IP地址为你的网卡所在网络的网络地址,我们这里的两个节点在172.16.0.0网络,因此这里将其设定为172.16.0.0;如下
bindnetaddr: 172.16.0.0
生成节点间通信时用到的认证密钥文件:
# corosync-keygen
将corosync和authkey复制至node2:
# scp -p corosync authkey node2:/etc/corosync/
分别为两个节点创建corosync生成的日志所在的目录:
# mkdir /var/log/cluster
# ssh node2 'mkdir /var/log/cluster'
5、尝试启动,(以下命令在node1上执行):
# /etc/init.d/corosync start
查看corosync引擎是否正常启动:
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:
# grep TOTEM /var/log/messages
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP).
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [172.16.100.11] is now up.
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:
# grep ERROR: /var/log/messages | grep -v unpack_resources
查看pacemaker是否正常启动:
# grep pcmk_startup /var/log/messages
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.magedu.com
如果上面命令执行均没有问题,接着可以执行如下命令启动node2上的corosync
# ssh node2 -- /etc/init.d/corosync start
注意:启动node2需要在node1上使用如上命令进行,不要在node2节点上直接启动;
使用如下命令查看集群节点的启动状态:
# crm status
============
Last updated: Tue Jun 14 19:07:06 2011
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
从上面的信息可以看出两个节点都已经正常启动,并且集群已经处于正常工作状态。
执行ps auxf命令可以查看corosync启动的各相关进程。
root 4665 0.4 0.8 86736 4244 ? Ssl 17:00 0:04 corosync
root 4673 0.0 0.4 11720 2260 ? S 17:00 0:00 \_ /usr/lib/heartbeat/stonithd
101 4674 0.0 0.7 12628 4100 ? S 17:00 0:00 \_ /usr/lib/heartbeat/cib
root 4675 0.0 0.3 6392 1852 ? S 17:00 0:00 \_ /usr/lib/heartbeat/lrmd
101 4676 0.0 0.4 12056 2528 ? S 17:00 0:00 \_ /usr/lib/heartbeat/attrd
101 4677 0.0 0.5 8692 2784 ? S 17:00 0:00 \_ /usr/lib/heartbeat/pengine
101 4678 0.0 0.5 12136 3012 ? S 17:00 0:00 \_ /usr/lib/heartbeat/crmd
HA2:
[root@node2 ~]# date(查看系统时间) 2016年 04月 06日 星期三 00:05:17 CST
HA1:
[root@node1 ~]# date(查看系统时间) Wed Apr 6 00:04:53 CST 2016 [root@node1 ~]# ntpdate 172.16.100.254(向时间服务器同步时间) 6 Apr 00:16:32 ntpdate[10932]: adjust time server 172.16.100.6 offset 0.000002 sec
HA2:
[root@node1 ~]# ntpdate 172.16.100.254(向时间服务器同步时间) 6 Apr 00:16:32 ntpdate[10932]: adjust time server 172.16.100.6 offset 0.000002 sec [root@node2 ~]# ssh node1 'date'(查看node1主机的系统时间) ^[[A2016年 04月 06日 星期三 00:18:48 CST [root@node2 ~]# date(查看系统时间) 2016年 04月 06日 星期三 00:18:49 CST
环境:1、时间同步,2、主机名称,3、SSH互信已经配置好了;
[root@node2 ~]# lftp 172.16.0.1/pub(连接ftp服务器) cd ok, cwd=/pub lftp 172.16.0.1/pub> cd Sources/corosync/ lftp 172.16.0.1/pub/Sources/corosync> mget cluster-glue-* corosync-1.2.7-1.1.el5-i386.rpm corosynclib-1.2.7-1.1.el5.i386.rpm heartbeat-3.0.3-2.3-el5.i386.rpm heartbeat-libs-3.0.3-2.3.el5.i386.rpm libesmtp-1.0.4-5.el5.i386.rpm openaislib-1.1.3-1.6.el5.i386.rpm pacemaker-1.1.5-1.1.el5.i386.rpm pacemaker-libs-1.1.5-1.1.el5.i386.rpm pacemaker-cts-1.1.5-1.1.el5.i386.rpm resource-agents-1.0.4-1.1.el5.i386.rpm 3068827 bytes transferred Total 12 files transferred lftp 172.16.0.1/pub/Sources/corosync> bye(退出) [root@node2 ~]# ls(查看当前目录文件及子目录) anaconda-ks.cfg corosynclib-1.2.7-1.1.el5.i386.rpm install.log.syslog pacemaker-cts-1.1.5-1.1.el5.i386.rpm cluster-glue-1.0.6-1.6.el5.i386.rpm heartbeat-3.0.3-2.3.el5.i386.rpm libesmtp-1.0.4-5.el5.i386.rpm pacemaker-libs-1.1.5-1.1.el5.i386.rpm cluster-glue-libs-1.0.6-1.6.el5.i386.rpm heartbeat-libs-3.0.3-2.3.el5.i386.rpm openaislib-1.1.3-1.6.el5.i386.rpm resource-agents-1.0.4-1.1.el5.i386.rpm corosync-1.2.7-1.1.el5.i386.rpm install.log pacemaker-1.1.5-1.1.el5.i386.rpm [root@node2 ~]# mv openaislib-1.1.3-1.6.el5.i386.rpm /tmp/(复制openaislib到/tmp目录) [root@node2 ~]# ls(查看当前目录文件及子目录) anaconda-ks.cfg corosynclib-1.2.7-1.1.el5.i386.rpm install.log.syslog pacemaker-libs-1.1.5-1.1.el5.i386.rpm cluster-glue-1.0.6-1.6.el5.i386.rpm heartbeat-3.0.3-2.3.el5.i386.rpm libesmtp-1.0.4-5.el5.i386.rpm resource-agents-1.0.4-1.1.el5.i386.rpm cluster-glue-libs-1.0.6-1.6.el5.i386.rpm heartbeat-libs-3.0.3-2.3.el5.i386.rpm pacemaker-1.1.5-1.1.el5.i386.rpm corosync-1.2.7-1.1.el5.i386.rpm install.log pacemaker-cts-1.1.5-1.1.el5.i386.rpm [root@node2 ~]# scp *.rpm node1:/root/(复制当前目录所有.rpm结尾的到node1主机的/root目录) cluster-glue-1.0.6-1.6.el5.i386.rpm 100% 265KB 265.0KB/s 00:00 cluster-glue-libs-1.0.6-1.6.el5.i386.rpm 100% 130KB 130.1KB/s 00:00 corosync-1.2.7-1.1.el5.i386.rpm 100% 166KB 166.1KB/s 00:00 corosynclib-1.2.7-1.1.el5.i386.rpm 100% 155KB 154.8KB/s 00:00 heartbeat-3.0.3-2.3.el5.i386.rpm 100% 162KB 161.7KB/s 00:00 heartbeat-libs-3.0.3-2.3.el5.i386.rpm 100% 283KB 282.8KB/s 00:00 libesmtp-1.0.4-5.el5.i386.rpm 100% 59KB 59.0KB/s 00:00 pacemaker-1.1.5-1.1.el5.i386.rpm 100% 778KB 778.1KB/s 00:01 pacemaker-cts-1.1.5-1.1.el5.i386.rpm 100% 203KB 203.1KB/s 00:00 pacemaker-libs-1.1.5-1.1.el5.i386.rpm 100% 324KB 324.2KB/s 00:00 resource-agents-1.0.4-1.1.el5.i386.rpm 100% 380KB 379.5KB/s 00:00
HA1:
[root@node1 ~]# ls /etc/yum.repos.d/(查看/etc/yum.repos.d目录文件及子目录) redhat.repo rhel-debuginfo.repo [root@node1 ~]# wget ftp://172.16.0.1/pub/gls/server.repo -O /etc/yum.repos.d/server.repo(通过互联网下载server.repo保存到/etc/yum.repos.d目录叫 server.repo,-O更改保存目录) [root@node1 ~]# scp /etc/yum.repos.d/server.repo node2:/etc/yum.repos.d/(复制server.repo到node2主机的/etc/yum.repos.d目录) server.repo 100% 300 0.3KB/s 00:00 [root@node1 ~]# yum --nogpgcheck localinstall *.rpm(安装本地rpm软件包,--nogpgcheck不做gpg校验)
HA2:
[root@node2 ~]# yum -y --nogpgcheck localinstall *.rpm(安装本地rpm软件包,--nogpgcheck不做gpg校验)
HA1:
[root@node1 ~]# rpm -ql corosync(查看corosync安装生成那些文件,)
/etc/corosync
/etc/corosync/corosync.conf.example(样例配置文件)
/etc/corosync/service.d
/etc/corosync/uidgid.d
/etc/init.d/corosync(服务脚本)
/usr/libexec/lcrso
/usr/libexec/lcrso/coroparse.lcrso
/usr/libexec/lcrso/objdb.lcrso
/usr/libexec/lcrso/quorum_testquorum.lcrso
/usr/libexec/lcrso/quorum_votequorum.lcrso
/usr/libexec/lcrso/service_cfg.lcrso
/usr/libexec/lcrso/service_confdb.lcrso
/usr/libexec/lcrso/service_cpg.lcrso
/usr/libexec/lcrso/service_evs.lcrso
/usr/libexec/lcrso/service_pload.lcrso
/usr/libexec/lcrso/vsf_quorum.lcrso
/usr/libexec/lcrso/vsf_ykd.lcrso
/usr/sbin/corosync(主程序)
/usr/sbin/corosync-cfgtool
/usr/sbin/corosync-cpgtool
/usr/sbin/corosync-fplay
/usr/sbin/corosync-keygen(密钥生成器,从/dev/random获取随机数,而且需要随机数量非常大,如果提供随机数不够就卡住了,需要敲键盘生产随机数)
/usr/sbin/corosync-objctl
/usr/sbin/corosync-pload
/usr/sbin/corosync-quorumtool
/usr/share/doc/corosync-1.2.7
/usr/share/doc/corosync-1.2.7/LICENSE
/usr/share/doc/corosync-1.2.7/SECURITY
/usr/share/man/man5/corosync.conf.5.gz
/usr/share/man/man8/confdb_overview.8.gz
/usr/share/man/man8/coroipc_overview.8.gz
/usr/share/man/man8/corosync-blackbox.8.gz
/usr/share/man/man8/corosync-cfgtool.8.gz
/usr/share/man/man8/corosync-cpgtool.8.gz
/usr/share/man/man8/corosync-fplay.8.gz
/usr/share/man/man8/corosync-keygen.8.gz
/usr/share/man/man8/corosync-objctl.8.gz
/usr/share/man/man8/corosync-pload.8.gz
/usr/share/man/man8/corosync-quorumtool.8.gz
/usr/share/man/man8/corosync.8.gz
/usr/share/man/man8/corosync_overview.8.gz
/usr/share/man/man8/cpg_overview.8.gz
/usr/share/man/man8/evs_overview.8.gz
/usr/share/man/man8/logsys_overview.8.gz
/usr/share/man/man8/sam_overview.8.gz
/usr/share/man/man8/votequorum_overview.8.gz
/var/lib/corosync
[root@node1 ~]# cd /etc/corosync/(切换到/etc/corosync目录)
[root@node1 corosync]# ls(查看当前目录文件及子目录)
corosync.conf.example service.d uidgid.d
[root@node1 corosync]# cp corosync.conf.example corosync.conf(复制配置文件)
[root@node1 corosync]# vim corosync.conf(编辑corosync.conf配置文件)
[root@node1 corosync]# man corosync.conf(查看corosync.conf配置文件的man文档)
totem { }(图腾)
This top level directive contains configuration options for the totem protocol.
logging { }(日志)
This top level directive contains configuration options for logging.
event { }(事件)
This top level directive contains configuration options for the event service.
ringnumber(防止多网卡环路)
This specifies the ring number for the interface. When using the redundant ring protocol, each
interface should specify separate ring numbers to uniquely identify to the membership protocol
which interface to use for which redundant ring. The ringnumber must start at 0.
bindnetaddr(绑定的网络地址)
This specifies the network address the corosync executive should bind to. For example, if the
local interface is 192.168.5.92 with netmask 255.255.255.0, set bindnetaddr to 192.168.5.0. If
the local interface is 192.168.5.92 with netmask 255.255.255.192, set bindnetaddr to
192.168.5.64, and so forth.
This may also be an IPV6 address, in which case IPV6 networking will be used. In this case,
the full address must be specified and there is no automatic selection of the network interface
within a specific subnet as with IPv4.
If IPv6 networking is used, the nodeid field must be specified.
broadcast(广播)
This is optional and can be set to yes. If it is set to yes, the broadcast address will be
used for communication. If this option is set, mcastaddr should not be set.
mcastaddr(多播)
This is the multicast address used by corosync executive. The default should work for most
networks, but the network administrator should be queried about a multicast address to use.
Avoid 224.x.x.x because this is a "config" multicast address.
This may also be an IPV6 multicast address, in which case IPV6 networking will be used. If
IPv6 networking is used, the nodeid field must be specified.
mcastport(多播端口)
This specifies the UDP port number. It is possible to use the same multicast address on a net-
work with the corosync services configured for different UDP ports. Please note corosync uses
two UDP ports mcastport (for mcast receives) and mcastport - 1 (for mcast sends). If you have
multiple clusters on the same network using the same mcastaddr please configure the mcastports
with a gap.
[root@node1 corosync]# vim corosync.conf(编辑corosync.conf配置文件)
compatibility: whitetank
totem {
version: 2
secauth: on(安全认证机制)
threads: 2(线程)
interface {
ringnumber: 0
bindnetaddr: 172.16.100.0
mcastaddr: 226.99.6.17(多播地址)
mcastport: 5405(多播端口)
}
logging {
fileline: off(各行直接是否打印分割线)
to_stderr: no(日志信息是否发送到标准输出)
to_logfile: yes(使用独立日志文件)
to_syslog: yes(发送到syslog文件)
logfile: /var/log/cluster/corosync.log(日志文件目录,要先创建目录)
debug: off(调试级别)
timestamp: on(时间戳,每次日志信息都要定义时间戳,每记录一条日志信息都要获取当前系统时间,要通过系统调用获取时间,要产生大量I/O)
logger_subsys {
subsys: AMF(要不要记录AMF子系统信息)
debug: off
}
}
amf {
mode: disabled
}
service {
ver: 0
name: pacemaker
}
aisexec {
user: root
group: root
}
[root@node1 ~]# man corosync.conf(查看corosync.conf配置文件man帮助)
version(配置文件版本号)
This specifies the version of the configuration file. Currently the only valid version for
this directive is 2.
secauth(安全认证机制)
This specifies that HMAC/SHA1 authentication should be used to authenticate all messages. It
further specifies that all data should be encrypted with the sober128 encryption algorithm to
protect data from eavesdropping.
Enabling this option adds a 36 byte header to every message sent by totem which reduces total
throughput. Encryption and authentication consume 75% of CPU cycles in aisexec as measured
with gprof when enabled.
For 100mbit networks with 1500 MTU frame transmissions: A throughput of 9mb/sec is possible
with 100% cpu utilization when this option is enabled on 3ghz cpus. A throughput of 10mb/sec
is possible wth 20% cpu utilization when this optin is disabled on 3ghz cpus.
For gig-e networks with large frame transmissions: A throughput of 20mb/sec is possible when
this option is enabled on 3ghz cpus. A throughput of 60mb/sec is possible when this option is
disabled on 3ghz cpus.
The default is on.
rrp_mode(corosync是可以执行在rrp模型)
This specifies the mode of redundant ring, which may be none, active, or passive. Active
replication offers slightly lower latency from transmit to delivery in faulty network environ-
ments but with less performance. Passive replication may nearly double the speed of the totem
protocol if the protocol doesn't become cpu bound. The final option is none, in which case
only one network interface will be used to operate the totem protocol.
If only one interface directive is specified, none is automatically chosen. If multiple inter-
face directives are specified, only active or passive may be chosen.
threads(启用多少线程来执行加密以及多播的,如果secauth off这项没有意义,如果是on使用多线程可以提高性能的)
This directive controls how many threads are used to encrypt and send multicast messages. If
secauth is off, the protocol will never use threaded sending. If secauth is on, this directive
allows systems to be configured to use multiple threads to encrypt and send multicast messages.
A thread directive of 0 indicates that no threaded send should be used. This mode offers best
performance for non-SMP systems.
The default is 0.
fileline(各行是否要打印分割线)
This specifies that file and line should be printed.
The default is off.
[root@node1 corosync]# corosync-keygen(生成密钥)
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /etc/corosync/authkey.
[root@node1 corosync]# ll(查看当前目录文件及子目录详细信息)
total 40
-r-------- 1 root root 128 Apr 22 01:02 authkey
-rw-r--r-- 1 root root 510 Apr 22 01:02 corosync.conf
-rw-r--r-- 1 root root 436 Jul 28 2010 corosync.conf.example
drwxr-xr-x 2 root root 4096 Jul 28 2010 service.d
drwxr-xr-x 2 root root 4096 Jul 28 2010 uidgid.d
提示:保存为authkey,默认权限400,这个不是文本文件不能cat打开的;
[root@node1 corosync]# file authkey(查看authkey文件类型)
authkey: data
[root@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/(复制authkey和corosync.conf文件到node2主机的/etc/corosync目录,-p保留
原来文件属性)
authkey 100% 128 0.1KB/s 00:00
corosync.conf 100% 510 0.5KB/s 00:00
[root@node1 corosync]# mkdir /var/log/cluster(创建日志文件目录)
[root@node1 corosync]# ssh node2 'mkdir /var/log/cluster'(远程为node2主机创建cluster目录)
[root@node1 corosync]# service corosync start(启动corosync服务)
Starting Corosync Cluster Engine (corosync): [ OK ]
[root@node1 corosync]# ssh node2 '/etc/init.d/corosync start'(启动node2主机的corosync服务)
Starting Corosync Cluster Engine (corosync): [ OK ]
[root@node1 corosync]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log(查看corosync引擎是否正常启动,
-e代表egrep)
Apr 22 01:07:16 corosync [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Apr 22 01:07:16 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
[root@node1 corosync]# grep TOTEM /var/log/cluster/corosync.log(查看初始化成员节点通知是否正常发出)
Apr 22 01:31:18 corosync [TOTEM ] Initializing transport (UDP/IP).
Apr 22 01:31:18 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr 22 01:31:18 corosync [TOTEM ] The network interface [172.16.100.6] is now up.
Apr 22 01:31:19 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
[root@node1 corosync]# grep ERROR: /var/log/cluster/corosync.log(检查启动过程中是否有错误产生)
Apr 22 01:08:19 node1.magedu.com pengine: [32266]: ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been
defined(集群成员没有配置STONITH设备)
Apr 22 01:08:19 node1.magedu.com pengine: [32266]: ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabl
ed option
Apr 22 01:08:19 node1.magedu.com pengine: [32266]: ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data int
egrity
[root@node1 corosync]# grep pcmk_startup /var/log/cluster/corosync.log(查看pacemaker是否正常启动)
Apr 22 01:07:16 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
Apr 22 01:07:16 corosync [pcmk ] Logging: Initialized pcmk_startup
Apr 22 01:07:16 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Apr 22 01:07:16 corosync [pcmk ] info: pcmk_startup: Service: 9
Apr 22 01:07:16 corosync [pcmk ] info: pcmk_startup: Local hostname: node1.magedu.com
[root@node1 corosync]# crm_mon(监控集群)
============
Last updated: Fri Apr 22 01:32:59 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum(当前DC是node1,有没有quorum,具备不具备法定票数,具备,没有配置ping_node,当一个节点下线很有可能产生分裂)
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes(现在配置了两个节点,期望的票数是两票)
0 Resources configured.(没有任何资源)
============
Online: [ node1.magedu.com node2.magedu.com ](在线的节点)
crm: 两种模式
交互式:
配置,执行commit命令以后才生效
批处理:
立即生效
6、配置集群的工作属性,禁用stonith
corosync默认启用了stonith,而当前集群并没有相应的stonith设备,因此此默认配置目前尚不可用,这可以通过如下命令验正:
# crm_verify -L
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
-V may provide more details
我们里可以通过如下命令先禁用stonith:
# crm configure property stonith-enabled=false
使用如下命令查看当前的配置信息:
# crm configure show
node node1.magedu.com
node node2.magedu.com
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false
从中可以看出stonith已经被禁用。
上面的crm,crm_verify命令是1.0后的版本的pacemaker提供的基于命令行的集群管理工具;可以在集群中的任何一个节点上执行。
7、为集群添加集群资源
corosync支持heartbeat,LSB和ocf等类型的资源代理,目前较为常用的类型为LSB和OCF两类,stonith类专为配置stonith设备而用;
可以通过如下命令查看当前集群系统所支持的类型:
# crm ra classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
如果想要查看某种类别下的所用资源代理的列表,可以使用类似如下命令实现:
# crm ra list lsb
# crm ra list ocf heartbeat
# crm ra list ocf pacemaker
# crm ra list stonith
# crm ra info [class:[provider:]]resource_agent
例如:
# crm ra info ocf:heartbeat:IPaddr
8、接下来要创建的web集群创建一个IP地址资源,以在通过集群提供web服务时使用;这可以通过如下方式实现:
语法:
primitive <rsc> [<class>:[<provider>:]]<type>
[params attr_list]
[operations id_spec]
[op op_type [<attribute>=<value>...] ...]
op_type :: start | stop | monitor
例子:
primitive apcfence stonith:apcsmart \
params ttydev=/dev/ttyS0 hostlist="node1 node2" \
op start timeout=60s \
op monitor interval=30m timeout=60s
应用:
# crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=172.16.100.1
通过如下的命令执行结果可以看出此资源已经在node1.magedu.com上启动:
# crm status
============
Last updated: Tue Jun 14 19:31:05 2011
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
WebIP (ocf::heartbeat:IPaddr): Started node1.magedu.com
当然,也可以在node1上执行ifconfig命令看到此地址已经在eth0的别名上生效:
# ifconfig
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:AA:DD:CF
inet addr:172.16.100.1 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:67 Base address:0x2000
而后我们到node2上通过如下命令停止node1上的corosync服务:
# ssh node1 -- /etc/init.d/corosync stop
查看集群工作状态:
# crm status
============
Last updated: Tue Jun 14 19:37:23 2011
Stack: openais
Current DC: node2.magedu.com - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.magedu.com ]
OFFLINE: [ node1.magedu.com ]
上面的信息显示node1.magedu.com已经离线,但资源WebIP却没能在node2.magedu.com上启动。这是因为此时的集群状态为"WITHOUT quorum",即已经失去了quorum,此时集群服务本身已经不满足正常运行的条件,这对于只有两节点的集群来讲是不合理的。因此,我们可以通过如下的命令来修改忽略quorum不能满足的集群状态检查:
# crm configure property no-quorum-policy=ignore(不具备法定票数采取的策略,ignore忽略)
片刻之后,集群就会在目前仍在运行中的节点node2上启动此资源了,如下所示:
# crm status
============
Last updated: Tue Jun 14 19:43:42 2011
Stack: openais
Current DC: node2.magedu.com - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.magedu.com ]
OFFLINE: [ node1.magedu.com ]
WebIP (ocf::heartbeat:IPaddr): Started node2.magedu.com
好了,验正完成后,我们正常启动node1.magedu.com:
# ssh node1 -- /etc/init.d/corosync start
正常启动node1.magedu.com后,集群资源WebIP很可能会重新从node2.magedu.com转移回node1.magedu.com。资源的这种在节点间每一次的来回流动都会造成那段时间内其无法正常被访问,所以,我们有时候需要在资源因为节点故障转移到其它节点后,即便原来的节点恢复正常也禁止资源再次流转回来。这可以通过定义资源的黏性(stickiness)来实现。在创建资源时或在创建资源后,都可以指定指定资源黏性。
资源黏性值范围及其作用:
0:这是默认选项。资源放置在系统中的最适合位置。这意味着当负载能力“较好”或较差的节点变得可用时才转移资源。此选项的作用基本等同于自动故障回复,只是资源可能会转移到非之前活动的节点上;
大于0:资源更愿意留在当前位置,但是如果有更合适的节点可用时会移动。值越高表示资源越愿意留在当前位置;
小于0:资源更愿意移离当前位置。绝对值越高表示资源越愿意离开当前位置;
INFINITY:如果不是因节点不适合运行资源(节点关机、节点待机、达到migration-threshold 或配置更改)而强制资源转移,资源总是留在当前位置。此选项的作用几乎等同于完全禁用自动故障回复;
-INFINITY:资源总是移离当前位置;
我们这里可以通过以下方式为资源指定默认黏性值:
# crm configure rsc_defaults resource-stickiness=100(res_defaults资源默认属性)
HA1:
[root@node1 corosync]# crm_mon(监控集群)
============
Last updated: Fri Apr 22 01:32:59 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
提示:一个正常情况下的corosync集群或者heartbeat集群应该有三个才是比较理想的状态,否则它可能会出现一些意外情况,比如说一个节点宕机以后我们的服务或者资源不会在另外一个节
点上自动自动起来;
[root@node1 corosync]# crm(查看crm开头相关命令)
crm crm_attribute crm_failcount crm_mon crm_report crm_shadow crm_standby crm_verify
crmadmin crm_diff crm_master crm_node crm_resource crm_simulate crm_uuid
提示:此前的crm_sh被此处的crm所取代,而且这个crm命令比crm_sh要强大的多;
[root@node1 corosync]# crm(进入crm的sh)
crm(live)# help(查看帮助)
This is the CRM command line interface program.
Available commands:
cib manage shadow CIBs
resource resources management(管理资源)
configure CRM cluster configuration
node nodes management(管理节点)
options user preferences
ra resource agents information center(管理资源代理)
status show cluster status(查看集群状态)
quit,bye,exit exit the program
help show help
end,cd,up go back one level
crm(live)# resource(进入资源模式)
crm(live)resource# help(查看帮助)
At this level resources may be managed.
All (or almost all) commands are implemented with the CRM tools
such as `crm_resource(8)`.
Available commands:
status show status of resources(查看资源状态)
start start a resource(启动资源)
stop stop a resource(停止资源)
restart restart a resource(重启资源)
promote promote a master-slave resource(提升资源)
demote demote a master-slave resource(降级资源)
manage put a resource into managed mode(管理资源)
unmanage put a resource into unmanaged mode(反管理资源)
migrate migrate a resource to another node(迁移资源)
unmigrate unmigrate a resource to another node(反迁移一个资源)
param manage a parameter of a resource
meta manage a meta attribute
utilization manage a utilization attribute
failcount manage failcounts
cleanup cleanup resource status(清理资源状态)
refresh refresh CIB from the LRM status
reprobe probe for resources not started by the CRM
quit exit the program
help show help
end go back one level
crm(live)resource# cd ..(回到上级目录)
crm(live)# resource(进入资源模式)
crm(live)resource# help status(查看status命令帮助)
Print resource status. If the resource parameter is left out
status of all resources is printed.
Usage:
...............
status [<rsc>]
...............
crm(live)resource# cd(回到上级目录)
crm(live)# help(查看帮助)
This is the CRM command line interface program.
Available commands:
cib manage shadow CIBs
resource resources management
configure CRM cluster configuration
node nodes management
options user preferences
ra resource agents information center
status show cluster status
quit,bye,exit exit the program
help show help
end,cd,up go back one level
crm(live)# node(进入node模式)
crm(live)node# help(查看帮助)
Node management and status commands.
Available commands:
status show nodes' status(显示节点状态)
show show node(显示节点)
standby put node into standby(让一个节点转换为备节点)
online set node online(让一个节点上线)
fence fence node
clearstate Clear node state(清理节点状态信息)
delete delete node(删除节点)
attribute manage attributes
utilization manage utilization attributes
status-attr manage status attributes
quit exit the program
help show help
end go back one level
crm(live)node# cd(切换到上级目录)
crm(live)# help(查看帮助)
This is the CRM command line interface program.
Available commands:
cib manage shadow CIBs
resource resources management
configure CRM cluster configuration
node nodes management
options user preferences
ra resource agents information center
status show cluster status
quit,bye,exit exit the program
help show help
end,cd,up go back one level
crm(live)# configure(进入configure模式)
crm(live)configure# help(查看帮助)
This level enables all CIB object definition commands.
The configuration may be logically divided into four parts:
nodes, resources, constraints, and (cluster) properties and
attributes. Each of these commands support one or more basic CIB
objects.
Nodes and attributes describing nodes are managed using the
`node` command.
Commands for resources are:
- `primitive`
- `monitor`
- `group`
- `clone`
- `ms`/`master` (master-slave)
There are three types of constraints:
- `location`
- `colocation`
- `order`
Finally, there are the cluster properties, resource meta
attributes defaults, and operations defaults. All are just a set
of attributes. These attributes are managed by the following
commands:
- `property`
- `rsc_defaults`
- `op_defaults`
In addition to the cluster configuration, the Access Control
Lists (ACL) can be setup to allow access to parts of the CIB for
users other than `root` and `hacluster`. The following commands
manage ACL:
- `user`
- `role`
The changes are applied to the current CIB only on ending the
configuration session or using the `commit` command.
Comments start with `#` in the first line. The comments are tied
to the element which follows. If the element moves, its comments
will follow.
Available commands:
node define a cluster node(定义节点)
primitive define a resource(定义主资源)
monitor add monitor operation to a primitive
group define a group(定义组资源)
clone define a clone(定义克隆资源)
ms define a master-slave resource(定义主从资源)
location a location preference(位置约束)
colocation colocate resources(排列约束)
order order resources(顺序约束)
property set a cluster property(定义集群属性)
rsc_defaults set resource defaults(定义资源默认属性)
role define role access rights(定义角色访问权限)
user define user access rights(定义用户访问权限)
op_defaults set resource operations defaults
show display CIB objects(查看CIB对象,集群信息库)
edit edit CIB objects(编辑CIB集群信息库)
filter filter CIB objects
delete delete CIB objects
default-timeouts set timeouts for operations to minimums from the meta-data
rename rename a CIB object
refresh refresh from CIB
erase erase the CIB(情况CIB集群信息库)
ptest show cluster actions if changes were committed
cib CIB shadow management
cibstatus CIB status management and editing(CIB状态)
template edit and import a configuration from a template
commit commit the changes to the CIB
verify verify the CIB with crm_verify
upgrade upgrade the CIB to version 1.0
save save the CIB to a file
load import the CIB from a file
xml raw xml(xml的CIB配置)
quit exit the program
help show help
end go back one level
crm(live)configure# cd(回到上级目录)
crm(live)# help(查看帮助)
This is the CRM command line interface program.
Available commands:
cib manage shadow CIBs
resource resources management
configure CRM cluster configuration
node nodes management
options user preferences
ra resource agents information center(资源代理)
status show cluster status(查看集群状态)
quit,bye,exit exit the program
help show help
end,cd,up go back one level
crm(live)# ra(进入ra模式)
crm(live)ra# help(查看帮助)
This level contains commands which show various information about
the installed resource agents. It is available both at the top
level and at the `configure` level.
Available commands:
classes list classes and providers(列出资源类别和提供商)
list list RA for a class (and provider)(显示某一个类别下某一个提供商所提供的所有ra代理)
meta show meta data for a RA
providers show providers for a RA and a class
quit exit the program
help show help
end go back one level
crm(live)ra# classes(显示当前有几个类别)
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
crm(live)ra# help list(查看list命令帮助)
List available resource agents for the given class. If the class
is `ocf`, supply a provider to get agents which are available
only from that provider.
Usage:
...............
list <class> [<provider>]
...............
Example:
...............
list ocf pacemaker
...............
crm(live)ra# list lsb(显示lsb类别所有ra代理)
NetworkManager acpid anacron apmd atd auditd
autofs avahi-daemon avahi-dnsconfd bluetooth capi conman
corosync cpuspeed crond cups cups-config-daemon dnsmasq
dund firstboot functions gpm haldaemon halt
heartbeat hidd hplip httpd ip6tables ipmi
iptables ipvsadm irda irqbalance iscsi iscsid
isdn kdump killall krb524 kudzu lm_sensors
logd lvm2-monitor mcstrans mdmonitor mdmpd messagebus
microcode_ctl multipathd netconsole netfs netplugd network
nfs nfslock nscd ntpd openibd pacemaker
pand pcscd portmap psacct rawdevices rdisc
readahead_early readahead_later restorecond rhnsd rhsmcertd rpcgssd
rpcidmapd rpcsvcgssd saslauthd sendmail setroubleshoot single
smartd sshd syslog vncserver wdaemon winbind
wpa_supplicant xfs xinetd ypbind yum-updatesd
crm(live)ra# list ocf(显示ocf类别所有ra代理)
AoEtarget AudibleAlarm CTDB ClusterMon Delay Dummy
EvmsSCC Evmsd Filesystem HealthCPU HealthSMART ICP
IPaddr IPaddr2 IPsrcaddr IPv6addr LVM LinuxSCSI
MailTo ManageRAID ManageVE Pure-FTPd Raid1 Route
SAPDatabase SAPInstance SendArp ServeRAID SphinxSearchDaemon Squid
Stateful SysInfo SystemHealth VIPArip VirtualDomain WAS
WAS6 WinPopup Xen Xinetd anything apache
conntrackd controld db2 drbd eDir88 exportfs
fio iSCSILogicalUnit iSCSITarget ids iscsi jboss
ldirectord mysql mysql-proxy nfsserver nginx o2cb
oracle oralsnr pgsql ping pingd portblock
postfix proftpd rsyncd scsi2reservation sfex syslog-ng
tomcat vmware
crm(live)ra# list ocf pacemaker(查看ocf类别pacemaker提供商的ra代理)
ClusterMon Dummy HealthCPU HealthSMART Stateful SysInfo SystemHealth controld o2cb
ping pingd
crm(live)ra# list ocf heartbeat(查看ocf类别heartbeat提供商的ra代理)
AoEtarget AudibleAlarm CTDB ClusterMon Delay Dummy
EvmsSCC Evmsd Filesystem ICP IPaddr IPaddr2
IPsrcaddr IPv6addr LVM LinuxSCSI MailTo ManageRAID
ManageVE Pure-FTPd Raid1 Route SAPDatabase SAPInstance
SendArp ServeRAID SphinxSearchDaemon Squid Stateful SysInfo
VIPArip VirtualDomain WAS WAS6 WinPopup Xen
Xinetd anything apache conntrackd db2 drbd
eDir88 exportfs fio iSCSILogicalUnit iSCSITarget ids
iscsi jboss ldirectord mysql mysql-proxy nfsserver
nginx oracle oralsnr pgsql pingd portblock
postfix proftpd rsyncd scsi2reservation sfex syslog-ng
tomcat vmware
crm(live)ra# help(查看帮助)
This level contains commands which show various information about
the installed resource agents. It is available both at the top
level and at the `configure` level.
Available commands:
classes list classes and providers
list list RA for a class (and provider)
meta show meta data for a RA(显示ra的源信息)
providers show providers for a RA and a class(显示资源代理和类的提供商)
quit exit the program
help show help
end go back one level
crm(live)ra# help meta(查看meta命令帮助)
Show the meta-data of a resource agent type. This is where users
can find information on how to use a resource agent.
Usage:
...............
meta [<class>:[<provider>:]]<type>(class类别,provider提供商,type那一个资源代理)
meta <type> <class> [<provider>] (obsolete)
...............
Example:
...............
meta apache
meta ocf:pacemaker:Dummy
meta stonith:ipmilan
...............
crm(live)ra# meta ocf:heartbeat:IPaddr(查看ocf类别、heartbeat提供商、IPaddr资源代理源信息)
Manages virtual IPv4 addresses (portable version) (ocf:heartbeat:IPaddr)
This script manages IP alias IP addresses
It can add an IP alias, or remove one.
Parameters (* denotes required, [] the default):(*星号必须的,[]中括号默认值)
ip* (string): IPv4 address
The IPv4 address to be configured in dotted quad notation, for example
"192.168.1.1".
nic (string, [eth0]): Network interface
The base network interface on which the IP address will be brought
online.
If left empty, the script will try and determine this from the
routing table.
Do NOT specify an alias interface in the form eth0:1 or anything here;
rather, specify the base interface only.
Prerequisite:
There must be at least one static IP address, which is not managed by
the cluster, assigned to the network interface.
If you can not assign any static IP address on the interface,
modify this kernel parameter:
sysctl -w net.ipv4.conf.all.promote_secondaries=1
(or per device)
cidr_netmask (string): Netmask
The netmask for the interface in CIDR format. (ie, 24), or in
dotted quad notation 255.255.255.0).
If unspecified, the script will also try to determine this from the
routing table.
broadcast (string): Broadcast address
Broadcast address associated with the IP. If left empty, the script will
determine this from the netmask.
iflabel (string): Interface label
You can specify an additional label for your IP address here.
lvs_support (boolean, [false]): Enable support for LVS DR
Enable support for LVS Direct Routing configurations. In case a IP
address is stopped, only move it to the loopback device to allow the
local node to continue to service requests, but no longer advertise it
on the network.
local_stop_script (string):
Script called when the IP is released
local_start_script (string):
Script called when the IP is added
ARP_INTERVAL_MS (integer, [500]): milliseconds between gratuitous ARPs
milliseconds between ARPs
ARP_REPEAT (integer, [10]): repeat count
How many gratuitous ARPs to send out when bringing up a new address
ARP_BACKGROUND (boolean, [yes]): run in background
run in background (no longer any reason to do this)
ARP_NETMASK (string, [ffffffffffff]): netmask for ARP
netmask for ARP - in nonstandard hexadecimal format.
Operations' defaults (advisory minimum):(对于资源最小监控值)
start timeout=20s(启动资源时候等待时间)
stop timeout=20s(等待停止资源超时时间)
monitor interval=5s timeout=20s(监控,间隔多少做次检测,等待时间)
crm(live)ra# help providers(查看providers的帮助)
List providers for a resource agent type. The class parameter
defaults to `ocf`.
Usage:
...............
providers <type> [<class>]
...............
Example:
...............
providers apache
...............
crm(live)ra# providers IPaddr(显示IPaddr资源代理的提供商)
heartbeat
crm(live)ra# cd(切换上级目录)
crm(live)# configure(切换到configure模式)
crm(live)configure# help(查看帮助)
This level enables all CIB object definition commands.
The configuration may be logically divided into four parts:
nodes, resources, constraints, and (cluster) properties and
attributes. Each of these commands support one or more basic CIB
objects.
Nodes and attributes describing nodes are managed using the
`node` command.
Commands for resources are:
- `primitive`
- `monitor`
- `group`
- `clone`
- `ms`/`master` (master-slave)
There are three types of constraints:
- `location`
- `colocation`
- `order`
Finally, there are the cluster properties, resource meta
attributes defaults, and operations defaults. All are just a set
of attributes. These attributes are managed by the following
commands:
- `property`
- `rsc_defaults`
- `op_defaults`
In addition to the cluster configuration, the Access Control
Lists (ACL) can be setup to allow access to parts of the CIB for
users other than `root` and `hacluster`. The following commands
manage ACL:
- `user`
- `role`
The changes are applied to the current CIB only on ending the
configuration session or using the `commit` command.
Comments start with `#` in the first line. The comments are tied
to the element which follows. If the element moves, its comments
will follow.
Available commands:
node define a cluster node
primitive define a resource
monitor add monitor operation to a primitive
group define a group
clone define a clone
ms define a master-slave resource
location a location preference
colocation colocate resources
order order resources
property set a cluster property(设置集群性能)
rsc_defaults set resource defaults
role define role access rights
user define user access rights
op_defaults set resource operations defaults
show display CIB objects
edit edit CIB objects
filter filter CIB objects
delete delete CIB objects
default-timeouts set timeouts for operations to minimums from the meta-data
rename rename a CIB object
refresh refresh from CIB
erase erase the CIB
ptest show cluster actions if changes were committed
cib CIB shadow management
cibstatus CIB status management and editing
template edit and import a configuration from a template
commit commit the changes to the CIB(提交改变的集群信息库)
verify verify the CIB with crm_verify(检查配置中有没有语法错误)
upgrade upgrade the CIB to version 1.0
save save the CIB to a file
load import the CIB from a file
xml raw xml
quit exit the program
help show help
end go back one level
crm(live)configure# verify(检查集群中错误)
crm_verify[504]: 2016/04/22_02:41:47 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[504]: 2016/04/22_02:41:47 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[504]: 2016/04/22_02:41:47 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
提示:错误,没有配置STONITH,我们没有STONITH设备,要么配置STONITH,要么把它关闭,表示不使用STONITH,这需要配置集群属性,
crm(live)configure# property(进入property模式)
usage: property [$id=<set_id>] <option>=<value>($id属性名称,)
crm(live)configure# show(查看当前配置)
node node1.magedu.com
node node2.magedu.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \(DC版本号)
cluster-infrastructure="openais" \(集群信息架构)
expected-quorum-votes="2"(期望法定票数)
crm(live)configure# property stonith-enabled=false(关闭stonith设备)
crm(live)configure# verify (检查集群错误)
crm(live)configure# commit(提交集群信息库)
crm(live)configure# show(查看当前配置)
node node1.magedu.com
node node2.magedu.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)configure# help(查看帮助)
This level enables all CIB object definition commands.
The configuration may be logically divided into four parts:
nodes, resources, constraints, and (cluster) properties and
attributes. Each of these commands support one or more basic CIB
objects.
Nodes and attributes describing nodes are managed using the
`node` command.
Commands for resources are:
- `primitive`
- `monitor`
- `group`
- `clone`
- `ms`/`master` (master-slave)
There are three types of constraints:
- `location`
- `colocation`
- `order`
Finally, there are the cluster properties, resource meta
attributes defaults, and operations defaults. All are just a set
of attributes. These attributes are managed by the following
commands:
- `property`
- `rsc_defaults`
- `op_defaults`
In addition to the cluster configuration, the Access Control
Lists (ACL) can be setup to allow access to parts of the CIB for
users other than `root` and `hacluster`. The following commands
manage ACL:
- `user`
- `role`
The changes are applied to the current CIB only on ending the
configuration session or using the `commit` command.
Comments start with `#` in the first line. The comments are tied
to the element which follows. If the element moves, its comments
will follow.
Available commands:
node define a cluster node
primitive define a resource(定义基本资源)
monitor add monitor operation to a primitive
group define a group
clone define a clone
ms define a master-slave resource
location a location preference
colocation colocate resources
order order resources
property set a cluster property(设置集群性能)
rsc_defaults set resource defaults
role define role access rights
user define user access rights
op_defaults set resource operations defaults
show display CIB objects
edit edit CIB objects
filter filter CIB objects
delete delete CIB objects
default-timeouts set timeouts for operations to minimums from the meta-data
rename rename a CIB object
refresh refresh from CIB
erase erase the CIB
ptest show cluster actions if changes were committed
cib CIB shadow management
cibstatus CIB status management and editing
template edit and import a configuration from a template
commit commit the changes to the CIB(提交改变的集群信息库)
verify verify the CIB with crm_verify(检查配置中有没有语法错误)
upgrade upgrade the CIB to version 1.0
save save the CIB to a file
load import the CIB from a file
xml raw xml
quit exit the program
help show help
end go back one level
crm(live)configure# help primitive(查看primitive的帮助)
The primitive command describes a resource. It may be referenced
only once in group, clone, or master-slave objects. If it's not
referenced, then it is placed as a single resource in the CIB.
Operations may be specified in three ways. "Anonymous" as a
simple list of "op" specifications. Use that if you don't want to
reference the set of operations elsewhere. That's by far the most
common way to define operations. If reusing operation sets is
desired, use the "operations" keyword along with the id to give
the operations set a name and the id-ref to reference another set
of operations.
Operation's attributes which are not recognized are saved as
instance attributes of that operation. A typical example is
`OCF_CHECK_LEVEL`.
For multistate resources, roles are specified as `role=<role>`.
Usage:
...............
primitive <rsc> [<class>:[<provider>:]]<type>(rsc资源名称,class那个ra,provider提供商,type)
[params attr_list]
[meta attr_list]
[utilization attr_list]
[operations id_spec](operations操作,)
[op op_type [<attribute>=<value>...] ...]
attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id>
id_spec :: $id=<id> | $id-ref=<id>
op_type :: start | stop | monitor(操作类型)
...............
Example:
...............
primitive apcfence(名字) stonith(那个cluster):apcsmart(资源代理) \
params(关键字) ttydev=/dev/ttyS0(属性=值) hostlist="node1 node2"(空格用引号引起来) \(\续行符)
op start timeout=60s \(等待超时时间)
op monitor interval=30m timeout=60s(间隔时间多长,超时时间多长)
primitive www8 apache \
params configfile=/etc/apache/www8.conf \
operations $id-ref=apache_ops
primitive db0 mysql \
params config=/etc/mysql/db0.conf \
op monitor interval=60s \
op monitor interval=300s OCF_CHECK_LEVEL=10
primitive r0 ocf:linbit:drbd \
params drbd_resource=r0 \
op monitor role=Master interval=60s \
op monitor role=Slave interval=300s
...............
[root@node1 ~]# crm(进入crm的shell)
crm(live)# ra(进入ra模式)
crm(live)ra# meta ocf:heartbeat:IPaddr(查看ocf类别、heartbeat提供商、IPaddr资源代理源信息)
Manages virtual IPv4 addresses (portable version) (ocf:heartbeat:IPaddr)
This script manages IP alias IP addresses
It can add an IP alias, or remove one.
Parameters (* denotes required, [] the default):
ip* (string): IPv4 address
The IPv4 address to be configured in dotted quad notation, for example
"192.168.1.1".
nic (string, [eth0]): Network interface
The base network interface on which the IP address will be brought
online.
If left empty, the script will try and determine this from the
routing table.
Do NOT specify an alias interface in the form eth0:1 or anything here;
rather, specify the base interface only.
Prerequisite:
There must be at least one static IP address, which is not managed by
the cluster, assigned to the network interface.
If you can not assign any static IP address on the interface,
modify this kernel parameter:
sysctl -w net.ipv4.conf.all.promote_secondaries=1
(or per device)
cidr_netmask (string): Netmask
The netmask for the interface in CIDR format. (ie, 24), or in
dotted quad notation 255.255.255.0).
If unspecified, the script will also try to determine this from the
routing table.
broadcast (string): Broadcast address
Broadcast address associated with the IP. If left empty, the script will
determine this from the netmask.
iflabel (string): Interface label
You can specify an additional label for your IP address here.
lvs_support (boolean, [false]): Enable support for LVS DR
Enable support for LVS Direct Routing configurations. In case a IP
address is stopped, only move it to the loopback device to allow the
local node to continue to service requests, but no longer advertise it
on the network.
local_stop_script (string):
Script called when the IP is released
local_start_script (string):
Script called when the IP is added
ARP_INTERVAL_MS (integer, [500]): milliseconds between gratuitous ARPs
milliseconds between ARPs
ARP_REPEAT (integer, [10]): repeat count
How many gratuitous ARPs to send out when bringing up a new address
ARP_BACKGROUND (boolean, [yes]): run in background
run in background (no longer any reason to do this)
ARP_NETMASK (string, [ffffffffffff]): netmask for ARP
netmask for ARP - in nonstandard hexadecimal format.
Operations' defaults (advisory minimum):
start timeout=20s
stop timeout=20s
monitor interval=5s timeout=20s
(END)
crm(live)configure# primitive(主资源) webip(资源名称) ocf(类别):heartbeat(提供商):IPaddr(资源代理) params(参数) ip=172.16.100.1 nic=eth0
cidr_netmask=16
crm(live)configure# verify(检查集群配置)
crm(live)configure# commit(提交)
crm(live)configure# show(查看集群信息库对象)
node node1.magedu.com
node node2.magedu.com
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16"
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)configure# xml(执行xml命令)
usage: xml <xml>
crm(live)configure# show xml(查看xml格式的)
<?xml version="1.0" ?>
<cib admin_epoch="0" cib-last-written="Fri Apr 22 01:08:19 2016" crm_feature_set="3.0.5" dc-uuid="node1.magedu.com" epoch="7" have-quorum="1"
num_updates="4" validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="node2.magedu.com" type="normal" uname="node2.magedu.com"/>
<node id="node1.magedu.com" type="normal" uname="node1.magedu.com"/>
</nodes>
<resources>
<primitive class="ocf" id="webip" provider="heartbeat" type="IPaddr">
<instance_attributes id="webip-instance_attributes">
<nvpair id="webip-instance_attributes-ip" name="ip" value="172.16.100.1"/>
<nvpair id="webip-instance_attributes-nic" name="nic" value="eth0"/>
<nvpair id="webip-instance_attributes-cidr_netmask" name="cidr_netmask" value="16"/>
</instance_attributes>
</primitive>
</resources>
<constraints/>
</configuration>
</cib>
(END)
crm(live)configure# exit(退出)
bye
[root@node1 ~]# crm_mon(监控集群)
============
Last updated: Fri Apr 22 04:16:44 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
[root@node1 ~]# crm_mon --one-shot(显示集群,--one-shot只显示一次)
============
Last updated: Fri Apr 22 04:17:38 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
[root@node1 ~]# crm(进入crm的shell)
crm(live)# status(显示集群状态)
============
Last updated: Fri Apr 22 04:18:23 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
crm(live)# quit(退出)
bye
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 04:19:17 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
[root@node1 ~]# ifconfig(查看系统网卡信息)
eth0 Link encap:Ethernet HWaddr 00:0C:29:CC:FA:AE
inet addr:172.16.100.6 Bcast:172.16.100.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fecc:faae/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:308086 errors:0 dropped:0 overruns:0 frame:0
TX packets:344360 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:33099139 (31.5 MiB) TX bytes:45164348 (43.0 MiB)
Interrupt:67 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:CC:FA:AE
inet addr:172.16.100.1 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:67 Base address:0x2000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:64135 errors:0 dropped:0 overruns:0 frame:0
TX packets:64135 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6294473 (6.0 MiB) TX bytes:6294473 (6.0 MiB)
[root@node1 ~]# crm(进入crm的shell)
crm(live)# resource(进入resource模式)
crm(live)resource# stop webip(停止webip资源)
crm(live)resource# cd(切换到上级目录)
crm(live)# status(查看集群状态)
============
Last updated: Fri Apr 22 04:58:07 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
crm(live)# resource
crm(live)resource# list(列出资源)
webip (ocf::heartbeat:IPaddr) Stopped
crm(live)resource# start webip(启动webip资源)
crm(live)resource# list(列出资源)
webip (ocf::heartbeat:IPaddr) Started
crm(live)resource# cd(切换到上级目录)
crm(live)# status(查看集群状态)
============
Last updated: Fri Apr 22 05:00:59 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
crm(live)# resource(进入resource模式)
crm(live)resource# help(帮助)
At this level resources may be managed.
All (or almost all) commands are implemented with the CRM tools
such as `crm_resource(8)`.
Available commands:
status show status of resources
start start a resource
stop stop a resource
restart restart a resource
promote promote a master-slave resource
demote demote a master-slave resource
manage put a resource into managed mode
unmanage put a resource into unmanaged mode
migrate migrate a resource to another node(手动迁移资源到其它节点)
unmigrate unmigrate a resource to another node
param manage a parameter of a resource
meta manage a meta attribute
utilization manage a utilization attribute
failcount manage failcounts
cleanup cleanup resource status
refresh refresh CIB from the LRM status
reprobe probe for resources not started by the CRM
quit exit the program
help show help
end go back one level
crm(live)resource# migrate webip(手动迁移资源到其它节点)
WARNING: Creating rsc_location constraint 'cli-standby-webip' with a score of -INFINITY for resource webip on node2.magedu.com.
This will prevent webip from running on node2.magedu.com until the constraint is removed using the 'crm_resource -U' command or manu
ally with cibadmin
This will be the case even if node2.magedu.com is the last node in the cluster
This message can be disabled with -Q
crm(live)resource# list(列出资源)
webip (ocf::heartbeat:IPaddr) Started
crm(live)resource# cd(切换到上级目录)
crm(live)# status(显示集群状态)
============
Last updated: Fri Apr 22 05:08:43 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node2.magedu.com
提示:迁移到node2.magedu.com主机;
HA2:
[root@node2 ~]# ifconfig(显示网卡接口信息)
eth0 Link encap:Ethernet HWaddr 00:0C:29:8A:44:AB
inet addr:172.16.100.7 Bcast:172.16.100.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe8a:44ab/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:140623 errors:0 dropped:0 overruns:0 frame:0
TX packets:91933 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:23470489 (22.3 MiB) TX bytes:16734883 (15.9 MiB)
Interrupt:67 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:8A:44:AB
inet addr:172.16.100.1 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:67 Base address:0x2000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:14176 errors:0 dropped:0 overruns:0 frame:0
TX packets:14176 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1945507 (1.8 MiB) TX bytes:1945507 (1.8 MiB)
HA1:
crm(live)# resource(切换到resource模式) crm(live)resource# unmigrate webip(不做迁移) crm(live)resource# list(列出资源) webip (ocf::heartbeat:IPaddr) Started crm(live)resource# cd(切换到上级目录) crm(live)# status(查看集群状态) ============ Last updated: Fri Apr 22 05:12:58 2016 Stack: openais Current DC: node1.magedu.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1.magedu.com node2.magedu.com ] webip (ocf::heartbeat:IPaddr): Started node2.magedu.com 提示:还在node2.magedu.com节点上,unmigrate表示不做迁移,并不是迁移回来,也不是迁移出去; crm(live)# resource(进入resource模式) crm(live)resource# migrate webip(迁移webip资源) WARNING: Creating rsc_location constraint 'cli-standby-webip' with a score of -INFINITY for resource webip on node2.magedu.com. This will prevent webip from running on node2.magedu.com until the constraint is removed using the 'crm_resource -U' command or manua lly with cibadmin This will be the case even if node2.magedu.com is the last node in the cluster This message can be disabled with -Q crm(live)resource# cd(切换到上级目录) crm(live)# status(查看集群状态) ============ Last updated: Fri Apr 22 05:14:46 2016 Stack: openais Current DC: node1.magedu.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1.magedu.com node2.magedu.com ] webip (ocf::heartbeat:IPaddr): Started node1.magedu.com crm(live)# quit(退出) bye [root@node1 ~]# yum -y install httpd(通过yum源安装httpd)
HA2:
[root@node2 ~]# yum -y install httpd(通过yum源安装httpd)
HA1:
[root@node1 ~]# setenforce 0(关闭selinux)
HA2:
[root@node2 ~]# setenforce 0(关闭selinux)
HA1:
[root@node1 ~]# echo "<h1>node1.magedu.com</h1>" >> /var/www/html/index.html(显示字符串输出追加到index.html文件) [root@node1 ~]# service httpd start(启动httpd服务) Starting httpd: [ OK ]
测试:通过Windows的ie浏览器访问172.16.100.6,可以正常访问;

[root@node1 ~]# service httpd stop(停止httpd服务) Stopping httpd: [ OK ] [root@node1 ~]# chkconfig httpd off(关闭httpd服务在相应系统级别开机自动启动) HA2: [root@node2 ~]# echo "<h1>node2.magedu.com</h1>" >> /var/www/html/index.html(显示字符串输出追加到index.html文件) [root@node2 ~]# service httpd start(启动httpd服务) 启动 httpd: [确定]
测试:通过Windows的ie浏览器访问172.16.100.7,可以正常访问;

[root@node2 ~]# service httpd stop(停止httpd服务) 停止 httpd: [确定] [root@node2 ~]# chkconfig httpd off(关闭httpd服务在相应系统级别开机自动启动)
HA1:
[root@node1 ~]# crm(进入crm的shell)
crm(live)# ra(进入ra模式)
crm(live)ra# providers httpd(查看httpd提供商)
crm(live)ra# classes(列出类和提供商)
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
crm(live)ra# list lsb(查看lsb类别的资源代理)
NetworkManager acpid anacron apmd atd auditd
autofs avahi-daemon avahi-dnsconfd bluetooth capi conman
corosync cpuspeed crond cups cups-config-daemon dnsmasq
dund firstboot functions gpm haldaemon halt
heartbeat hidd hplip httpd ip6tables ipmi
iptables ipvsadm irda irqbalance iscsi iscsid
isdn kdump killall krb524 kudzu lm_sensors
logd lvm2-monitor mcstrans mdmonitor mdmpd messagebus
microcode_ctl multipathd netconsole netfs netplugd network
nfs nfslock nscd ntpd openibd pacemaker
pand pcscd portmap psacct rawdevices rdisc
readahead_early readahead_later restorecond rhnsd rhsmcertd rpcgssd
rpcidmapd rpcsvcgssd saslauthd sendmail setroubleshoot single
smartd sshd syslog vncserver wdaemon winbind
wpa_supplicant xfs xinetd ypbind yum-updatesd
crm(live)ra# meta lsb:httpd(列出资源代理元数据)
lsb:httpd
Apache is a World Wide Web server. It is used to serve \
HTML files and CGI.
Operations' defaults (advisory minimum):
start timeout=15
stop timeout=15
status timeout=15
restart timeout=15
force-reload timeout=15
monitor interval=15 timeout=15 start-delay=15
crm(live)ra# cd(切换到上级目录)
crm(live)# configure(进入configure模式)
crm(live)configure# primitive httpd lsb:httpd op start timeout=20s(定义基本资源httpd,lsb类别,httpd,op操作类型,启动超时时间20s)
crm(live)configure# show(列出资源)
node node1.magedu.com
node node2.magedu.com
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta target-role="Started" is-managed="true"
location cli-standby-webip webip \
rule $id="cli-standby-rule-webip" -inf: #uname eq node2.magedu.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)configure# verify(检查集群配置)
crm(live)configure# commit(提交)
crm(live)configure# show(列出资源)
node node1.magedu.com
node node2.magedu.com
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta target-role="Started" is-managed="true"
location cli-standby-webip webip \
rule $id="cli-standby-rule-webip" -inf: #uname eq node2.magedu.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
HA2:
[root@node2 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 05:41:42 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node2.magedu.com
提示:webip启动了在节点1,httpd启动了,在节点2,默认情况下它们启动是平衡的,尽可能运行在不同的节点上;
crm(live)configure# help group(查看group的帮助)
The `group` command creates a group of resources.
Usage:
...............
group <name>(组名) <rsc> [<rsc>...](资源名...)
[meta attr_list]
[params attr_list]
attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id>
...............
Example:
...............
group internal_www disk0 fs0 internal_ip apache \
meta target_role=stopped
...............
(END)
crm(live)configure# group webservice webip httpd(创建组资源webservice,将webip和httpd基本资源加入到组资源)
INFO: resource references in location:cli-standby-webip updated
crm(live)configure# verify(检查集群配置)
crm(live)configure# commit(提交)
HA2:
[root@node2 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 05:45:59 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
提示:现在webip和httpd资源运行在同一个节点上了;
测试:通过Windows的ie浏览器访问172.16.100.1,可以访问;

HA1:
crm(live)configure# cd(切换到上级目录)
crm(live)# node(切换到node模式)
crm(live)node# help(查看帮助)
Node management and status commands.
Available commands:
status show nodes' status
show show node
standby put node into standby(把当前节点作为备用)
online set node online
fence fence node
clearstate Clear node state
delete delete node
attribute manage attributes
utilization manage utilization attributes
status-attr manage status attributes
quit exit the program
help show help
end go back one level
crm(live)node# quit(退出)
bye
[root@node1 ~]# crm node standby(把当前节点作为备用节点)
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 05:51:24 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Node node1.magedu.com: standby
Online: [ node2.magedu.com ]
提示:资源没有了,应该在节点2启动,只有两个节点,停止一个就不具备法定票数了,不具备法定票数资源就不会启动的,默认所有资源stop,可以使用freeze冻结,ignore忽略,不做任
何操作,不具备法定票数仍然让它启动;
[root@node1 ~]# crm node online(让当前节点上线)
[root@node1 ~]# crm status(显示集群状态)
============
Last updated: Fri Apr 22 06:02:16 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
[root@node1 ~]# crm configure(进入crm的shell的configure模式)
crm(live)configure# show(显示集群配置)
node node1.magedu.com \
attributes standby="off"
node node2.magedu.com
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta target-role="Started" is-managed="true"
group webservice webip httpd
location cli-standby-webip webservice \
rule $id="cli-standby-rule-webip" -inf: #uname eq node2.magedu.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
提示:就算不具备法定票数,也照样能让它启动;
crm(live)configure# property no-quorum-policy=ignore(修改集群属性,不具备法定票数忽略)
crm(live)configure# verify(检查集群配置)
crm(live)configure# commit(提交)
crm(live)configure# exit(退出)
bye
[root@node1 ~]# crm configure show(查看集群配置)
node node1.magedu.com \
attributes standby="off"
node node2.magedu.com
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta target-role="Started" is-managed="true"
group webservice webip httpd
location cli-standby-webip webservice \
rule $id="cli-standby-rule-webip" -inf: #uname eq node2.magedu.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
[root@node1 ~]# crm node standby(将当前节点转换为备用)
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 06:27:37 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Node node1.magedu.com: standby
Online: [ node2.magedu.com ]
[root@node1 ~]# crm node online(让当前节点上线)
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 06:28:47 2016
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
HA2:
[root@node2 ~]# ssh node1 '/etc/init.d/corosync stop'(停止节点1的corosync服务)
Signaling Corosync Cluster Engine (corosync) to terminate: [确定]
Waiting for corosync services to unload:.......[确定]
[root@node2 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 06:30:14 2016
Stack: openais
Current DC: node2.magedu.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.magedu.com ]
OFFLINE: [ node1.magedu.com ]
[root@node2 ~]# crm configure show(显示资源配置)
node node1.magedu.com \
attributes standby="off"
node node2.magedu.com
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta target-role="Started" is-managed="true"
group webservice webip httpd
location cli-standby-webip webservice \
rule $id="cli-standby-rule-webip" -inf: #uname eq node2.magedu.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
[root@node2 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 06:31:40 2016
Stack: openais
Current DC: node2.magedu.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.magedu.com ]
OFFLINE: [ node1.magedu.com ]
[root@node2 ~]# ssh node1 '/etc/init.d/corosync start'(启动node1主机的corosync服务)
Starting Corosync Cluster Engine (corosync): [确定]
[root@node2 ~]# crm(进入crm的shell)
crm(live)# resource(进入resource模式)
crm(live)resource# stop webservice(停止组webservice)
crm(live)resource# list(列出资源)
Resource Group: webservice
webip (ocf::heartbeat:IPaddr) Stopped
httpd (lsb:httpd) Stopped
crm(live)resource# cleanup webservice(清理资源组)
Cleaning up webip on node1.magedu.com
Cleaning up webip on node2.magedu.com
Cleaning up httpd on node1.magedu.com
Cleaning up httpd on node2.magedu.com
Waiting for 5 replies from the CRMd..... OK
crm(live)resource# cleanup webip(清理webip资源)
Cleaning up webip on node1.magedu.com
Cleaning up webip on node2.magedu.com
Waiting for 3 replies from the CRMd... OK
crm(live)resource# cleanup httpd(清理httpd资源)
Cleaning up httpd on node1.magedu.com
Cleaning up httpd on node2.magedu.com
Waiting for 3 replies from the CRMd... OK
Waiting for 3 replies from the CRMd... OK
crm(live)resource# list(列出资源)
Resource Group: webservice
webip (ocf::heartbeat:IPaddr) Stopped
httpd (lsb:httpd) Stopped
crm(live)resource# cd(切换到上级目录)
crm(live)# node(切换到node模式)
crm(live)node# clearstate node1.magedu.com(清理节点1状态)
Do you really want to drop state for node node1.magedu.com? y
crm(live)node# clearstate node2.magedu.com(清理节点2状态)
Do you really want to drop state for node node2.magedu.com? y
crm(live)node# cd(切换到上级目录)
crm(live)# resource(进入resource模式)
crm(live)resource# start webservice(启动组资源webservice)
crm(live)resource# list(列出资源)
Resource Group: webservice
webip (ocf::heartbeat:IPaddr) Stopped
httpd (lsb:httpd) Stopped
提示:启动不了,故障了;
HA1:
[root@node1 ~]# crm status(查看集群状态) ============ Last updated: Fri Apr 22 06:39:49 2016 Stack: openais Current DC: node2.magedu.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 1 Resources configured. ============ OFFLINE: [ node1.magedu.com node2.magedu.com ]
HA2:
crm(live)resource# help(帮助)
At this level resources may be managed.
All (or almost all) commands are implemented with the CRM tools
such as `crm_resource(8)`.
Available commands:
status show status of resources
start start a resource
stop stop a resource
restart restart a resource
promote promote a master-slave resource
demote demote a master-slave resource
manage put a resource into managed mode
unmanage put a resource into unmanaged mode
migrate migrate a resource to another node
unmigrate unmigrate a resource to another node
param manage a parameter of a resource
meta manage a meta attribute
utilization manage a utilization attribute
failcount manage failcounts
cleanup cleanup resource status
refresh refresh CIB from the LRM status
reprobe probe for resources not started by the CRM
quit exit the program
help show help
end go back one level
crm(live)resource# reprobe(重新探测资源)
Waiting for 1 replies from the CRMd. OK
crm(live)resource# refresh(刷新状态)
Waiting for 1 replies from the CRMd. OK
crm(live)resource# cd(切换到上级目录)
crm(live)# configure(进入configure模式)
crm(live)configure# show(显示资源配置)
node node1.magedu.com \
attributes standby="off"
node node2.magedu.com
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta is-managed="true"
group webservice webip httpd \
meta target-role="Started"
location cli-standby-webip webservice \
rule $id="cli-standby-rule-webip" -inf: #uname eq node2.magedu.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1461278068"
crm(live)configure# edit(编辑资源配置)
node node1.magedu.com
node node2.magedu.com
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta is-managed="true"
group webservice webip httpd \
meta target-role="Started"
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
crm(live)configure# verify(检查集群配置)
crm(live)configure# commit(提交)
crm(live)configure# show(显示配置)
node node1.magedu.com
node node2.magedu.com
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta is-managed="true"
group webservice webip httpd \
meta target-role="Started"
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
crm(live)configure# quit(退出)
bye
[root@node2 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 06:46:48 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
[root@node2 ~]# crm(进入crm的shell)
crm(live)# resource(进入resource模式)
crm(live)resource# migrate webservice(手动迁移资源)
WARNING: Creating rsc_location constraint 'cli-standby-webservice' with a score of -INFINITY for resource webservice on node1.magedu.com.
This will prevent webservice from running on node1.magedu.com until the constraint is removed using the 'crm_resource -U' command or
manually with cibadmin
This will be the case even if node1.magedu.com is the last node in the cluster
This message can be disabled with -Q
crm(live)resource# status(查看资源状态)
Resource Group: webservice
webip (ocf::heartbeat:IPaddr) Started
httpd (lsb:httpd) Started
HA1:
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 06:50:13 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node2.magedu.com
httpd (lsb:httpd): Started node2.magedu.com
提示:启动到节点2上面了;
测试:通过Windows的ie浏览器访问172.16.100.1,正常访问;

HA2:
crm(live)resource# cd(切换到上级目录)
crm(live)# configure(进入configure模式)
crm(live)configure# edit(编辑配置)
node node1.magedu.com
node node2.magedu.com
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta is-managed="true"
group webservice webip httpd \
meta target-role="Started"
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
crm(live)configure# commit(提交)
crm(live)configure# exit(退出)
bye
[root@node2 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 06:54:45 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node2.magedu.com
httpd (lsb:httpd): Started node2.magedu.com
[root@node2 ~]# crm node standby(让当前节点转换为备节点)
[root@node2 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 06:55:45 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Node node2.magedu.com: standby
Online: [ node1.magedu.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
[root@node2 ~]# crm node online(让当前节点上线)
[root@node2 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 06:56:38 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
提示:没有迁移回来,还在节点1上,倾向于留在节点1上面,每个资源的粘性值为0,对于两个节点来讲粘性值都为0,是不会走,不像heartbeat配置了out_for_back,如果没有组想让两个
资源在一起只能使用位置约束了;
[root@node2 ~]# crm(进入crm的shell)
crm(live)# resource(进入resource模式)
crm(live)resource# stop webservice(停止webservice组资源)
crm(live)resource# cleanup webservice(清理组资源webservice)
Cleaning up webip on node1.magedu.com
Cleaning up webip on node2.magedu.com
Cleaning up httpd on node1.magedu.com
Cleaning up httpd on node2.magedu.com
Waiting for 5 replies from the CRMd..... OK
crm(live)resource# help(查看帮助)
At this level resources may be managed.
All (or almost all) commands are implemented with the CRM tools
such as `crm_resource(8)`.
Available commands:
status show status of resources
start start a resource
stop stop a resource
restart restart a resource
promote promote a master-slave resource
demote demote a master-slave resource
manage put a resource into managed mode
unmanage put a resource into unmanaged mode
migrate migrate a resource to another node
unmigrate unmigrate a resource to another node
param manage a parameter of a resource
meta manage a meta attribute
utilization manage a utilization attribute
failcount manage failcounts
cleanup cleanup resource status
refresh refresh CIB from the LRM status
reprobe probe for resources not started by the CRM
quit exit the program
help show help
end go back one level
crm(live)resource# cd(切换到上级目录)
crm(live)# configure(切换到configure模式)
crm(live)configure# help(帮助)
This level enables all CIB object definition commands.
The configuration may be logically divided into four parts:
nodes, resources, constraints, and (cluster) properties and
attributes. Each of these commands support one or more basic CIB
objects.
Nodes and attributes describing nodes are managed using the
`node` command.
Commands for resources are:
- `primitive`
- `monitor`
- `group`
- `clone`
- `ms`/`master` (master-slave)
There are three types of constraints:
- `location`
- `colocation`
- `order`
Finally, there are the cluster properties, resource meta
attributes defaults, and operations defaults. All are just a set
of attributes. These attributes are managed by the following
commands:
- `property`
- `rsc_defaults`
- `op_defaults`
In addition to the cluster configuration, the Access Control
Lists (ACL) can be setup to allow access to parts of the CIB for
users other than `root` and `hacluster`. The following commands
manage ACL:
- `user`
- `role`
The changes are applied to the current CIB only on ending the
configuration session or using the `commit` command.
Comments start with `#` in the first line. The comments are tied
to the element which follows. If the element moves, its comments
will follow.
Available commands:
node define a cluster node
primitive define a resource
monitor add monitor operation to a primitive
group define a group
clone define a clone
ms define a master-slave resource
location a location preference
colocation colocate resources
order order resources
property set a cluster property
rsc_defaults set resource defaults
role define role access rights
user define user access rights
op_defaults set resource operations defaults
show display CIB objects
edit edit CIB objects
filter filter CIB objects
delete delete CIB objects(删除集群信息库对象)
default-timeouts set timeouts for operations to minimums from the meta-data
rename rename a CIB object
refresh refresh from CIB
erase erase the CIB
ptest show cluster actions if changes were committed
cib CIB shadow management
cibstatus CIB status management and editing
template edit and import a configuration from a template
commit commit the changes to the CIB
verify verify the CIB with crm_verify
upgrade upgrade the CIB to version 1.0
save save the CIB to a file
load import the CIB from a file
xml raw xml
quit exit the program
help show help
end go back one level
crm(live)configure# delete webservice(删除组资源webservice)
crm(live)configure# show(显示配置)
node node1.magedu.com
node node2.magedu.com \
attributes standby="off"
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta is-managed="true"
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1461279728"
crm(live)configure# commit(提交资源)
crm(live)configure# show(显示配置)
node node1.magedu.com
node node2.magedu.com \
attributes standby="off"
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta is-managed="true"
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1461279728"
crm(live)configure# cd(切换到上级目录)
crm(live)# status(显示资源状态)
============
Last updated: Fri Apr 22 07:07:29 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node2.magedu.com
提示:两个资源又开始运行在不同的节点上了;
crm(live)# configure(进入configure模式)
crm(live)configure# help(帮助)
This level enables all CIB object definition commands.
The configuration may be logically divided into four parts:
nodes, resources, constraints, and (cluster) properties and
attributes. Each of these commands support one or more basic CIB
objects.
Nodes and attributes describing nodes are managed using the
`node` command.
Commands for resources are:
- `primitive`
- `monitor`
- `group`
- `clone`
- `ms`/`master` (master-slave)
There are three types of constraints:
- `location`
- `colocation`
- `order`
Finally, there are the cluster properties, resource meta
attributes defaults, and operations defaults. All are just a set
of attributes. These attributes are managed by the following
commands:
- `property`
- `rsc_defaults`
- `op_defaults`
In addition to the cluster configuration, the Access Control
Lists (ACL) can be setup to allow access to parts of the CIB for
users other than `root` and `hacluster`. The following commands
manage ACL:
- `user`
- `role`
The changes are applied to the current CIB only on ending the
configuration session or using the `commit` command.
Comments start with `#` in the first line. The comments are tied
to the element which follows. If the element moves, its comments
will follow.
Available commands:
node define a cluster node
primitive define a resource
monitor add monitor operation to a primitive
group define a group
clone define a clone
ms define a master-slave resource
location a location preference(位置约束)
colocation colocate resources(排列约束)
order order resources(顺序约束)
property set a cluster property
rsc_defaults set resource defaults
role define role access rights
user define user access rights
op_defaults set resource operations defaults
show display CIB objects
edit edit CIB objects
filter filter CIB objects
delete delete CIB objects
default-timeouts set timeouts for operations to minimums from the meta-data
rename rename a CIB object
refresh refresh from CIB
erase erase the CIB
ptest show cluster actions if changes were committed
cib CIB shadow management
cibstatus CIB status management and editing
template edit and import a configuration from a template
commit commit the changes to the CIB
verify verify the CIB with crm_verify
upgrade upgrade the CIB to version 1.0
save save the CIB to a file
load import the CIB from a file
xml raw xml
quit exit the program
help show help
end go back one level
crm(live)configure# help colocation(查看colocation的帮助)
This constraint expresses the placement relation between two
or more resources. If there are more than two resources, then the
constraint is called a resource set. Collocation resource sets have
an extra attribute to allow for sets of resources which don't depend
on each other in terms of state. The shell syntax for such sets is
to put resources in parentheses.
Usage:
...............
colocation <id>(名字) <score>(倾向性,分数): <rsc>[:<role>](角色) <rsc>[:<role>] ...
...............
Example:
...............
colocation dummy_and_apache -inf: apache dummy
colocation c1 inf: A ( B C )
...............
(END)
crm(live)configure# colocation httpd_with_webip inf: webip httpd(定义排列约束,名字httpd_with_webip,inf一定在一起,)
crm(live)configure# verify(检查语法)
crm(live)configure# show xml(查看xml格式配置)
<?xml version="1.0" ?>
<cib admin_epoch="0" cib-last-written="Fri Apr 22 01:08:19 2016" crm_feature_set="3.0.5" dc-uuid="node2.magedu.com" epoch="45" have-quorum="1"
num_updates="26" validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1461279728"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="node2.magedu.com" type="normal" uname="node2.magedu.com">
<instance_attributes id="nodes-node2.magedu.com">
<nvpair id="nodes-node2.magedu.com-standby" name="standby" value="off"/>
</instance_attributes>
</node>
<node id="node1.magedu.com" type="normal" uname="node1.magedu.com"/>
</nodes>
<resources>
<primitive class="ocf" id="webip" provider="heartbeat" type="IPaddr">
<instance_attributes id="webip-instance_attributes">
<nvpair id="webip-instance_attributes-ip" name="ip" value="172.16.100.1"/>
<nvpair id="webip-instance_attributes-nic" name="nic" value="eth0"/>
<nvpair id="webip-instance_attributes-cidr_netmask" name="cidr_netmask" value="16"/>
</instance_attributes>
<meta_attributes id="webip-meta_attributes">
<nvpair id="webip-meta_attributes-is-managed" name="is-managed" value="true"/>
</meta_attributes>
</primitive>
<primitive class="lsb" id="httpd" type="httpd">
<operations>
<op id="httpd-start-0" interval="0" name="start" timeout="20s"/>
</operations>
</primitive>
</resources>
<constraints>
<rsc_colocation id="httpd_with_webip" rsc="webip" score="INFINITY" with-rsc="httpd"/>(httpd运行到那,webip就运行到那里,不复合我们需求)
</constraints>
</configuration>
</cib>
(END)
crm(live)configure# edit(编辑配置)
node node1.magedu.com
node node2.magedu.com \
attributes standby="off"
primitive httpd lsb:httpd \
op start interval="0" timeout="20s"
primitive webip ocf:heartbeat:IPaddr \
params ip="172.16.100.1" nic="eth0" cidr_netmask="16" \
meta is-managed="true"
colocation httpd_with_webip inf: httpd webip
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1461279728"
crm(live)configure# show xml(查看xml格式配置)
<?xml version="1.0" ?>
<cib admin_epoch="0" cib-last-written="Fri Apr 22 01:08:19 2016" crm_feature_set="3.0.5" dc-uuid="node2.magedu.com" epoch="45" have-quorum="1"
num_updates="26" validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1461279728"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="node2.magedu.com" type="normal" uname="node2.magedu.com">
<instance_attributes id="nodes-node2.magedu.com">
<nvpair id="nodes-node2.magedu.com-standby" name="standby" value="off"/>
</instance_attributes>
</node>
<node id="node1.magedu.com" type="normal" uname="node1.magedu.com"/>
</nodes>
<resources>
<primitive class="ocf" id="webip" provider="heartbeat" type="IPaddr">
<instance_attributes id="webip-instance_attributes">
<nvpair id="webip-instance_attributes-ip" name="ip" value="172.16.100.1"/>
<nvpair id="webip-instance_attributes-nic" name="nic" value="eth0"/>
<nvpair id="webip-instance_attributes-cidr_netmask" name="cidr_netmask" value="16"/>
</instance_attributes>
<meta_attributes id="webip-meta_attributes">
<nvpair id="webip-meta_attributes-is-managed" name="is-managed" value="true"/>
</meta_attributes>
</primitive>
<primitive class="lsb" id="httpd" type="httpd">
<operations>
<op id="httpd-start-0" interval="0" name="start" timeout="20s"/>
</operations>
</primitive>
</resources>
<constraints>
<rsc_colocation id="httpd_with_webip" rsc="httpd" score="INFINITY" with-rsc="webip"/>(httpd要和webip在一起)
</constraints>
</configuration>
</cib>
(END)
crm(live)configure# commit(提交)
HA1:
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 07:17:52 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
crm(live)configure# help order(查看order的帮助)
This constraint expresses the order of actions on two resources
or more resources. If there are more than two resources, then the
constraint is called a resource set. Ordered resource sets have an
extra attribute to allow for sets of resources whose actions may run
in parallel. The shell syntax for such sets is to put resources in
parentheses.
Usage:
...............
order <id>(名字) score-type(分数类型): <rsc>[:<action>](动作) <rsc>[:<action>] ...
[symmetrical=<bool>]
score-type :: advisory(建议) | mandatory(必须) | <score>(分数)
...............
Example:
...............
order c_apache_1 mandatory: apache:start ip_1
order o1 inf: A ( B C )
...............
HA1:
[root@node1 ~]# crm resource(进入crm的shell的resource模式)
crm(live)resource# help
At this level resources may be managed.
All (or almost all) commands are implemented with the CRM tools
such as `crm_resource(8)`.
Available commands:
status show status of resources
start start a resource
stop stop a resource
restart restart a resource
promote promote a master-slave resource(尝试启动主从资源的资源)
demote demote a master-slave resource
manage put a resource into managed mode
unmanage put a resource into unmanaged mode
migrate migrate a resource to another node
unmigrate unmigrate a resource to another node
param manage a parameter of a resource
meta manage a meta attribute
utilization manage a utilization attribute
failcount manage failcounts
cleanup cleanup resource status
refresh refresh CIB from the LRM status
reprobe probe for resources not started by the CRM
quit exit the program
help show help
end go back one level
HA2:
crm(live)configure# order webip_before_httpd mandatory: webip httpd(定义顺序约束,名字webip_before_httpd,mandatory必须,)
crm(live)configure# show xml
<?xml version="1.0" ?>
<cib admin_epoch="0" cib-last-written="Fri Apr 22 01:08:19 2016" crm_feature_set="3.0.5" dc-uuid="node2.magedu.com" epoch="46" have-quorum="1"
num_updates="3" validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1461279728"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="node2.magedu.com" type="normal" uname="node2.magedu.com">
<instance_attributes id="nodes-node2.magedu.com">
<nvpair id="nodes-node2.magedu.com-standby" name="standby" value="off"/>
</instance_attributes>
</node>
<node id="node1.magedu.com" type="normal" uname="node1.magedu.com"/>
</nodes>
<resources>
<primitive class="lsb" id="httpd" type="httpd">
<operations>
<op id="httpd-start-0" interval="0" name="start" timeout="20s"/>
</operations>
</primitive>
<primitive class="ocf" id="webip" provider="heartbeat" type="IPaddr">
<instance_attributes id="webip-instance_attributes">
<nvpair id="webip-instance_attributes-ip" name="ip" value="172.16.100.1"/>
<nvpair id="webip-instance_attributes-nic" name="nic" value="eth0"/>
<nvpair id="webip-instance_attributes-cidr_netmask" name="cidr_netmask" value="16"/>
</instance_attributes>
<meta_attributes id="webip-meta_attributes">
<nvpair id="webip-meta_attributes-is-managed" name="is-managed" value="true"/>
</meta_attributes>
</primitive>
</resources>
<constraints>
<rsc_colocation id="httpd_with_webip" rsc="httpd" score="INFINITY" with-rsc="webip"/>
<rsc_order first="webip" id="webip_before_httpd" score="INFINITY" then="httpd"/>(先启动webip,再启动httpd,符合逻辑)
</constraints>
</configuration>
</cib>
(END)
crm(live)configure# commit(提交)
crm(live)configure# exit(退出)
bye
[root@node2 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 07:29:52 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
HA1:
crm(live)resource# exit(退出) bye [root@node1 ~]# crm node standby(让节点1切换为备用节点)
HA2:
[root@node2 ~]# crm_mon(监控集群) ============ Last updated: Fri Apr 22 07:32:04 2016 Stack: openais Current DC: node2.magedu.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Node node1.magedu.com: standby Online: [ node2.magedu.com ] webip (ocf::heartbeat:IPaddr): Started node2.magedu.com httpd (lsb:httpd): Started node2.magedu.com
HA1:
[root@node1 ~]# crm node online(让节点1上线)
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 07:33:07 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node2.magedu.com
httpd (lsb:httpd): Started node2.magedu.com
提示:如果期望资源更倾向运行在节点1上,可以定义位置约束;
[root@node1 ~]# crm configure(进入crm的shell的configure模式)
crm(live)configure# help location(查看location的帮助)
`location` defines the preference of nodes for the given
resource. The location constraints consist of one or more rules
which specify a score to be awarded if the rule matches.
Usage:
...............
location <id>(名字) <rsc>(资源) {node_pref(更倾向)|rules(规则)}
node_pref :: <score>: <node>(更倾向运行那个节点,而且对于这个节点的分数是什么)
rules ::
rule [id_spec](标识符id) [$role=<role>] <score>:(分数) <expression>(表达式)
[rule [id_spec] [$role=<role>] <score>: <expression> ...]
id_spec :: $id=<id> | $id-ref=<id>
score :: <number> | <attribute> | [-]inf
expression :: <simple_exp> [bool_op <simple_exp> ...]
bool_op :: or | and
simple_exp :: <attribute> [type:]<binary_op> <value>
| <unary_op> <attribute>
| date <date_expr>
type :: string | version | number
binary_op :: lt | gt | lte | gte | eq | ne(操作符)
unary_op :: defined | not_defined
date_expr :: lt <end>
| gt <start>
| in_range start=<start> end=<end>
| in_range start=<start> <duration>
| date_spec <date_spec>
duration|date_spec ::
hours=<value>
| monthdays=<value>
| weekdays=<value>
| yearsdays=<value>
| months=<value>
| weeks=<value>
| years=<value>
| weekyears=<value>
| moon=<value>
...............
Examples:
...............
location conn_1 internal_www 100: node1
location conn_1 internal_www \
rule 50: #uname eq node1 \(#执行命令)
rule pingd: defined pingd(ping节点)
location conn_2 dummy_float \
rule -inf: not_defined pingd or pingd number:lte 0
...............
(END)
crm(live)configure# location webip_on_node1 webip rule 100: #uname eq node1.magedu.com(定义位置约束,名字webip_on_node1,资源webip,规则,倾向性100,
uname等于node1.magedu.com)
crm(live)configure# show xml(查看xml类型配置)
<?xml version="1.0" ?>
<cib admin_epoch="0" cib-last-written="Fri Apr 22 01:08:19 2016" crm_feature_set="3.0.5" dc-uuid="node2.magedu.com" epoch="49" have-quorum="1"
num_updates="1" validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1461279728"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="node2.magedu.com" type="normal" uname="node2.magedu.com">
<instance_attributes id="nodes-node2.magedu.com">
<nvpair id="nodes-node2.magedu.com-standby" name="standby" value="off"/>
</instance_attributes>
</node>
<node id="node1.magedu.com" type="normal" uname="node1.magedu.com">
<instance_attributes id="nodes-node1.magedu.com">
<nvpair id="nodes-node1.magedu.com-standby" name="standby" value="off"/>
</instance_attributes>
</node>
</nodes>
<resources>
<primitive class="ocf" id="webip" provider="heartbeat" type="IPaddr">
<instance_attributes id="webip-instance_attributes">
<nvpair id="webip-instance_attributes-ip" name="ip" value="172.16.100.1"/>
<nvpair id="webip-instance_attributes-nic" name="nic" value="eth0"/>
<nvpair id="webip-instance_attributes-cidr_netmask" name="cidr_netmask" value="16"/>
</instance_attributes>
<meta_attributes id="webip-meta_attributes">
<nvpair id="webip-meta_attributes-is-managed" name="is-managed" value="true"/>
</meta_attributes>
</primitive>
<primitive class="lsb" id="httpd" type="httpd">
<operations>
<op id="httpd-start-0" interval="0" name="start" timeout="20s"/>
</operations>
</primitive>
</resources>
<constraints>
<rsc_colocation id="httpd_with_webip" rsc="httpd" score="INFINITY" with-rsc="webip"/>
<rsc_location id="webip_on_node1" rsc="webip">
<rule id="webip_on_node1-rule" score="100">()
<expression attribute="#uname" id="webip_on_node1-expression" operation="eq" value="node1.magedu.com"/>
</rule>
</rsc_location>
<rsc_order first="webip" id="webip_before_httpd" score="INFINITY" then="httpd"/>
</constraints>
</configuration>
</cib>
(END)
crm(live)configure# verify(检查配置)
crm(live)configure# commit(提交)
crm(live)configure# exit(退出)
bye
[root@node1 ~]# crm status(查看资源状态)
============
Last updated: Fri Apr 22 07:43:22 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
[root@node1 ~]# crm node standby(让当前节点称为备用节点)
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 07:44:02 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Node node1.magedu.com: standby
Online: [ node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node2.magedu.com
httpd (lsb:httpd): Started node2.magedu.com
[root@node1 ~]# crm node online(让当前节点上线)
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 07:44:32 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
提示:又回到节点1,因为对节点1分数是100,对节点2分数是0,没有定义资源默认粘性,如果一个资源的默认粘性大于它的倾向性,倾向性就没有倾向能力了;
[root@node1 ~]# crm configure(进入crm的shell的configure模式)
crm(live)configure# rsc_defaults resource-stickiness=200(设定每个资源的默认粘性为200)
crm(live)configure# verify(检查配置)
crm(live)configure# commit(提交)
crm(live)configure# exit(退出)
bye
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 07:49:40 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node1.magedu.com
httpd (lsb:httpd): Started node1.magedu.com
[root@node1 ~]# crm node standby(让当前节点称为备用节点)
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 07:50:07 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Node node1.magedu.com: standby
Online: [ node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node2.magedu.com
httpd (lsb:httpd): Started node2.magedu.com
[root@node1 ~]# crm node online(让当前节点上线)
[root@node1 ~]# crm status(查看集群状态)
============
Last updated: Fri Apr 22 07:51:15 2016
Stack: openais
Current DC: node2.magedu.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
webip (ocf::heartbeat:IPaddr): Started node2.magedu.com
httpd (lsb:httpd): Started node2.magedu.com
提示:粘性值大于倾向性不会转移回来的;
作业:添加第三个资源Filesystem,将web数据文件放到nfs中,还要定义几个约束,httpd必须要跟filesystem在一起,还要定义启动次序,先启动webip,再启动文件系统,再启动httpd;
[root@node1 ~]# crm(进入到crm的shell)
crm(live)# ra(进入ra模式)
crm(live)ra# meta ocf:heartbeat:Filesystem(查看ocf类型heartbeat提供商Filesystem资源的元数据)
Manages filesystem mounts (ocf:heartbeat:Filesystem)
Resource script for Filesystem. It manages a Filesystem on a
shared storage medium.
The standard monitor operation of depth 0 (also known as probe)
checks if the filesystem is mounted. If you want deeper tests,
set OCF_CHECK_LEVEL to one of the following values:
10: read first 16 blocks of the device (raw read)
This doesn't exercise the filesystem at all, but the device on
which the filesystem lives. This is noop for non-block devices
such as NFS, SMBFS, or bind mounts.
20: test if a status file can be written and read
The status file must be writable by root. This is not always the
case with an NFS mount, as NFS exports usually have the
"root_squash" option set. In such a setup, you must either use
read-only monitoring (depth=10), export with "no_root_squash" on
your NFS server, or grant world write permissions on the
directory where the status file is to be placed.
Parameters (* denotes required, [] the default):
device* (string): block device
The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.
directory* (string): mount point
The mount point for the filesystem.
fstype* (string): filesystem type
The type of filesystem to be mounted.
options (string):
Any extra options to be given as -o options to mount.
For bind mounts, add "bind" here and set fstype to "none".
We will do the right thing for options such as "bind,ro".
statusfile_prefix (string, [.Filesystem_status/]): status file prefix
The prefix to be used for a status file for resource monitoring
with depth 20. If you don't specify this parameter, all status
files will be created in a separate directory.
run_fsck (string, [auto]):
Specify how to decide whether to run fsck or not.
"auto" : decide to run fsck depending on the fstype(default)
"force" : always run fsck regardless of the fstype
"no" : do not run fsck ever.
fast_stop (boolean, [yes]): fast stop
Normally, we expect no users of the filesystem and the stop
operation to finish quickly. If you cannot control the filesystem
users easily and want to prevent the stop action from failing,
then set this parameter to "no" and add an appropriate timeout
for the stop operation.
Operations' defaults (advisory minimum):
start timeout=60
stop timeout=60
notify timeout=60
monitor interval=20 timeout=40
(END)
浙公网安备 33010602011771号