您好,登錄后才能下訂單哦!
最近測試了RedHat 6.5上RHCS,搭建了一個雙機HA群集,在此將配置過程和測試過程分享給大家,主要包括節點配置、群集管理服務器配置、群集創建與配置、群集測試等內容。
一、測試環境
計算機名 | 操作系統 | IP地址 | 群集IP | 安裝的軟件包 |
HAmanager | RedHat 6.5 | 192.168.10.150 | - | luci、iscsi target(用于仲裁盤) |
node1 | RedHat 6.5 | 192.168.10.104 | 192.168.10.103 | High Availability、httpd |
node2 | RedHat 6.5 | 192.168.10.105 | High Availability、httpd |
二、節點配置
1、在三臺機分別配置hosts互相解析
[root@HAmanager ~]# cat /etc/hosts
192.168.10.104 node1 node1.localdomain
192.168.10.105 node2 node2.localdomain
192.168.10.150 HAmanager HAmanager.localdomain
[root@node1 ~]# cat /etc/hosts
192.168.10.104 node1 node1.localdomain
192.168.10.105 node2 node2.localdomain
192.168.10.150 HAmanager HAmanager.localdomain
[root@node2 ~]# cat /etc/hosts
192.168.10.104 node1 node1.localdomain
192.168.10.105 node2 node2.localdomain
192.168.10.150 HAmanager HAmanager.localdomain
2、在三臺機分別配置SSH互信
[root@HAmanager ~]# ssh-keygen -t rsa
[root@HAmanager ~]# ssh-copy-id -i node1
[root@node1 ~]# ssh-keygen -t rsa
[root@node1 ~]# ssh-copy-id -i node2
[root@node2 ~]# ssh-keygen -t rsa
[root@node2 ~]# ssh-copy-id -i node1
3、兩個節點關閉NetworkManager和acpid服務
[root@node1 ~]# service NetworkManager stop
[root@node1 ~]# chkconfig NetworkManager off
[root@node1 ~]# service acpid stop
[root@node1 ~]# chkconfig acpid off
[root@node2 ~]# service NetworkManager stop
[root@node2 ~]# chkconfig NetworkManager off
[root@node2 ~]# service acpid stop
[root@node2 ~]# chkconfig acpid off
4、兩個節點配置本地yum源
[root@node1 ~]# cat/etc/yum.repos.d/rhel6.5.repo
[Server]
name=base
baseurl=file:///mnt/
enabled=1
gpgcheck=0
[HighAvailability]
name=base
baseurl=file:///mnt/HighAvailability
enabled=1
gpgcheck=0
[root@node2 ~]# cat/etc/yum.repos.d/rhel6.5.repo
[Server]
name=base
baseurl=file:///mnt/
enabled=1
gpgcheck=0
[HighAvailability]
name=base
baseurl=file:///mnt/HighAvailability
enabled=1
gpgcheck=0
5、兩個節點分別安裝群集軟件包
[root@node1 ~]# yum groupinstall 'High Availability' –y
Installed:
ccs.x86_64 0:0.16.2-69.el6 cman.x86_640:3.0.12.1-59.el6
omping.x86_64 0:0.0.4-1.el6 rgmanager.x86_640:3.0.12.1-19.el6
Dependency Installed:
cifs-utils.x86_640:4.8.1-19.el6 clusterlib.x86_64 0:3.0.12.1-59.el6
corosync.x86_640:1.4.1-17.el6 corosynclib.x86_64 0:1.4.1-17.el6
cyrus-sasl-md5.x86_640:2.1.23-13.el6_3.1 fence-agents.x86_64 0:3.1.5-35.el6
fence-virt.x86_640:0.2.3-15.el6 gnutls-utils.x86_64 0:2.8.5-10.el6_4.2
ipmitool.x86_64 0:1.8.11-16.el6 keyutils.x86_640:1.4-4.el6
libevent.x86_640:1.4.13-4.el6 libgssglue.x86_64 0:0.1-11.el6
libibverbs.x86_640:1.1.7-1.el6 librdmacm.x86_64 0:1.0.17-1.el6
libtirpc.x86_640:0.2.1-6.el6_4 libvirt-client.x86_64 0:0.10.2-29.el6
lm_sensors-libs.x86_640:3.1.1-17.el6 modcluster.x86_640:0.16.2-28.el6
nc.x86_64 0:1.84-22.el6 net-snmp-libs.x86_64 1:5.5-49.el6
net-snmp-utils.x86_641:5.5-49.el6 nfs-utils.x86_641:1.2.3-39.el6
nfs-utils-lib.x86_640:1.1.5-6.el6 numactl.x86_640:2.0.7-8.el6
oddjob.x86_64 0:0.30-5.el6 openais.x86_64 0:1.1.1-7.el6
openaislib.x86_640:1.1.1-7.el6 perl-Net-Telnet.noarch 0:3.03-11.el6
pexpect.noarch 0:2.3-6.el6 python-suds.noarch0:0.4.1-3.el6
quota.x86_64 1:3.17-20.el6 resource-agents.x86_640:3.9.2-40.el6
ricci.x86_64 0:0.16.2-69.el6 rpcbind.x86_640:0.2.0-11.el6
sg3_utils.x86_64 0:1.28-5.el6 tcp_wrappers.x86_640:7.6-57.el6
telnet.x86_64 1:0.17-47.el6_3.1 yajl.x86_64 0:1.0.7-3.el6
Complete!
[root@node2 ~]# yum groupinstall 'High Availability' –y
Installed:
ccs.x86_64 0:0.16.2-69.el6 cman.x86_640:3.0.12.1-59.el6
omping.x86_64 0:0.0.4-1.el6 rgmanager.x86_640:3.0.12.1-19.el6
Dependency Installed:
cifs-utils.x86_640:4.8.1-19.el6 clusterlib.x86_64 0:3.0.12.1-59.el6
corosync.x86_640:1.4.1-17.el6 corosynclib.x86_64 0:1.4.1-17.el6
cyrus-sasl-md5.x86_640:2.1.23-13.el6_3.1 fence-agents.x86_64 0:3.1.5-35.el6
fence-virt.x86_640:0.2.3-15.el6 gnutls-utils.x86_64 0:2.8.5-10.el6_4.2
ipmitool.x86_64 0:1.8.11-16.el6 keyutils.x86_640:1.4-4.el6
libevent.x86_640:1.4.13-4.el6 libgssglue.x86_64 0:0.1-11.el6
libibverbs.x86_640:1.1.7-1.el6 librdmacm.x86_64 0:1.0.17-1.el6
libtirpc.x86_640:0.2.1-6.el6_4 libvirt-client.x86_64 0:0.10.2-29.el6
lm_sensors-libs.x86_640:3.1.1-17.el6 modcluster.x86_640:0.16.2-28.el6
nc.x86_64 0:1.84-22.el6 net-snmp-libs.x86_64 1:5.5-49.el6
net-snmp-utils.x86_641:5.5-49.el6 nfs-utils.x86_641:1.2.3-39.el6
nfs-utils-lib.x86_640:1.1.5-6.el6 numactl.x86_640:2.0.7-8.el6
oddjob.x86_64 0:0.30-5.el6 openais.x86_64 0:1.1.1-7.el6
openaislib.x86_640:1.1.1-7.el6 perl-Net-Telnet.noarch 0:3.03-11.el6
pexpect.noarch 0:2.3-6.el6 python-suds.noarch0:0.4.1-3.el6
quota.x86_64 1:3.17-20.el6 resource-agents.x86_640:3.9.2-40.el6
ricci.x86_64 0:0.16.2-69.el6 rpcbind.x86_640:0.2.0-11.el6
sg3_utils.x86_64 0:1.28-5.el6 tcp_wrappers.x86_640:7.6-57.el6
telnet.x86_64 1:0.17-47.el6_3.1 yajl.x86_64 0:1.0.7-3.el6
Complete!
6、兩個節點分別啟動群集服務
[root@node1 ~]# service ricci start
[root@node1 ~]# chkconfig ricci on
[root@node1 ~]# chkconfig cman on
[root@node1 ~]# chkconfig rgmanager on
[root@node2 ~]# service ricci start
[root@node2 ~]# chkconfig ricci on
[root@node2 ~]# chkconfig cman on
[root@node2 ~]# chkconfig rgmanager on
7、兩個節點分別配置ricci密碼
[root@node1 ~]# passwd ricci
New password:
BAD PASSWORD: it is too short
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
[root@node2 ~]# passwd ricci
New password:
BAD PASSWORD: it is too short
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
8、兩個節點分別安裝httpd服務,方便后面測試應用的高可用性
[root@node1 ~]# yum -y install httpd
[root@node1 ~]# echo "This is Node1" > /var/www/html/index.html
[root@node2 ~]# yum -y install httpd
[root@node2 ~]# echo "This is Node2" > /var/www/html/index.html
二、群集管理服務器配置
1、在群集管理服務器安裝luci軟件包
[root@HAmanager ~]#yum -y install luci
Installed:
luci.x86_64 0:0.26.0-48.el6
Dependency Installed:
TurboGears2.noarch 0:2.0.3-4.el6
python-babel.noarch 0:0.9.4-5.1.el6
python-beaker.noarch 0:1.3.1-7.el6
python-cheetah.x86_64 0:2.4.1-1.el6
python-decorator.noarch 0:3.0.1-3.1.el6
python-decoratortools.noarch 0:1.7-4.1.el6
python-formencode.noarch 0:1.2.2-2.1.el6
python-genshi.x86_64 0:0.5.1-7.1.el6
python-mako.noarch 0:0.3.4-1.el6
python-markdown.noarch 0:2.0.1-3.1.el6
python-markupsafe.x86_64 0:0.9.2-4.el6
python-myghty.noarch 0:1.1-11.el6
python-nose.noarch 0:0.10.4-3.1.el6
python-paste.noarch 0:1.7.4-2.el6
python-paste-deploy.noarch 0:1.3.3-2.1.el6
python-paste-script.noarch 0:1.7.3-5.el6_3
python-peak-rules.noarch 0:0.5a1.dev-9.2582.1.el6
python-peak-util-addons.noarch 0:0.6-4.1.el6
python-peak-util-assembler.noarch 0:0.5.1-1.el6
python-peak-util-extremes.noarch 0:1.1-4.1.el6
python-peak-util-symbols.noarch 0:1.0-4.1.el6
python-prioritized-methods.noarch 0:0.2.1-5.1.el6
python-pygments.noarch 0:1.1.1-1.el6
python-pylons.noarch 0:0.9.7-2.el6
python-repoze-tm2.noarch 0:1.0-0.5.a4.el6
python-repoze-what.noarch 0:1.0.8-6.el6
python-repoze-what-pylons.noarch 0:1.0-4.el6
python-repoze-who.noarch 0:1.0.18-1.el6
python-repoze-who-friendlyform.noarch 0:1.0-0.3.b3.el6
python-repoze-who-testutil.noarch 0:1.0-0.4.rc1.el6
python-routes.noarch 0:1.10.3-2.el6
python-setuptools.noarch 0:0.6.10-3.el6
python-sqlalchemy.noarch 0:0.5.5-3.el6_2
python-tempita.noarch 0:0.4-2.el6
python-toscawidgets.noarch 0:0.9.8-1.el6
python-transaction.noarch 0:1.0.1-1.el6
python-turbojson.noarch 0:1.2.1-8.1.el6
python-weberror.noarch 0:0.10.2-2.el6
python-webflash.noarch 0:0.1-0.2.a9.el6
python-webhelpers.noarch 0:0.6.4-4.el6
python-webob.noarch 0:0.9.6.1-3.el6
python-webtest.noarch 0:1.2-2.el6
python-zope-filesystem.x86_64 0:1-5.el6
python-zope-interface.x86_64 0:3.5.2-2.1.el6
python-zope-sqlalchemy.noarch 0:0.4-3.el6
Complete!
[root@HAmanager ~]#
2、啟動luci服務
[root@HAmanager ~]# service luci start
Adding following auto-detected host IDs (IP addresses/domain names), corresponding to `HAmanager.localdomain' address, to the configuration of self-managed certificate `/var/lib/luci/etc/cacert.config' (you can change them by editing `/var/lib/luci/etc/cacert.config', removing the generated certificate `/var/lib/luci/certs/host.pem' and restarting luci):
(none suitable found, you can still do it manually as mentioned above)
Generating a 2048 bit RSA private key
writing new private key to '/var/lib/luci/certs/host.pem'
Starting saslauthd: [ OK ]
Start luci...
Point your web browser to https://HAmanager.localdomain:8084 (or equivalent) to access luci
[root@HAmanager ~]# chkconfig luci on
江健龍的技術博客http://jiangjianlong.blog.51cto.com/3735273/1931499
三、創建和配置群集
1、使用瀏覽器訪問HA的web管理界面https://192.168.10.150:8084
2、創建群集并添加節點至群集中
3、添加vCenter為fence設備
4、查找節點的虛擬機UUID
[root@node1 ~]# fence_vmware_soap -a 192.168.10.91 -z -l administrator@vsphere.local -p P@ssw0rd -o list
node1,564df192-7755-9cd6-8a8b-45d6d74eabbb
node2,564df4ed-cda1-6383-bbf5-f99807416184
5、兩個節點添加fence方法和實例
6、查看fence設備狀態
[root@node1 ~]# fence_vmware_soap -a 192.168.10.91 -z -l administrator@vsphere.local -p P@ssw0rd -o status
Status: ON
7、測試fence設備
[root@node2 ~]# fence_check
fence_check run at Tue May 23 09:41:30 CST 2017 pid: 3455
Testing node1.localdomain method 1: success
Testing node2.localdomain method 1: success
8、創建故障域
9、添加群集資源,分別添加IP地址和腳本為群集資源
10、創建群集服務組并添加已有的資源
11、配置仲裁盤,在HAmanager服務器安裝iSCSI target服務并創建一塊100M的共享磁盤給兩個節點
[root@HAmanager ~]#yum install scsi-target-utils -y
[root@HAmanager ~]#dd if=/dev/zero of=/iSCSIdisk/100m.img bs=1M seek=100 count=0
[root@HAmanager ~]#vi /etc/tgt/targets.conf
<target iqn.2016-08.disk.rh7:disk100m>
backing-store /iSCSIdisk/100m.img
initiator-address 192.168.10.104 #for node1
initiator-address 192.168.10.105 #for node2
</target>
[root@HAmanager ~]#service tgtd start
[root@HAmanager ~]#chkconfig tgtd on
[root@HAmanager ~]#tgt-admin –show
Target 1: iqn.2016-08.disk.rh7:disk100m
System information:
Driver: iscsi
State: ready
I_T nexus information:
LUN information:
LUN: 0
Type: controller
SCSI ID: IET 00010000
SCSI SN: beaf10
Size: 0 MB, Block size: 1
Online: Yes
Removable media: No
Prevent removal: No
Readonly: No
Backing store type: null
Backing store path: None
Backing store flags:
LUN: 1
Type: disk
SCSI ID: IET 00010001
SCSI SN: beaf11
Size: 105 MB, Block size: 512
Online: Yes
Removable media: No
Prevent removal: No
Readonly: No
Backing store type: rdwr
Backing store path: /sharedisk/100m.img
Backing store flags:
Account information:
ACL information:
192.168.10.104
192.168.10.105
[root@HAmanager ~]#
12、兩個節點安裝iscsi-initiator-utils并登錄iscsi目標
[root@node1 ~]# yum install iscsi-initiator-utils
[root@node1 ~]# chkconfig iscsid on
[root@node1 ~]# iscsiadm -m discovery -t sendtargets -p 192.168.10.150
[root@node1 ~]# iscsiadm -m node
[root@node1 ~]# iscsiadm -m node -T iqn.2016-08.disk.rh7:disk100m --login
[root@node2 ~]# yum install iscsi-initiator-utils
[root@node2 ~]# chkconfig iscsid on
[root@node2 ~]# iscsiadm -m discovery -t sendtargets -p 192.168.10.150
[root@node2 ~]# iscsiadm -m node
[root@node2 ~]# iscsiadm -m node -T iqn.2016-08.disk.rh7:disk100m --login
13、在節點一將共享磁盤/dev/sdb創建分區sdb1
[root@node1 ~]# fdisk /dev/sdb
然后創建成sdb1
[root@node1 ~]# partprobe /dev/sdb1
14、在節點一將sdb1創建成仲裁盤
[root@node1 ~]# mkqdisk -c /dev/sdb1 -l testqdisk
mkqdisk v3.0.12.1
Writing new quorum disk label 'testqdisk' to /dev/sdb1.
WARNING: About to destroy all data on /dev/sdb1; proceed [N/y] ? y
Initializing status block for node 1...
Initializing status block for node 2...
Initializing status block for node 3...
Initializing status block for node 4...
Initializing status block for node 5...
Initializing status block for node 6...
Initializing status block for node 7...
Initializing status block for node 8...
Initializing status block for node 9...
Initializing status block for node 10...
Initializing status block for node 11...
Initializing status block for node 12...
Initializing status block for node 13...
Initializing status block for node 14...
Initializing status block for node 15...
Initializing status block for node 16...
[root@node1 ~]#
[root@node1 ~]# mkqdisk -L
mkqdisk v3.0.12.1
/dev/block/8:17:
/dev/disk/by-id/scsi-1IET_00010001-part1:
/dev/disk/by-path/ip-192.168.10.150:3260-iscsi-iqn.2016-08.disk.rh7:disk100m-lun-1-part1:
/dev/sdb1:
Magic: eb7a62c2
Label: testqdisk
Created: Mon May 22 22:52:01 2017
Host: node1.localdomain
Kernel Sector Size: 512
Recorded Sector Size: 512
[root@node1 ~]#
15、在節點二查看仲裁盤,也正常識別
[root@node2 ~]# partprobe /dev/sdb1
[root@node2 ~]# mkqdisk -L
mkqdisk v3.0.12.1
/dev/block/8:17:
/dev/disk/by-id/scsi-1IET_00010001-part1:
/dev/disk/by-path/ip-192.168.10.150:3260-iscsi-iqn.2016-08.disk.rh7:disk100m-lun-1-part1:
/dev/sdb1:
Magic: eb7a62c2
Label: testqdisk
Created: Mon May 22 22:52:01 2017
Host: node1.localdomain
Kernel Sector Size: 512
Recorded Sector Size: 512
16、配置群集使用該仲裁盤
17、重啟群集,使仲裁盤生效
[root@node1 ~]# ccs -h node1 --stopall
node1 password:
Stopped node2.localdomain
Stopped node1.localdomain
[root@node1 ~]# ccs -h node1 --startall
Started node2.localdomain
Started node1.localdomain
[root@node1 ~]#
18、查看群集狀態
[root@node1 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1.localdomain 1 Online, Local, rgmanager
node2.localdomain 2 Online, rgmanager
/dev/block/8:17 0 Online, Quorum Disk
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:TestServGrp node1.localdomain started
[root@node1 ~]#
19、查看群集節點狀態
[root@node1 ~]# ccs_tool lsnode
Cluster name: icpl_cluster, config_version: 21
Nodename Votes Nodeid Fencetype
node1.localdomain 1 1 vcenter_fence
node2.localdomain 1 2 vcenter_fence
20、查看群集節點同步狀態
[root@node1 ~]# ccs -h node1 --checkconf
All nodes in sync.
21、使用群集IP訪問web服務
江健龍的技術博客http://jiangjianlong.blog.51cto.com/3735273/1931499
四、群集故障轉移測試
1、關閉主節點,故障自動轉移功能正常
[root@node1 ~]#poweroff
[root@node1 ~]#tail–f /var/log/messages
May 23 10:29:26 node1 modclusterd: shutdown succeeded
May 23 10:29:26 node1 rgmanager[2125]: Shutting down
May 23 10:29:26 node1 rgmanager[2125]: Shutting down
May 23 10:29:26 node1 rgmanager[2125]:Stopping service service:TestServGrp
May 23 10:29:27 node1 rgmanager[2125]: [ip] Removing IPv4 address 192.168.10.103/24 from eth0
May 23 10:29:36 node1rgmanager[2125]: Service service:TestServGrp is stopped
May 23 10:29:36 node1 rgmanager[2125]: Disconnecting from CMAN
May 23 10:29:52 node1 rgmanager[2125]: Exiting
May 23 10:29:53 node1 ricci:shutdown succeeded
May 23 10:29:54 node1 oddjobd: oddjobd shutdown succeeded
May 23 10:29:54 node1 saslauthd[2315]:server_exit : master exited: 2315
[root@node2 ~]#tail–f /var/log/messages
May 23 10:29:45 node2 rgmanager[2130]: Member 1 shutting down
May 23 10:29:45 node2 rgmanager[2130]: Starting stopped service service:TestServGrp
May 23 10:29:45 node2 rgmanager[5688]: [ip] Adding IPv4 address 192.168.10.103/24 to eth0
May 23 10:29:49 node2 rgmanager[2130]: Service service:TestServGrp started
May 23 10:30:06 node2 qdiskd[1480]: Node 1 shutdown
May 23 10:30:06 node2 corosync[1437]: [QUORUM Members[1]: 2
May 23 10:30:06 node2 corosync[1437]: [TOTEM ] A processor joined or left the membership
and a new membership was formed.
May 23 10:30:06 node2 corosync[1437]: [CPG ] chosen downlist: sender r(0) ip (192.168.10.105) :
members(old:2 left:1)
May 23 10:30:06 node2 corosync[1437]: [MAIN ] Completed service synchronization, ready to
provide service
May 23 10:30:06 node2 kernel: dlm: closing connection to node 1
May 23 10:30:06 node2 qdiskd[1480]: Assuming master role
[root@node2 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1.localdomain 1 Online, Local, rgmanager
node2.localdomain 2 Online, rgmanager
/dev/block/8:17 0 Online, Quorum Disk
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:TestServGrp node2.localdomain started
[root@node2 ~]#
2、停掉主節點應用服務,故障自動轉移功能正常
[root@node2 ~]# /etc/init.d/httpd stop
[root@node2 ~]#tail–f /var/log/messages
May 23 11:14:02 node2 rgmanager[11264]: [script] Executing /etc/init.d/httpd status
May 23 11:14:02 node2 rgmanager[11289]: [script] script:icpl: status of /etc/init.d/httpd failed (returned 3)
May 23 11:14:02 node2 rgmanager[2127]: status on script "httpd" returned 1 (generic error)
May 23 11:14:02 node2 rgmanager[2127]: Stopping service service:TestServGrp
May 23 11:14:03 node2 rgmanager[11320]: [script] Executing /etc/init.d/httpd stop
May 23 11:14:03 node2 rgmanager[11384]: [ip] Removing IPv4 address 192.168.10.103/24 from eth0
May 23 11:14:08 node2 ricci[11416]: Executing '/usr/bin/virsh nodeinfo'
May 23 11:14:08 node2 ricci[11418]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/2116732044'
May 23 11:14:09 node2 ricci[11422]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1193918332'
May 23 11:14:13 node2 rgmanager[2127]: Service service:TestServGrp is recovering
May 23 11:14:17 node2 rgmanager[2127]: Service service:TestServGrp is now running on member 1
[root@node1 ~]#tail–f /var/log/messages
May 23 11:14:20 node1 rgmanager[2130]: Recovering failed service service:TestServGrp
May 23 11:14:20 node1 rgmanager[13006]: [ip] Adding IPv4 address 192.168.10.103/24 to eth0
May 23 11:14:24 node1 rgmanager[13092]: [script] Executing /etc/init.d/httpd start
May 23 11:14:24 node1 rgmanager[2130]: Service service:TestServGrp started
May 23 11:14:58 node1 rgmanager[13280]: [script] Executing /etc/init.d/httpd status
[root@node1 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1.localdomain 1 Online, Local, rgmanager
node2.localdomain 2 Online, rgmanager
/dev/block/8:17 0 Online, Quorum Disk
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:TestServGrp node1.localdomain started
[root@node1 ~]#
3、停掉主節點網絡服務,故障自動轉移功能正常
[root@node1 ~]#service network stop
[root@node2 ~]#tail–f /var/log/messages
May 23 22:11:16 node2 qdiskd[1480]: Assuming master role
May 23 22:11:17 node2 qdiskd[1480]: Writing eviction notice for node 1
May 23 22:11:17 node2 corosync[1437]: [TOTEM ] A processor failed, forming new configuration.
May 23 22:11:18 node2 qdiskd[1480]: Node 1 evicted
May 23 22:11:19 node2 corosync[1437]: [QUORUM] Members[1]: 2
May 23 22:11:19 node2 corosync[1437]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 23 22:11:19 node2 corosync[1437]: [CPG ] chosen downlist: sender r(0) ip(192.168.10.105) ; members(old:2 left:1)
May 23 22:11:19 node2 corosync[1437]: [MAIN ] Completed service synchronization, ready to provide service.
May 23 22:11:19 node2 kernel: dlm: closing connection to node 1
May 23 22:11:19 node2 rgmanager[2131]: State change: node1.localdomain DOWN
May 23 22:11:19 node2 fenced[1652]: fencing node1.localdomain
May 23 22:11:58 node2 fenced[1652]: fence node1.localdomain success
May 23 22:11:59 node2 rgmanager[2131]: Taking over service service:TestServGrp from down member node1.localdomain
May 23 22:11:59 node2 rgmanager[6145]: [ip] Adding IPv4 address 192.168.10.103/24 to eth0
May 23 22:12:03 node2 rgmanager[6234]: [script] Executing /etc/init.d/httpd start
May 23 22:12:03 node2 rgmanager[2131]: Service service:TestServGrp started
May 23 22:12:35 node2 corosync[1437]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 23 22:12:35 node2 corosync[1437]: [QUORUM] Members[2]: 1 2
May 23 22:12:35 node2 corosync[1437]: [QUORUM] Members[2]: 1 2
May 23 22:12:35 node2 corosync[1437]: [CPG ] chosen downlist: sender r(0) ip(192.168.10.105) ; members(old:1 left:0)
May 23 22:12:35 node2 corosync[1437]: [MAIN ] Completed service synchronization, ready to provide service.
May 23 22:12:41 node2 rgmanager[6425]: [script] Executing /etc/init.d/httpd status
May 23 22:12:43 node2 qdiskd[1480]: Node 1 shutdown
May 23 22:12:55 node2 kernel: dlm: got connection from 1
May 23 22:13:08 node2 rgmanager[2131]: State change: node1.localdomain UP
[root@node2 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1.localdomain 1 Online, Local, rgmanager
node2.localdomain 2 Online, rgmanager
/dev/block/8:17 0 Online, Quorum Disk
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:TestServGrp node2.localdomain started
[root@node2 ~]#
附:RHCS名詞解釋
1 分布式集群管理器(CMAN,Cluster Manager)
管理集群成員,了解成員之間的運行狀態。
2 分布式鎖管理器(DLM,Distributed Lock Manager)
每一個節點都運行了一個后臺進程DLM,當用記操作一個元數據時,會通知其它節點,只能讀取這個元數據。
3 配置文件管理(CCS,Cluster Configuration System)
主要用于集群配置文件管理,用于配置文件的同步。每個節點運行了CSS后臺進程。當發現配置文件(/etc/cluster/cluster.conf)變化后,馬上將此變化傳播到其它節點上去。
4.fence設備(fence)
工作原理:當主機異常,務機會調用fence設備,然后將異常主機重啟,當fence設備操作成功后,返回信息給備機,備機接到fence設備的消息后,接管主機的服務和資源。
5、Conga集群管理軟件:
Conga由兩部分組成:luci和ricci,luci是跑在集群管理服務器上的服務,而ricci則是跑在各集群節點上的服務,luci也可以裝在節點上。集群的管理和配置由這兩個服務進行通信,可以使用Conga的web界面來管理RHCS集群。
6、高可用性服務管理(rgmanager)
提供節點服務監控和服務故障轉移功能,當一個節點服務出現故障時,將服務轉移到另一個健康節點。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。