您好,登錄后才能下訂單哦!
這篇文章主要介紹了MySQL高可用架構之MHA的原理分析,具有一定借鑒價值,感興趣的朋友可以參考下,希望大家閱讀完這篇文章之后大有收獲,下面讓小編帶著大家一起了解一下。
MHA角色部署MHA 服務有兩種角色,MHA Manager(管理節點)和MHA Node(數據節點): MHA Manager:通常單獨部署在一臺獨立的機器上或者直接部署在其中一臺slave上(不建議后者),管理多個master/slave集群,每個master/slave集群稱作一個application;其作用有二: (1)master自動切換及故障轉移命令運行 (2)其他的幫助腳本運行:手動切換master;master/slave狀態檢測 MHA node:運行在每臺MySQL服務器上(master/slave/manager),它通過監控具備解析和清理logs功能的腳本來加快故障轉移。其作用有: (1)復制主節點的binlog數據 (2)對比從節點的中繼日志文件 (3)無需停止從節點的SQL線程,定時刪除中繼日志
目前MHA主要支持一主多從的架構,要搭建MHA,要求一個復制集群中必須最少有三臺數據庫服務器,一主二從,即一臺充當master,一臺充當備用master,另外一臺充當從庫,因為至少需要三臺服務器,出于機器成本的考慮,淘寶也在該基礎上進行了改造,目前淘寶TMHA已經支持一主一從。 我們自己使用其實也可以使用1主1從,但是master主機宕機后無法切換,以及無法補全binlog。master的mysqld進程crash后,還是可以切換成功,以及補全binlog的。 官方介紹: https://code.google.com/p/mysql-master-ha/ 下圖展示了如何通過MHA Manager管理多組主從復制。可以將MHA工作原理總結為如下: (1)從宕機崩潰的master保存二進制日志事件(binlog events); (2)識別含有最新更新的slave; (3)應用差異的中繼日志(relay log)到其他的slave; (4)應用從master保存的二進制日志事件(binlog events); (5)提升一個slave為新的master; (6)使其他的slave連接新的master進行復制; MHA組件(1)、 Manager工具: – masterha_check_ssh : 檢查MHA的SSH配置。 – masterha_check_repl : 檢查MySQL復制。 – masterha_manager : 啟動MHA。 – masterha_check_status : 檢測當前MHA運行狀態。 – masterha_master_monitor : 監測master是否宕機。 – masterha_master_switch : 控制故障轉移(自動或手動)。 – masterha_conf_host : 添加或刪除配置的server信息。
(2)、 Node工具(這些工具通常由MHAManager的腳本觸發,無需人手操作)。 – save_binary_logs : 保存和復制master的二進制日志。 – apply_diff_relay_logs : 識別差異的中繼日志事件并應用于其它slave。 – filter_mysqlbinlog : 去除不必要的ROLLBACK事件(MHA已不再使用這個工具)。 – purge_relay_logs : 清除中繼日志(不會阻塞SQL線程)。
(3)、自定義擴展: -secondary_check_script:通過多條網絡路由檢測master的可用性; -master_ip_failover_script:更新application使用的masterip; (需要修改) -shutdown_script:強制關閉master節點; -report_script:發送報告; -init_conf_load_script:加載初始配置參數; -master_ip_online_change:更新master節點ip地址;(需要修改)
MHA環境準備OS:CentOS 6.8 MySQL :5.7.18 MHA 軟件包: MHA 0.57 角色 ip地址 主機名 server_id "類型 " 安裝MHA Node 包(1)在所有節點安裝MHA node所需的perl模塊(DBD:mysql),并下載MHA 軟件包 ? 12yum install perl-DBD-MySQL -y (可能需要epel源)https://mega.nz/#F!G4oRjARB!SWzFS59bUv9VrKwdAeIGVw (MHA0.57) (2)在所有的節點安裝mha node(包括Manager 節點): tar xf mha4mysql-node-0.57.tar.gz cd mha4mysql-node-0.57perl Makefile.PLmake && make install 安裝完成將產生文件如下: [root@MHA-S1 bin]# ll total 48-r-xr-xr-x 1 root root 16381 Aug 7 14:06 apply_diff_relay_logs-r-xr-xr-x 1 root root 4807 Aug 7 14:06 filter_mysqlbinlog lrwxrwxrwx 1 root root 26 Aug 8 17:10 mysql -> /usr/local/mysql/bin/mysql lrwxrwxrwx 1 root root 32 Aug 8 17:09 mysqlbinlog -> /usr/local/mysql/bin/mysqlbinlog-r-xr-xr-x 1 root root 8261 Aug 7 14:06 purge_relay_logs-rwxr-xr-x 1 root root 314 Aug 8 16:21 purge_relay.sh-r-xr-xr-x 1 root root 7525 Aug 7 14:06 save_binary_logs [root@MHA-S1 bin]# pwd/usr/local/bin 增加系統環境變量: echo "export PATH=\$PATH:/usr/local/bin" >> /etc/profile source ~/.bash_profile 安裝MHA Manager 包tar xf mha4mysql-node-0.57.tar.gz cd mha4mysql-node-0.57perl Makefile.PLmake && make install 安裝完成后會在/usr/local/bin目錄下面生成以下腳本文件 [root@MHA-S2 bin]# pwd/usr/local/bin [root@MHA-S2 bin]# ll total 140-r-xr-xr-x 1 root root 16381 Aug 7 14:07 apply_diff_relay_logs-r-xr-xr-x 1 root root 4807 Aug 7 14:07 filter_mysqlbinlog-rwxr-xr-x 1 root root 166 Aug 9 17:18 manager.sh-r-xr-xr-x 1 root root 1995 Aug 7 17:28 masterha_check_repl-r-xr-xr-x 1 root root 1779 Aug 7 17:28 masterha_check_ssh-r-xr-xr-x 1 root root 1865 Aug 7 17:28 masterha_check_status-r-xr-xr-x 1 root root 3201 Aug 7 17:28 masterha_conf_host-r-xr-xr-x 1 root root 2517 Aug 7 17:28 masterha_manager-r-xr-xr-x 1 root root 2165 Aug 7 17:28 masterha_master_monitor-r-xr-xr-x 1 root root 2373 Aug 7 17:28 masterha_master_switch-r-xr-xr-x 1 root root 5171 Aug 7 17:28 masterha_secondary_check-r-xr-xr-x 1 root root 1739 Aug 7 17:28 masterha_stop-rwxr-xr-x 1 root root 2169 Aug 9 10:49 master_ip_failover-rwxr-xr-x 1 root root 3648 Aug 7 17:30 master_ip_failover.old-rwxr-xr-x 1 root root 10369 Aug 12 21:33 master_ip_online_change-rwxr-xr-x 1 root root 9870 Aug 7 17:30 master_ip_online_change.old lrwxrwxrwx 1 root root 26 Aug 8 17:10 mysql -> /usr/local/mysql/bin/mysql lrwxrwxrwx 1 root root 32 Aug 8 17:09 mysqlbinlog -> /usr/local/mysql/bin/mysqlbinlog-rw------- 1 root root 0 Aug 12 20:04 nohup.out-rwxr-xr-x 1 root root 11867 Aug 7 17:30 power_manager-r-xr-xr-x 1 root root 8261 Aug 7 14:07 purge_relay_logs-rwxr-xr-x 1 root root 314 Aug 8 16:20 purge_relay.sh-r-xr-xr-x 1 root root 7525 Aug 7 14:07 save_binary_logs-rwxr-xr-x 1 root root 1360 Aug 7 17:30 send_report 復制相關腳本到/usr/local/bin目錄(軟件包解壓縮后就有了,不是必須,因為這些腳本不完整,需要自己修改,這是軟件開發著留給我們自己發揮的,如果開啟下面的任何一個腳本對應的參數,而對應這里的腳本又沒有修改,則會拋錯,自己被坑的很慘) [root@MHA-S2 scripts]# ll 配置SSH登錄無密碼驗證 ssh-keygenssh-copy-id root@xxx (XXX 請包括自己,要不然后面check-ssh那步要杯具的) 搭建主從復制環境 詳解之前雙主復制環境搭建文檔 保證兩臺Slave都搭建成功 Slave_IO_Running: Yes Slave_SQL_Running: Yes 兩臺slave服務器設置read_only(從庫對外提供讀服務,只所以沒有寫進配置文件,是因為隨時slave會提升為master) root@localhost:mysql3306.sock [(none)]>set global read_only=1 創建監控用戶(在master上執行) grant all privileges on *.* to root@'%' identified by '123456'; flush privileges; 至此,復制搭建完畢,后面配置MHA MHA環境配置 (1) 創建MHA 工作目錄 mkdir -p /etc/mha 修改app1.cnf配置文件,修改后的文件內容如下: [root@MHA-S2 ~]# /etc/mha/=/var/log/masterha/app1/=/var/log/masterha/=/data/mysql//=/usr/local/bin/=/usr/local/bin/===/===/usr/local/bin/=/usr/local/bin/masterha_secondary_check -s MHA-S1 -s MHA-===MHA-==//設置為候選master,如果設置該參數以后,發生主從切換以后將會將此從庫提升為主庫,即使這個主庫不是集群中事件最新的slave check_repl_delay=//默認情況下如果一個slave落后master 100M的relay logs的話,MHA將不會選擇該slave作為一個新的master,因為對于這個slave的恢復需要花費很長時間,通過設置check_repl_delay=0,MHA觸發切換在選擇一個新的master的時候將會忽略復制延時, 這個參數對于設置了candidate_master=1的主機非常有用,因為這個候選主在切換的過程中一定是新的master =MHA-S1 port= =MHA-= (2)設置relay log的清除方式(在每個slave節點上): ? 1'set global relay_log_purge=0' 注意: MHA在發生切換的過程中,從庫的恢復過程中依賴于relay log的相關信息,所以這里要將relay log的自動清除設置為OFF,采用手動清除relay log的方式。在默認情況下,從服務器上的中繼日志會在SQL線程執行完畢后被自動刪除。但是在MHA環境中,這些中繼日志在恢復其他從服務器時可能會被用到,因此需要禁用中繼日志的自動刪除功能。定期清除中繼日志需要考慮到復制延時的問題。在ext3的文件系統下,刪除大的文件需要一定的時間,會導致嚴重的復制延時。為了避免復制延時,需要暫時為中繼日志創建硬鏈接,因為在linux系統中通過硬鏈接刪除大文件速度會很快。(在mysql數據庫中,刪除大表時,通常也采用建立硬鏈接的方式) MHA節點中包含了pure_relay_logs命令工具,它可以為中繼日志創建硬鏈接,執行SET GLOBAL relay_log_purge=1,等待幾秒鐘以便SQL線程切換到新的中繼日志,再執行SET GLOBAL relay_log_purge=0。 pure_relay_logs腳本參數如下所示: --user mysql 用戶名--password mysql 密碼--port 端口號--workdir 指定創建relay log的硬鏈接的位置,默認是/var/tmp,由于系統不同分區創建硬鏈接文件會失敗,故需要執行硬鏈接具體位置,成功執行腳本后,硬鏈接的中繼日志文件被刪除--disable_relay_log_purge 默認情況下,如果relay_log_purge=1,腳本會什么都不清理,自動退出,通過設定這個參數,當relay_log_purge=1的情況下會將relay_log_purge設置為0。清理relay log之后,最后將參數設置為OFF。 (3)設置定期清理relay腳本(例如每天一次,所有服務器) [root@MHA-S2 bin]# purge_relay.!/bin/====== [ ! - $log_dir ---user=$user --password=$ --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log >& 添加到crontab [root@MHA-S2 bin]# crontab -l0 4 * * * /bin/bash /root/purge_relay_log.sh 可以手工執行以下是否會報錯 檢查SSH配置 [root@MHA-S2 bin]# masterha_check_ssh --conf=/etc/mha/app1.cnf Mon Aug 14 18:07:02 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Aug 14 18:07:02 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Mon Aug 14 18:07:02 2017 - [info] Reading server configuration from /etc/mha/app1.cnf.. Mon Aug 14 18:07:02 2017 - [info] Starting SSH connection tests.. Mon Aug 14 18:07:03 2017 - [debug] Mon Aug 14 18:07:02 2017 - [debug] Connecting via SSH from root@MHA-M1(10.180.2.163:22) to root@MHA-S1(10.180.2.164:22).. Mon Aug 14 18:07:02 2017 - [debug] ok. Mon Aug 14 18:07:02 2017 - [debug] Connecting via SSH from root@MHA-M1(10.180.2.163:22) to root@MHA-S2(10.180.2.165:22).. Mon Aug 14 18:07:03 2017 - [debug] ok. Mon Aug 14 18:07:03 2017 - [debug] Mon Aug 14 18:07:03 2017 - [debug] Connecting via SSH from root@MHA-S1(10.180.2.164:22) to root@MHA-M1(10.180.2.163:22).. Mon Aug 14 18:07:03 2017 - [debug] ok. Mon Aug 14 18:07:03 2017 - [debug] Connecting via SSH from root@MHA-S1(10.180.2.164:22) to root@MHA-S2(10.180.2.165:22).. Mon Aug 14 18:07:03 2017 - [debug] ok. Mon Aug 14 18:07:04 2017 - [debug] Mon Aug 14 18:07:03 2017 - [debug] Connecting via SSH from root@MHA-S2(10.180.2.165:22) to root@MHA-M1(10.180.2.163:22).. Mon Aug 14 18:07:03 2017 - [debug] ok. Mon Aug 14 18:07:04 2017 - [debug] Connecting via SSH from root@MHA-S2(10.180.2.165:22) to root@MHA-S1(10.180.2.164:22).. Mon Aug 14 18:07:04 2017 - [debug] ok. Mon Aug 14 18:07:04 2017 - [info] All SSH connection tests passed successfully. 檢查整個復制環境狀況 發現有報錯, Tue Aug 8 17:46:31 2017 - [info] Checking master_ip_failover_script status: Tue Aug 8 17:46:31 2017 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=MHA-M1 --orig_master_ip=10.180.2.163 --orig_master_port=3306 Bareword "FIXME_xxx" not allowed while "strict subs" in use at /usr/local/bin/master_ip_failover line 93. Execution of /usr/local/bin/master_ip_failover aborted due to compilation errors. Tue Aug 8 17:46:31 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln229] Failed to get master_ip_failover_script status with return code 255:0.Tue Aug 8 17:46:31 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/bin/masterha_check_repl line 48Tue Aug 8 17:46:31 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers. Tue Aug 8 17:46:31 2017 - [info] Got exit code 1 (Not master dead). 原來Failover兩種方式:一種是虛擬IP地址,一種是全局配置文件。MHA并沒有限定使用哪一種方式,而是讓用戶自己選擇,虛擬IP地址的方式會牽扯到其它的軟件,比如keepalive軟件,而且還要修改腳本master_ip_failover。這里先把app1.cnf 里面 master_ip_failover_script= /usr/local/bin/master_ip_failover這個選項屏蔽才可以通過。 #master_ip_failover_script= /usr/local/bin/master_ip_failover Tue Aug 8 17:49:40 2017 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK. 檢查MHA Manager的狀態 [root@MHA-S2 mha]# masterha_check_status --conf=/etc/mha/app1.cnf app1 is stopped(2:NOT_RUNNING). 手動啟動 [root@MHA-S2 mha]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &[1] 16774[root@MHA-S2 mha]# ps -ef|grep masterha root 16774 15297 4 17:52 pts/3 00:00:00 perl /usr/local/bin/masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover [root@MHA-S2 mha]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:16774) is running(0:PING_OK), master:MHA-M1 --remove_dead_master_conf 該參數代表當發生主從切換后,老的主庫的ip將會從配置文件中移除。(如果發生異常切換之后修復了舊的master,要加進去新的MHA 的話,必須記得app1.cnf回補server1的信息) --manger_log 日志存放位置 --ignore_last_failover 在缺省情況下,如果MHA檢測到連續發生宕機,且兩次宕機間隔不足8小時的話,則不會進行Failover,之所以這樣限制是為了避免ping-pong效應。該參數代表忽略上次MHA觸發切換產生的文件,默認情況下,MHA發生切換后會在日志目錄,也就是上面我設置的/data產生app1.failover.complete文件,下次再次切換的時候如果發現該目錄下存在該文件將不允許觸發切換,除非在第一次切換后收到刪除該文件,為了方便,這里設置為--ignore_last_failover。 檢查啟動日志 [root@MHA-S2 app1]# vi manager.log Tue Aug 8 17:52:37 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Aug 8 17:52:37 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Tue Aug 8 17:52:37 2017 - [info] Reading server configuration from /etc/mha/app1.cnf.. Tue Aug 8 17:52:37 2017 - [info] MHA::MasterMonitor version 0.57. Tue Aug 8 17:52:38 2017 - [info] GTID failover mode = 1Tue Aug 8 17:52:38 2017 - [info] Dead Servers: Tue Aug 8 17:52:38 2017 - [info] Alive Servers: Tue Aug 8 17:52:38 2017 - [info] MHA-M1(10.180.2.163:3306) Tue Aug 8 17:52:38 2017 - [info] MHA-S1(10.180.2.164:3306) Tue Aug 8 17:52:38 2017 - [info] MHA-S2(10.180.2.165:3306) Tue Aug 8 17:52:38 2017 - [info] Alive Slaves: Tue Aug 8 17:52:38 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Tue Aug 8 17:52:38 2017 - [info] GTID ON Tue Aug 8 17:52:38 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Tue Aug 8 17:52:38 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Aug 8 17:52:38 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Tue Aug 8 17:52:38 2017 - [info] GTID ON Tue Aug 8 17:52:38 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Tue Aug 8 17:52:38 2017 - [info] Current Alive Master: MHA-M1(10.180.2.163:3306) Tue Aug 8 17:52:38 2017 - [info] Checking slave configurations.. Tue Aug 8 17:52:38 2017 - [info] Checking replication filtering settings.. Tue Aug 8 17:52:38 2017 - [info] binlog_do_db= , binlog_ignore_db=Tue Aug 8 17:52:38 2017 - [info] Replication filtering check ok. Tue Aug 8 17:52:38 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Tue Aug 8 17:52:38 2017 - [info] Checking SSH publickey authentication settings on the current master.. Tue Aug 8 17:52:38 2017 - [info] HealthCheck: SSH to MHA-M1 is reachable. Tue Aug 8 17:52:38 2017 - [info] MHA-M1(10.180.2.163:3306) (current master) +--MHA-S1(10.180.2.164:3306) +--MHA-S2(10.180.2.165:3306) Tue Aug 8 17:52:38 2017 - [warning] master_ip_failover_script is not defined. Tue Aug 8 17:52:38 2017 - [warning] shutdown_script is not defined. Tue Aug 8 17:52:38 2017 - [info] Set master ping interval 1 seconds. Tue Aug 8 17:52:38 2017 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s MHA-S1 -s MHA-S2 Tue Aug 8 17:52:38 2017 - [info] Starting ping health check on MHA-M1(10.180.2.163:3306)..Tue Aug 8 17:52:38 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. 配置VIP vip配置可以采用兩種方式,一種通過keepalived的方式管理虛擬ip的浮動;另外一種通過腳本方式啟動虛擬ip的方式(即不需要keepalived或者heartbeat類似的軟件)。 這里僅演示使用腳本管理VIP 的方式, 修改master_ip_failover 腳本,使用腳本管理VIP [root@MHA-M1 ~]# /sbin/ifconfig eth2:1 10.180.2.168/19 腳本: [root@MHA-S2 bin]# cat master_ip_failover #!/usr/bin/env perluse strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '10.180.2.168/19'; my $key = '1'; my $ssh_start_vip = "/sbin/ifconfig eth2:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth2:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;} sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;} sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; } 在app1.cnf 文件中取消剛剛對master_ip_online_failover 的注釋并測試: 再次檢查MHA check [root@MHA-S2 bin]# masterha_check_repl --conf=/etc/mha/app1.cnf Wed Aug 9 10:49:42 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Aug 9 10:49:42 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Wed Aug 9 10:49:42 2017 - [info] Reading server configuration from /etc/mha/app1.cnf.. Wed Aug 9 10:49:42 2017 - [info] MHA::MasterMonitor version 0.57. Wed Aug 9 10:49:43 2017 - [info] GTID failover mode = 1Wed Aug 9 10:49:43 2017 - [info] Dead Servers: Wed Aug 9 10:49:43 2017 - [info] Alive Servers: Wed Aug 9 10:49:43 2017 - [info] MHA-M1(10.180.2.163:3306) Wed Aug 9 10:49:43 2017 - [info] MHA-S1(10.180.2.164:3306) Wed Aug 9 10:49:43 2017 - [info] MHA-S2(10.180.2.165:3306) Wed Aug 9 10:49:43 2017 - [info] Alive Slaves: Wed Aug 9 10:49:43 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 10:49:43 2017 - [info] GTID ON Wed Aug 9 10:49:43 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 10:49:43 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Aug 9 10:49:43 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 10:49:43 2017 - [info] GTID ON Wed Aug 9 10:49:43 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 10:49:43 2017 - [info] Current Alive Master: MHA-M1(10.180.2.163:3306) Wed Aug 9 10:49:43 2017 - [info] Checking slave configurations.. Wed Aug 9 10:49:43 2017 - [info] Checking replication filtering settings.. Wed Aug 9 10:49:43 2017 - [info] binlog_do_db= , binlog_ignore_db= Wed Aug 9 10:49:43 2017 - [info] Replication filtering check ok. Wed Aug 9 10:49:43 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Wed Aug 9 10:49:43 2017 - [info] Checking SSH publickey authentication settings on the current master.. Wed Aug 9 10:49:43 2017 - [info] HealthCheck: SSH to MHA-M1 is reachable. Wed Aug 9 10:49:43 2017 - [info] MHA-M1(10.180.2.163:3306) (current master) +--MHA-S1(10.180.2.164:3306) +--MHA-S2(10.180.2.165:3306) Wed Aug 9 10:49:43 2017 - [info] Checking replication health on MHA-S1.. Wed Aug 9 10:49:43 2017 - [info] ok. Wed Aug 9 10:49:43 2017 - [info] Checking replication health on MHA-S2.. Wed Aug 9 10:49:43 2017 - [info] ok. Wed Aug 9 10:49:43 2017 - [info] Checking master_ip_failover_script status: Wed Aug 9 10:49:43 2017 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=MHA-M1 --orig_master_ip=10.180.2.163 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth2:1 down==/sbin/ifconfig eth2:1 10.180.2.168/19=== Checking the Status of the script.. OK
MySQL Replication Health is OK. 以上就是MHA 安裝配置的全過程,以下進行簡單的測試。 (1)failover 測試 手動kill 了master 上面的mysqld 進程,查看切換狀態 [root@MHA-S2 tmp]# more manager.log Wed Aug 9 17:47:11 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Wed Aug 9 17:47:11 2017 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s MHA-S1 -s MHA-S2 --user=root --master_host=MHA-M1 --master_ip=10.180.2.163 --master_port=3306 --master_user=root --master_password=123456 --ping_type=SELECT Wed Aug 9 17:47:11 2017 - [info] Executing SSH check script: exit 0Wed Aug 9 17:47:11 2017 - [info] HealthCheck: SSH to MHA-M1 is reachable. Monitoring server MHA-S1 is reachable, Master is not reachable from MHA-S1. OK. Monitoring server MHA-S2 is reachable, Master is not reachable from MHA-S2. OK. Wed Aug 9 17:47:11 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start. Wed Aug 9 17:47:12 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Wed Aug 9 17:47:12 2017 - [warning] Connection failed 2 time(s).. Wed Aug 9 17:47:13 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Wed Aug 9 17:47:13 2017 - [warning] Connection failed 3 time(s).. Wed Aug 9 17:47:14 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Wed Aug 9 17:47:14 2017 - [warning] Connection failed 4 time(s).. Wed Aug 9 17:47:14 2017 - [warning] Master is not reachable from health checker!Wed Aug 9 17:47:14 2017 - [warning] Master MHA-M1(10.180.2.163:3306) is not reachable!Wed Aug 9 17:47:14 2017 - [warning] SSH is reachable. Wed Aug 9 17:47:14 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and tryin g to connect to all servers to check server status.. Wed Aug 9 17:47:14 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Aug 9 17:47:14 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Wed Aug 9 17:47:14 2017 - [info] Reading server configuration from /etc/mha/app1.cnf.. Wed Aug 9 17:47:14 2017 - [info] GTID failover mode = 1Wed Aug 9 17:47:14 2017 - [info] Dead Servers: Wed Aug 9 17:47:14 2017 - [info] MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Alive Servers: Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Wed Aug 9 17:47:14 2017 - [info] Alive Slaves: Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 17:47:14 2017 - [info] GTID ON Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 17:47:14 2017 - [info] GTID ON Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Checking slave configurations.. Wed Aug 9 17:47:14 2017 - [info] Checking replication filtering settings.. Wed Aug 9 17:47:14 2017 - [info] Replication filtering check ok. Wed Aug 9 17:47:14 2017 - [info] Master is down!Wed Aug 9 17:47:14 2017 - [info] Terminating monitoring script. Wed Aug 9 17:47:14 2017 - [info] Got exit code 20 (Master dead). Wed Aug 9 17:47:14 2017 - [info] MHA::MasterFailover version 0.57. Wed Aug 9 17:47:14 2017 - [info] Starting master failover. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] * Phase 1: Configuration Check Phase.. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] GTID failover mode = 1Wed Aug 9 17:47:14 2017 - [info] Dead Servers: Wed Aug 9 17:47:14 2017 - [info] MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Checking master reachability via MySQL(double check)... Wed Aug 9 17:47:14 2017 - [info] ok. Wed Aug 9 17:47:14 2017 - [info] Alive Servers: Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Wed Aug 9 17:47:14 2017 - [info] Alive Slaves: Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 17:47:14 2017 - [info] GTID ON Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 17:47:14 2017 - [info] GTID ON Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Starting GTID based failover. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] ** Phase 1: Configuration Check Phase completed. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Wed Aug 9 17:47:14 2017 - [info] Executing master IP deactivation script: Wed Aug 9 17:47:14 2017 - [info] /usr/local/bin/master_ip_failover --orig_master_host=MHA-M1 --orig_master_ip=10.180.2.163 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth2:1 down==/sbin/ifconfig eth2:1 10.180.2.168/24===Disabling the VIP on old master: MHA-M1 SIOCSIFFLAGS: Cannot assign requested address Wed Aug 9 17:47:14 2017 - [info] done. Wed Aug 9 17:47:14 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Wed Aug 9 17:47:14 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] * Phase 3: Master Recovery Phase.. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] The latest binary log file/position on all slaves is 3306-binlog.000003:194Wed Aug 9 17:47:14 2017 - [info] Retrieved Gtid Set: a5757eae-7981-11e7-82c7-005056b662d3:6-32210Wed Aug 9 17:47:14 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 17:47:14 2017 - [info] GTID ON Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 17:47:14 2017 - [info] GTID ON Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] The oldest binary log file/position on all slaves is 3306-binlog.000003:194Wed Aug 9 17:47:14 2017 - [info] Retrieved Gtid Set: a5757eae-7981-11e7-82c7-005056b662d3:6-32210Wed Aug 9 17:47:14 2017 - [info] Oldest slaves: Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 17:47:14 2017 - [info] GTID ON Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 17:47:14 2017 - [info] GTID ON Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] * Phase 3.3: Determining New Master Phase.. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] Searching new master from slaves.. Wed Aug 9 17:47:14 2017 - [info] Candidate masters from the configuration file: Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Wed Aug 9 17:47:14 2017 - [info] GTID ON Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306) Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Aug 9 17:47:14 2017 - [info] Non-candidate masters: Wed Aug 9 17:47:14 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Wed Aug 9 17:47:14 2017 - [info] New master is MHA-S1(10.180.2.164:3306) Wed Aug 9 17:47:14 2017 - [info] Starting master failover.. Wed Aug 9 17:47:14 2017 - [info] From: MHA-M1(10.180.2.163:3306) (current master) +--MHA-S1(10.180.2.164:3306) +--MHA-S2(10.180.2.165:3306) To: MHA-S1(10.180.2.164:3306) (new master) +--MHA-S2(10.180.2.165:3306) Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] Waiting all logs to be applied.. Wed Aug 9 17:47:14 2017 - [info] done. Wed Aug 9 17:47:14 2017 - [info] Getting new master's binlog name and position..Wed Aug 9 17:47:14 2017 - [info] 3306-binlog.000003:61944788Wed Aug 9 17:47:14 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='MHA-S1 or 10.180.2.164', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MAS TER_USER='repl', MASTER_PASSWORD='xxx'; Wed Aug 9 17:47:14 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: 3306-binlog.000003, 61944788, 1c2dc99f-7b57-11e7-a280-005056b665cb:1-2, a5757eae-7981-11e7-82c7-005056b662d3:1-32210Wed Aug 9 17:47:14 2017 - [info] Executing master IP activate script: Wed Aug 9 17:47:14 2017 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=MHA-M1 --orig_master_ip=10.180.2.163 --orig_master_port=3306 --new_master_host=MHA-S1 --new_master_ip=10.180.2.164 --new_master_port=3306 --new_master_user='root' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_password IN SCRIPT TEST====/sbin/ifconfig eth2:1 down==/sbin/ifconfig eth2:1 10.180.2.168/24===Enabling the VIP - 10.180.2.168/24 on the new master - MHA-S1 Wed Aug 9 17:47:14 2017 - [info] OK. Wed Aug 9 17:47:14 2017 - [info] Setting read_only=0 on MHA-S1(10.180.2.164:3306).. Wed Aug 9 17:47:14 2017 - [info] ok. Wed Aug 9 17:47:14 2017 - [info] ** Finished master recovery successfully. Wed Aug 9 17:47:14 2017 - [info] * Phase 3: Master Recovery Phase completed. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] * Phase 4: Slaves Recovery Phase.. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Wed Aug 9 17:47:14 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] -- Slave recovery on host MHA-S2(10.180.2.165:3306) started, pid: 18757. Check tmp log /var/log/masterha/app1/MHA-S2_3306_20170809174714.log if it takes time.. Wed Aug 9 17:47:15 2017 - [info] Wed Aug 9 17:47:15 2017 - [info] Log messages from MHA-S2 ... Wed Aug 9 17:47:15 2017 - [info] Wed Aug 9 17:47:14 2017 - [info] Resetting slave MHA-S2(10.180.2.165:3306) and starting replication from the new master MHA-S1(10.180.2.164:3306).. Wed Aug 9 17:47:14 2017 - [info] Executed CHANGE MASTER. Wed Aug 9 17:47:15 2017 - [info] Slave started. Wed Aug 9 17:47:15 2017 - [info] gtid_wait(1c2dc99f-7b57-11e7-a280-005056b665cb:1-2, a5757eae-7981-11e7-82c7-005056b662d3:1-32210) completed on MHA-S2(10.180.2.165:3306). Executed 0 events. Wed Aug 9 17:47:15 2017 - [info] End of log messages from MHA-S2. Wed Aug 9 17:47:15 2017 - [info] -- Slave on host MHA-S2(10.180.2.165:3306) started. Wed Aug 9 17:47:15 2017 - [info] All new slave servers recovered successfully. Wed Aug 9 17:47:15 2017 - [info] Wed Aug 9 17:47:15 2017 - [info] * Phase 5: New master cleanup phase.. Wed Aug 9 17:47:15 2017 - [info] Wed Aug 9 17:47:15 2017 - [info] Resetting slave info on the new master.. Wed Aug 9 17:47:15 2017 - [info] MHA-S1: Resetting slave info succeeded. Wed Aug 9 17:47:15 2017 - [info] Master failover to MHA-S1(10.180.2.164:3306) completed successfully. Wed Aug 9 17:47:15 2017 - [info] Deleted server1 entry from /etc/mha/app1.cnf . Wed Aug 9 17:47:15 2017 - [info] ----- Failover Report -----app1: MySQL Master failover MHA-M1(10.180.2.163:3306) to MHA-S1(10.180.2.164:3306) succeeded Master MHA-M1(10.180.2.163:3306) is down!Check MHA Manager logs at MHA-S2:/var/log/masterha/app1/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on MHA-M1(10.180.2.163:3306) Selected MHA-S1(10.180.2.164:3306) as a new master. MHA-S1(10.180.2.164:3306): OK: Applying all logs succeeded. MHA-S1(10.180.2.164:3306): OK: Activated master IP address. MHA-S2(10.180.2.165:3306): OK: Slave started, replicating from MHA-S1(10.180.2.164:3306) MHA-S1(10.180.2.164:3306): Resetting slave info succeeded. Master failover to MHA-S1(10.180.2.164:3306) completed successfully. Wed Aug 9 17:47:15 2017 - [info] Sending mail.. Unknown option: conf 以上是切換的全日志過程,我們可以看到MHA 切換主要經歷以下步驟: 1.配置文件檢查階段,這個階段會檢查整個集群配置文件配置 2.宕機的master處理,這個階段包括虛擬ip摘除操作,主機關機操作(這個我這里還沒有實現,需要研究) 3.復制dead maste和最新slave相差的relay log,并保存到MHA Manger具體的目錄下 4.識別含有最新更新的slave 5.應用從master保存的二進制日志事件(binlog events) 6.提升一個slave為新的master進行復制 7.使其他的slave連接新的master進行復制
注意: 1. 切換完之后你會發現MHA Manager 監控程序會自動死掉,官網有如下解釋和解決方式: Running MHA Manager from daemontoolsCurrently MHA Manager process does not run as a daemon. If failover completed successfully or the master process was killed by accident, the manager stops working. To run as a daemon, daemontool. or any external daemon program can be used. Here is an example to run from daemontools. 這里我們用shell 腳本的方式去執行就不會發生監控程序死掉的情況 [root@MHA-S2 bin]# more manager.sh #!/bin/shnohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 & 2. 當你修復完死掉的master想重新加入先有的兩節點MHA 也是可以的 舊Master : root@localhost:mysql3306.sock [tt]>show master status\G*************************** 1. row *************************** File: 3306-binlog.000004 Position: 194 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: a5757eae-7981-11e7-82c7-005056b662d3:1-322101 row in set (0.00 sec) 現有master: root@localhost:mysql3306.sock [tt]>show master status\G*************************** 1. row *************************** File: 3306-binlog.000003 Position: 61945043 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 1c2dc99f-7b57-11e7-a280-005056b665cb:1-3, a5757eae-7981-11e7-82c7-005056b662d3:1-322101 row in set (0.00 sec) 由于有GTID,我們可以直接就change master 切換過去,先對比一下數據: 舊master: root@localhost:mysql3306.sock [tt]>select * from t1;+----+------+ | id | c1 | +----+------+ | 1 | a1 | | 2 | a2 | | 3 | a3 | | 4 | a4 | +----+------+4 rows in set (0.02 sec) 新master: root@localhost:mysql3306.sock [tt]>select * from t1;+----+------+ | id | c1 | +----+------+ | 1 | a1 | | 2 | a2 | | 3 | a3 | | 4 | a4 | | 5 | a5 |+----+------+ 舊master 直接change master to: change master to master_host='MHA-S1',master_user='repl',master_password='123456',master_port=3306,master_auto_position=1; start slave 看輸出: root@localhost:mysql3306.sock [tt]>show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: MHA-S1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: 3306-binlog.000003 Read_Master_Log_Pos: 61945043 Relay_Log_File: MHA-M1-relay-bin.000004 Relay_Log_Pos: 715 Relay_Master_Log_File: 3306-binlog.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes 看是否會補全數據: root@localhost:mysql3306.sock [tt]>select * from t1;+----+------+ | id | c1 | +----+------+ | 1 | a1 | | 2 | a2 | | 3 | a3 | | 4 | a4 | | 5 | a5 | +----+------+ 發現數據補全了,加入復制沒問題。 最后還得修改app1.cnf 把server1 補上 [server1]hostname=MHA-M1 port=3306 重啟監控程序并查看MHA 狀態 [root@MHA-S2 tmp]# masterha_check_repl --conf=/etc/mha/app1.cnf Sat Aug 12 20:37:01 2017 - [info] Replication filtering check ok. Sat Aug 12 20:37:01 2017 - [error][/usr/local/share/perl5/MHA/Server.pm, ln398] MHA-M1(10.180.2.163:3306): User repl does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host. Sat Aug 12 20:37:01 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/share/perl5/MHA/ServerManager.pm line 1403 Sat Aug 12 20:37:01 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers. Sat Aug 12 20:37:01 2017 - [info] Got exit code 1 (Not master dead). 發現權限有問題,趕緊修復一下: MHA-M1: set session sql_log_bin=OFF; grant replication slave on *.* to repl@'%' identified by '123456'; set session sql_log_bin=ON; 再次執行MHA 狀態檢查: masterha_check_repl --conf=/etc/mha/app1.cnf Sat Aug 12 20:41:14 2017 - [info] Checking replication health on MHA-M1.. Sat Aug 12 20:41:14 2017 - [info] ok. Sat Aug 12 20:41:14 2017 - [info] Checking replication health on MHA-S2.. Sat Aug 12 20:41:14 2017 - [info] ok. Sat Aug 12 20:41:14 2017 - [info] Checking master_ip_failover_script status: Sat Aug 12 20:41:14 2017 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=MHA-S1 --orig_master_ip=10.180.2.164 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth2:1 down==/sbin/ifconfig eth2:1 10.180.2.168/24===Checking the Status of the script.. OK Sat Aug 12 20:41:15 2017 - [info] OK. Sat Aug 12 20:41:15 2017 - [warning] shutdown_script is not defined. Sat Aug 12 20:41:15 2017 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK. 最后啟動監控程序 [root@MHA-S2 bin]# nohup monitor.sh &[root@MHA-S2 bin]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:32084) is running(0:PING_OK), master:MHA-S1 (2)手動在線切換測試 在許多情況下, 需要將現有的主服務器遷移到另外一臺服務器上。 比如主服務器硬件故障,RAID 控制卡需要重建,將主服務器移到性能更好的服務器上等等。維護主服務器引起性能下降, 導致停機時間至少無法寫入數據。 另外, 阻塞或殺掉當前運行的會話會導致主主之間數據不一致的問題發生。 MHA 提供快速切換和優雅的阻塞寫入,這個切換過程只需要 0.5-2s 的時間,這段時間內數據是無法寫入的。在很多情況下,0.5-2s 的阻塞寫入是可以接受的。因此切換主服務器不需要計劃分配維護時間窗口。 MHA在線切換的大概過程:
注意,在線切換的時候應用架構需要考慮以下兩個問題: 1.自動識別master和slave的問題(master的機器可能會切換),如果采用了vip的方式,基本可以解決這個問題。 2.負載均衡的問題(可以定義大概的讀寫比例,每臺機器可承擔的負載比例,當有機器離開集群時,需要考慮這個問題) 為了保證數據完全一致性,在最快的時間內完成切換,MHA的在線切換必須滿足以下條件才會切換成功,否則會切換失敗。 1.所有slave的IO線程都在運行 2.所有slave的SQL線程都在運行 3.所有的show slave status的輸出中Seconds_Behind_Master參數小于或者等于running_updates_limit秒,如果在切換過程中不指定running_updates_limit,那么默認情況下running_updates_limit為1秒。 4.在master端,通過show processlist輸出,沒有一個更新花費的時間大于running_updates_limit秒。
在線切換步驟如下: 先停止監控程序 [root@MHA-S2 app1]# masterha_stop --conf=/etc/mha/app1.cnf Stopped app1 successfully. 修改master_ip_online_change腳本如下: [root@MHA-S2 bin]# more master_ip_online_change #!/usr/bin/env perl# Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; use MHA::NodeUtil; use Time::HiRes qw( sleep gettimeofday tv_interval ); use Data::Dumper; my $_tstart; my $_running_interval = 0.1; my ( $command, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, ); my $vip = '10.180.2.168/19'; # Virtual IP my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig eth2:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth2:$key down"; my $ssh_user = "root"; my $new_master_password='123456'; my $orig_master_password='123456'; GetOptions( 'command=s' => \$command, #'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, #'orig_master_password=s' => \$orig_master_password, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, #'new_master_password=s' => \$new_master_password, ); exit &main(); sub current_time_us { my ( $sec, $microsec ) = gettimeofday(); my $curdate = localtime($sec); return $curdate . " " . sprintf( "%06d", $microsec ); } sub sleep_until { my $elapsed = tv_interval($_tstart); if ( $_running_interval > $elapsed ) { sleep( $_running_interval - $elapsed ); } } sub get_threads_util { my $dbh = shift; my $my_connection_id = shift; my $running_time_threshold = shift; my $type = shift; $running_time_threshold = 0 unless ($running_time_threshold); $type = 0 unless ($type); my @threads; my $sth = $dbh->prepare("SHOW PROCESSLIST"); $sth->execute(); while ( my $ref = $sth->fetchrow_hashref() ) { my $id = $ref->{Id}; my $user = $ref->{User}; my $host = $ref->{Host}; my $command = $ref->{Command}; my $state = $ref->{State}; my $query_time = $ref->{Time}; my $info = $ref->{Info}; $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info); next if ( $my_connection_id == $id ); next if ( defined($query_time) && $query_time < $running_time_threshold ); next if ( defined($command) && $command eq "Binlog Dump" ); next if ( defined($user) && $user eq "system user" ); next if ( defined($command) && $command eq "Sleep" && defined($query_time) && $query_time >= 1 ); if ( $type >= 1 ) { next if ( defined($command) && $command eq "Sleep" ); next if ( defined($command) && $command eq "Connect" ); } if ( $type >= 2 ) { next if ( defined($info) && $info =~ m/^select/i ); next if ( defined($info) && $info =~ m/^show/i ); } push @threads, $ref; } return @threads; } sub main { if ( $command eq "stop" ) { ## Gracefully killing connections on the current master # 1. Set read_only= 1 on the new master # 2. DROP USER so that no app user can establish new connections # 3. Set read_only= 1 on the current master # 4. Kill current queries # * Any database access failure will result in script die. my $exit_code = 1; eval { ## Setting read_only=1 on the new master (to avoid accident) my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error(die_on_error)_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); print current_time_us() . " Set read_only on the new master.. "; $new_master_handler->enable_read_only(); if ( $new_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } $new_master_handler->disconnect(); # Connecting to the orig master, die if any database error happens my $orig_master_handler = new MHA::DBHelper(); $orig_master_handler->connect( $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, 1 ); ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand #$orig_master_handler->disable_log_bin_local(); #print current_time_us() . " Drpping app user on the orig master..\n"; #FIXME_xxx_drop_app_user($orig_master_handler); ## Waiting for N * 100 milliseconds so that current connections can exit my $time_until_read_only = 15; $_tstart = [gettimeofday]; my @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_read_only > 0 && $#threads >= 0 ) { if ( $time_until_read_only % 5 == 0 ) { printf"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_read_only * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_read_only--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } ## Setting read_only=1 on the current master so that nobody(except SUPER) can write print current_time_us() . " Set read_only=1 on the orig master.. "; $orig_master_handler->enable_read_only(); if ( $orig_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } ## Waiting for M * 100 milliseconds so that current update queries can complete my $time_until_kill_threads = 5; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_kill_threads > 0 && $#threads >= 0 ) { if ( $time_until_kill_threads % 5 == 0 ) { printf"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_kill_threads * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_kill_threads--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); ## Terminating all threads print current_time_us() . " Killing all application threads..\n"; $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 ); print current_time_us() . " done.\n"; #$orig_master_handler->enable_log_bin_local(); $orig_master_handler->disconnect(); ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { ## Activating master ip on the new master # 1. Create app user with write privileges # 2. Moving backup script if needed # 3. Register new master's ip to the catalog database# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.# If exit code is 0 or 10, MHA does not abort my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master #$new_master_handler->disable_log_bin_local(); print current_time_us() . " Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master #print current_time_us() . " Creating app user on the new master..\n"; #FIXME_xxx_create_app_user($new_master_handler); #$new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;} # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;} sub usage { print"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; die; } 執行切換 [root@MHA-S2 tmp]# masterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=MHA-M1 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 其中參數的意思: --orig_master_is_new_slave 切換時加上此參數是將原 master 變為 slave 節點,如果不加此參數,原來的 master 將不啟動 --running_updates_limit=10000,故障切換時,候選master 如果有延遲的話, mha 切換不能成功,加上此參數表示延遲在此時間范圍內都可切換(單位為s),但是切換的時間長短是由recover 時relay 日志的大小決定 查看切換后各機器的狀態: S2: root@localhost:mysql3306.sock [tt]>show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: MHA-M1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: 3306-binlog.000004 Read_Master_Log_Pos: 748 Relay_Log_File: MHA-S2-relay-bin.000002 Relay_Log_Pos: 420 Relay_Master_Log_File: 3306-binlog.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes S1: root@localhost:mysql3306.sock [tt]>show slave status\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: MHA-M1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: 3306-binlog.000004 Read_Master_Log_Pos: 748 Relay_Log_File: MHA-S1-relay-bin.000002 Relay_Log_Pos: 420 Relay_Master_Log_File: 3306-binlog.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes M1: root@localhost:mysql3306.sock [tt]>show slave status\G Empty set (0.00 sec) 在線切換的日志: [root@MHA-S2 tmp]# more sw.log [root@MHA-S2 bin]# masterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=MHA-M1 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000Sat Aug 12 21:34:54 2017 - [info] MHA::MasterRotate version 0.57. Sat Aug 12 21:34:54 2017 - [info] Starting online master switch.. Sat Aug 12 21:34:54 2017 - [info] Sat Aug 12 21:34:54 2017 - [info] * Phase 1: Configuration Check Phase.. Sat Aug 12 21:34:54 2017 - [info] Sat Aug 12 21:34:54 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Aug 12 21:34:54 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sat Aug 12 21:34:54 2017 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sat Aug 12 21:34:54 2017 - [info] GTID failover mode = 1Sat Aug 12 21:34:54 2017 - [info] Current Alive Master: MHA-S1(10.180.2.164:3306) Sat Aug 12 21:34:54 2017 - [info] Alive Slaves: Sat Aug 12 21:34:54 2017 - [info] MHA-M1(10.180.2.163:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Sat Aug 12 21:34:54 2017 - [info] GTID ON Sat Aug 12 21:34:54 2017 - [info] Replicating from MHA-S1(10.180.2.164:3306) Sat Aug 12 21:34:54 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Sat Aug 12 21:34:54 2017 - [info] GTID ON Sat Aug 12 21:34:54 2017 - [info] Replicating from MHA-S1(10.180.2.164:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on MHA-S1(10.180.2.164:3306)? (YES/no): yes Sat Aug 12 21:35:07 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Sat Aug 12 21:35:07 2017 - [info] ok. Sat Aug 12 21:35:07 2017 - [info] Checking MHA is not monitoring or doing failover.. Sat Aug 12 21:35:07 2017 - [info] Checking replication health on MHA-M1.. Sat Aug 12 21:35:07 2017 - [info] ok. Sat Aug 12 21:35:07 2017 - [info] Checking replication health on MHA-S2.. Sat Aug 12 21:35:07 2017 - [info] ok. Sat Aug 12 21:35:07 2017 - [info] MHA-M1 can be new master. Sat Aug 12 21:35:07 2017 - [info] From: MHA-S1(10.180.2.164:3306) (current master) +--MHA-M1(10.180.2.163:3306) +--MHA-S2(10.180.2.165:3306) To: MHA-M1(10.180.2.163:3306) (new master) +--MHA-S2(10.180.2.165:3306) +--MHA-S1(10.180.2.164:3306) Starting master switch from MHA-S1(10.180.2.164:3306) to MHA-M1(10.180.2.163:3306)? (yes/NO): yes Sat Aug 12 21:35:15 2017 - [info] Checking whether MHA-M1(10.180.2.163:3306) is ok for the new master.. Sat Aug 12 21:35:15 2017 - [info] ok. Sat Aug 12 21:35:15 2017 - [info] MHA-S1(10.180.2.164:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Sat Aug 12 21:35:15 2017 - [info] MHA-S1(10.180.2.164:3306): Resetting slave pointing to the dummy host. Sat Aug 12 21:35:15 2017 - [info] ** Phase 1: Configuration Check Phase completed. Sat Aug 12 21:35:15 2017 - [info] Sat Aug 12 21:35:15 2017 - [info] * Phase 2: Rejecting updates Phase.. Sat Aug 12 21:35:15 2017 - [info] Sat Aug 12 21:35:15 2017 - [info] Executing master ip online change script to disable write on the current master: Sat Aug 12 21:35:15 2017 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=MHA-S1 --orig_master_ip=10.180.2.164 --orig_master_port=3306 --orig_master_user='root' --new_master_ host=MHA-M1 --new_master_ip=10.180.2.163 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_mas ter_password=xxx Unknown option: orig_master_ssh_user Unknown option: new_master_ssh_user Unknown option: orig_master_is_new_slave Unknown option: orig_master_password Unknown option: new_master_password Sat Aug 12 21:35:15 2017 568580 Set read_only on the new master.. ok. Sat Aug 12 21:35:15 2017 573508 Waiting all running 2 threads are disconnected.. (max 1500 milliseconds) {'Time' => '272878','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '40','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Host' => 'MHA-S2:46970'}{'Time' => '3738','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '55','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Host' => 'MHA-M1:51506'} Sat Aug 12 21:35:16 2017 075020 Waiting all running 2 threads are disconnected.. (max 1000 milliseconds) {'Time' => '272879','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '40','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Host' => 'MHA-S2:46970'}{'Time' => '3739','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '55','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Host' => 'MHA-M1:51506'} Sat Aug 12 21:35:16 2017 576059 Waiting all running 2 threads are disconnected.. (max 500 milliseconds) {'Time' => '272879','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '40','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Host' => 'MHA-S2:46970'}{'Time' => '3739','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '55','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Host' => 'MHA-M1:51506'} Sat Aug 12 21:35:17 2017 076940 Set read_only=1 on the orig master.. ok. Sat Aug 12 21:35:17 2017 079645 Waiting all running 2 queries are disconnected.. (max 500 milliseconds) {'Time' => '272880','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '40','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Host' => 'MHA-S2:46970'}{'Time' => '3740','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '55','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Host' => 'MHA-M1:51506'} Disabling the VIP on old master: MHA-S1 Sat Aug 12 21:35:17 2017 683769 Killing all application threads.. Sat Aug 12 21:35:17 2017 686090 done. Sat Aug 12 21:35:17 2017 - [info] ok. Sat Aug 12 21:35:17 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Sat Aug 12 21:35:17 2017 - [info] Executing FLUSH TABLES WITH READ LOCK.. Sat Aug 12 21:35:17 2017 - [info] ok. Sat Aug 12 21:35:17 2017 - [info] Orig master binlog:pos is 3306-binlog.000003:61945043. Sat Aug 12 21:35:17 2017 - [info] Waiting to execute all relay logs on MHA-M1(10.180.2.163:3306).. Sat Aug 12 21:35:17 2017 - [info] master_pos_wait(3306-binlog.000003:61945043) completed on MHA-M1(10.180.2.163:3306). Executed 0 events. Sat Aug 12 21:35:17 2017 - [info] done. Sat Aug 12 21:35:17 2017 - [info] Getting new master's binlog name and position..Sat Aug 12 21:35:17 2017 - [info] 3306-binlog.000004:748Sat Aug 12 21:35:17 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='MHA-M1 or 10.180.2.163', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MAS TER_USER='repl', MASTER_PASSWORD='xxx'; Sat Aug 12 21:35:17 2017 - [info] Executing master ip online change script to allow write on the new master: Sat Aug 12 21:35:17 2017 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=MHA-S1 --orig_master_ip=10.180.2.164 --orig_master_port=3306 --orig_master_user='root' --new_master _host=MHA-M1 --new_master_ip=10.180.2.163 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_ma ster_password=xxx Unknown option: orig_master_ssh_user Unknown option: new_master_ssh_user Unknown option: orig_master_is_new_slave Unknown option: orig_master_password Unknown option: new_master_password Sat Aug 12 21:35:17 2017 865209 Set read_only=0 on the new master. Enabling the VIP - 10.180.2.168/19 on the new master - MHA-M1 Sat Aug 12 21:35:17 2017 - [info] ok. Sat Aug 12 21:35:17 2017 - [info] Sat Aug 12 21:35:17 2017 - [info] * Switching slaves in parallel.. Sat Aug 12 21:35:17 2017 - [info] Sat Aug 12 21:35:17 2017 - [info] -- Slave switch on host MHA-S2(10.180.2.165:3306) started, pid: 2327Sat Aug 12 21:35:17 2017 - [info] Sat Aug 12 21:35:18 2017 - [info] Log messages from MHA-S2 ... Sat Aug 12 21:35:18 2017 - [info] Sat Aug 12 21:35:18 2017 - [info] Waiting to execute all relay logs on MHA-S2(10.180.2.165:3306).. Sat Aug 12 21:35:18 2017 - [info] master_pos_wait(3306-binlog.000003:61945043) completed on MHA-S2(10.180.2.165:3306). Executed 0 events. Sat Aug 12 21:35:18 2017 - [info] done. Sat Aug 12 21:35:18 2017 - [info] Resetting slave MHA-S2(10.180.2.165:3306) and starting replication from the new master MHA-M1(10.180.2.163:3306).. Sat Aug 12 21:35:18 2017 - [info] Executed CHANGE MASTER. Sat Aug 12 21:35:18 2017 - [info] Slave started. Sat Aug 12 21:35:18 2017 - [info] End of log messages from MHA-S2 ... Sat Aug 12 21:35:18 2017 - [info] Sat Aug 12 21:35:18 2017 - [info] -- Slave switch on host MHA-S2(10.180.2.165:3306) succeeded. Sat Aug 12 21:35:18 2017 - [info] Unlocking all tables on the orig master: Sat Aug 12 21:35:18 2017 - [info] Executing UNLOCK TABLES.. Sat Aug 12 21:35:18 2017 - [info] ok. Sat Aug 12 21:35:18 2017 - [info] Starting orig master as a new slave.. Sat Aug 12 21:35:18 2017 - [info] Resetting slave MHA-S1(10.180.2.164:3306) and starting replication from the new master MHA-M1(10.180.2.163:3306).. Sat Aug 12 21:35:18 2017 - [info] Executed CHANGE MASTER. Sat Aug 12 21:35:19 2017 - [info] Slave started. Sat Aug 12 21:35:19 2017 - [info] All new slave servers switched successfully. Sat Aug 12 21:35:19 2017 - [info] Sat Aug 12 21:35:19 2017 - [info] * Phase 5: New master cleanup phase.. Sat Aug 12 21:35:19 2017 - [info] Sat Aug 12 21:35:19 2017 - [info] MHA-M1: Resetting slave info succeeded. Sat Aug 12 21:35:19 2017 - [info] Switching master to MHA-M1(10.180.2.163:3306) completed successfully. |
感謝你能夠認真閱讀完這篇文章,希望小編分享的“MySQL高可用架構之MHA的原理分析”這篇文章對大家有幫助,同時也希望大家多多支持億速云,關注億速云行業資訊頻道,更多相關知識等著你來學習!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。