MHA软件由两部分组成,Manager工具包和Node工具包。详细的说明例如以下。 Manager工具包主要包含下面几个工具: masterha_check_ssh 检查MHA的SSH配置状况 masterha_check_repl 检查MySQL复制状况 masterha_manger 启动MHA masterha_check_status 检測当前MHA执行状态 masterha_master_monitor 检測master是否宕机 masterha_master_switch 控制故障转移(自己主动或者手动) masterha_conf_host 加入或删除配置的server信息 Node工具包(这些工具通常由MHA Manager的脚本触发。无需人为操作)主要包含下面几个工具: save_binary_logs 保存和复制master的二进制日志 apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其它的slave filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具) purge_relay_logs 清除中继日志(不会堵塞SQL线程) 注意: (1)为了尽可能的降低主库硬件损坏宕机造成的数据丢失。因此在配置MHA的同一时候建议配置成MySQL 5.5的半同步复制 1.1、搭建好开发环境

# mkdir -p /yangsq/ftp
# useradd -d /yangsq/ftp -s /sbin/nologin uftp
# passwd uftp
# chown -R uftp:uftp /yangsq/ftp

# yum list all|grep vsftpd
# yum -y install vsftpd.x86_64

# cp /etc/vsftpd/vsftpd.conf /etc/vsftpd/vsftpd.conf.bak

# chkconfig vsftpd on
# service vsftpd start

# yum install ftp.x86_64 -y

# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted
# setenforce 0

# ftp 21

yum -y install perl-DBD-MySQL
yum -y install perl-CPAN.x86_64

cd /yangsq/ftp
tar xvf mha4mysql-node-0.54.tar.gz

cd mha4mysql-node-0.54
perl Makefile.PL

make && make install
Installing /usr/local/share/perl5/MHA/BinlogPosFinderXid.pm
Installing /usr/local/share/perl5/MHA/SlaveUtil.pm
Installing /usr/local/share/perl5/MHA/NodeUtil.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFinderElp.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFindManager.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFinder.pm
Installing /usr/local/share/perl5/MHA/BinlogManager.pm
Installing /usr/local/share/perl5/MHA/NodeConst.pm
Installing /usr/local/share/perl5/MHA/BinlogHeaderParser.pm
Installing /usr/local/share/man/man1/filter_mysqlbinlog.1
Installing /usr/local/share/man/man1/purge_relay_logs.1
Installing /usr/local/share/man/man1/apply_diff_relay_logs.1
Installing /usr/local/share/man/man1/save_binary_logs.1
Installing /usr/local/bin/filter_mysqlbinlog
Installing /usr/local/bin/apply_diff_relay_logs
Installing /usr/local/bin/save_binary_logs
Installing /usr/local/bin/purge_relay_logs

1.4、在yaolansvr_slave安装mha manager
(1)No package perl-Log-Dispatch available.
# mv CentOS-Base.repo CentOS-Base.repo.bak
sftp> put C:\Users\Yaolan\Downloads\CentOS6-Base-163.repo
# yum clean all
# yum makecache
# rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

yum -y install perl-Config-Tiny
yum install perl-Log-Dispatch -y
yum install perl-Parallel-ForkManager -y
yum install perl-Time-HiRes -y

# tar xvf mha4mysql-manager-0.54.tar.gz
# cd mha4mysql-manager-0.54
# perl Makefile.PL 

# make && make install
Installing /usr/local/share/perl5/MHA/ManagerAdminWrapper.pm
Installing /usr/local/share/perl5/MHA/ManagerUtil.pm
Installing /usr/local/share/perl5/MHA/MasterFailover.pm
Installing /usr/local/share/perl5/MHA/MasterMonitor.pm
Installing /usr/local/share/perl5/MHA/ManagerAdmin.pm
Installing /usr/local/share/perl5/MHA/Config.pm
Installing /usr/local/share/perl5/MHA/DBHelper.pm
Installing /usr/local/share/perl5/MHA/HealthCheck.pm
Installing /usr/local/share/perl5/MHA/FileStatus.pm
Installing /usr/local/share/perl5/MHA/MasterRotate.pm
Installing /usr/local/share/perl5/MHA/Server.pm
Installing /usr/local/share/perl5/MHA/ServerManager.pm
Installing /usr/local/share/perl5/MHA/SSHCheck.pm
Installing /usr/local/share/perl5/MHA/ManagerConst.pm
Installing /usr/local/share/man/man1/masterha_check_ssh.1
Installing /usr/local/share/man/man1/masterha_secondary_check.1
Installing /usr/local/share/man/man1/masterha_conf_host.1
Installing /usr/local/share/man/man1/masterha_check_status.1
Installing /usr/local/share/man/man1/masterha_stop.1
Installing /usr/local/share/man/man1/masterha_manager.1
Installing /usr/local/share/man/man1/masterha_master_monitor.1
Installing /usr/local/share/man/man1/masterha_check_repl.1
Installing /usr/local/share/man/man1/masterha_master_switch.1
Installing /usr/local/bin/masterha_manager
Installing /usr/local/bin/masterha_check_ssh
Installing /usr/local/bin/masterha_check_status
Installing /usr/local/bin/masterha_master_monitor
Installing /usr/local/bin/masterha_secondary_check
Installing /usr/local/bin/masterha_conf_host
Installing /usr/local/bin/masterha_check_repl
Installing /usr/local/bin/masterha_stop
Installing /usr/local/bin/masterha_master_switch

(1)ssh-copy-id: command not found,解决:yum install openssh-clients -y
# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa)

# yum install -y openssh-clients
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@

(1)datadir和server-id的设置,candidate master和slave仅仅改动server-id
# vi /usr/mysql/etc/my.cnf
port = 3306

long_query_time =5


#binlog 设置
log-bin = /data/mysql/data/mysql-bin.log
binlog_cache_size = 8M

join_buffer_size = 2M
sort_buffer_size = 2M
read_rnd_buffer_size = 2M
read_buffer_size = 2M
max_heap_table_size = 64M
thread_concurrency = 12
query_cache_type = 1
query_cache_size = 32M
ft_min_word_len = 4
thread_stack = 192K
tmp_table_size = 64M


max_connections = 1000
max_connect_errors = 200

innodb_buffer_pool_size = 1G
innodb_additional_mem_pool_size = 16M
innodb_log_buffer_size = 8M
innodb_log_file_size = 512M
innodb_log_files_in_group = 3
innodb_write_io_threads = 8
innodb_read_io_threads = 8
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 30

# service mysqld start 

# mysqldump  -A --flush-privileges --lock-all-tables --events --routines --triggers --master-data=2>/yangsq/ftp/`date +%Y-%m-%d`_all.sql

mysql> grant replication slave on *.* to 'repl1'@'192.168.0.%' identified by '123456';
mysql> flush privileges;

# mysql<2015-06-29_all.sql
# head -n 30 2015-06-29_all.sql |grep -i "change master"

mysql> show slave status;
Empty set (0.00 sec)
mysql> change master to master_host='',master_user='repl1',master_password='123456',master_port=3306,master_log_file='mysql-bin.000008',master_log_pos=120;
mysql> start slave;
Last_IO_Errno: 1593
Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work.
解决:两台datadir/auto.cnf一样,select uuid()不同。所以删除candidate master上的auto.cnf。又一次启动实例

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_User: repl1
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000008
          Read_Master_Log_Pos: 408
               Relay_Log_File: yaolansvr_slave-relay-bin.000003
                Relay_Log_Pos: 571
        Relay_Master_Log_File: mysql-bin.000008
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
1.6.3、其它的slave节点设置read_only=1(不写入my.cnf。以供candidate master转为主后,提供写)
# mysql -e "set global read_only=1"

(1)candidate master不创建复制用户。则报错:
Mon Jun 29 17:28:00 2015 - [info] Alive Slaves:
Mon Jun 29 17:28:00 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 17:28:00 2015 - [info]     Replicating from
Mon Jun 29 17:28:00 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jun 29 17:28:00 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 17:28:00 2015 - [info]     Replicating from
Mon Jun 29 17:28:00 2015 - [info]     Not candidate for the new Master (no_master is set)
Mon Jun 29 17:28:00 2015 - [info] Current Alive Master:
Mon Jun 29 17:28:00 2015 - [info] Checking slave configurations..
Mon Jun 29 17:28:00 2015 - [info] Checking replication filtering settings..
Mon Jun 29 17:28:00 2015 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Jun 29 17:28:00 2015 - [info]  Replication filtering check ok.
Mon Jun 29 17:28:00 2015 - [error][/usr/local/share/perl5/MHA/Server.pm, ln382] User repl1 does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.

仅仅在candidate master创建复制用户,必须与master的复制用户同样
mysql> grant replication slave on *.* to 'repl1'@'192.168.0.%' identified by '123456';
mysql> flush privileges;

Mon Jun 29 18:02:41 2015 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln255] Got MySQL error when connecting :1045:Access denied for user 'monitor'@'' (using password: YES), but this is not mysql crash. Check MySQL server settings.

mysql> grant all privileges on *.* to 'monitor'@'192.168.0.%' identified by '123456';
mysql> flush privileges;

# mkdir -p /etc/mha/app1
# vi /etc/mha/app1/app1.cnf
[server default]
ssh_user=root ####ssh_user是互信配置用户。就用这个账户运行mhamaster命令。原理:ssh roo@

secondary_check_script=masterha_secondary_check -s -s




1.7.2、关闭全部数据库节点 relay log的自己主动清除
# mysql -e "set global relay_log_purge=0"

# ln -sv /usr/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog

(1)# cp /yangsq/ftp/mha4mysql-manager-0.54/samples/scripts/master_ip_failover /etc/mha/app1/master_ip_failover或将master_ip_failover_script=/etc/mha/app1/master_ip_failover凝视掉
否则报错:Mon Jun 29 17:05:26 2015 - [info]   /etc/mha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host= --orig_master_ip= --orig_master_port=3306 
Bareword "FIXME_xxx" not allowed while "strict subs" in use at /etc/mha/app1/master_ip_failover line 93.
Mon Jun 29 17:52:11 2015 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000008 
Mon Jun 29 17:52:11 2015 - [info]   Connecting to root@ 
Failed to save binary log: Binlog not found from /var/lib/mysql,/var/log/mysql! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again.

# masterha_check_ssh --conf=/etc/mha/app1/app1.cnf 

# masterha_check_ssh --conf=/etc/mha/app1/app1.cnf 
Mon Jun 29 17:33:39 2015 - [info] Reading server configurations from /etc/mha/app1/app1.cnf..
Mon Jun 29 17:33:39 2015 - [info] MHA::MasterMonitor version 0.54.
Mon Jun 29 17:33:39 2015 - [info] Dead Servers:
Mon Jun 29 17:33:39 2015 - [info] Alive Servers:
Mon Jun 29 17:33:39 2015 - [info]
Mon Jun 29 17:33:39 2015 - [info]
Mon Jun 29 17:33:39 2015 - [info]
Mon Jun 29 17:33:39 2015 - [info] Alive Slaves:
Mon Jun 29 17:33:39 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 17:33:39 2015 - [info]     Replicating from
Mon Jun 29 17:33:39 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jun 29 17:33:39 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 17:33:39 2015 - [info]     Replicating from
Mon Jun 29 17:33:39 2015 - [info]     Not candidate for the new Master (no_master is set)
Mon Jun 29 17:33:39 2015 - [info] Current Alive Master:
Mon Jun 29 17:33:39 2015 - [info] Checking slave configurations..
Mon Jun 29 17:33:39 2015 - [info] Checking replication filtering settings..
Mon Jun 29 17:33:39 2015 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Jun 29 17:33:39 2015 - [info]  Replication filtering check ok.
Mon Jun 29 17:33:39 2015 - [info] Starting SSH connection tests..
Mon Jun 29 17:34:21 2015 - [info] All SSH connection tests passed successfully.
Mon Jun 29 17:34:21 2015 - [info] Checking MHA Node version..
Mon Jun 29 17:34:31 2015 - [info]  Version check ok.
Mon Jun 29 17:34:31 2015 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jun 29 17:34:32 2015 - [info] HealthCheck: SSH to is reachable.
Mon Jun 29 17:34:32 2015 - [info] Master MHA Node version is 0.54.
Mon Jun 29 17:34:32 2015 - [info] Checking recovery script configurations on the current master..
Mon Jun 29 17:34:32 2015 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000008 
Mon Jun 29 17:34:32 2015 - [info]   Connecting to root@ 
  Creating /var/tmp if not exists..    ok.
  Checking output directory is accessible or not..
  Binlog found at /data/mysql/data, up to mysql-bin.000008
Mon Jun 29 17:34:32 2015 - [info] Master setting check done.
Mon Jun 29 17:34:32 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Jun 29 17:34:32 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host= --slave_ip= --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info  --relay_dir=/data/mysql/data/  --slave_pass=xxx
Mon Jun 29 17:34:32 2015 - [info]   Connecting to root@ 
  Checking slave recovery environment settings..
    Opening /data/mysql/data/relay-log.info ... ok.
    Relay log found at /data/mysql/data, up to yaolansvr_slave-relay-bin.000003
    Temporary relay log file is /data/mysql/data/yaolansvr_slave-relay-bin.000003
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Jun 29 17:34:33 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host= --slave_ip= --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info  --relay_dir=/data/mysql/data/  --slave_pass=xxx
Mon Jun 29 17:34:33 2015 - [info]   Connecting to root@ 
reverse mapping checking getaddrinfo for bogon [] failed - POSSIBLE BREAK-IN ATTEMPT!
  Checking slave recovery environment settings..
    Opening /data/mysql/data/relay-log.info ... ok.
    Relay log found at /data/mysql/data, up to yaolansvr_slave01-relay-bin.000002
    Temporary relay log file is /data/mysql/data/yaolansvr_slave01-relay-bin.000002
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Jun 29 17:34:43 2015 - [info] Slaves settings check done.
Mon Jun 29 17:34:43 2015 - [info] (current master)

Mon Jun 29 17:34:43 2015 - [info] Checking replication health on
Mon Jun 29 17:34:43 2015 - [info]  ok.
Mon Jun 29 17:34:43 2015 - [info] Checking replication health on
Mon Jun 29 17:34:43 2015 - [info]  ok.
Mon Jun 29 17:34:43 2015 - [warning] master_ip_failover_script is not defined.
Mon Jun 29 17:34:43 2015 - [warning] shutdown_script is not defined.
Mon Jun 29 17:34:43 2015 - [info] Got exit code 0 (Not master dead).


# vi /etc/mha/app1/master_ip_failover 
my $vip = '';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down";

# /sbin/ifconfig eth0:1
# /sbin/ifconfig eth0:1 down
# ip a

# masterha_check_repl --conf=/etc/mha/app1/app1.cnf 

# masterha_check_status --conf=/etc/mha/app1/app1.cnf 
app1 is stopped(2:NOT_RUNNING).

# touch /etc/mha/app1/manager.log
# nohup masterha_manager --conf=/etc/mha/app1/app1.cnf </dev/null>/etc/mha/app1/manager.log 2>&1   &
# tail -f /var/log/manager.log
Tue Jun 30 10:21:25 2015 - [info] Got terminate signal. Exit.
Tue Jun 30 10:21:58 2015 - [info] MHA::MasterMonitor version 0.54.
Tue Jun 30 10:21:58 2015 - [info] Dead Servers:
Tue Jun 30 10:21:58 2015 - [info] Alive Servers:
Tue Jun 30 10:21:58 2015 - [info]
Tue Jun 30 10:21:58 2015 - [info]
Tue Jun 30 10:21:58 2015 - [info]
Tue Jun 30 10:21:58 2015 - [info] Alive Slaves:
Tue Jun 30 10:21:58 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:21:58 2015 - [info]     Replicating from
Tue Jun 30 10:21:58 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:21:58 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:21:58 2015 - [info]     Replicating from
Tue Jun 30 10:21:58 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jun 30 10:21:58 2015 - [info] Current Alive Master:
Tue Jun 30 10:21:58 2015 - [info] Checking slave configurations..
Tue Jun 30 10:21:58 2015 - [info]  read_only=1 is not set on slave
Tue Jun 30 10:21:58 2015 - [warning]  relay_log_purge=0 is not set on slave
Tue Jun 30 10:21:58 2015 - [info]  read_only=1 is not set on slave
Tue Jun 30 10:21:58 2015 - [warning]  relay_log_purge=0 is not set on slave
Tue Jun 30 10:21:58 2015 - [info] Checking replication filtering settings..
Tue Jun 30 10:21:58 2015 - [info]  binlog_do_db= , binlog_ignore_db= 
Tue Jun 30 10:21:58 2015 - [info]  Replication filtering check ok.
Tue Jun 30 10:21:58 2015 - [info] Starting SSH connection tests..
Tue Jun 30 10:22:00 2015 - [info] All SSH connection tests passed successfully.
Tue Jun 30 10:22:00 2015 - [info] Checking MHA Node version..
Tue Jun 30 10:22:01 2015 - [info]  Version check ok.
Tue Jun 30 10:22:01 2015 - [info] Checking SSH publickey authentication settings on the current master..
Tue Jun 30 10:22:01 2015 - [info] HealthCheck: SSH to is reachable.
Tue Jun 30 10:22:01 2015 - [info] Master MHA Node version is 0.54.
Tue Jun 30 10:22:01 2015 - [info] Checking recovery script configurations on the current master..
Tue Jun 30 10:22:01 2015 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000009 
Tue Jun 30 10:22:01 2015 - [info]   Connecting to root@ 
  Creating /var/tmp if not exists..    ok.
  Checking output directory is accessible or not..
  Binlog found at /data/mysql/data/, up to mysql-bin.000009
Tue Jun 30 10:22:01 2015 - [info] Master setting check done.
Tue Jun 30 10:22:01 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Jun 30 10:22:01 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host= --slave_ip= --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info  --relay_dir=/data/mysql/data/  --slave_pass=xxx
Tue Jun 30 10:22:01 2015 - [info]   Connecting to root@ 
  Checking slave recovery environment settings..
    Opening /data/mysql/data/relay-log.info ... ok.
    Relay log found at /data/mysql/data, up to yaolansvr_slave-relay-bin.000006
    Temporary relay log file is /data/mysql/data/yaolansvr_slave-relay-bin.000006
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Jun 30 10:22:02 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host= --slave_ip= --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info  --relay_dir=/data/mysql/data/  --slave_pass=xxx
Tue Jun 30 10:22:02 2015 - [info]   Connecting to root@ 
  Checking slave recovery environment settings..
    Opening /data/mysql/data/relay-log.info ... ok.
    Relay log found at /data/mysql/data, up to yaolansvr_slave01-relay-bin.000005
    Temporary relay log file is /data/mysql/data/yaolansvr_slave01-relay-bin.000005
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Jun 30 10:22:02 2015 - [info] Slaves settings check done.
Tue Jun 30 10:22:02 2015 - [info] (current master)

Tue Jun 30 10:22:02 2015 - [info] Checking master_ip_failover_script status:
Tue Jun 30 10:22:02 2015 - [info]   /etc/mha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host= --orig_master_ip= --orig_master_port=3306 
Tue Jun 30 10:22:02 2015 - [info]  OK.
Tue Jun 30 10:22:02 2015 - [warning] shutdown_script is not defined.
Tue Jun 30 10:22:02 2015 - [info] Set master ping interval 3 seconds.
Tue Jun 30 10:22:02 2015 - [info] Set secondary check script: masterha_secondary_check -s -s
Tue Jun 30 10:22:02 2015 - [info] Starting ping health check on
Tue Jun 30 10:22:02 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

# masterha_check_status --conf=/etc/mha/app1/app1.cnf 
app1 (pid:2243) is running(0:PING_OK), master:

# masterha_stop --conf=/etc/mha/app1/app1.cnf

# tail -f /var/log/manager.log
Tue Jun 30 10:28:02 2015 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Jun 30 10:28:02 2015 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --binlog_prefix=mysql-bin
Tue Jun 30 10:28:03 2015 - [info] Executing seconary network check script: masterha_secondary_check -s -s  --user=root  --master_host=  --master_ip=  --master_port=3306
Tue Jun 30 10:28:03 2015 - [info] HealthCheck: SSH to is reachable.
Monitoring server is reachable, Master is not reachable from OK.
Monitoring server is reachable, Master is not reachable from OK.
Tue Jun 30 10:28:03 2015 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Tue Jun 30 10:28:05 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 30 10:28:05 2015 - [warning] Connection failed 1 time(s)..
Tue Jun 30 10:28:08 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 30 10:28:08 2015 - [warning] Connection failed 2 time(s)..
Tue Jun 30 10:28:11 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 30 10:28:11 2015 - [warning] Connection failed 3 time(s)..
Tue Jun 30 10:28:11 2015 - [warning] Master is not reachable from health checker!
Tue Jun 30 10:28:11 2015 - [warning] Master is not reachable!
Tue Jun 30 10:28:11 2015 - [warning] SSH is reachable.
Tue Jun 30 10:28:11 2015 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1/app1.cnf again, and trying to connect to all servers to check server status..
Tue Jun 30 10:28:11 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 30 10:28:11 2015 - [info] Reading application default configurations from /etc/mha/app1/app1.cnf..
Tue Jun 30 10:28:11 2015 - [info] Reading server configurations from /etc/mha/app1/app1.cnf..
Tue Jun 30 10:28:12 2015 - [info] Dead Servers:
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] Alive Servers:
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] Alive Slaves:
Tue Jun 30 10:28:12 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info]     Replicating from
Tue Jun 30 10:28:12 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:12 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info]     Replicating from
Tue Jun 30 10:28:12 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:12 2015 - [info] Checking slave configurations..
Tue Jun 30 10:28:12 2015 - [warning]  relay_log_purge=0 is not set on slave
Tue Jun 30 10:28:12 2015 - [warning]  relay_log_purge=0 is not set on slave
Tue Jun 30 10:28:12 2015 - [info] Checking replication filtering settings..
Tue Jun 30 10:28:12 2015 - [info]  Replication filtering check ok.
Tue Jun 30 10:28:12 2015 - [info] Master is down!
Tue Jun 30 10:28:12 2015 - [info] Terminating monitoring script.
Tue Jun 30 10:28:12 2015 - [info] Got exit code 20 (Master dead).
Tue Jun 30 10:28:12 2015 - [info] MHA::MasterFailover version 0.54.
Tue Jun 30 10:28:12 2015 - [info] Starting master failover.
Tue Jun 30 10:28:12 2015 - [info] 
Tue Jun 30 10:28:12 2015 - [info] * Phase 1: Configuration Check Phase..
Tue Jun 30 10:28:12 2015 - [info] 
Tue Jun 30 10:28:12 2015 - [info] Dead Servers:
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] Checking master reachability via mysql(double check)..
Tue Jun 30 10:28:12 2015 - [info]  ok.
Tue Jun 30 10:28:12 2015 - [info] Alive Servers:
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] Alive Slaves:
Tue Jun 30 10:28:12 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info]     Replicating from
Tue Jun 30 10:28:12 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:12 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info]     Replicating from
Tue Jun 30 10:28:12 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:12 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Jun 30 10:28:12 2015 - [info] 
Tue Jun 30 10:28:12 2015 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Jun 30 10:28:12 2015 - [info] 
Tue Jun 30 10:28:12 2015 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Jun 30 10:28:12 2015 - [info] Executing master IP deactivatation script:
Tue Jun 30 10:28:12 2015 - [info]   /etc/mha/app1/master_ip_failover --orig_master_host= --orig_master_ip= --orig_master_port=3306 --command=stopssh --ssh_user=root  
Disabling the VIP on old master: 
Tue Jun 30 10:28:13 2015 - [info]  done.
Tue Jun 30 10:28:13 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Jun 30 10:28:13 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Jun 30 10:28:13 2015 - [info] 
Tue Jun 30 10:28:13 2015 - [info] * Phase 3: Master Recovery Phase..
Tue Jun 30 10:28:13 2015 - [info] 
Tue Jun 30 10:28:13 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Jun 30 10:28:13 2015 - [info] 
Tue Jun 30 10:28:13 2015 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:120
Tue Jun 30 10:28:13 2015 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Jun 30 10:28:13 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info]     Replicating from
Tue Jun 30 10:28:13 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:13 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info]     Replicating from
Tue Jun 30 10:28:13 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:13 2015 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:120
Tue Jun 30 10:28:13 2015 - [info] Oldest slaves:
Tue Jun 30 10:28:13 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info]     Replicating from
Tue Jun 30 10:28:13 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:13 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info]     Replicating from
Tue Jun 30 10:28:13 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:13 2015 - [info] 
Tue Jun 30 10:28:13 2015 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. XXXXXXXXXXXXXXXXXXXXXX
Tue Jun 30 10:28:13 2015 - [info] 
Tue Jun 30 10:28:13 2015 - [info] Fetching dead master's binary logs..
Tue Jun 30 10:28:13 2015 - [info] Executing command on the dead master save_binary_logs --command=save --start_file=mysql-bin.000009  --start_pos=120 --binlog_dir=/data/mysql/data/ --output_file=/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54
  Creating /var/tmp if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000009 pos 120 to mysql-bin.000009 EOF into /var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog ..
  Dumping binlog format description event, from position 0 to 120.. ok.
  Dumping effective binlog data from /data/mysql/data//mysql-bin.000009 position 120 to tail(143).. ok.
 Concat succeeded.
Tue Jun 30 10:28:14 2015 - [info] scp from root@ to local:/etc/mha/app1/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog succeeded.
Tue Jun 30 10:28:14 2015 - [info] HealthCheck: SSH to is reachable.
Tue Jun 30 10:28:15 2015 - [info] HealthCheck: SSH to is reachable.
Tue Jun 30 10:28:15 2015 - [info] 
Tue Jun 30 10:28:15 2015 - [info] * Phase 3.3: Determining New Master Phase..
Tue Jun 30 10:28:15 2015 - [info] 
Tue Jun 30 10:28:15 2015 - [info] XXXXXXXXX  Finding the latest slave that has all relay logs for recovering other slaves..
Tue Jun 30 10:28:15 2015 - [info] XXXXXXXXX  All slaves received relay logs to the same position. No need to resync each other.
Tue Jun 30 10:28:15 2015 - [info] Searching new master from slaves..
Tue Jun 30 10:28:15 2015 - [info]  Candidate masters from the configuration file:
Tue Jun 30 10:28:15 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:15 2015 - [info]     Replicating from
Tue Jun 30 10:28:15 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:15 2015 - [info]  Non-candidate masters:
Tue Jun 30 10:28:15 2015 - [info]  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:15 2015 - [info]     Replicating from
Tue Jun 30 10:28:15 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:15 2015 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue Jun 30 10:28:15 2015 - [info] New master is
Tue Jun 30 10:28:15 2015 - [info] Starting master failover..
Tue Jun 30 10:28:15 2015 - [info] 
From: (current master)

To: (new master)
Tue Jun 30 10:28:15 2015 - [info] 
Tue Jun 30 10:28:15 2015 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Jun 30 10:28:15 2015 - [info] 
Tue Jun 30 10:28:15 2015 - [info] XXXXXXX  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 30 10:28:15 2015 - [info] Sending binlog..
Tue Jun 30 10:28:15 2015 - [info] scp from local:/etc/mha/app1/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog to root@ succeeded.
Tue Jun 30 10:28:15 2015 - [info] 
Tue Jun 30 10:28:15 2015 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Jun 30 10:28:15 2015 - [info] 
Tue Jun 30 10:28:15 2015 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Jun 30 10:28:15 2015 - [info] Starting recovery on
Tue Jun 30 10:28:15 2015 - [info]  Generating diffs succeeded.
Tue Jun 30 10:28:15 2015 - [info] Waiting until all relay logs are applied.
Tue Jun 30 10:28:15 2015 - [info]  done.
Tue Jun 30 10:28:16 2015 - [info] Getting slave status..
Tue Jun 30 10:28:16 2015 - [info] This slave('s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000009:120). No need to recover from Exec_Master_Log_Pos.
Tue Jun 30 10:28:16 2015 - [info] Connecting to the target slave host, running recover script..
Tue Jun 30 10:28:16 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host= --slave_ip=  --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog --workdir=/var/tmp --target_version=5.6.20-log --timestamp=20150630102812 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54 --slave_pass=xxx
Tue Jun 30 10:28:16 2015 - [info] 
MySQL client version is 5.6.20. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog on This may take long time...
Applying log files succeeded.
Tue Jun 30 10:28:16 2015 - [info]  All relay logs were successfully applied.
Tue Jun 30 10:28:16 2015 - [info] Getting new master's binlog name and position..
Tue Jun 30 10:28:16 2015 - [info]  mysql-bin.000015:120
Tue Jun 30 10:28:16 2015 - [info] XXXXXXX  all other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000015', MASTER_LOG_POS=120, MASTER_USER='repl1', MASTER_PASSWORD='xxx';
Tue Jun 30 10:28:16 2015 - [info] Executing master IP activate script:
Tue Jun 30 10:28:16 2015 - [info]   /etc/mha/app1/master_ip_failover --command=start --ssh_user=root --orig_master_host= --orig_master_ip= --orig_master_port=3306 --new_master_host= --new_master_ip= --new_master_port=3306 --new_master_user='monitor' --new_master_password='123456'  
Enabling the VIP - on the new master - 
Tue Jun 30 10:28:16 2015 - [info]  OK.
Tue Jun 30 10:28:16 2015 - [info] Setting read_only=0 on
Tue Jun 30 10:28:16 2015 - [info]  ok.
Tue Jun 30 10:28:16 2015 - [info] ** Finished master recovery successfully.
Tue Jun 30 10:28:16 2015 - [info] * Phase 3: Master Recovery Phase completed.
Tue Jun 30 10:28:16 2015 - [info] 
Tue Jun 30 10:28:16 2015 - [info] * Phase 4: Slaves Recovery Phase..
Tue Jun 30 10:28:16 2015 - [info] 
Tue Jun 30 10:28:16 2015 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Jun 30 10:28:16 2015 - [info] 
Tue Jun 30 10:28:16 2015 - [info] -- Slave diff file generation on host started, pid: 2658. Check tmp log /etc/mha/app1/ if it takes time..
Tue Jun 30 10:28:16 2015 - [info] 
Tue Jun 30 10:28:16 2015 - [info] Log messages from ...
Tue Jun 30 10:28:16 2015 - [info] 
Tue Jun 30 10:28:16 2015 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 30 10:28:16 2015 - [info] End of log messages from
Tue Jun 30 10:28:16 2015 - [info] -- has the latest relay log events.
Tue Jun 30 10:28:16 2015 - [info] Generating relay diff files from the latest slave succeeded.
Tue Jun 30 10:28:16 2015 - [info] 
Tue Jun 30 10:28:16 2015 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Jun 30 10:28:16 2015 - [info] 
Tue Jun 30 10:28:16 2015 - [info] -- Slave recovery on host started, pid: 2660. Check tmp log /etc/mha/app1/ if it takes time..
Tue Jun 30 10:28:18 2015 - [info] 
Tue Jun 30 10:28:18 2015 - [info] Log messages from ...
Tue Jun 30 10:28:18 2015 - [info] 
Tue Jun 30 10:28:16 2015 - [info] Sending binlog..
Tue Jun 30 10:28:17 2015 - [info] scp from local:/etc/mha/app1/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog to root@ succeeded.
Tue Jun 30 10:28:17 2015 - [info] Starting recovery on
Tue Jun 30 10:28:17 2015 - [info]  Generating diffs succeeded.
Tue Jun 30 10:28:17 2015 - [info] Waiting until all relay logs are applied.
Tue Jun 30 10:28:17 2015 - [info]  done.
Tue Jun 30 10:28:17 2015 - [info] Getting slave status..
Tue Jun 30 10:28:17 2015 - [info] This slave('s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000009:120). No need to recover from Exec_Master_Log_Pos.
Tue Jun 30 10:28:17 2015 - [info] Connecting to the target slave host, running recover script..
Tue Jun 30 10:28:17 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host= --slave_ip=  --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog --workdir=/var/tmp --target_version=5.6.20-log --timestamp=20150630102812 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54 --slave_pass=xxx
Tue Jun 30 10:28:17 2015 - [info] 
MySQL client version is 5.6.20. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog on This may take long time...
Applying log files succeeded.
Tue Jun 30 10:28:17 2015 - [info]  All relay logs were successfully applied.
Tue Jun 30 10:28:17 2015 - [info]  Resetting slave and starting replication from the new master
Tue Jun 30 10:28:18 2015 - [info]  Executed CHANGE MASTER.
Tue Jun 30 10:28:18 2015 - [info]  Slave started.
Tue Jun 30 10:28:18 2015 - [info] End of log messages from
Tue Jun 30 10:28:18 2015 - [info] -- Slave recovery on host succeeded.
Tue Jun 30 10:28:18 2015 - [info] All new slave servers recovered successfully.
Tue Jun 30 10:28:18 2015 - [info] 
Tue Jun 30 10:28:18 2015 - [info] * Phase 5: New master cleanup phase..
Tue Jun 30 10:28:18 2015 - [info] 
Tue Jun 30 10:28:18 2015 - [info] Resetting slave info on the new master..
Tue Jun 30 10:28:18 2015 - [info] Resetting slave info succeeded.
Tue Jun 30 10:28:18 2015 - [info] Master failover to completed successfully.
Tue Jun 30 10:28:18 2015 - [info] 

----- Failover Report -----

app1: MySQL Master failover to succeeded

Master is down!

Check MHA Manager logs at yaolansvr_slave:/var/log/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on
The latest slave has all relay logs for recovery.
Selected as a new master. OK: Applying all logs succeeded. OK: Activated master IP address. This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded. OK: Applying all logs succeeded. Slave started, replicating from Resetting slave info succeeded.
Master failover to completed successfully.

# masterha_check_status --conf=/etc/mha/app1/app1.cnf
app1 is stopped(2:NOT_RUNNING).

mysql> select @@read_only;
| @@read_only |
|           0 |

首先cat /var/log/manager.log|grep -i "All other slaves should start"确定change master命令。把宕掉的数据库给启动。登陆进去后,slave status为空,使用change master命令设置应用的主节点,启动slave进程
然后设置read_only=1。最后检查复制环境,必须启动mha manager的监控(ps aux|grep perl)并查看状态,删除app1.failover.complete,并把# mysql -e "set global relay_log_purge=0"关闭mysql后,提升为主的过程中报错:
Tue Jun 30 11:50:37 2015 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln297] Last failover was done at 2015/06/30 10:05:18. Current time is too early to do failover again. If you want to do failover, manually remove /etc/mha/app1/app1.failover.complete and run this script again.
Tue Jun 30 11:50:37 2015 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR:  at /usr/local/bin/masterha_manager line 65

(1)一旦重新启动slave,记得须要将mysql -e "set global read_only=1"

