时间:2021-07-01 10:21:17 帮助过:12人阅读
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
基本工作流程大致如下:
(1) Manager定期监控Master,监控时间间隔由参数ping_interval决定,缺省为3秒钟一次;可利用其自身的监控功能,也可调用第三方软件来监控;MHA自身提供了两种监控方式:SELECT(执行SELECT 1)和CONNECT(创建连接/断开连接),
主要由ping_type参数决定,默认是select方式。
(2) 当监测到Master故障时,调用SSH脚本对所有Node执行一次检查,包括如下几个方面:
――MySQL实例是否可以连接;
――Master服务器是否可以SSH连通;
――检查SQL Thread的状态;
――检查哪些Server死掉了,哪些Server是活动的,以及活动的Slave实例;
――检查Slave实例的配置及复制过滤规则;
――最后退出监控脚本并返回代表特殊意义代码。
(3) 开始Master故障切换,包括如下几个子阶段:
――Phase 1: Configuration Check Phase
在这个阶段,若某个Slave实例的SQL Thread停止了,则会自动启动它;并再次确认活动的Servers及Slaves。
――Phase 2: Dead Master Shutdown Phase
在这个阶段,首先调用master_ip_failover_script,若HA是基于VIP实现的,则关闭VIP,若是基于目录数据库实现的,则修改映射记录。然后调用shutdown_script脚本强制关闭主机,以避免服务重启时,发生脑裂。
――Phase 3: Master Recovery Phase
又包括如下3个子阶段:
Phase 3.1: Getting Latest Slaves Phase
检查各个Slave,获取最近的和最旧的binary log file和position,并检查各个Slave成为Master的优先级,依赖于candidate_master、no_master、 [server_xxx]顺序、binary log差异量等因素。
Phase 3.2: Saving Dead Master‘s Binlog Phase
若dead master所在服务器依然可以通过SSH连通,则提取dead master的binary log,提取日志的起点就是上一步获取的最新的binary log file和position,直到最后一条事件日志,并在dead master本地的工作目录(由参数remote_workdir决定)中
创建文件保存这些提取到的日志,然后将该文件拷贝到Manager服务器的工作 目录下(由参数manager_workdir决定)。若dead master系统就无法连接,也就不存在差异的binary log了。MHA还要对各个Slave节点进行健康检查,主要是SSH连通性。
Phase 3.3: Determining New Master Phase
接下来调用apply_diff_relay_logs命令恢复Slave的差异日志,这个差异日志指的是各个Slave之间的relay log。恢复完成后,所有的Slave数据是一致的,此时就可以根据优先级选择New Master了。
Phase 3.4: New Master Diff Log Generation Phase
这里是生成dead master和new master之间的差异日志,即将Phase 3.2保存的binary log拷贝到New Master的工作目录中(remote_workdir)。
Phase 3.5: Master Log Apply Phase
将上一步拷贝的差异日志恢复到New Master上,若发生错误,也可手动恢复。然后获取New Master的binlog name和position,以便其它Slave从这个新的binlog name和position开始复制。最后会开启New Master的写权限,即将read_only参数设置为0。
――Phase 4: Slaves Recovery Phase
Phase 4.1: Starting Parallel Slave Diff Log Generation Phase
生成Slave与New Slave之间的差异日志,并将该日志拷贝到各Slave的工作目录下,这部分日志dead master和new master之间差异的那部分日志,因为各个Slave在Phase 3.3阶段已经同步了。
Phase 4.2: Starting Parallel Slave Log Apply Phase
在各个Slave上应用这部分差异日志,然后通过CHANGE MASTER TO命令将这些Slave指向新的New Master,最后开始复制(start slave)。
――Phase 5: New master cleanup phase
清理New Master其实就是重置slave info,即取消原来的Slave信息。至此整个Master故障切换过程完成。
######################################################################################################################################
二、【实验部分】
1、【环境说明】:默认三台机器上都已安装mysql5.6,且主从复制已经配置完成。
主库:192.168.245.129(读写) 从1: 192.168.245.131(指定的接管主库)(只读) 从2: 192.168.245.132(只读) vip: 192.168.245.100
这里需要设置两个从库为只读库,不建议将它写入配置文件,因为有个从库随时会切换为主库。如下:
set global read_only=1
2、配置三台机器之间的信任机制(省)
目的:机器之间能够无需输入密码进行访问
3、安装mha软件
#安装可能需要的依赖包
[root@node1 software]# yum install perl-DBD-MySQL [root@node1 software]# yum install perl-Config-Tiny [root@node1 software]# yum install perl-Parallel-ForkManager*.rpm [root@node1 software]# yum install perl-Mail-Sender*.rpm [root@node1 software]# yum install perl-Mail-Sendmail*.rpm [root@node1 software]# yum install perl-Log-Dispatch*.rpm
#安装mha,这里用rpm包安装,默认在/usr/bin [root@node1 software]# yum install mha4mysql-node-0.56-0.el6.noarch.rpm [root@node1 software]# yum install mha4mysql-manager-0.56-0.el6.noarch.rpm
4、配置主库服务器的vip并测试
这里通过脚本手动创建vip,如下:
[root@node1 scripts]# cat init_vip.sh vip="192.168.1.100/32" /sbin/ip addr add $vip dev eth0
【测试】到任意从库ping 192.168.245.100 --看是否连上vip
mysql -h 192.168.223.100 -udarren -pdarren --是否连上vip数据库
如果都能够连接上,表示vip设置成功了。
5、配置mha及启动
(1)创建mha监控用户(在主库执行,这样每个服务器都有这个用户了)
mysql> grant all privileges on *.* to ‘root‘@‘%‘ identified by ‘123456‘; Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.01 sec)
(2)修改mha配置文件
purge_relay_logs的主要功能:
a、为relay日志创建硬链接(最小化批量删除大文件导致的性能问题)
b、SET GLOBAL relay_log_purge=1; FLUSH LOGS; SET GLOBAL relay_log_purge=0;
c、删除relay log(rm –f /path/to/archive_dir/*)
purge_relay_logs的用法及相关参数 1 purge_relay_logs --help Usage: purge_relay_logs --user=root --password=rootpass --host=127.0.0.1 2 参数描述 --user 用户名,缺省为root --password 密码 --port 端口号 --host 主机名,缺省为127.0.0.1 --workdir 指定创建relay log的硬链接的位置,默认是/var/tmp,成功执行脚本后,硬链接的中继日志文件被删除,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,建议指定为relay log相同的分区 --disable_relay_log_purge 默认情况下,如果参数relay_log_purge=1,脚本不做任何处理,自动退出.设定该参数,脚本会将relay_log_purge设置为0,当清理relay log之后,最后将参数设置为OFF(0)
3 定制清理relay log cronjob pureg_relay_logs脚本在不阻塞SQL线程的情况下自动清理relay log。对于不断产生的relay log直接将该脚本部署到crontab以实现按天或按小时定期清理。 $ crontab -l # purge relay logs at 5am 0 5 * * * app /usr/bin/purge_relay_logs --user=root --password=PASSWORD --disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1
(product)root@127.0.0.1 [(none)]> set global relay_log_purge=0; Query OK, 0 rows affected (0.00 sec)
#清除脚本 #!/bin/bash user=root passwd=root port=3306 log_dir=‘/data/masterha/log‘ work_dir=‘/data‘ purge=‘/usr/bin/purge_relay_logs‘ if [ ! -d $log_dir ] then mkdir $log_dir -p fi $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1 #定时任务 crontab -e #每天早上5点10分执行 10 5 * * * sh /data/scripts/purge_relay_log.sh
到manager节点的/etc下面新建masterha目录,并将mha需要的配置初始化文件拷贝到该目录下:
[root@node3 ~]# cd /etc
[root@node3 etc]# mkdir masterha
#创建以下mha日志目录,没有则报错
[root@node3 etc]#mkdir -p /var/log/masterha/app1
[root@node3 mastermha]# ll
total 32
-rw-r--r--. 1 root root 503 Nov 9 01:26 app1.conf
-rwxr-xr-x. 1 root root 55 Nov 9 01:26 drop_vip.sh
-rwxr-xr-x. 1 root root 55 Nov 9 01:26 init_vip.sh
-rw-r--r--. 1 root root 357 Nov 9 01:26 masterha_default.conf
-rwxr-xr-x. 1 root root 3888 Nov 9 01:26 master_ip_failover
-rwxr-xr-x. 1 root root 10298 Nov 9 01:26 master_ip_online_change
然后修改vip的值:在masterha目录下执行grep "vip" *,将会列出所有文件中vip变量,然后一一修改为192.168.245.100。
修改app1.conf文件:
#mha manager工作目录 manager_workdir = /var/log/masterha/app1 manager_log = /var/log/masterha/app1/app1.log remote_workdir = /var/log/masterha/app1 user=root password=root ssh_user=root repl_user=repl repl_password=repl4slave ping_interval=1 shutdown_script="" master_ip_online_change_script="" report_script="" [server1] hostname=192.168.245.129 master_binlog_dir = /data/mysql/mysql_3306/logs candidate_master=1 check_repl_delay=0 [server2] hostname=192.168.245.131 master_binlog_dir=/data/mysql/mysql_3306/logs candidate_master=1 check_repl_delay=0 [server3] hostname=192.168.245.132 port=3306
#检查MHA Manger到所有MHA Node的SSH连接状态: [root@node3 masterha]# /usr/bin/masterha_check_ssh --conf=/etc/masterha/app1.conf Mon Nov 16 01:24:21 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Nov 16 01:24:21 2015 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Mon Nov 16 01:24:21 2015 - [info] Reading server configuration from /etc/masterha/app1.conf.. Mon Nov 16 01:24:21 2015 - [info] Starting SSH connection tests.. Mon Nov 16 01:24:24 2015 - [debug] Mon Nov 16 01:24:21 2015 - [debug] Connecting via SSH from root@192.168.245.129(192.168.245.129:22) to root@192.168.245.131(192.168.245.131:22).. Mon Nov 16 01:24:23 2015 - [debug] ok. Mon Nov 16 01:24:23 2015 - [debug] Connecting via SSH from root@192.168.245.129(192.168.245.129:22) to root@192.168.245.132(192.168.245.132:22).. Mon Nov 16 01:24:24 2015 - [debug] ok. Mon Nov 16 01:24:25 2015 - [debug] Mon Nov 16 01:24:22 2015 - [debug] Connecting via SSH from root@192.168.245.131(192.168.245.131:22) to root@192.168.245.129(192.168.245.129:22).. Mon Nov 16 01:24:23 2015 - [debug] ok. Mon Nov 16 01:24:23 2015 - [debug] Connecting via SSH from root@192.168.245.131(192.168.245.131:22) to root@192.168.245.132(192.168.245.132:22).. Mon Nov 16 01:24:25 2015 - [debug] ok. Mon Nov 16 01:24:25 2015 - [debug] Mon Nov 16 01:24:22 2015 - [debug] Connecting via SSH from root@192.168.245.132(192.168.245.132:22) to root@192.168.245.129(192.168.245.129:22).. Mon Nov 16 01:24:24 2015 - [debug] ok. Mon Nov 16 01:24:24 2015 - [debug] Connecting via SSH from root@192.168.245.132(192.168.245.132:22) to root@192.168.245.131(192.168.245.131:22).. Mon Nov 16 01:24:25 2015 - [debug] ok. Mon Nov 16 01:24:25 2015 - [info] All SSH connection tests passed successfully.
#检查主从复制环境 [root@node3 masterha]# /usr/bin/masterha_check_repl --conf=/etc/masterha/app1.conf Mon Nov 16 01:37:08 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Nov 16 01:37:08 2015 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Mon Nov 16 01:37:08 2015 - [info] Reading server configuration from /etc/masterha/app1.conf.. Mon Nov 16 01:37:08 2015 - [info] MHA::MasterMonitor version 0.56. Mon Nov 16 01:37:09 2015 - [info] GTID failover mode = 0 Mon Nov 16 01:37:09 2015 - [info] Dead Servers: Mon Nov 16 01:37:09 2015 - [info] Alive Servers: Mon Nov 16 01:37:09 2015 - [info] 192.168.245.129(192.168.245.129:3306) Mon Nov 16 01:37:09 2015 - [info] 192.168.245.131(192.168.245.131:3306) Mon Nov 16 01:37:09 2015 - [info] 192.168.245.132(192.168.245.132:3306) Mon Nov 16 01:37:09 2015 - [info] Alive Slaves: Mon Nov 16 01:37:09 2015 - [info] 192.168.245.131(192.168.245.131:3306) Version=5.6.23-log (oldest major version between slaves) log-bin:enabled Mon Nov 16 01:37:09 2015 - [info] Replicating from 192.168.245.129(192.168.245.129:3306) Mon Nov 16 01:37:09 2015 - [info] Primary candidate for the new Master (candidate_master is set) Mon Nov 16 01:37:09 2015 - [info] 192.168.245.132(192.168.245.132:3306) Version=5.6.21-log (oldest major version between slaves) log-bin:enabled Mon Nov 16 01:37:09 2015 - [info] Replicating from 192.168.245.129(192.168.245.129:3306) Mon Nov 16 01:37:09 2015 - [info] Current Alive Master: 192.168.245.129(192.168.245.129:3306) Mon Nov 16 01:37:09 2015 - [info] Checking slave configurations.. Mon Nov 16 01:37:09 2015 - [info] read_only=1 is not set on slave 192.168.245.131(192.168.245.131:3306). Mon Nov 16 01:37:09 2015 - [info] read_only=1 is not set on slave 192.168.245.132(192.168.245.132:3306). Mon Nov 16 01:37:09 2015 - [info] Checking replication filtering settings.. Mon Nov 16 01:37:09 2015 - [info] binlog_do_db= , binlog_ignore_db= Mon Nov 16 01:37:09 2015 - [info] Replication filtering check ok. Mon Nov 16 01:37:09 2015 - [info] GTID (with auto-pos) is not supported Mon Nov 16 01:37:09 2015 - [info] Starting SSH connection tests.. Mon Nov 16 01:37:12 2015 - [info] All SSH connection tests passed successfully. Mon Nov 16 01:37:12 2015 - [info] Checking MHA Node version.. Mon Nov 16 01:37:13 2015 - [info] Version check ok. Mon Nov 16 01:37:13 2015 - [info] Checking SSH publickey authentication settings on the current master.. Mon Nov 16 01:37:13 2015 - [info] HealthCheck: SSH to 192.168.245.129 is reachable. Mon Nov 16 01:37:14 2015 - [info] Master MHA Node version is 0.56. Mon Nov 16 01:37:14 2015 - [info] Checking recovery script configurations on 192.168.245.129(192.168.245.129:3306).. Mon Nov 16 01:37:14 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/mysql_3306/logs --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000023 Mon Nov 16 01:37:14 2015 - [info] Connecting to root@192.168.245.129(192.168.245.129:22).. Creating /var/log/masterha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mysql/mysql_3306/logs, up to mysql-bin.000023 Mon Nov 16 01:37:15 2015 - [info] Binlog setting check done. Mon Nov 16 01:37:15 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Mon Nov 16 01:37:15 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=‘root‘ --slave_host=192.168.245.131 --slave_ip=192.168.245.131 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.6.23-log --manager_version=0.56 --relay_log_info=/data/mysql/mysql_3306/data/relay-log.info --relay_dir=/data/mysql/mysql_3306/data/ --slave_pass=xxx Mon Nov 16 01:37:15 2015 - [info] Connecting to root@192.168.245.131(192.168.245.131:22).. Checking slave recovery environment settings.. Opening /data/mysql/mysql_3306/data/relay-log.info ... ok. Relay log found at /data/mysql/mysql_3306/data, up to relay-bin.000009 Temporary relay log file is /data/mysql/mysql_3306/data/relay-bin.000009 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Mon Nov 16 01:37:15 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=‘root‘ --slave_host=192.168.245.132 --slave_ip=192.168.245.132 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.6.21-log --manager_version=0.56 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx Mon Nov 16 01:37:15 2015 - [info] Connecting to root@192.168.245.132(192.168.245.132:22).. Checking slave recovery environment settings.. Opening /data/mysql/data/relay-log.info ... ok. Relay log found at /data/mysql/data, up to node3-relay-bin.000007 Temporary relay log file is /data/mysql/data/node3-relay-bin.000007 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Mon Nov 16 01:37:16 2015 - [info] Slaves settings check done. Mon Nov 16 01:37:16 2015 - [info] 192.168.245.129(192.168.245.129:3306) (current master) +--192.168.245.131(192.168.245.131:3306) +--192.168.245.132(192.168.245.132:3306) Mon Nov 16 01:37:16 2015 - [info] Checking replication health on 192.168.245.131.. Mon Nov 16 01:37:16 2015 - [info] ok. Mon Nov 16 01:37:16 2015 - [info] Checking replication health on 192.168.245.132.. Mon Nov 16 01:37:16 2015 - [info] ok. Mon Nov 16 01:37:16 2015 - [warning] master_ip_failover_script is not defined. Mon Nov 16 01:37:16 2015 - [warning] shutdown_script is not defined. Mon Nov 16 01:37:16 2015 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.View Code
如果遇到这个报错:
Can‘t exec "mysqlbinlog": No such file or directory at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106. mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options at /usr/bin/apply_diff_relay_logs line 493 Mon Nov 16 01:32:36 2015 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln205] Slaves settings check failed! Mon Nov 16 01:32:36 2015 - [e