时间:2021-07-01 10:21:17 帮助过:11人阅读
tar xf keepalived-1.2.12.tar.gz cd keepalived-1.2.12 ./configure --prefix=/usr/local/keepalived make && make install cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/ cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/ mkdir /etc/keepalived cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/ cp /usr/local/keepalived/sbin/keepalived /usr/sbin/
(2)配置keepalived的配置文件,在master上配置(192.168.0.101)
[root@master ~]# cat /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { saltstack@163.com } notification_email_from dba@dbserver.com smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MySQL-HA } vrrp_instance VI_1 { state BACKUP interface eth1 virtual_router_id 51 priority 150 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.0.88 } } [root@192.168.0.50 ~]#
其中router_id MySQL HA表示设定keepalived组的名称,将192.168.0.88这个虚拟ip绑定到该主机的eth1网卡上,并且设置了状态为backup模式,将keepalived的模式设置为非抢占模式(nopreempt),priority 150表示设置的优先级为150。下面的配置略有不同,但是都是一个意思。
在候选master上配置(192.168.0.60)
[root@192.168.0.60 ~]# cat /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { saltstack@163.com } notification_email_from dba@dbserver.com smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MySQL-HA } vrrp_instance VI_1 { state BACKUP interface eth1 virtual_router_id 51 priority 120 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.0.88 } } [root@192.168.0.60 ~]#
(3)启动keepalived服务,在master上启动并查看日志
[root@192.168.0.50 ~]# /etc/init.d/keepalived start Starting keepalived: [ OK ] [root@192.168.0.50 ~]# tail -f /var/log/messages Apr 20 20:22:16 192 Keepalived_healthcheckers[15334]: Opening file ‘/etc/keepalived/keepalived.conf‘. Apr 20 20:22:16 192 Keepalived_healthcheckers[15334]: Configuration is using : 7231 Bytes Apr 20 20:22:16 192 kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes) Apr 20 20:22:16 192 kernel: IPVS: ipvs loaded. Apr 20 20:22:16 192 Keepalived_healthcheckers[15334]: Using LinkWatch kernel netlink reflector... Apr 20 20:22:19 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Transition to MASTER STATE Apr 20 20:22:20 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Entering MASTER STATE Apr 20 20:22:20 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) setting protocol VIPs. Apr 20 20:22:20 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.0.88 Apr 20 20:22:20 192 Keepalived_healthcheckers[15334]: Netlink reflector reports IP 192.168.0.88 added Apr 20 20:22:25 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.0.88
发现已经将虚拟ip 192.168.0.88绑定了网卡eth1上。
(4)查看绑定情况
[root@192.168.0.50 ~]# ip addr | grep eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.0.50/24 brd 192.168.0.255 scope global eth1 inet 192.168.0.88/32 scope global eth1 [root@192.168.0.50 ~]#
在另外一台服务器,候选master上启动keepalived服务,并观察
[root@192.168.0.60 ~]# /etc/init.d/keepalived start ; tail -f /var/log/messages Starting keepalived: [ OK ] Apr 20 20:26:18 192 Keepalived_vrrp[9472]: Registering gratuitous ARP shared channel Apr 20 20:26:18 192 Keepalived_vrrp[9472]: Opening file ‘/etc/keepalived/keepalived.conf‘. Apr 20 20:26:18 192 Keepalived_vrrp[9472]: Configuration is using : 62976 Bytes Apr 20 20:26:18 192 Keepalived_vrrp[9472]: Using LinkWatch kernel netlink reflector... Apr 20 20:26:18 192 Keepalived_vrrp[9472]: VRRP_Instance(VI_1) Entering BACKUP STATE Apr 20 20:26:18 192 Keepalived_vrrp[9472]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)] Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP 192.168.80.138 added Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP 192.168.0.60 added Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP fe80::20c:29ff:fe9d:6a9e added Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP fe80::20c:29ff:fe9d:6aa8 added Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Registering Kernel netlink reflector Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Registering Kernel netlink command channel Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Opening file ‘/etc/keepalived/keepalived.conf‘. Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Configuration is using : 7231 Bytes Apr 20 20:26:18 192 kernel: IPVS: Registered protocols (TCP, UDP, AH, ESP) Apr 20 20:26:18 192 kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes) Apr 20 20:26:18 192 kernel: IPVS: ipvs loaded. Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Using LinkWatch kernel netlink reflector...
从上面的信息可以看到keepalived已经配置成功。
注意:
上面两台服务器的keepalived都设置为了BACKUP模式,在keepalived中2种模式,分别是master->backup模式和backup->backup模式。这两种模式有很大区别。在master->backup模式下,一旦主库宕机,虚拟ip会自动漂移到从库,当主库修复后,keepalived启动后,还会把虚拟ip抢占过来,即使设置了非抢占模式(nopreempt)抢占ip的动作也会发生。在backup->backup模式下,当主库宕机后虚拟ip会自动漂移到从库上,当原主库恢复和keepalived服务启动后,并不会抢占新主的虚拟ip,即使是优先级高于从库的优先级别,也不会发生抢占。为了减少ip漂移次数,通常是把修复好的主库当做新的备库。
(5)MHA引入keepalived(MySQL服务进程挂掉时通过MHA 停止keepalived):
要想把keepalived服务引入MHA,我们只需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。
编辑脚本/usr/local/bin/master_ip_failover,修改后如下,我对perl不熟悉,所以我这里完整贴出该脚本(主库上操作,192.168.0.50)。
在MHA Manager修改脚本修改后的内容如下(参考资料比较少):
#!/usr/bin/env perl use strict; use warnings FATAL => ‘all‘; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = ‘192.168.0.88‘; my $ssh_start_vip = "/etc/init.d/keepalived start"; my $ssh_stop_vip = "/etc/init.d/keepalived stop"; GetOptions( ‘command=s‘ => \$command, ‘ssh_user=s‘ => \$ssh_user, ‘orig_master_host=s‘ => \$orig_master_host, ‘orig_master_ip=s‘ => \$orig_master_ip, ‘orig_master_port=i‘ => \$orig_master_port, ‘new_master_host=s‘ => \$new_master_host, ‘new_master_ip=s‘ => \$new_master_ip, ‘new_master_port=i‘ => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; #`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() {
return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
现在已经修改这个脚本了,我们现在打开在上面提到过的参数,再检查集群状态,看是否会报错。
[root@192.168.0.20 ~]# grep ‘master_ip_failover_script‘ /etc/masterha/app1.cnf master_ip_failover_script= /usr/local/bin/master_ip_failover [root@192.168.0.20 ~]#
[root@192.168.0.20 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf Sun Apr 20 23:10:01 2014 - [info] Slaves settings check done. Sun Apr 20 23:10:01 2014 - [info] 192.168.0.50 (current master) +--192.168.0.60 +--192.168.0.70 Sun Apr 20 23:10:01 2014 - [info] Checking replication health on 192.168.0.60.. Sun Apr 20 23:10:01 2014 - [info] ok. Sun Apr 20 23:10:01 2014 - [info] Checking replication health on 192.168.0.70.. Sun Apr 20 23:10:01 2014 - [info] ok. Sun Apr 20 23:10:01 2014 - [info] Checking master_ip_failover_script status: Sun Apr 20 23:10:01 2014 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.0.50 --orig_master_ip=192.168.0.50 --orig_master_port=3306 Sun Apr 20 23:10:01 2014 - [info] OK. Sun Apr 20 23:10:01 2014 - [warning] shutdown_script is not defined. Sun Apr 20 23:10:01 2014 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
可以看见已经没有报错了。哈哈
/usr/local/bin/master_ip_failover添加或者修改的内容意思是当主库数据库发生故障时,会触发MHA切换,MHA Manager会停掉主库上的keepalived服务,触发虚拟ip漂移到备选从库,从而完成切换。当然可以在keepalived里面引入脚本,这个脚本监控mysql是否正常运行,如果不正常,则调用该脚本杀掉keepalived进程。
常用命令
检查整个复制环境状况 masterha_check_repl --conf=/data/mha/3306/mha.cnf 检查SSH配置 masterha_check_ssh --conf=/data/mha/3306/mha.cnf 检测当前MHA运行状态 masterha_check_status --conf=/data/mha/3306/mha.cnf 启动MHA vi manager.sh nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/3306/manager.log 2>&1 & 停止MHA masterha_stop --conf=/data/mha/3306/mha.cnf
以上就是MHA 安装配置的全过程,以下进行简单的测试。
(1)failover 测试
手动kill 了master 上面的mysqld 进程,查看切换状态
日志待补全
以上是切换的全日志过程,我们可以看到MHA 切换主要经历以下步骤:
1.配置文件检查阶段,这个阶段会检查整个集群配置文件配置
2.宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作(这个我这里还没有实现,需要研究)
3.复制dead maste和最新slave相差的relay log,并保存到MHA Manger具体的目录下
4.识别含有最新更新的slave
5.应用从master保存的二进制日志事件(binlog events)
6.提升一个slave为新的master进行复制
7.使其他的slave连接新的master进行复制
(2)手动switch
在许多情况下, 需要将现有的主服务器迁移到另外一台服务器上。 比如主服务器硬件故障,RAID 控制卡需要重建,将主服务器移到性能更好的服务器上等等。维护主服务器引起性能下降, 导致停机时间至少无法写入数据。 另外, 阻塞或杀掉当前运行的会话会导致主主之间数据不一致的问题发生。 MHA 提供快速切换和优雅的阻塞写入,这个切换过程只需要 0.5-2s 的时间,这段时间内数据是无法写入的。在很多情况下,0.5-2s 的阻塞写入是可以接受的。因此切换主服务器不需要计划分配维护时间窗口。
MHA在线切换的大概过程:
1.检测复制设置和确定当前主服务器
2.确定新的主服务器
3.阻塞写入到当前主服务器
4.等待所有从服务器赶上复制
5.授予写入到新的主服务器
6.重新设置从服务器
注意,在线切换的时候应用架构需要考虑以下两个问题:
1.自动识别master和slave的问题(master的机器可能会切换),如果采用了vip的方式,基本可以解决这个问题。
2.负载均衡的问题(可以定义大概的读写比例,每台机器可承担的负载比例,当有机器离开集群时,需要考虑这个问题)
为了保证数据完全一致性,在最快的时间内完成切换,MHA的在线切换必须满足以下条件才会切换成功,否则会切换失败。
1.所有slave的IO线程都在运行
2.所有slave的SQL线程都在运行
3.所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒。
4.在master端,通过show processlist输出,没有一个更新花费的时间大于running_updates_limit秒。
执行切换
[root@manager tmp]# masterha_master_switch --conf=/data/mha/3306/mha.cnf --master_state=alive --new_master_host=192.168.0.102 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
其中参数的意思:
--orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动
--running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定
参考来源:
http://www.cnblogs.com/gomysql/p/3675429.html
http://www.178linux.com/61111
https://www.cnblogs.com/rayment/p/7355093.html
MySQL 之 MHA + ProxySQL + keepalived 实现读写分离,高可用(一)
标签:hard link bubuko ice 现在 pass 存放位置 while pos sage