时间:2021-07-01 10:21:17 帮助过:6人阅读
consul server:192.168.0.10
consul client:192.168.0.20,192.168.0.30,192.168.0.40
consul的安装非常容易,从https://www.consul.io/downloads.html这里下载以后,解压即可使用,就是一个二进制文件,其他的都没有了。我这里使用的是0.92版本。文件下载以后解压放到/usr/local/bin。就可以使用了。不依赖任何东西。上面的4台服务器都安装。
4台机器都创建目录,分别是放配置文件,以及存放数据的。以及存放redis,mysql的健康检查脚本
mkdir /etc/consul.d/ -p && mkdir /data/consul/ -p
mkidr /data/consul/shell -p
然后把相关配置参数写入配置文件,其实也可以不用写,直接跟在命令后面就行,那样不方便管理。
consul server(192.168.0.10)配置文件(具体参数的意思请查询官网或者文章给的参考链接):
[root@db-server-yayun-01 ~]# cat /etc/consul.d/server.json { "data_dir": "/data/consul", "datacenter": "dc1", "log_level": "INFO", "server": true, "bootstrap_expect": 1, "bind_addr": "192.168.0.10", "client_addr": "192.168.0.10", "ui":true } [root@db-server-yayun-01 ~]#
consul client(192.168.0.20,192.168.0.30,192.168.0.40)
[root@db-server-yayun-02 ~]# cat /etc/consul.d/client.json { "data_dir": "/data/consul", "enable_script_checks": true, "bind_addr": "192.168.0.20", "retry_join": ["192.168.0.10"], "retry_interval": "30s", "rejoin_after_leave": true, "start_join": ["192.168.0.10"] } [root@db-server-yayun-02 ~]#
3台服务器的配置文件差异不大,唯一有区别的就是bind_addr地方,自行修改为你自己服务器的ip。我测试环境是虚拟机,有多快网卡,所以必须指定,否则可以绑定0.0.0.0。
下面我们先启动consul server:
nohup consul agent -config-dir=/etc/consul.d > /data/consul/consul.log &
查看日志:
[root@db-server-yayun-01 consul]# cat consul.log ==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode. ==> WARNING: Bootstrap mode enabled! Do not enable unless necessary ==> Starting Consul agent... ==> Consul agent running! Version: ‘v0.9.2‘ Node ID: ‘5e612623-ec5b-386c-19be-d38876a9a46f‘ Node name: ‘db-server-yayun-01‘ Datacenter: ‘dc1‘ Server: true (bootstrap: true) Client Addr: 192.168.0.10 (HTTP: 8500, HTTPS: -1, DNS: 8600) Cluster Addr: 192.168.0.10 (LAN: 8301, WAN: 8302) Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false ==> Log data will now stream in as it occurs: 2017/12/09 09:49:53 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:192.168.0.10:8300 Address:192.168.0.10:8300}] 2017/12/09 09:49:53 [INFO] raft: Node at 192.168.0.10:8300 [Follower] entering Follower state (Leader: "") 2017/12/09 09:49:53 [INFO] serf: EventMemberJoin: db-server-yayun-01.dc1 192.168.0.10 2017/12/09 09:49:53 [INFO] serf: EventMemberJoin: db-server-yayun-01 192.168.0.10 2017/12/09 09:49:53 [INFO] agent: Started DNS server 192.168.0.10:8600 (udp) 2017/12/09 09:49:53 [INFO] consul: Adding LAN server db-server-yayun-01 (Addr: tcp/192.168.0.10:8300) (DC: dc1) 2017/12/09 09:49:53 [INFO] consul: Handled member-join event for server "db-server-yayun-01.dc1" in area "wan" 2017/12/09 09:49:53 [INFO] agent: Started DNS server 192.168.0.10:8600 (tcp) 2017/12/09 09:49:53 [INFO] agent: Started HTTP server on 192.168.0.10:8500 2017/12/09 09:50:00 [ERR] agent: failed to sync remote state: No cluster leader 2017/12/09 09:50:00 [WARN] raft: Heartbeat timeout from "" reached, starting election 2017/12/09 09:50:00 [INFO] raft: Node at 192.168.0.10:8300 [Candidate] entering Candidate state in term 2 2017/12/09 09:50:00 [INFO] raft: Election won. Tally: 1 2017/12/09 09:50:00 [INFO] raft: Node at 192.168.0.10:8300 [Leader] entering Leader state 2017/12/09 09:50:00 [INFO] consul: cluster leadership acquired 2017/12/09 09:50:00 [INFO] consul: New leader elected: db-server-yayun-01 2017/12/09 09:50:00 [INFO] consul: member ‘db-server-yayun-01‘ joined, marking health alive 2017/12/09 09:50:03 [INFO] agent: Synced node infoView Code
可以从日志中看到(HTTP: 8500, HTTPS: -1, DNS: 8600),http端口默认8500,在reload以及web ui会用到,dns端口是8600,在使用dns解析的时候会用到。还可以看到这台机器就是leader,consul: New leader elected: db-server-yayun-01。因为只有一台机器。所以生产环境一定要3个或者5个server。
下面启动3台client,3台client启动命令是一样的。然后查看其中一台client的日志:
nohup consul agent -config-dir=/etc/consul.d > /data/consul/consul.log &
[root@db-server-yayun-02 consul]# cat /data/consul/consul.log ==> Starting Consul agent... ==> Joining cluster... Join completed. Synced with 1 initial agents ==> Consul agent running! Version: ‘v0.9.2‘ Node ID: ‘0ec901ab-6c66-2461-95e6-50a77a28ed72‘ Node name: ‘db-server-yayun-02‘ Datacenter: ‘dc1‘ Server: false (bootstrap: false) Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600) Cluster Addr: 192.168.0.20 (LAN: 8301, WAN: 8302) Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false ==> Log data will now stream in as it occurs: 2017/12/09 10:06:10 [INFO] serf: EventMemberJoin: db-server-yayun-02 192.168.0.20 2017/12/09 10:06:10 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp) 2017/12/09 10:06:10 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp) 2017/12/09 10:06:10 [INFO] agent: Started HTTP server on 127.0.0.1:8500 2017/12/09 10:06:10 [INFO] agent: (LAN) joining: [192.168.0.10] 2017/12/09 10:06:10 [INFO] agent: Retry join is supported for: aws azure gce softlayer 2017/12/09 10:06:10 [INFO] agent: Joining cluster... 2017/12/09 10:06:10 [INFO] agent: (LAN) joining: [192.168.0.10] 2017/12/09 10:06:10 [INFO] serf: EventMemberJoin: db-server-yayun-01 192.168.0.10 2017/12/09 10:06:10 [INFO] agent: (LAN) joined: 1 Err: <nil> 2017/12/09 10:06:10 [INFO] consul: adding server db-server-yayun-01 (Addr: tcp/192.168.0.10:8300) (DC: dc1) 2017/12/09 10:06:10 [INFO] agent: (LAN) joined: 1 Err: <nil> 2017/12/09 10:06:10 [INFO] agent: Join completed. Synced with 1 initial agents 2017/12/09 10:06:10 [INFO] agent: Synced node infoView Code
可以看到提示agent: Join completed. Synced with 1 initial agents,以及Server: false (bootstrap: false)。这也是client和server的区别。
我们继续执行命令看一下集群:
[root@db-server-yayun-02 ~]# consul members Node Address Status Type Build Protocol DC db-server-yayun-01 192.168.0.10:8301 alive server 0.9.2 2 dc1 db-server-yayun-02 192.168.0.20:8301 alive client 0.9.2 2 dc1 db-server-yayun-03 192.168.0.30:8301 alive client 0.9.2 2 dc1 db-server-yayun-04 192.168.0.40:8301 alive client 0.9.2 2 dc1 [root@db-server-yayun-02 ~]#
[root@db-server-yayun-02 ~]# consul operator raft list-peers Node ID Address State Voter RaftProtocol db-server-yayun-01 192.168.0.10:8300 192.168.0.10:8300 leader true 2 [root@db-server-yayun-02 ~]#
我们看看web ui,consul自带的ui,非常轻便。访问:http://192.168.0.10:8500/ui/
到这来consul集群就搭建完成了,是不是很简单。对就是这么简单,但是从上面可以看到,client节点并没有注册服务,显示0 services。这也就是接下来需要讲解的。那么到底如何实现redis及mysql的高可用呢?正式开始:
Consul 使用场景一(redis sentinel)
(1)Redis 哨兵架构下,服务器部署了哨兵,但业务部门没有在app 层面,使用jedis 哨兵驱动来自动发现Redis master,而使用直连IP master。当master挂掉,其他redis节点担当新master后,应用需要手工修改配置,指向新master。
(2)Redis 客户端驱动,还没有读写分离的配置,若想slave的读负载均衡,暂时没好的办法。我们程序都是支持读写分离,所以没影响
(3)Consul 可以满足以上需求,配置两个DNS服务,一个是master的服务,利用consul自身的服务健康检查和探测功能, 自动发现新的master。 然后定义一个slave的服务,基于DNS本身, 能够对slave角色的redis IP做轮询。
架构图如下:
同样也可以对mysql做高可用,mha和sentinel的角色一样,架构图如下:
下面就说说redis高可用的实现过程,mysql的我就不说了,mysql用到的健康检查脚本我会贴出来。思路都是一样的。
Consul 服务定义(Redis)
上面已经搭建好了consul集群,server是192.168.0.10 client是20到40. 那么20我们就拿来当redis master,30,40拿来当redis slave。下面定义服务(20,30,40都要存在):
20,30,40的配置文件如下,除了address要修改为对应的服务器地址,其他一样。
[root@db-server-yayun-02 consul.d]# pwd /etc/consul.d [root@db-server-yayun-02 consul.d]# ll total 12 -rw-r--r--. 1 root root 221 Dec 9 09:44 client.json -rw-r--r--. 1 root root 319 Dec 9 10:48 r-6029-redis-test.json -rw-r--r--. 1 root root 321 Dec 9 10:48 w-6029-redis-test.json [root@db-server-yayun-02 consul.d]#
master的服务定义配置文件:
[root@db-server-yayun-02 consul.d]# cat w-6029-redis-test.json { "services": [ { "name": "w-6029-redis-test", "tags": [ "master-test-6029" ], "address": "192.168.0.20", "port": 6029, "checks": [ { "script": "/data/consul/shell/check_redis_master.sh 6029 ", "interval": "15s" } ] } ] } [root@db-server-yayun-02 consul.d]#View Code
slave的服务定义配置文件:
[root@db-server-yayun-02 consul.d]# cat r-6029-redis-test.json { "services": [ { "name": "r-6029-redis-test", "tags": [ "slave-test-6029" ], "address": "192.168.0.20", "port": 6029, "checks": [ { "script": "/data/consul/shell/check_redis_slave.sh 6029 ", "interval": "15s" } ] } ] } [root@db-server-yayun-02 consul.d]#View Code
每个agent都注册后, 对应有两个域名:
w-6029-redis-test.service.consul (对应唯一一个master IP)
r-6029-redis-test.service.consul (对应两个slave IP, 客户端请求时, 随机分配一个)
其中"script": "/data/consul/shell/check_redis_slave.sh 6029 "代表对redis 6029端口进行健康检查,关于更多健康检查请查看官网介绍。
[root@db-server-yayun-03 shell]# pwd /data/consul/shell [root@db-server-yayun-03 shell]# ll total 16 -rwxr-xr-x. 1 root root 480 Dec 9 10:56 check_mysql_master.sh -rwxr-xr-x. 1 root root 3004 Dec 9 10:55 check_mysql_slave.sh -rwxr-xr-x. 1 root root 254 Dec 9 10:51 check_redis_master.sh -rwxr-xr-x. 1 root root 379 Dec 9 10:51 check_redis_slave.sh [root@db-server-yayun-03 shell]#
/data/consul/shell目录下面有4个脚本,是对redis和mysql进行健康检查用的。脚本比较简单,大概就是如果只有一个master,那么读写都在master,如果有slave可用,那么读会在slave进行。如果slave复制不正常,或者复制延时,那么服务将不会注册。
[root@db-server-yayun-03 shell]# cat check_redis_master.sh #!/bin/bash myport=$1 auth=$2 if [ ! -n "$auth" ] then auth=‘\"\"‘ fi comm="/usr/local/bin/redis-cli -p $myport -a $auth " role=`echo ‘INFO Replication‘|$comm |grep -Ec ‘role:master‘` echo ‘INFO Replication‘|$comm if [ $role -ne 1 ] then exit 2 fi [root@db-server-yayun-03 shell]#View Code
[root@db-server-yayun-03 shell]# cat check_redis_slave.sh #!/bin/bash myport=$1 auth=$2 if [ ! -n "$auth" ] then auth=‘\"\"‘ fi comm="/usr/local/bin/redis-cli -p $myport -a $auth " role=`echo ‘INFO Replication‘|$comm |grep -Ec ‘^role:slave|^master_link_status:up‘` single=`echo ‘INFO Replication‘|$comm |grep -Ec ‘^role:master|^connected_slaves:0‘` echo ‘INFO Replication‘|$comm if [ $role -ne 2 -a $single -ne 2 ] then exit 2 fi [root@db-server-yayun-03 shell]#View Code
[root@db-server-yayun-03 shell]# cat check_mysql_master.sh #!/bin/bash port=$1 user="root" passwod="123" comm="/usr/local/mysql/bin/mysql -u$user -h 127.0.0.1 -P $port -p$passwod" slave_info=`$comm -e "show slave status" |wc -l` value=`$comm -Nse "select 1"` # 判断是不是从库 if [ $slave_info -ne 0 ] then exit 2 fi # 判断mysql是否存活 if [ -z $value ] then exit 2 fi echo "MySQL $port Instance is Master........" $comm -e "select * from information_schema.PROCESSLIST where user=‘repl‘ and COMMAND like ‘%Dump%‘" [root@db-server-yayun-03 shell]#View Code
[root@db-server-yayun-03 shell]# cat check_mysql_slave.sh #!/bin/bash port=$1 user="root" passwod="123" repl_check_user="root" repl_check_pwd="123" master_comm="/usr/local/mysql/bin/mysql -u$user -h 127.0.0.1 -P $port -p$passwod" slave_comm="/usr/local/mysql/bin/mysql -u$repl_check_user -P $port -p$repl_check_pwd" # 判断mysql是否存活 value=`$master_comm -Nse "select 1"` if [ -z $value ] then echo "MySQL Server is Down....." exit 2 fi get_slave_count=0 is_slave_role=0 slave_mode_repl_delay=0 master_mode_repl_delay=0 master_mode_repl_dead=0 slave_mode_repl_status=0 max_delay=120 get_slave_hosts=`$master_comm -Nse "select substring_index(HOST,‘:‘,1) from information_schema.PROCESSLIST where user=‘repl‘ and COMMAND=‘Binlog Dump‘;" ` get_slave_count=`$master_comm -Nse "select count(1) from information_schema.PROCESSLIST where user=‘repl‘ and COMMAND=‘Binlog Dump‘;" ` is_slave_role=`$master_comm -e "show slave status\G"|grep -Ewc "Slave_SQL_Running|Slave_IO_Running"` ### 单点模式(如果 get_slave_count=0 and is_slave_role=0) function single_mode { if [ $get_slave_count -eq 0 -a $is_slave_role -eq 0 ] then echo "MySQL $port Instance is Single Master........" exit 0 fi } ### 从节点模式(如果 get_slave_count=0 and is_slave_role=2 ) function slave_mode { #如果是从节点,必须满足不延迟, if [ $is_slave_role -ge 2 ] then echo "MySQL $port Instance is Slave........" $master_comm -e "show slave status\G" | egrep -w "Master_Host|Master_User|Master_Port|Master_Log_File|Read_Master_Log_Pos|Relay_Log_File|Relay_Log_Pos|Relay_Master_Log_File|Slave_IO_Running|Slave_SQL_Running|Exec_Master_Log_Pos|Relay_Log_Space|Seconds_Behind_Master" slave_mode_repl_delay=`$master_comm -e "show slave status\G" | grep -w "Seconds_Behind_Master" | awk ‘{print $NF}‘` slave_mode_repl_status=`$master_comm -e "show slave status\G"|grep -Ec "Slave_IO_Running: Yes|Slave_SQL_Running: Yes"` if [ X"$slave_mode_repl_delay" == X"NULL" ] then slave_mode_repl_delay=99999 fi if [ $slave_mode_repl_delay != "NULL" -a $slave_mode_repl_delay -lt $max_delay -a $slave_mode_repl_status -ge 2 ] then exit 0 fi fi } function master_mode { ###如果是主节点,必须满足从节点为延迟或复制错误。才可读 if [ $get_slave_count -gt 0 -a $is_slave_role -eq 0 ] then echo "MySQL $port Instance is Master........" $master_comm -e "select * from information_schema.PROCESSLIST where user=‘repl‘ and COMMAND like ‘%Dump%‘" for my_slave in $get_slave_hosts do master_mode_repl_delay=`$slave_comm -h $my_slave -e "show slave status\G" | grep -w "Seconds_Behind_Master" | awk ‘{print $NF}‘ ` master_mode_repl_thread=`$slave_comm -h $my_slave -e "show slave status\G"|grep -Ec "Slave_IO_Running: Yes|Slave_SQL_Running: Yes"` if [ X"$master_mode_repl_delay" == X"NULL" ] then master_mode_repl_delay=99999 fi if [ $master_mode_repl_delay -lt $max_delay -a $master_mode_repl_thread -ge 2 ] then exit 2 fi done exit 0 fi } single_mode slave_mode master_mode exit 2 [root@db-server-yayun-03 shell]#View Code
"name": "r-6029-redis-test",这个就是域名了,默认后缀是servers.consul,consul可以利用domain参数修改。配置文件生成以后安装redis,搭建主从复制(省略)。主从复制完成以后就可以重新reload consul了。redis info信息:
127.0.0.1:6029> info replication # Replication role:master connected_slaves:2 slave0:ip=192.168.0.40,port=6029,state=online,offset=6786,lag=0 slave1:ip=192.168.0.30,port=6029,state=online,offset=6786,lag=1 master_repl_offset:6786 repl_backlog_active:1 repl_backlog_size:67108864 repl_backlog_first_byte_offset:2 repl_backlog_histlen:6785 127.0.0.1:6029>
reload consul(3台client,也就是20-40):
[root@db-server-yayun-02 ~]# consul reload Configuration reload triggered [root@db-server-yayun-02 ~]#
在其中一台服务器查看consul日志(20):
[root@db-server-yayun-02 consul]# tail -f consul.log 2017/12/09 10:09:59 [INFO] serf: EventMemberJoin: db-server-yayun-04 192.168.0.40 2017/12/09 11:14:55 [INFO] Caught signal: hangup 2017/12/09 11:14:55 [INFO] Reloading configuration... 2017/12/09 11:14:55 [INFO] agent: Synced service ‘r-6029-redis-test‘ 2017/12/09 11:14:55 [INFO] agent: Synced service ‘w-6029-redis-test‘ 2017/12/09 11:14