您的位置:首页 > 数据库 > SQL

MySQL高可用架构之MHA

2017-10-07 17:19 561 查看

一、简介

MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器



MHA工作原理总结为以下几条:

(1)从宕机崩溃的master保存二进制日志事件(binlog events);

(2)识别含有最新更新的slave;

(3)应用差异的中继日志(relay log) 到其他slave;

(4)应用从master保存的二进制日志事件(binlog events);

(5)提升一个slave为新master;

(6)使用其他的slave连接新的master进行复制。

官方介绍:https://code.google.com/p/mysql-master-ha/

二、部署MHA:

1.实验环境:

redhat6.5 、MySQL 5.7.19

角色ip主机名server_id服务类型
Monitor host172.25.27.4server4-监控复制组
Master172.25.27.1server11
Candicate master172.25.27.2server22
Slave172.25.27.3server33
server2和server3是server1的slave,其中master对外提供写服务,备选master(实际的slave,主机名server2)提供读服务,slave也提供相关的读服务,一旦master宕机,将会把备选master提升为新的master,slave指向新的master

2.搭建主从复制环境

下载mysql 相关rpm包:

https://dev.mysql.com/downloads/mysql/

##在server1(172.25.27.1)操作:

[root@server1 mysql5.7.19]# ls
mysql-community-client-5.7.19-1.el6.x86_64.rpm
mysql-community-common-5.7.19-1.el6.x86_64.rpm
mysql-community-libs-5.7.19-1.el6.x86_64.rpm
mysql-community-libs-compat-5.7.19-1.el6.x86_64.rpm
mysql-community-server-5.7.19-1.el6.x86_64.rpm
[root@server1 mysql5.7.19]# yum install -y *
[root@server1 mysql5.7.19]# vim /etc/my.cnf

symbolic-links=0

server-id=1
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
binlog_do-db=test
binlog_ignore_db=mysql

log-error=/var/log/mysqld.log

[root@server1 mysql5.7.19]# scp /etc/my.cnf server2:/etc/
[root@server1 mysql5.7.19]# scp /etc/my.cnf server3:/etc/

[root@server1 mysql5.7.19]# /etc/init.d/mysqld start
Initializing MySQL database:                               [  OK  ]
Starting mysqld:                                           [  OK  ]
[root@server1 mysql5.7.19]# cat /var/log/mysqld.log | grep localhost        ##查看数据库初始密码
2017-10-07T08:23:32.276339Z 1 [Note] A temporary password is generated for root@localhost: ekso,kwhk1B&
2017-10-07T08:23:39.043717Z 3 [Note] Access denied for user 'UNKNOWN_MYSQL_USER'@'localhost' (using password: NO)
[root@server1 mysql5.7.19]# mysql -p
Enter password:
mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'Mypasswd+1';
Query OK, 0 rows affected (0.00 sec)
mysql> grant replication slave on *.* to repl@'172.25.27.%' identified by 'Mypasswd+1';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 |      691 | test         | mysql            | ce1d2cae-ab38-11e7-9244-525400ac4caf:1-2 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)




##在server2\3(172.25.27.2\3)操作:
##yum 安装mysql 与在server1上完全相同,这里不再叙述
[root@server2 ~]# vim /etc/my.cnf
server-id=2\3       ##只需将此处数字改为2或者3,或者其他任意数字,三台服务器不可重复

##更改完成之后启动数据库并修改密码,均与server1相同
mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'Mypasswd+1';
Query OK, 0 rows affected (0.01 sec)

mysql> grant replication slave on *.* to repl@'172.25.27.%' identified by 'Mypasswd+1';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> change master to master_host='172.25.27.1', master_user='repl', master_password='Mypasswd+1', master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.07 sec)

mysql> start slave;
Query OK, 0 rows affected (0.02 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

##或者使用如下命令查看
[root@server2 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | egrep 'Slave_IO|Slave_SQL'
mysql: [Warning] Using a password on the command line interface can be insecure.
Slave_IO_State: Waiting for master to send event
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates


3.主从复制测试

##SERVER 2/3:
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+

##SERVER 1:
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
mysql> CREATE DATABASE test;
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+

##SERVER2/3 :
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+
5 rows in set (0.00 sec)


同步成功

4.安装MHA Manager

[root@server4 ~]# cd MHA/
[root@server4 MHA]# ls
master_ip_failover
master_ip_online_change
mha4mysql-manager-0.56-0.el6.noarch.rpm
mha4mysql-node-0.56-0.el6.noarch.rpm
perl-Config-Tiny-2.12-7.1.el6.noarch.rpm
perl-Email-Date-Format-1.002-5.el6.noarch.rpm
perl-Log-Dispatch-2.27-1.el6.noarch.rpm
perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
perl-MIME-Lite-3.027-2.el6.noarch.rpm
perl-MIME-Types-1.28-2.el6.noarch.rpm
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm

[root@server4 MHA]# yum install -y *.rpm


5.创建监控用户

在master上执行,也就是server1:

mysql> grant all on *.* to 'root'@'172.25.27.%' identified  by 'Mypasswd+2';


6.创建MHA的工作目录,并且创建相关配置文件

[root@server4 MHA]# mkdir /etc/masterha
[root@server4 MHA]# cd /etc/masterha/
[root@server4 masterha]# vim app.conf

[server default]
manager_workdir=/etc/masterha
manager_log=/etc/masterha/mha.log
master_binlog_dir=/var/lib/mysql
#master_ip_failover_script=/etc/masterha/master_ip_failover
#master_ip_online_change_script= /etc/masterha/master_ip_online_change
password=Mypasswd+2
user=root
ping_interval=1
remote_workdir=/tmp
repl_password=Mypasswd+1
repl_user=repl
#report_script=/usr/local/send_report
#secondary_check_script=/usr/bin/masterha_secondary_check -s 172.25.27.2 -s 172.25.27.3
#shutdown_script=""
ssh_user=root

[server1]
hostname=172.25.27.1
port=3306
#candidate_master=1
#check_repl_delay=0

[server2]
hostname=172.25.27.2
port=3306
#candidate_master=1
#check_repl_delay=0

[server3]
hostname=172.25.27.3
port=3306
no_master=1


7.在所有的节点安装mha node:

##server1/2/3

[root@server1 ~]# ls
mha4mysql-node-0.56-0.el6.noarch.rpm
[root@server1 ~]# yum install -y mha4mysql-node-0.56-0.el6.noarch.rpm


8.配置SSH登录无密码验证

使用key登录,服务器之间ssh登陆无需密码验证。关于配置使用key登录,这里只做简单介绍。注意:不能禁止 password 登陆,否则会出现错误

[root@server4 ~]# ssh-keygen
[root@server4 ~]# ssh-copy-id 172.25.27.1
[root@server4 ~]# ssh-copy-id 172.25.27.2
[root@server4 ~]# ssh-copy-id 172.25.27.3
[root@server4 ~]# ssh root@172.25.27.1
[root@server4 ~]# ssh root@172.25.27.2
[root@server4 ~]# ssh root@172.25.27.3


注意服务器之间均需要SSH登录无密码验证,所以在三台服务器上也需要配置使用key登录,这里不再赘述

9.检查SSH配置

使用masterha_check_ssh工具检查MHA的SSH配置状况

[root@server4 ~]# masterha_check_ssh --conf=/etc/masterha/app.cnf
Mon Oct  9 11:15:21 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct  9 11:15:21 2017 - [info] Reading application default configuration from /etc/masterha/app.cnf..
Mon Oct  9 11:15:21 2017 - [info] Reading server configuration from /etc/masterha/app.cnf..
Mon Oct  9 11:15:21 2017 - [info] Starting SSH connection tests..
Mon Oct  9 11:15:22 2017 - [debug]
Mon Oct  9 11:15:21 2017 - [
1aa70
debug]  Connecting via SSH from root@172.25.27.1(172.25.27.1:22) to root@172.25.27.2(172.25.27.2:22)..
Mon Oct  9 11:15:21 2017 - [debug]   ok.
Mon Oct  9 11:15:21 2017 - [debug]  Connecting via SSH from root@172.25.27.1(172.25.27.1:22) to root@172.25.27.3(172.25.27.3:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:22 2017 - [debug]
Mon Oct  9 11:15:22 2017 - [debug]  Connecting via SSH from root@172.25.27.2(172.25.27.2:22) to root@172.25.27.1(172.25.27.1:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:22 2017 - [debug]  Connecting via SSH from root@172.25.27.2(172.25.27.2:22) to root@172.25.27.3(172.25.27.3:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:23 2017 - [debug]
Mon Oct  9 11:15:22 2017 - [debug]  Connecting via SSH from root@172.25.27.3(172.25.27.3:22) to root@172.25.27.1(172.25.27.1:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:22 2017 - [debug]  Connecting via SSH from root@172.25.27.3(172.25.27.3:22) to root@172.25.27.2(172.25.27.2:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:23 2017 - [info] All SSH connection tests passed successfully.


检测结果不能有error,否则就是SSH登录无密码验证配置存在问题

10.检查整个复制环境状况。

通过masterha_check_repl脚本查看整个集群的状态

[root@server4 ~]# yum install -y mysql-server
[root@server4 ~]# mysql -h 172.25.27.1 -u repl -pMypasswd+1
[root@server4 ~]# mysql -h 172.25.27.1 -u root -pMypasswd+2
[root@server4 ~]# masterha_check_repl --conf=/etc/masterha/app.cnf
Mon Oct  9 11:32:21 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct  9 11:32:21 2017 - [info] Reading application default configuration from /etc/masterha/app.cnf..
Mon Oct  9 11:32:21 2017 - [info] Reading server configuration from /etc/masterha/app.cnf..
Mon Oct  9 11:32:21 2017 - [info] MHA::MasterMonitor version 0.56.
Mon Oct  9 11:32:22 2017 - [info] GTID failover mode = 1
Mon Oct  9 11:32:22 2017 - [info] Dead Servers:
Mon Oct  9 11:32:22 2017 - [info] Alive Servers:
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.1(172.25.27.1:3306)
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.2(172.25.27.2:3306)
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.3(172.25.27.3:3306)
Mon Oct  9 11:32:22 2017 - [info] Alive Slaves:
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.2(172.25.27.2:3306)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Mon Oct  9 11:32:22 2017 - [info]     GTID ON
Mon Oct  9 11:32:22 2017 - [info]     Replicating from 172.25.27.1(172.25.27.1:3306)
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.3(172.25.27.3:3306)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Mon Oct  9 11:32:22 2017 - [info]     GTID ON
Mon Oct  9 11:32:22 2017 - [info]     Replicating from 172.25.27.1(172.25.27.1:3306)
Mon Oct  9 11:32:22 2017 - [info]     Not candidate for the new Master (no_master is set)
Mon Oct  9 11:32:22 2017 - [info] Current Alive Master: 172.25.27.1(172.25.27.1:3306)
Mon Oct  9 11:32:22 2017 - [info] Checking slave configurations..
Mon Oct  9 11:32:22 2017 - [info]  read_only=1 is not set on slave 172.25.27.2(172.25.27.2:3306).
Mon Oct  9 11:32:22 2017 - [info]  read_only=1 is not set on slave 172.25.27.3(172.25.27.3:3306).
Mon Oct  9 11:32:22 2017 - [info] Checking replication filtering settings..
Mon Oct  9 11:32:22 2017 - [info]  binlog_do_db= test, binlog_ignore_db= mysql
Mon Oct  9 11:32:22 2017 - [info]  Replication filtering check ok.
Mon Oct  9 11:32:22 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Oct  9 11:32:22 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct  9 11:32:22 2017 - [info] HealthCheck: SSH to 172.25.27.1 is reachable.
Mon Oct  9 11:32:22 2017 - [info]
172.25.27.1(172.25.27.1:3306) (current master)
+--172.25.27.2(172.25.27.2:3306)
+--172.25.27.3(172.25.27.3:3306)

Mon Oct  9 11:32:22 2017 - [info] Checking replication health on 172.25.27.2..
Mon Oct  9 11:32:22 2017 - [info]  ok.
Mon Oct  9 11:32:22 2017 - [info] Checking replication health on 172.25.27.3..
Mon Oct  9 11:32:22 2017 - [info]  ok.
Mon Oct  9 11:32:22 2017 - [warning] master_ip_failover_script is not defined.
Mon Oct  9 11:32:22 2017 - [warning] shutdown_script is not defined.
Mon Oct  9 11:32:22 2017 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.


没有明显报错,只有两个警告而已,复制显示正常。

11.检查MHA Manager的状态:

通过master_check_status脚本查看Manager的状态:

[root@server4 ~]# masterha_check_status --conf=/etc/masterha/app.cnf
app is stopped(2:NOT_RUNNING).


注意:如果正常,会显示”PING_OK”,否则会显示”NOT_RUNNING”,这代表MHA监控没有开启。

12.开启MHA Manager监控

[root@server4 ~]# nohup masterha_manager --conf=/etc/masterha/app.cnf &
[1] 1272
[root@server4 ~]# nohup: ignoring input and appending output to `nohup.out'

[root@server4 ~]#


查看MHA Manager监控是否正常:

[root@server4 ~]# masterha_check_status --conf=/etc/masterha/app.cnf
app (pid:1272) is running(0:PING_OK), master:172.25.27.1


已经在监控了,而且master的主机为172.25.27.1

13.查看启动日志

[root@server4 ~]# tail -n20 /etc/masterha/mha.log
Mon Oct  9 11:39:19 2017 - [info] Checking slave configurations..
Mon Oct  9 11:39:19 2017 - [info]  read_only=1 is not set on slave 172.25.27.2(172.25.27.2:3306).
Mon Oct  9 11:39:19 2017 - [info]  read_only=1 is not set on slave 172.25.27.3(172.25.27.3:3306).
Mon Oct  9 11:39:19 2017 - [info] Checking replication filtering settings..
Mon Oct  9 11:39:19 2017 - [info]  binlog_do_db= test, binlog_ignore_db= mysql
Mon Oct  9 11:39:19 2017 - [info]  Replication filtering check ok.
Mon Oct  9 11:39:19 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Oct  9 11:39:19 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct  9 11:39:19 2017 - [info] HealthCheck: SSH to 172.25.27.1 is reachable.
Mon Oct  9 11:39:19 2017 - [info]
172.25.27.1(172.25.27.1:3306) (current master)
+--172.25.27.2(172.25.27.2:3306)
+--172.25.27.3(172.25.27.3:3306)

Mon Oct  9 11:39:19 2017 - [warning] master_ip_failover_script is not defined.
Mon Oct  9 11:39:19 2017 - [warning] shutdown_script is not defined.
Mon Oct  9 11:39:19 2017 - [info] Set master ping interval 1 seconds.
Mon Oct  9 11:39:19 2017 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Oct  9 11:39:19 2017 - [info] Starting ping health check on 172.25.27.1(172.25.27.1:3306)..
Mon Oct  9 11:39:19 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..


中”Ping(SELECT) succeeded, waiting until MySQL doesn’t respond..”说明整个系统已经开始监控了

14.master故障测试

##server2、3  查看master
[root@server2 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 6
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.27.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60


可以看到master为172.25.27.1

接下来我们模拟172.25.27.1出现故障的情况

[root@server4 ~]# masterha_check_status --conf=/etc/masterha/app.cnf
app (pid:1272) is running(0:PING_OK), master:172.25.27.1
[root@server1 ~]# ps -ax | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
922 ? S 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
1236 ? Sl 0:03 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
1706 pts/0 S+ 0:00 grep mysql

[root@server1 ~]# kill -9 922 1236
[root@server1 ~]# ps -ax | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
1713 pts/0 S+ 0:00 grep mysql


现在再去server3 上查看master

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.27.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000003
Read_Master_Log_Pos: 774
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 695
Relay_Master_Log_File: binlog.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes


发现master已经切换到了server2

接下来server1故障修复,使其加入集群成为slave

[root@server1 ~]# /etc/init.d/mysqld start
Starting mysqld:                                           [  OK  ]
[root@server1 ~]# mysql -p
Enter password:
mysql> change master to master_host='172.25.27.2', master_user='repl', master_password='Mypasswd+1', master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.05 sec)

mysql> start slave;
Query OK, 0 rows affected (0.01 sec)

mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.27.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000003
Read_Master_Log_Pos: 774
Relay_Log_File: server1-relay-bin.000003
Relay_Log_Pos: 735
Relay_Master_Log_File: binlog.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes


15.强制在线切换

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.27.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000003
Read_Master_Log_Pos: 774
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 695
Relay_Master_Log_File: binlog.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes


master为server2,接下来我们要强制手动切换到server1

[root@server4 ~]# masterha_master_switch --conf=/etc/masterha/app.cnf --master_state=alive --new_master_host=172.25.27.1 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000


提示是否切换,输入三次yes即可

其中参数的意思:

–orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动

–running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定

[root@server2 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.27.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000004
Read_Master_Log_Pos: 896
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 405
Relay_Master_Log_File: binlog.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes


server2上查看,slave变成了server1,server2变成了slave,切换成功

查看日志:

[root@server4 ~]# tail -n20 /etc/masterha/mha.log Mon Oct  9 11:55:33 2017 - [info]
Mon Oct  9 11:55:33 2017 - [info] Resetting slave info on the new master..
Mon Oct  9 11:55:33 2017 - [info]  172.25.27.2: Resetting slave info succeeded.
Mon Oct  9 11:55:33 2017 - [info] Master failover to 172.25.27.2(172.25.27.2:3306) completed successfully.
Mon Oct  9 11:55:33 2017 - [info]

----- Failover Report -----

app: MySQL Master failover 172.25.27.1(172.25.27.1:3306) to 172.25.27.2(172.25.27.2:3306) succeeded

Master 172.25.27.1(172.25.27.1:3306) is down!

Check MHA Manager logs at server4:/etc/masterha/mha.log for details.

Started automated(non-interactive) failover.
Selected 172.25.27.2(172.25.27.2:3306) as a new master.
172.25.27.2(172.25.27.2:3306): OK: Applying all logs succeeded.
172.25.27.3(172.25.27.3:3306): OK: Slave started, replicating from 172.25.27.2(172.25.27.2:3306)
172.25.27.2(172.25.27.2:3306): Resetting slave info succeeded.
Master failover to 172.25.27.2(172.25.27.2:3306) completed successfully.


16.强制故障切换

前面实验过 MHA Manager监控开启的情况下master故障会自动从slave中选取一台提升为master,但是成功切换一次之后MHA Manager监控就自动关闭了,那么这个时候如果新的master发生故障之后就无法自动切换了,就需要我们手动进行切换

查看MHA Manager监控状态

[root@server4 ~]# masterha_check_status --conf=/etc/masterha/app.cnfapp is stopped(2:NOT_RUNNING).


果然是关闭的

查看master主机

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 5
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.27.1
Master_User: repl
Master_Port: 3306


server1为master,接下来模拟master发生故障

[root@server1 ~]# ps -ax | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
1742 pts/0    S      0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
2060 pts/0    Sl     0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
2113 pts/0    S+     0:00 grep mysql
[root@server1 ~]# kill -9 1742 2060
[root@server1 ~]# ps -ax | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
2115 pts/0    S+     0:00 grep mysql


查看server2/3的状态

[root@server2 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Reconnecting after a failed master event read
Master_Host: 172.25.27.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000004
Read_Master_Log_Pos: 896
Relay_Log_File: server2-relay-bin.000002
Relay_Log_Pos: 405
Relay_Master_Log_File: binlog.000004
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Reconnecting after a failed master event read
Master_Host: 172.25.27.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000004
Read_Master_Log_Pos: 896
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 405
Relay_Master_Log_File: binlog.000004
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes


master仍然为server1,但是server1已经挂了,无法提供服务了,此时就需要我们手动使候选master接管服务

[root@server4 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app.cnf --dead_master_host=172.25.27.1 --dead_master_port=3306 --new_master_host=172.25.27.2 --new_master_port=3306 --ignore_last_failover


检查是否接管服务

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.27.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000003
Read_Master_Log_Pos: 774
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 405
Relay_Master_Log_File: binlog.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes


server2接管服务成为master,说明切换成功

查看日志:

[root@server4 ~]# tail -n20 /etc/masterha/mha.log Mon Oct  9 11:55:33 2017 - [info]
Mon Oct  9 11:55:33 2017 - [info] Resetting slave info on the new master..
Mon Oct  9 11:55:33 2017 - [info]  172.25.27.2: Resetting slave info succeeded.
Mon Oct  9 11:55:33 2017 - [info] Master failover to 172.25.27.2(172.25.27.2:3306) completed successfully.
Mon Oct  9 11:55:33 2017 - [info]

----- Failover Report -----

app: MySQL Master failover 172.25.27.1(172.25.27.1:3306) to 172.25.27.2(172.25.27.2:3306) succeeded

Master 172.25.27.1(172.25.27.1:3306) is down!

Check MHA Manager logs at server4:/etc/masterha/mha.log for details.

Started automated(non-interactive) failover.
Selected 172.25.27.2(172.25.27.2:3306) as a new master.
172.25.27.2(172.25.27.2:3306): OK: Applying all logs succeeded.
172.25.27.3(172.25.27.3:3306): OK: Slave started, replicating from 172.25.27.2(172.25.27.2:3306)
172.25.27.2(172.25.27.2:3306): Resetting slave info succeeded.
Master failover to 172.25.27.2(172.25.27.2:3306) completed successfully.


接下来手动修复server1并加入集群成为slave,这一步此处不再赘述

17.配置VIP

vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式(即不需要keepalived或者heartbeat类似的软件)。

1.通过脚本的方式管理VIP。

这里是修改/usr/local/bin/master_ip_failover,也可以使用其他的语言完成,比如php语言。使用php脚本编写的failover这里就不介绍了。修改完成后内容如下,而且如果使用脚本管理vip的话,需要手动在master服务器上绑定一个vip

[root@server4 ~]# vim /etc/masterha/app.cnf
master_ip_failover_script=/etc/masterha/master_ip_failover
master_ip_online_change_script= /etc/masterha/master_ip_online_change
##这两行的注释去掉

[root@server4 ~]# cd /etc/masterha/
[root@server4 masterha]# vim master_ip_online_change

#!/usr/bin/env perl
use strict;
use warnings FATAL =>'all';

use Getopt::Long;

my $vip = '172.25.27.100/24';  # Virtual IP
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $exit_code = 0;

my (
$command,              $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip,       $orig_master_port,         $orig_master_user,
$orig_master_password, $orig_master_ssh_user,     $new_master_host,
$new_master_ip,        $new_master_port,          $new_master_user,
$new_master_password,  $new_master_ssh_user,
);
GetOptions(
'command=s'                => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s'       => \$orig_master_host,
'orig_master_ip=s'         => \$orig_master_ip,
'orig_master_port=i'       => \$orig_master_port,
'orig_master_user=s'       => \$orig_master_user,
'orig_master_password=s'   => \$orig_master_password,
'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
'new_master_host=s'        => \$new_master_host,
'new_master_ip=s'          => \$new_master_ip,
'new_master_port=i'        => \$new_master_port,
'new_master_user=s'        => \$new_master_user,
'new_master_password=s'    => \$new_master_password,
'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

exit &main();

sub main {

#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "\n\n\n***************************************************************\n";
print "Disabling the VIP - $vip on old master: $orig_master_host\n";
print "***************************************************************\n\n\n\n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {

# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "\n\n\n***************************************************************\n";
print "Enabling the VIP - $vip on new master: $new_master_host \n";
print "***************************************************************\n\n\n\n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}

# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}


[root@server4 masterha]# vim master_ip_failover

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;

my (
$command,          $ssh_user,        $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '172.25.27.100/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

GetOptions(
'command=s'          => \$command,
'ssh_user=s'         => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s'   => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s'  => \$new_master_host,
'new_master_ip=s'    => \$new_master_ip,
'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {

my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}

sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0  unless  ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}


[root@server4 masterha]# chmod +x master_ip_*

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13 mysql: [Warning] Using a password on the command line interface can be insecure. *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.25.27.2 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000003 Read_Master_Log_Pos: 774 Relay_Log_File: server3-relay-bin.000002 Relay_Log_Pos: 405 Relay_Master_Log_File: binlog.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes
[root@server4 masterha]# masterha_master_switch --conf=/etc/masterha/app.cnf --master_state=alive --new_master_host=172.25.27.1 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.27.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000005
Read_Master_Log_Pos: 234
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 361
Relay_Master_Log_File: binlog.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

[root@server1 ~]# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:ac:4c:af brd ff:ff:ff:ff:ff:ff
inet 172.25.27.1/24 brd 172.25.27.255 scope global eth0
inet 172.25.27.100/24 brd 172.25.27.255 scope global secondary eth0:1
inet6 fe80::5054:ff:feac:4caf/64 scope link
valid_lft forever preferred_lft forever


可以看到切换成功,虚拟ip也成功,访问虚拟ip即可获取服务

[root@server4 ~]# mysql -h 172.25.27.100 -u root -pMypasswd+2
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+
5 rows in set (0.00 sec)


刚才我们写了master_ip_failover 脚本,还可以进行master故障强制切换的模拟,这里不做介绍了,有兴趣可以自己实践

2.通过keepalived的方式管理虚拟ip

关于通过keepalived的方式管理虚拟ip,将在后面的文章介绍,感谢关注
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息