mysql中MHA配置及切换方式有哪些

132次阅读

共计 12535 个字符，预计需要花费 32 分钟才能阅读完成。

这篇文章主要介绍 mysql 中 MHA 配置及切换方式有哪些，文中介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们一定要看完！

master 节点 /MHA 管理节点：172.31.217.183
slave 节点 /MHA 成员节点：172.31.217.182
已开启半同步。

数据库版本为 5.7

配置免密码登录
master 节点：
root@bd-dev-mingshuo-183:/opt/soft#ssh-keygen -t rsa

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
36:39:6b:1e:40:f2:85:31:db:d0:3e:ab:05:0e:fd:37 root@bd-dev-mingshuo-183
The key s randomart image is:
+–[RSA 2048]—-+
| +. |
| B. |
| ..+.o |
| .+o.o. |
| oooSo |
| .o++E |
| o+. . |
| .o . |
| . |
+—————–+
root@bd-dev-mingshuo-183:/opt/soft#ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.182
root@172.31.217.182 s password:

Now try logging into the machine, with ssh root@172.31.217.182 , and check in:

.ssh/authorized_keys

to make sure we haven t added extra keys that you weren t expecting.

root@bd-dev-mingshuo-183:/u01#ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.183
root@172.31.217.183 s password:

Now try logging into the machine, with ssh root@172.31.217.183 , and check in:

.ssh/authorized_keys

to make sure we haven t added extra keys that you weren t expecting.

slave 节点：
ssh-keygen -t rsa

ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.183
ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.182

slave 节点：
mysql set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql show variables like read_only \G
*************************** 1. row ***************************
Variable_name: read_only
Value: ON
1 row in set (0.00 sec)

read_only 为 1 代表是只读，0 代表读写。从库只读不会影响 slave 的日志应用。但是不要把参数写入参数文件，因为可能当这个 slave 切换为 master 就会造成普通用户不能写入。当然这个参数在配置 mha 过程中是可选的。

部署安装包
manager 节点安装 manager 包
所有节点安装 node 包
先安装 node 包
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

yum install mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

在 master 上创建 mha 管理账号
grant all privileges on *.* to mha@ 172.31.217.% identified by oracle
flush privileges;

创建目录，用于存放 mha 配置文件和 mha 日志
mkdir -p /u01/mha/log
chown mysql.mysql -R mha

编辑配置文件
vi /u01/mha/mha.cnf

[server default]
manager_log=/u01/mha/log/manager.log
manager_workdir=/u01/mha/log

master_binlog_dir=/u01/mysql/3306/data
user=mha
password=oracle
ping_interval=2
repl_user=repl_user
repl_password=oracle
ssh_user=root

[server1]
hostname=172.31.217.183
port=3306

[server2]
hostname=172.31.217.182
port=3306

配置文件可选参数：
[server default]模块：
ping_interval=1 // 设置监控主库，发送 ping 包的时间间隔，默认是 3 秒，尝试三次没有回应的时候自动进行 railover
remote_workdir=/tmp // 设置远端 mysql 在发生切换时 binlog 的保存位置
report_script=/usr/local/send_report // 设置发生切换后发送的报警的脚本
shutdown_script= // 设置故障发生后关闭故障主机脚本（该脚本的主要作用是关闭主机放在发生脑裂, 这里没有使用）
从库模块：
candidate_master=1 // 设置为候选 master，如果设置该参数以后，发生主从切换以后将会将此从库提升为主库，即使这个主库不是集群中事件最新的 slave
check_repl_delay=0 // 默认情况下如果一个 slave 落后 master 100M 的 relay logs 的话，MHA 将不会选择该 slave 作为一个新的 master，因为对于这个 slave 的恢复需要花费很长时间，通过设置 check_repl_delay=0,MHA 触发切换在选择一个新的 master 的时候将会忽略复制延时，这个参数对于设置了 candidate_master= 1 的主机非常有用，因为这个候选主在切换的过程中一定是新的 master

检测同步及 ssh 登录
masterha_check_ssh –conf=/u01/mha/mha.cnf
masterha_check_repl –conf=/u01/mha/mha.cnf

中间报了很多次错，部分解决方案：
ln -s /opt/mysql-5.7.23/bin/mysql /usr/bin/mysql
ln -s /opt/mysql-5.7.23/bin/mysqlbinlog /usr/bin/mysqlbinlog
卸载 mha4mysql-manager-0.58-0.el7.centos.noarch.rpm，安装 mha4mysql-manager-0.56-0.el6.noarch.rpm

启动 mha
nohup masterha_manager –conf=/u01/mha/mha.cnf /u01/mha/log/manager.log 2 1

检查 mha 状态
root@bd-dev-mingshuo-183:/opt/soft#masterha_check_status –conf=/u01/mha/mha.cnf

mha (pid:24910) is running(0:PING_OK), master:172.31.217.183
配置 VIP
在 server default 模块下面添加
master_ip_failover_script=/usr/local/bin/master_ip_failover

从源码包中将 master_ip_failover 拷贝到 /usr/local/bin/ 下面
cd /opt/soft/MHAsoft/mha4mysql-manager-0.56/samples/scripts
cp -ra master_ip_failover /usr/local/bin/master_ip_failover

修改 /usr/local/bin/master_ip_failover
my $vip = 172.31.217.203/24 #此处为你要设置的虚拟 ip
my $key = 1
my $ssh_start_vip = /sbin/ifconfig eth3:$key $vip #此处改为你的网卡名称
my $ssh_stop_vip = /sbin/ifconfig eth3:$key down
注：
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);

将上面内容添加到这里

GetOptions(
command=s = \$command,
ssh_user=s = \$ssh_user,
orig_master_host=s = \$orig_master_host,
orig_master_ip=s = \$orig_master_ip,
orig_master_port=i = \$orig_master_port,
new_master_host=s = \$new_master_host,
new_master_ip=s = \$new_master_ip,
new_master_port=i = \$new_master_port,
);
配置网卡 VIP
ifconfig eth3:1 172.31.217.203/24

ifconfig

eth3 Link encap:Ethernet HWaddr 54:0F:5D:2C:4D:77
inet addr:172.31.217.202 Bcast:172.31.217.255 Mask:255.255.255.0
inet6 addr: fe80::560f:5dff:fe2c:4d77/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:74742667 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000

RX bytes:52680755472 (49.0 GiB) TX bytes:740 (740.0 b)

eth3:1 Link encap:Ethernet HWaddr 54:0F:5D:2C:4D:77
inet addr:172.31.217.203 Bcast:172.31.217.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

停止 mha
masterha_stop –conf=/u01/mha/mha.cnf

再次开启 mha
nohup masterha_manager –conf=/u01/mha/mha.cnf /u01/mha/log/manager.log 2 1

报错：
Bareword FIXME_xxx not allowed while strict subs in use at /usr/local/bin/master_ip_failover line 98.
Execution of /usr/local/bin/master_ip_failover aborted due to compilation errors.
Mon Sep 17 10:56:04 2018 – [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln226] Failed to get master_ip_failover_script status with return code 255:0.
Mon Sep 17 10:56:04 2018 – [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_manager line 50
Mon Sep 17 10:56:04 2018 – [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Mon Sep 17 10:56:04 2018 – [info] Got exit code 1 (Not master dead).

直接把 FIXME_xxx 相关行注释掉算了。

再次开启 mha
nohup masterha_manager –conf=/u01/mha/mha.cnf /u01/mha/log/manager.log 2 1
ok！

关闭主库
mysqladmin -uroot -poracle shutdown

检查备库
mysql show slave status;
Empty set (0.00 sec)

mysql show master status\G
*************************** 1. row ***************************
File: slave-relay-bin.000002
Position: 154
Binlog_Do_DB:

Binlog_Ignore_DB:

Executed_Gtid_Set:

1 row in set (0.00 sec)
备库已经自动切成了主库。停掉的主库上面的 mha 软件也自动停止了。
恢复之前的主从关系：
现在拉起停掉的主库，会发现主库没有主动加入到集群中去。
主库查询日志位置：
mysql show master status\G
*************************** 1. row ***************************
File: master-bin.000005
Position: 154
Binlog_Do_DB:

Binlog_Ignore_DB:

Executed_Gtid_Set:

1 row in set (0.00 sec)
备库：
change master to
master_host= bd-dev-mingshuo-183 ,
master_port=3306,
master_user= repl_user ,
master_password= oracle ,
master_log_file= master-bin.000005 ,
master_log_pos=154;

start slave;
主库启用 mha 软件，注意这里要加 -ignore_last_failover 参数，否则会报错：
Mon Sep 17 14:45:56 2018 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 17 14:45:56 2018 – [info] Reading application default configuration from /u01/mha/mha.cnf..
Mon Sep 17 14:45:56 2018 – [info] Reading server configuration from /u01/mha/mha.cnf..
Mon Sep 17 14:45:56 2018 – [info] MHA::MasterMonitor version 0.56.
Mon Sep 17 14:45:56 2018 – [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln193] There is no alive slave. We can t do failover
Mon Sep 17 14:45:56 2018 – [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326
Mon Sep 17 14:45:56 2018 – [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Mon Sep 17 14:45:56 2018 – [info] Got exit code 1 (Not master dead).

开启 mha 软件：
nohup masterha_manager -ignore_last_failover –conf=/u01/mha/mha.cnf /u01/mha/log/manager.log 2 1

上面是自动 failover 的过程，后面再来测试一下手动 failover
停止 mha manager：
masterha_stop –conf=/u01/mha/mha.cnf

停止 master 数据库
mysqladmin -uroot -poracle shutdown

手动切换
masterha_master_switch –master_state=dead –conf=/u01/mha/mha.cnf –dead_master_host=172.31.217.183 –dead_master_port=3306 –new_master_host=172.31.217.182 –new_master_port=3306 –ignore_last_failover
上面是自动 failover 的过程，后面再来测试一下在线切换：
manager 节点：
停止 mha manager：
masterha_stop –conf=/u01/mha/mha.cnf

masterha_master_switch –conf=/u01/mha/mha.cnf –master_state=alive –new_master_host=172.31.217.182 –new_master_port=3306 –orig_master_is_new_slave –running_updates_limit=100
Mon Sep 17 15:47:29 2018 – [info] MHA::MasterRotate version 0.56.
Mon Sep 17 15:47:29 2018 – [info] Starting online master switch..
Mon Sep 17 15:47:29 2018 – [info]

Mon Sep 17 15:47:29 2018 – [info] * Phase 1: Configuration Check Phase..
Mon Sep 17 15:47:29 2018 – [info]

Mon Sep 17 15:47:29 2018 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 17 15:47:29 2018 – [info] Reading application default configuration from /u01/mha/mha.cnf..
Mon Sep 17 15:47:29 2018 – [info] Reading server configuration from /u01/mha/mha.cnf..
Mon Sep 17 15:47:29 2018 – [info] GTID failover mode = 0
Mon Sep 17 15:47:29 2018 – [info] Current Alive Master: 172.31.217.183(172.31.217.183:3306)
Mon Sep 17 15:47:29 2018 – [info] Alive Slaves:
Mon Sep 17 15:47:29 2018 – [info] 172.31.217.182(172.31.217.182:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled
Mon Sep 17 15:47:29 2018 – [info] Replicating from bd-dev-mingshuo-183(172.31.217.183:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 172.31.217.183(172.31.217.183:3306)? (YES/no): YES
Mon Sep 17 15:47:33 2018 – [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Sep 17 15:47:33 2018 – [info] ok.
Mon Sep 17 15:47:33 2018 – [info] Checking MHA is not monitoring or doing failover..
Mon Sep 17 15:47:33 2018 – [info] Checking replication health on 172.31.217.182..
Mon Sep 17 15:47:33 2018 – [info] ok.
Mon Sep 17 15:47:33 2018 – [info] 172.31.217.182 can be new master.
Mon Sep 17 15:47:33 2018 – [info]

From:
172.31.217.183(172.31.217.183:3306) (current master)
+–172.31.217.182(172.31.217.182:3306)

To:
172.31.217.182(172.31.217.182:3306) (new master)
+–172.31.217.183(172.31.217.183:3306)

Starting master switch from 172.31.217.183(172.31.217.183:3306) to 172.31.217.182(172.31.217.182:3306)? (yes/NO): yes
Mon Sep 17 15:47:55 2018 – [info] Checking whether 172.31.217.182(172.31.217.182:3306) is ok for the new master..
Mon Sep 17 15:47:55 2018 – [info] ok.
Mon Sep 17 15:47:55 2018 – [info] 172.31.217.183(172.31.217.183:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Sep 17 15:47:55 2018 – [info] 172.31.217.183(172.31.217.183:3306): Resetting slave pointing to the dummy host.
Mon Sep 17 15:47:55 2018 – [info] ** Phase 1: Configuration Check Phase completed.
Mon Sep 17 15:47:55 2018 – [info]

Mon Sep 17 15:47:55 2018 – [info] * Phase 2: Rejecting updates Phase..
Mon Sep 17 15:47:55 2018 – [info]

master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Mon Sep 17 15:48:32 2018 – [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Sep 17 15:48:32 2018 – [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Sep 17 15:48:32 2018 – [info] ok.
Mon Sep 17 15:48:32 2018 – [info] Orig master binlog:pos is master-bin.000007:154.
Mon Sep 17 15:48:32 2018 – [info] Waiting to execute all relay logs on 172.31.217.182(172.31.217.182:3306)..
Mon Sep 17 15:48:32 2018 – [info] master_pos_wait(master-bin.000007:154) completed on 172.31.217.182(172.31.217.182:3306). Executed 0 events.
Mon Sep 17 15:48:32 2018 – [info] done.
Mon Sep 17 15:48:32 2018 – [info] Getting new master s binlog name and position..
Mon Sep 17 15:48:32 2018 – [info] slave-relay-bin.000002:154
Mon Sep 17 15:48:32 2018 – [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST= 172.31.217.182 , MASTER_PORT=3306, MASTER_LOG_FILE= slave-relay-bin.000002 , MASTER_LOG_POS=154, MASTER_USER= repl_user , MASTER_PASSWORD= xxx
Mon Sep 17 15:48:32 2018 – [info] Setting read_only=0 on 172.31.217.182(172.31.217.182:3306)..
Mon Sep 17 15:48:32 2018 – [info] ok.
Mon Sep 17 15:48:32 2018 – [info]

Mon Sep 17 15:48:32 2018 – [info] * Switching slaves in parallel..
Mon Sep 17 15:48:32 2018 – [info]

Mon Sep 17 15:48:32 2018 – [info] Unlocking all tables on the orig master:
Mon Sep 17 15:48:32 2018 – [info] Executing UNLOCK TABLES..
Mon Sep 17 15:48:32 2018 – [info] ok.
Mon Sep 17 15:48:32 2018 – [info] Starting orig master as a new slave..
Mon Sep 17 15:48:32 2018 – [info] Resetting slave 172.31.217.183(172.31.217.183:3306) and starting replication from the new master 172.31.217.182(172.31.217.182:3306)..
Mon Sep 17 15:48:32 2018 – [info] Executed CHANGE MASTER.
Mon Sep 17 15:48:32 2018 – [info] Slave started.
Mon Sep 17 15:48:32 2018 – [info] All new slave servers switched successfully.
Mon Sep 17 15:48:32 2018 – [info]

Mon Sep 17 15:48:32 2018 – [info] * Phase 5: New master cleanup phase..
Mon Sep 17 15:48:32 2018 – [info]

Mon Sep 17 15:48:32 2018 – [info] 172.31.217.182: Resetting slave info succeeded.
Mon Sep 17 15:48:32 2018 – [info] Switching master to 172.31.217.182(172.31.217.182:3306) completed successfully.

注意切换过程中会有一个地方询问你
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
没有 disable 主库的写入，切换之后连接这的应用程序会继续往里面写入，这样 ok 吗？
这里我只是测试这个在线切换的过程的可用性，所以输入了 yes。
切换完成之后 mha 软件暂停了。

以上是“mysql 中 MHA 配置及切换方式有哪些”这篇文章的所有内容，感谢各位的阅读！希望分享的内容对大家有帮助，更多相关知识，欢迎关注丸趣 TV 行业资讯频道！

正文完