mgr(mysql group replication)配置
mysql group replication 组复制所需条件
1.只能为innodb引擎
2.所有表必须有主键
查看哪些表不是innodb引擎
select table_schema,table_name from information_schema.tables where (table_schema,table_name) not in( select distinct table_schema,table_name from information_schema.columns where COLUMN_KEY='PRI' ) and table_schema not in ('sys','mysql','information_schema','performance_schema');
没有主键的添加主键
ALTER TABLE db1.table1 ADD COLUMN `id` varchar(240) NOT NULL FIRST , ADD PRIMARY KEY (`id`);
其他限制请看官方文档
以下docker的mysql镜像源于 https://github.com/bbotte/bbotte.com/tree/master/Commonly-Dockerfile/mysql
下面为mgr单主模式
my.cnf配置添加如下
[mysqld]
read_only=1
disabled_storage_engines="BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
plugin_load_add='group_replication.so'
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
loose-group_replication_start_on_boot=off
loose-group_replication_bootstrap_group=off
loose-group_replication_local_address="172.17.1.10:3317"
loose-group_replication_group_seeds="172.17.1.10:3317,172.17.1.20:3327,172.17.1.30:3337"
#loose-group_replication_allow_local_disjoint_gtids_join=on
#loose-group_replication_start_on_boot=on
server_id=1
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
不同节点,group_replication_local_address、server_id 不同,需要更改
配置完成后,启动3个mysql实例,并在所有库执行
SET SQL_LOG_BIN=0;
CREATE USER irisrepluser@'%' IDENTIFIED BY '123456';
GRANT REPLICATION SLAVE ON *.* TO irisrepluser@'%';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
CHANGE MASTER TO MASTER_USER='irisrepluser', MASTER_PASSWORD='123456' FOR CHANNEL 'group_replication_recovery';
主库执行
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
SELECT * FROM performance_schema.replication_group_members;
2个备库执行
set global group_replication_allow_local_disjoint_gtids_join=ON;
START GROUP_REPLICATION;
一般来说,现在3个mysql节点状态都为ONLINE
常用命令
SELECT * FROM performance_schema.replication_group_members;
show global variables like '%read_only%';
SELECT * FROM performance_schema.replication_group_member_stats\G
SELECT * FROM performance_schema.replication_applier_status\G
SELECT * FROM performance_schema.replication_connection_status\G
SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'group_replication_primary_member';
SELECT MEMBER_ID, MEMBER_HOST,MEMBER_PORT,MEMBER_STATE,IF(global_status.VARIABLE_NAME IS NOT NULL,'PRIMARY','SECONDARY') AS MEMBER_ROLE FROM performance_schema.replication_group_members LEFT JOIN performance_schema.global_status ON global_status.VARIABLE_NAME = 'group_replication_primary_member' AND global_status.VARIABLE_VALUE = replication_group_members.MEMBER_ID;
show global variables like '%group_replication%';
mgr状态错误
1.加载group_replication.so出错
my.cnf配置中添加 plugin_load_add=’group_replication.so’
show plugins;
2.mysql在docker中启动组复制不能启动
START GROUP_REPLICATION;
ERROR 3096 (HY000): The START GROUP_REPLICATION command failed as there was an error when initializing the group communication layer.
[Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
https://bugs.mysql.com/bug.php?id=86772 用新版本mysql解决问题
3.由于allow_local_disjoint_gtids_join关闭状态MEMBER_STATE为RECOVERING
mysql> START GROUP_REPLICATION;
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.
mysql> set global group_replication_allow_local_disjoint_gtids_join=ON;
mysql> START GROUP_REPLICATION;
4.由于从库同步点问题MEMBER_STATE为RECOVERING
SELECT * FROM performance_schema.replication_group_members;
状态是RECOVERING 现在2个从库状态是RECOVERING,状态有问题,查日志
[ERROR] Slave SQL for channel 'group_replication_recovery': Worker 1 failed executing transaction 'b3b803ed-b358-11e9-85c2-0242ac11010a:1' at master log mysql-bin.000002, end_log_pos 369; Could not execute Write_rows event on table mysql.time_zone; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 369, Error_code: 1062
数据库初始化完毕后,做mgr集群,遇到上面错误或者是其他错误,那么就跳过,查看主库已经执行的gtid,因为数据库是刚初始化的,所以gtid_purged为空,如果gtid_purged有值的话,就跳过gtid_purged的值。即确保数据一致的情况下开始主从同步 主库查看现在执行过的gtid值:
show global variables like 'gtid_executed';
从库执行:
STOP GROUP_REPLICATION;
reset master;
set global gtid_purged = '主库的gtid_executed';
START GROUP_REPLICATION;
SELECT * FROM performance_schema.replication_group_members;
现在3个数据库状态都为ONLINE
5.由于hosts问题MEMBER_STATE为RECOVERING
[ERROR] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'
[ERROR] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'
如果是上面的报错,那么绑定hosts,把ip和主机名绑定,MEMBER_STATE的RECOVERING状态自然就变为ONLINE
6.由于主库没有创建同步账号irisrepluser导致不能同步数据
[ERROR] Plugin group_replication reported: 'There was an error when connecting to the donor server
. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_
group_members table are correct and DNS resolvable.'
[ERROR] Plugin group_replication reported: 'For details please check performance_schema.replicatio
n_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'
7.数据库没有reset master导致日志有错误提示,可忽略
[ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: Group transactions:
8.如果有节点故障,需要恢复的话,先在主写入节点备份,故障节点导入
主写节点备份数据库
mysql -p123456 -e "show databases;"|grep -Evw "information_schema|mysql|Database|sys|performance_schema" |xargs mysqldump -p123456 -uroot --single-transaction --default-character-set=utf8 --master-data=1 --databases --triggers --routines --events > /root/mgr.sql
故障节点启动数据库后,再导入备份的数据库
reset master;
stop group_replication;
set global read_only=0;
source /root/mgr.sql;
set global read_only=1;
start group_replication;
线上数据库在mgr的操作
数据导入主节点,其他节点数据自动同步
线上数据库备份脚本如下:
#!/bin/bash
mysql -p123456 -e "show databases;"|grep -wEv "information_schema|mysql|Database|sys|performance_schema" |xargs mysqldump -p123456 -uroot --single-transaction --default-character-set=utf8 --master-data=1 --databases --triggers --routines --events > all.sql
sed -i '1,20s#SET @@SESSION.SQL_LOG_BIN= 0;#SET @@SESSION.SQL_LOG_BIN= 1;#' all.sql
默认全库备份导入时不生成二进制日志,要修改导入数据生成二进制日志,否则其他2个节点数据不会同步
在写节点导入线上数据库备份
mysql -uroot -p123456 -e "show databases\G"|grep Database|awk '{print $2}' |egrep -w -v "information_schema|mysql|percona|performance_schema|sys" |while read i ;do mysql -uroot -p123456 -e "drop database $i;";done
source /root/all.sql;
现在3个mysql节点MEMBER_STATE均为ONLINE 为什么不做全库–all-databases备份呢,因为会包含mysql库 MyIsam引擎的表,导入的操作是先drop表(比如 mysql.userl),再create。这样一来,数据库的授权和主从信息被删除,mgr集群出错
更改3台mysql配置文件
vim /etc/my.cnf
loose-group_replication_allow_local_disjoint_gtids_join=on
loose-group_replication_start_on_boot=on
这样3个mysql节点其中一台重启,集群不受影响
最后确认mgr状态
SELECT MEMBER_ID, MEMBER_HOST,MEMBER_PORT,MEMBER_STATE,IF(global_status.VARIABLE_NAME IS NOT NULL,'PRIMARY','SECONDARY') AS MEMBER_ROLE FROM performance_schema.replication_group_members LEFT JOIN performance_schema.global_status ON global_status.VARIABLE_NAME = 'group_replication_primary_member' AND global_status.VARIABLE_VALUE = replication_group_members.MEMBER_ID;
如果是断电导致数据库都停止,那么把数据库启动后,设置一下就可以
3台数据库my.conf配置
[mysqld]
read_only=1
disabled_storage_engines="BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
plugin_load_add='group_replication.so'
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
#loose-group_replication_start_on_boot=off
loose-group_replication_bootstrap_group=off
loose-group_replication_local_address="192.168.0.188:3316"
loose-group_replication_group_seeds="192.168.0.187:3316,192.168.0.188:3316,192.168.0.189:3316"
loose-group_replication_allow_local_disjoint_gtids_join=on
loose-group_replication_start_on_boot=on
主库执行
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
SELECT * FROM performance_schema.replication_group_members;
另外2个库执行
set global group_replication_allow_local_disjoint_gtids_join=ON;
START GROUP_REPLICATION;
参考 https://dev.mysql.com/doc/refman/5.7/en/group-replication.html