Automatic failover of slave nodes in replication architecture in MySQL 8.0.23

Automatic failover of slave nodes in replication architecture in MySQL 8.0.23

I have been in contact with MGR for some time. With the arrival of MySQL 8.0.23, the high-availability architecture based on MySQL Group Replicaion (MGR) provides new architectural ideas.

How can the slave in the disaster recovery room better support the MGR in the main computer room?

How many nodes can MGR fail?

This time I will briefly talk to you about some of the ideas and functions of MGR based on the above two questions.

1. Fault tolerance of the number of MySQL Group Relication members

I believe that everyone is familiar with the table above. I often ask in interviews: "How many MGR nodes of 4 nodes can fail at most?" Most people answer: "At most one node can fail. If two nodes fail, the brain will split and the system will not work."

Then let’s take a look at how MGR handles it. Is this the answer?

1) We have a 4-node MGR

Here is a question: This diagram is obviously in Single mode, but the arrow is not unidirectional. Is it drawn wrong?

2) At this time, Second-04 suddenly crashes, so what will happen to the MGR cluster?

The cluster status will become:

  • Each node will exchange its own information at a fixed time.
  • When no information from the Second-04 node is received, other members will wait for 5 seconds.
  • During this period, Second-04 definitely did not send any messages, so the healthy members considered Second-04 to be in a suspicious state and marked it as UNREACHABLE.
  • Then the healthy member continues to wait according to the parameter: group_replication_member_expel_timeout (at this time Second-04 is still in UNREACHABLE status).
  • When the group_replication_member_expel_timeout time is exceeded, the healthy member expels the Second-04 node from the cluster.

Now here comes the point, pay attention to the blackboard

In Second-04, when not expelled:

At this time, the cluster is (4 nodes - 3 healthy - 1 bad). If one node fails during this period, the cluster becomes (4 nodes - 2 healthy - 2 bad). The cluster does not meet the majority principle, and each node cannot be written (unless manual intervention is performed to forcibly specify the cluster member list).

In Second-04, after being ejected:

At this time, the cluster is (3 nodes - 3 healthy - 0 bad). The 4-node cluster degenerates into a 3-node healthy cluster. At this time, the cluster can still continue to fail a node and become (3 nodes - 2 healthy - 1 bad).

Therefore, whether one or two nodes in a 4-node cluster can fail depends on the stage of the cluster processing.

PS:

Let’s talk about the problem we just mentioned: This picture is obviously Single mode, but the arrow is not unidirectional. Is it drawn wrong?

First, in Single mode, the Second node cannot be written by default, but this is only because the super-read-only setting of the Second node is enabled.

Set the Second node super-read-only = 0. The Second node can be written normally and can synchronize with other nodes (Primary and other Second nodes). The transmission is still based on the Paxos protocol.

Run a train: The Second node reversely synchronizes other nodes without going through the conflict detection phase (the theoretical efficiency is higher than the multi-write mode). It has not been verified, so you can study it if you are interested.

2. Asynchronous Connection Failover

MySQL 8.0.22 introduced asynchronous replication connection failover. Many friends have published articles to introduce it. Here I will just briefly describe it:

1) One master and one slave in the same computer room, and a slave node in a remote computer room

2) Master fails, Slave-01 becomes the Master, and Slave-02 cannot connect to the original Master

3) If "asynchronous connection failover configuration" is configured for Slave-02, then after Slave-02 recognizes the failure of the original Master, it will automatically try to establish a replication relationship with the original Slave-01 (new Master) according to the pre-defined configuration:

This feature is very good. The referenced third-party tools (such as MHA's repair of master-slave relationships) can be replaced by MySQL native functions.

But after I finished testing, I had some doubts:

1. Doesn’t “asynchronous” replication failover support semi-synchronous architecture? It cannot ensure that data will not be lost, and it cannot completely replace MHA?
Answer: Actually, it supports enhanced semi-synchronization.

2. If the Master List for failover needs to be pre-configured, then if the architecture of computer room A changes, do we still need to maintain the nodes in computer room B?
A: Yes.

3. If computer room A is MGR, then the node (master) of MGR is abnormal, but the service is not shut down and can be accessed, then shouldn’t the node in computer room B be always connected?
A: Yes

Then, MySQL 8.0.23 was released, which brought this feature enhancement:

Slave can support MGR clusters and can dynamically identify MGR members to establish a Master-Slave relationship.

Finally, let's run a lap:

1) First we have a 3-node MGR cluster, version 8.0.22 (asynchronous connection failover works on the Slave's IO Thread, so the Slave version is 8.0.23)

+----------------------------+-------------+--------------+-------------+---------------------+
| now(6) | member_host | member_state | member_role | VIEW_ID |
+----------------------------+-------------+--------------+-------------+---------------------+
| 2021-01-22 13:41:27.902251 | mysql-01 | ONLINE | SECONDARY | 16112906030396799:9 |
| 2021-01-22 13:41:27.902251 | mysql-02 | ONLINE | PRIMARY | 16112906030396799:9 |
| 2021-01-22 13:41:27.902251 | mysql-03 | ONLINE | SECONDARY | 16112906030396799:9 |
+----------------------------+-------------+--------------+-------------+---------------------+

2) Then we specify the "failover list for Master connection" on the independent Slave node

SELECT asynchronous_connection_failover_add_managed('ch1', 'GroupReplication', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1', 'mysql-02', 3306, '', 80, 60);

Briefly explain the parameters:
ch1: chanel name GroupReplication: a hard-coded parameter. Currently supports MGR clusters aaaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaa1: MGR group name (parameter group_replication_group_name)
mysql-02: One of the MGR members 80: The priority of the Primary node (0-100). If there are multiple masters with the same priority, a node is randomly selected to serve as the master.
60: The priority of the Second node (0-100), which is basically prepared for Single mode

3) Specify replication channel information for Slave

CHANGE REPLICATION SOURCE TO SOURCE_USER='rpl_user', SOURCE_PASSWORD='123456', SOURCE_HOST='mysql-02',SOURCE_PORT=3306,SOURCE_RETRY_COUNT=2,SOURCE_CONNECTION_AUTO_FAILOVER=1,SOURCE_AUTO_POSITION=1 For CHANNEL 'ch1';

4) Start the slave and check the "Connected Transferable List"

If the io thread is not enabled, the MGR members will not be automatically recognized. And copy the user

rpl_user needs to have select permissions on performance_schema on the MGR node

start slave;
SELECT * FROM performance_schema.replication_asynchronous_connection_failover;
+--------------+----------+------+-------------------+--------+--------------------------------------+
| CHANNEL_NAME | HOST | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME |
+--------------+----------+------+-------------------+--------+--------------------------------------+
| ch1 | mysql-01 | 3306 | | 60 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
| ch1 | mysql-02 | 3306 | | 80 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
| ch1 | mysql-03 | 3306 | | 60 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
+--------------+----------+------+-------------------+--------+--------------------------------------+

5) Then we stop group_replication in mysql-02 (not shutting down the service),

The slave list automatically eliminates mysql-02 and reconnects with other nodes -- mysql-02 (Primary):

stop group_replication;

-- Slave:
SELECT * FROM performance_schema.replication_asynchronous_connection_failover;
+--------------+----------+------+-------------------+--------+--------------------------------------+
| CHANNEL_NAME | HOST | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME |
+--------------+----------+------+-------------------+--------+--------------------------------------+
| ch1 | mysql-01 | 3306 | | 80 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
| ch1 | mysql-03 | 3306 | | 60 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
+--------------+----------+------+-------------------+--------+--------------------------------------+

show slave status\G
*************************** 1. row ***************************
        Slave_IO_State: Waiting for master to send event
         Master_Host: mysql-01
         Master_User: rpl_user
         Master_Port: 3306
        Connect_Retry: 60
       Master_Log_File: mybinlog.000003
     Read_Master_Log_Pos: 4904
        Relay_Log_File:mysql-01-relay-bin-ch1.000065
        Relay_Log_Pos: 439
    Relay_Master_Log_File: mybinlog.000003
       Slave_IO_Running: Yes
      Slave_SQL_Running: Yes
      ...

Now the configuration is complete. Later, when MGR nodes are added or removed, Slave can automatically maintain this list. No other use cases will be posted.

PS:

If you want to manually switch the connection of the Master node (Primary) established by the Slave to another node (Second), you only need to delete the "Transferable List of Replication Connections" and readjust the Second priority and add it back.

-- Delete configuration SELECT asynchronous_connection_failover_delete_managed('ch1', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1');


-- Re-add and adjust the priority of Second to be higher than Primary
SELECT asynchronous_connection_failover_add_managed('ch1', 'GroupReplication', 'aaaaaaaaaaaaa-aaaa-aaaa-aaaaaaaaaa1', 'mysql-03', 3306, '', 60, 80);

Reference Links:

https://mysqlhighavailability.com/automatic-asynchronous-replication-connection-failover/

https://my.oschina.net/u/4591256/blog/4813037

https://dev.mysql.com/doc/refman/8.0/en/replication-functions-source-list.html

This is the end of this article about automatic failover of replication architecture slave nodes in MySQL 8.0.23. For more information about MySQL automatic failover, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Comprehensive analysis of MySql master-slave replication mechanism
  • Solution to MySQL replication failure caused by disk fullness
  • Detailed explanation of MySQL master-slave replication and read-write separation
  • How to copy MySQL table
  • MYSQL database GTID realizes master-slave replication (super convenient)
  • Implementation principle and configuration of MySql master-slave replication
  • A brief analysis of MySQL's WriteSet parallel replication
  • MySQL master-slave replication principle and points to note
  • How to dynamically modify the replication filter in mysql
  • A brief analysis of MySQL parallel replication
  • Analysis of three parameters of MySQL replication problem

<<:  A thorough analysis of HTML special characters

>>:  CSS example code for setting scroll bar style

Recommend

Detailed explanation of MySql slow query analysis and opening slow query log

I have also been researching MySQL performance op...

Talking about ContentType(s) from image/x-png

This also caused the inability to upload png files...

A brief discussion on the design and optimization of MySQL tree structure tables

Preface In many management and office systems, tr...

Pure CSS to implement iOS style open and close selection box function

1 Effect Demo address: https://www.albertyy.com/2...

js canvas implements verification code and obtains verification code function

This article example shares the specific code of ...

How to view and execute historical commands in Linux

View historical commands and execute specified co...

MySQL integrity constraints definition and example tutorial

Table of contents Integrity constraints Definitio...

How to replace all tags in html text

(?i) means do not match case. Replace all uppercas...

How to query a record in Mysql in which page of paging

Preface In practice, we may encounter such a prob...

Ant Design Blazor component library's routing reuse multi-tab function

Recently, there has been a growing demand for imp...

Pure CSS code to achieve drag effect

Table of contents 1. Drag effect example 2. CSS I...