In this blog post, we will explore how network partitions impact group replication and the way it detects and responds to failures. In case you haven’t checked out my previous blog post about group replication recovery strategies, please have a look at them for some insight. Topology: [crayon-66f40f3f3cad1905113104/] Scenario 1: One of the GR nodes […]
14
2024
Effective Strategies for Recovering MySQL Group Replication From Failures
Group replication is a fault-tolerant/highly available replication topology that ensures if the primary node goes down, one of the other candidates or secondary members takes over so write and read operations can continue without any interruptions. However, there are some scenarios where, due to outages, network partitions, or database crashes, the group membership could be broken, or we end […]
24
2024
Understanding Basic Flow Control Activity in MySQL Group Replication: Part One
Flow control is not a new term, and we have already heard it a lot of times in Percona XtraDB Cluster/Galera-based environments. In very simple terms, it means the cluster node can’t keep up with the cluster write pace. The write rate is too high, or the nodes are oversaturated. Flow control helps avoid excessive […]
17
2023
InnoDB ClusterSet Deployment With MySQLRouter
This blog post will cover the basic setup of the InnoDB ClusterSet environment, which provides disaster tolerance for InnoDB Cluster deployments by associating a primary InnoDB Cluster with one or more replicas in alternate locations/different data centers. InnoDB ClusterSet automatically manages replication from the primary cluster to the replica clusters via a specific ClusterSet Async replication channel. If the primary cluster becomes inaccessible due to a loss of network connectivity or a data center issue, you can make a replica cluster active in its place.
Now, let’s see in detail how exactly we can configure the topology.
We have used the sandbox environment available via MySQLShell utility for this setup.
Environment
Cluster1: 127.0.0.1:3308 127.0.0.1:3309 127.0.0.1:3310 Cluster2: 127.0.0.1:3311 127.0.0.1:3312 127.0.0.1:3313 Router: 127.0.0.1:6446/6447
Let’s set up the first cluster (“cluster1”)
- Deploying the sandboxes.
MySQL JS > dba.deploySandboxInstance(3308) MySQL JS > dba.deploySandboxInstance(3309) MySQL JS > dba.deploySandboxInstance(33010)
- Then, we need to perform some pre-checks before initiating the cluster.
###connecting to the concerned nodes one by one. MySQL JS > shell.connect('root@localhost:3308') MySQL localhost:3308 ssl JS > shell.connect('root@localhost:3309') MySQL localhost:3309 ssl JS > shell.connect('root@localhost:3310') ###The below commands will check if satisfying the Innodb cluster requirements (group replication settings) and fix the missing requirements automatically. Here, we have configured a new user "iroot" for cluster deployment. MySQL localhost:3308 ssl JS > dba.checkInstanceConfiguration('root@localhost:3308') MySQL localhost:3308 ssl JS > dba.configureInstance('root@127.0.0.1:3308',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) MySQL localhost:3309 ssl JS > dba.checkInstanceConfiguration('root@localhost:3309') MySQL localhost:3309 ssl JS > dba.configureInstance('root@127.0.0.1:3309',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) MySQL localhost:3310 ssl JS > dba.checkInstanceConfiguration('root@localhost:3310') MySQL localhost:3310 ssl JS > dba.configureInstance('root@127.0.0.1:3310',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'})
Once all the instances are prepared, we can plan to create the cluster with the seed node. The “createcluster” command will perform all the hidden steps of initializing group replication, and later on, the other nodes join the group with distributed recovery/clone plugin.
InnoDB cluster is built on top of group replication which provides (automatic membership management, fault tolerance, and automatic failover). It provides us with an easy interface to deploy/manage the complex topologies with DR support.
- We will bootstrap the cluster with an initial node(“localhost:3308″).
MySQL localhost:3310 ssl JS > shell.connect('iroot@localhost:3308') MySQL localhost:3308 ssl JS > cluster1 = dba.createCluster('Cluster1') MySQL localhost:3308 ssl JS > cluster1 = dba.getCluster()
Output:
MySQL localhost:3308 ssl JS > cluster1.status() { "clusterName": "Cluster1", "defaultReplicaSet": { "name": "default", "primary": "127.0.0.1:3308", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.31" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "127.0.0.1:3308" }
- Here, we have successfully bootstrapped the first node. Next, the other nodes will join the cluster using the CLONE Plugin.
MySQL localhost:3308 ssl JS > cluster1.addInstance("iroot@localhost:3309",{password:'Iroot@1234'})
Output:
* Waiting for clone to finish... NOTE: 127.0.0.1:3309 is being cloned from 127.0.0.1:3308 ** Stage DROP DATA: Completed ** Clone Transfer FILE COPY ############################################################ 100% Completed PAGE COPY ############################################################ 100% Completed REDO COPY ############################################################ 100% Completed NOTE: 127.0.0.1:3309 is shutting down... * Waiting for server restart... ready * 127.0.0.1:3309 has restarted, waiting for clone to finish... ** Stage RESTART: Completed * Clone process has finished: 73.66 MB transferred in about 1 second (~73.66 MB/s) State recovery already finished for '127.0.0.1:3309' The instance '127.0.0.1:3309' was successfully added to the cluster.
MySQL localhost:3308 ssl JS > cluster1.addInstance("iroot@localhost:3310",{password:'Iroot@1234'})
Output:
* Waiting for clone to finish... NOTE: 127.0.0.1:3310 is being cloned from 127.0.0.1:3309 ** Stage DROP DATA: Completed ** Clone Transfer FILE COPY ############################################################ 100% Completed PAGE COPY ############################################################ 100% Completed REDO COPY ############################################################ 100% Completed NOTE: 127.0.0.1:3310 is shutting down... * Waiting for server restart... ready * 127.0.0.1:3310 has restarted, waiting for clone to finish... ** Stage RESTART: Completed * Clone process has finished: 73.66 MB transferred in about 1 second (~73.66 MB/s) State recovery already finished for '127.0.0.1:3310' The instance '127.0.0.1:3310' was successfully added to the cluster.
- At this stage, our first cluster is ready with all three nodes.
MySQL localhost:3308 ssl JS > cluster1.status()
Output:
{ "clusterName": "Cluster1", "defaultReplicaSet": { "name": "default", "primary": "127.0.0.1:3308", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.31" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "127.0.0.1:3308" }
Let’s now proceed with the second cluster (“cluster2”) setup
- Deploying the sandboxes via MySqlShell.
MySQL JS > dba.deploySandboxInstance(3311) MySQL JS > dba.deploySandboxInstance(3312) MySQL JS > dba.deploySandboxInstance(3313)
- Similarly, perform some pre-checks as we did for “cluster1” nodes.
# connecting to the concerned nodes. MySQL JS > shell.connect('root@localhost:3311') MySQL JS > shell.connect('root@localhost:3312') MySQL JS > shell.connect('root@localhost:3313') # The below commands will check if satisfying the Innodb cluster requirements (group replication settings) and fix the missing requirements automatically. Here, we have configured a new user "iroot" for cluster deployment. MySQL localhost:3308 ssl JS > dba.checkInstanceConfiguration('root@localhost:3311') MySQL localhost:3308 ssl JS > dba.configureInstance('root@127.0.0.1:3311',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) MySQL localhost:3308 ssl JS > dba.checkInstanceConfiguration('root@localhost:3312') MySQL localhost:3308 ssl JS > dba.configureInstance('root@127.0.0.1:3312',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) MySQL localhost:3308 ssl JS > dba.checkInstanceConfiguration('root@localhost:3313') MySQL localhost:3308 ssl JS > dba.configureInstance('root@127.0.0.1:3313',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'})
- Next, we will create the ClusterSet topology by triggering the sync on the node (127.0.0.1:3311) by existing cluster1 nodes. Node (127.0.0.1:3311) will be the Primary node for cluster2, and the rest of other nodes will join this node by Clone/Incremental process.
1) First, connect to “cluster1” node.
MySQL localhost:3308 ssl JS > c iroot@127.0.0.1:3308 MySQL 127.0.0.1:3308 ssl JS > cluster1 = dba.getCluster()
2) Here, “cluster1” join the ClusterSet topology,
MySQL 127.0.0.1:3308 ssl JS > myclusterset = cluster1.createClusterSet('firstclusterset')
Output:
ClusterSet successfully created. Use ClusterSet.createReplicaCluster() to add Replica Clusters to it. <ClusterSet:firstclusterset>`
3) Verifying the status.
MySQL 127.0.0.1:3308 ssl JS > myclusterset.status({extended: 1})
Output:
{ "clusters": { "Cluster1": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "127.0.0.1:3308", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-85,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5" } }, "domainName": "firstclusterset", "globalPrimaryInstance": "127.0.0.1:3308", "metadataServer": "127.0.0.1:3308", "primaryCluster": "Cluster1", "status": "HEALTHY", "statusText": "All Clusters available." }
4) Now, Node (“127.0.0.1:3311″) will sync with the existing “cluster1” with Async process.
MySQL 127.0.0.1:3308 ssl JS > c iroot@127.0.0.1:3311 MySQL 127.0.0.1:3311 ssl JS > cluster2 = myclusterset.createReplicaCluster("127.0.0.1:3311", "cluster2", {recoveryProgress: 1, timeout: 10})
Output:
... Replica Cluster 'cluster2' successfully created on ClusterSet 'firstclusterset'. ...
5) Next, the other nodes join the “cluster2” with the clone process.
MySQL 127.0.0.1:3311 ssl JS > cluster2.addInstance("iroot@127.0.0.1:3312",{password:'Iroot@1234'}) MySQL 127.0.0.1:3311 ssl JS > cluster2.addInstance("iroot@127.0.0.1:3313",{password:'Iroot@1234'})
6) Finally, checking the status of our clusterset environment.
MySQL 127.0.0.1:3311 ssl JS > myclusterset = dba.getClusterSet() MySQL 127.0.0.1:3311 ssl JS > myclusterset.status({extended: 1})
Output:
{ "clusters": { "Cluster1": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "127.0.0.1:3308", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-124,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5" }, "cluster2": { "clusterRole": "REPLICA", "clusterSetReplication": { "applierStatus": "APPLIED_ALL", "applierThreadState": "Waiting for an event from Coordinator", "applierWorkerThreads": 4, "receiver": "127.0.0.1:3311", "receiverStatus": "ON", "receiverThreadState": "Waiting for source to send event", "source": "127.0.0.1:3308" }, "clusterSetReplicationStatus": "OK", "globalStatus": "OK", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3311": { "address": "127.0.0.1:3311", "memberRole": "PRIMARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3312": { "address": "127.0.0.1:3312", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3313": { "address": "127.0.0.1:3313", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-124,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5", "transactionSetConsistencyStatus": "OK", "transactionSetErrantGtidSet": "", "transactionSetMissingGtidSet": "" } }, "domainName": "firstclusterset", "globalPrimaryInstance": "127.0.0.1:3308", "metadataServer": "127.0.0.1:3308", "primaryCluster": "Cluster1", "status": "HEALTHY", "statusText": "All Clusters available."
Here, the ClusterSet topology is ready now with all six nodes.
In the next phase, we will bootstrap MySQLRouter with our newly created ClusterSet environment:
- First, we will generate a dedicated user for MySQLRouter monitoring/management.
MySQL 127.0.0.1:3311 ssl JS > c iroot@localhost:3308 MySQL localhost:3308 ssl JS > cluster1 = dba.getCluster(); MySQL localhost:3308 ssl JS > cluster1.setupRouterAccount('router_usr')
Output:
Missing the password for new account router_usr@%. Please provide one. Password for new account: ********** Confirm password: ********** Creating user router_usr@%. Account router_usr@% was successfully created.
- Bootstrap the router with the user (“router_usr) and router name (“Router1”).
[vagrant@localhost ~]$ sudo mysqlrouter --bootstrap iroot@127.0.0.1:3308 --account=router_usr --name='Router1' --user root --force
We are using –-force here because without –-force mysqlrouter won’t recognize the clusterset. This will reconfigure the existing clusterset.
Here, we will see some useful information that later on is required to connect to a database or manage the services.
# MySQL Router 'Router1' configured for the ClusterSet 'firstclusterset' After this MySQL Router has been started with the generated configuration $ /etc/init.d/mysqlrouter restart or $ systemctl start mysqlrouter or $ mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf ClusterSet 'firstclusterset' can be reached by connecting to: ## MySQL Classic protocol - Read/Write Connections: localhost:6446 - Read/Only Connections: localhost:6447 ## MySQL X protocol - Read/Write Connections: localhost:6448 - Read/Only Connections: localhost:6449
- Finally, start the mysqlrouter service:
sudo mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf &
Validating the connection route
- Connect to the router port “6446” and create some demo table/data:
shell> mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6446 -e "create database sbtest;use sbtest;create table sbtest1 (id int(10) not null auto_increment primary key, user varchar(50));insert into sbtest1(user) values('test');"
- Connect to the router port “6447” for reading purposes. Here, the connection will be, by default, balanced among the number of nodes of the Primary Cluster(cluster1).
[vagrant@localhost ~]$ mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6447 -e "use sbtest;select * from sbtest1;select @@server_id;" +----+------+ | id | user | +----+------+ | 1 | test | +----+------+ +-------------+ | @@server_id | +-------------+ | 194452202 | +-------------+ [vagrant@localhost ~]$ mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6447 -e "use sbtest;select * from sbtest1;select @@server_id;" +----+------+ | id | user | +----+------+ | 1 | test | +----+------+ +-------------+ | @@server_id | +-------------+ | 2376678236 | +-------------+ [vagrant@localhost ~]$ mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6447 -e "use sbtest;select * from sbtest1;select @@server_id;" mysql: [Warning] Using a password on the command line interface can be insecure. +----+------+ | id | user | +----+------+ | 1 | test | +----+------+ +-------------+ | @@server_id | +-------------+ | 194452202 | +-------------+
So, by default, all the connections will route to the default “Primary” Cluster, which, in our case, is “Clustrer1”; however, we can change the primary component based on the requirement.
Changing ClusterSet topology
MySQL localhost:3308 ssl JS > myclusterset=dba.getClusterSet() MySQL localhost:3308 ssl JS > myclusterset.status({extended:1})
Output:
<ClusterSet:firstclusterset> { "clusters": { "Cluster1": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "127.0.0.1:3308", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-143,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5" }, "cluster2": { "clusterRole": "REPLICA", "clusterSetReplication": { "applierStatus": "APPLIED_ALL", "applierThreadState": "Waiting for an event from Coordinator", "applierWorkerThreads": 4, "receiver": "127.0.0.1:3311", "receiverStatus": "ON", "receiverThreadState": "Waiting for source to send event", "source": "127.0.0.1:3308" }, "clusterSetReplicationStatus": "OK", "globalStatus": "OK", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3311": { "address": "127.0.0.1:3311", "memberRole": "PRIMARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3312": { "address": "127.0.0.1:3312", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3313": { "address": "127.0.0.1:3313", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-143,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5", "transactionSetConsistencyStatus": "OK", "transactionSetErrantGtidSet": "", "transactionSetMissingGtidSet": "" } }, "domainName": "firstclusterset", "globalPrimaryInstance": "127.0.0.1:3308", "metadataServer": "127.0.0.1:3308", "primaryCluster": "Cluster1", "status": "HEALTHY", "statusText": "All Clusters available." }
-
Changing the Primary cluster from “cluster1” to “cluster2:
MySQL localhost:3308 ssl JS > myclusterset.setPrimaryCluster('cluster2')
Output:
Switching the primary cluster of the clusterset to 'cluster2' * Verifying clusterset status ** Checking cluster cluster2 Cluster 'cluster2' is available ** Checking cluster Cluster1 Cluster 'Cluster1' is available * Reconciling 5 internally generated GTIDs * Refreshing replication account of demoted cluster * Synchronizing transaction backlog at 127.0.0.1:3311 ** Transactions replicated ############################################################ 100% * Updating metadata * Updating topology ** Changing replication source of 127.0.0.1:3309 to 127.0.0.1:3311 ** Changing replication source of 127.0.0.1:3310 to 127.0.0.1:3311 ** Changing replication source of 127.0.0.1:3308 to 127.0.0.1:3311 * Acquiring locks in replicaset instances ** Pre-synchronizing SECONDARIES ** Acquiring global lock at PRIMARY ** Acquiring global lock at SECONDARIES * Synchronizing remaining transactions at promoted primary ** Transactions replicated ############################################################ 100% * Updating replica clusters Cluster 'cluster2' was promoted to PRIMARY of the clusterset. The PRIMARY instance is '127.0.0.1:3311'
- If we see the output again, we can observe that “clusterRole:PRIMARY” is shifted to “cluster2”.
<span class="s1">My</span><span class="s2">SQL </span><span class="s3"> localhost:3308 ssl </span><span class="s4"> JS </span><span class="s5">> </span><span class="s6">myclusterset.status({extended:1})</span>
Output:
{ "clusters": { "Cluster1": { "clusterRole": "REPLICA", "clusterSetReplication": { "applierStatus": "APPLIED_ALL", "applierThreadState": "Waiting for an event from Coordinator", "applierWorkerThreads": 4, "receiver": "127.0.0.1:3308", "receiverStatus": "ON", "receiverThreadState": "Waiting for source to send event", "source": "127.0.0.1:3311" }, "clusterSetReplicationStatus": "OK", "globalStatus": "OK", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-145,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5", "transactionSetConsistencyStatus": "OK", "transactionSetErrantGtidSet": "", "transactionSetMissingGtidSet": "" }, "cluster2": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "127.0.0.1:3311", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3311": { "address": "127.0.0.1:3311", "memberRole": "PRIMARY", "mode": "R/W", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3312": { "address": "127.0.0.1:3312", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3313": { "address": "127.0.0.1:3313", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-145,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5" } }, "domainName": "firstclusterset", "globalPrimaryInstance": "127.0.0.1:3311", "metadataServer": "127.0.0.1:3311", "primaryCluster": "cluster2", "status": "HEALTHY", "statusText": "All Clusters available." }
So, we have changed the Primary component from cluster1 to cluster2, but the routing is still set for cluster1. In order to send traffic to cluster2, we also have to change the routing option.
MySQL localhost:3308 ssl JS > myclusterset.listRouters()
Output:
{ "domainName": "firstclusterset", "routers": { "localhost.localdomain::Router1": { "hostname": "localhost.localdomain", "lastCheckIn": "2023-07-22 02:47:42", "roPort": "6447", "roXPort": "6449", "rwPort": "6446", "rwXPort": "6448", "targetCluster": "primary", "version": "8.0.32" },
- Changing the connection target from “cluster1” to “cluster2”:
MySQL localhost:3308 ssl JS > myclusterset.setRoutingOption('localhost.localdomain::Router1', 'target_cluster', 'cluster2')
MySQL localhost:3308 ssl JS > myclusterset.listRouters()
Output:
MySQL localhost:3308 ssl JS > myclusterset.listRouters() { "domainName": "firstclusterset", "routers": { "localhost.localdomain::Router1": { "hostname": "localhost.localdomain", "lastCheckIn": "2023-07-22 02:47:42", "roPort": "6447", "roXPort": "6449", "rwPort": "6446", "rwXPort": "6448", "targetCluster": "cluster2", "version": "8.0.32" }
Verifying the routing policy in the existing clusterset
MySQL localhost:3308 ssl JS > myclusterset.routingOptions()
Output:
{ "domainName": "firstclusterset", "global": { "invalidated_cluster_policy": "drop_all", "stats_updates_frequency": 0, "target_cluster": "primary" }, "routers": { "localhost.localdomain::Router1": { "target_cluster": "cluster2" } } }
There are situations when Primary clusters are not available or reachable. The immediate solution in some situations would be to perform an emergency failover in order to avoid the application block out.
An emergency failover basically switches to a selected replica cluster from the primary InnoDB Cluster for the InnoDB ClusterSet deployment. During an emergency failover process, data consistency is not assured due to async replication and other network factors, so for safety, the original primary cluster is marked as invalidated during the failover process.
So if by any chance the original primary cluster remains online, it should be shut down. Later, the invalidated primary cluster can join the clusterset via rejoin/repair process.
Perform emergency failover
myclusterset.forcePrimaryCluster("cluster2") myclusterset.setRoutingOption('localhost::Route1', 'target_cluster', 'cluster2')
Summary
With the help of ClusterSet implementation, deploying DR support over different regions is no more a complex challenge. MySQLShell and InnoDB cluster tackles all the configurations and syncing process behind the scene. The disaster recovery and failure time can be minimized with the help of the Admin APIs/MySQLShell commands.
There is one caveat with the clusterset functioning. It does not support high availability/auto-promotion of the new primary if the existing one goes down. We must take care of the same with some manual intervention or via some internal automated process.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
11
2022
Percona Operator for MySQL Supports Group Replication
There are two Operators at Percona to deploy MySQL on Kubernetes:
- one is based on Percona XtraDB Cluster (PXC), production-ready and widely adopted
- another is based on Percona Server for MySQL, now in tech preview and not recommended for production yet
We wrote a blog post in the past explaining the thought process and reasoning behind creating the new Operator for MySQL. The goal for us is to provide production-grade solutions to run MySQL in Kubernetes and support various replication configurations:
- Synchronous replication
- with Percona XtraDB Cluster
- with Group Replication
- Asynchronous replication
With the latest 0.2.0 release of Percona Operator for MySQL (based on Percona Server for MySQL), we have added Group Replication support. In this blog post, we will briefly review the design of our implementation and see how to set it up.
Design
This is a high-level design of running MySQL cluster with Group Replication:
MySQL Router acts as an entry point for all requests and routes the traffic to the nodes.
This is a deeper look at how the Operator deploys these components in Kubernetes:
Going from right to left:
- StatefulSet to deploy a cluster of MySQL nodes with Group Replication configured. Each node has its storage attached to it.
- Deployment object for stateless MySQL Router.
- Deployment is exposed with a Service. We use various TCP ports here:
- MySQL Protocol ports
- 6446 – read/write, routing traffic to Primary node
- 6447 – read-only, load-balancing the traffic across Replicas
- MySQL X Protocol – can be useful for CRUD operations, ex. asynchronous calls. Ports follow the same logic:
- 6448 – read/write
- 6449 – read-only
- MySQL Protocol ports
Action
Prerequisites: you need a Kubernetes cluster. Minikube would do.
The files used in this blog post can be found in this Github repo.
Deploy the Operator
kubectl apply --server-side -f https://raw.githubusercontent.com/spron-in/blog-data/master/ps-operator-gr-demo/bundle.yaml
Note
–server-side
flag, without it you will get the error:
The CustomResourceDefinition "perconaservermysqls.ps.percona.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
Our Operator follows OpenAPIv3 schema to have proper validation. This unfortunately increases the size of our Custom Resource Definition manifest and as a result, requires us to use
–server-side
flag.
Deploy the Cluster
We are ready to deploy the cluster now:
kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/ps-operator-gr-demo/cr.yaml
I created this Custom Resource manifest specifically for this demo. Important to note variables:
- Line 10:
clusterType: group-replication
- Lines 31-47: are all about MySQL Router. Once Group Replication is enabled, the Operator will automatically deploy the router.
Get the status
The best way to see if the cluster is ready is to check the Custom Resource state:
$ kubectl get ps NAME REPLICATION ENDPOINT STATE AGE my-cluster group-replication 35.223.42.238 ready 18m
As you can see, it is
ready
. You can also see
initializing
if the cluster is still not ready or
error
if something went wrong.
Here you can also see the endpoint where you can connect to. In our case, it is a public IP-address of the load balancer. As described in the design section above, there are multiple ports exposed:
$ kubectl get service my-cluster-router NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE my-cluster-router LoadBalancer 10.20.22.90 35.223.42.238 6446:30852/TCP,6447:31694/TCP,6448:31515/TCP,6449:31686/TCP 18h
Connect to the Cluster
To connect we will need the user first. By default, there is a root user with a randomly generated password. The password is stored in the Secret object. You can always fetch the password with the following command:
$ kubectl get secrets my-cluster-secrets -ojson | jq -r .data.root | base64 -d SomeRandomPassword
I’m going to use port 6446, which would grant me read/write access and lead me directly to the Primary node through MySQL Router:
mysql -u root -p -h 35.223.42.238 --port 6446 Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 156329 Server version: 8.0.28-19 Percona Server (GPL), Release 19, Revision 31e88966cd3 Copyright (c) 2000, 2022, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
Group Replication in action
Let’s see if Group Replication really works.
List the members of the cluster by running the following command:
$ mysql -u root -p -P 6446 -h 35.223.42.238 -e 'SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members;' +-------------------------------------------+--------------+-------------+ | member_host | member_state | member_role | +-------------------------------------------+--------------+-------------+ | my-cluster-mysql-0.my-cluster-mysql.mysql | ONLINE | PRIMARY | | my-cluster-mysql-1.my-cluster-mysql.mysql | ONLINE | SECONDARY | | my-cluster-mysql-2.my-cluster-mysql.mysql | ONLINE | SECONDARY | +-------------------------------------------+--------------+-------------+
Now we will delete one Pod (MySQL node), which also happens to have a Primary role, and see what happens:
$ kubectl delete pod my-cluster-mysql-0 pod "my-cluster-mysql-0" deleted $ mysql -u root -p -P 6446 -h 35.223.42.238 -e 'SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members;' +-------------------------------------------+--------------+-------------+ | member_host | member_state | member_role | +-------------------------------------------+--------------+-------------+ | my-cluster-mysql-1.my-cluster-mysql.mysql | ONLINE | PRIMARY | | my-cluster-mysql-2.my-cluster-mysql.mysql | ONLINE | SECONDARY | +-------------------------------------------+--------------+-------------+
One node is gone as expected.
my-cluster-mysql-1
node got promoted to a Primary role. I’m still using port 6446 and the same host to connect to the database, which indicates that MySQL Router is doing its job.
After some time Kubernetes will recreate the Pod and the node will join the cluster again automatically:
$ mysql -u root -p -P 6446 -h 35.223.42.238 -e 'SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members;' +-------------------------------------------+--------------+-------------+ | member_host | member_state | member_role | +-------------------------------------------+--------------+-------------+ | my-cluster-mysql-0.my-cluster-mysql.mysql | RECOVERING | SECONDARY | | my-cluster-mysql-1.my-cluster-mysql.mysql | ONLINE | PRIMARY | | my-cluster-mysql-2.my-cluster-mysql.mysql | ONLINE | SECONDARY | +-------------------------------------------+--------------+-------------+
The recovery phase might take some time, depending on the data size and amount of the changes, but eventually, it will come back ONLINE:
$ mysql -u root -p -P 6446 -h 35.223.42.238 -e 'SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members;' +-------------------------------------------+--------------+-------------+ | member_host | member_state | member_role | +-------------------------------------------+--------------+-------------+ | my-cluster-mysql-0.my-cluster-mysql.mysql | ONLINE | SECONDARY | | my-cluster-mysql-1.my-cluster-mysql.mysql | ONLINE | PRIMARY | | my-cluster-mysql-2.my-cluster-mysql.mysql | ONLINE | SECONDARY | +-------------------------------------------+--------------+-------------+
What’s coming up next?
Some exciting capabilities and features that we are going to ship pretty soon:
- Backup and restore support for clusters with Group Replication
- We have backups and restores in the Operator, but they currently do not work with Group Replication
- Monitoring of MySQL Router in Percona Monitoring and Management (PMM)
- Even though the Operator integrates nicely with PMM, it is possible to monitor MySQL nodes only, but not MySQL Router.
- Automated Upgrades of MySQL and database components in the Operator
- We have it in all other Operators and it is just logical to add it here
Percona is an open source company and we value our community and contributors. You are greatly encouraged to contribute to Percona Software. Please read our Contributions guide and visit our community webpage.
08
2016
Docker Images for MySQL Group Replication 5.7.14
In this post, I will point you to Docker images for MySQL Group Replication testing.
There is a new release of MySQL Group Replication plugin for MySQL 5.7.14. It’s a “beta” plugin and it is probably the last (or at lease one of the final pre-release packages) before Group Replication goes GA (during Oracle OpenWorld 2016, in our best guess).
Since it is close to GA, it would be great to get a better understanding of this new technology. Unfortunately, MySQL Group Replication installation process isn’t very user-friendly.
Or, to put it another way, totally un-user-friendly! It consists of a mere “50 easy steps” – by which I think they mean “easy” to mess up.
Matt Lord, in his post http://mysqlhighavailability.com/mysql-group-replication-a-quick-start-guide/, acknowledges: “getting a working MySQL service consisting of 3 Group Replication members is not an easy “point and click” or automated single command style operation.”
I’m not providing a review of MySQL Group Replication 5.7.14 yet – I need to play around with it a lot more. To make this process easier for myself, and hopefully more helpful to you, I’ve prepared Docker images for the testing of MySQL Group Replication.
Docker Images
To start the first node, run:
docker run -d --net=cluster1 --name=node1 perconalab/mysql-group-replication --group_replication_bootstrap_group=ON
To join all following nodes:
docker run -d --net=cluster1 --name=node2 perconalab/mysql-group-replication --group_replication_group_seeds='node1:6606'
Of course, you need to have Docker Network running:
docker network create cluster1
I hope this will make the testing process easier!