Apr
20
2026
--

Deploying Cross-Site Replication in Percona Operator for MySQL (PXC)

Having a separate DR cluster for production databases is a modern day requirement or necessity for tech and other related businesses that rely heavily on their database systems. Setting up such a [DC -> DR] topology for Percona XtraDB Cluster (PXC), which is a virtually- synchronous cluster, can be a bit challenging in a complex Kubernetes environment.

Here, Percona Operator for MySQL comes in handy, with a minimal number of steps to configure such a topology, which ensures a remote side backup or a disaster recovery solution.

So without taking much time, let’s see how the overall setup and configurations look from a practical standpoint.

 

PXC Cross-Site/Disaster Recovery
PXC Cross-Site/Disaster Recovery

 

DC Configuration

1) Here we have a three-node PXC cluster running on the DC side.

shell> kubectl get pods -n pxc
NAME                                               READY   STATUS      RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running     0          23h
cluster1-haproxy-1                                 2/2     Running     0          23h
cluster1-haproxy-2                                 2/2     Running     0          23h
cluster1-pxc-0                                     3/3     Running     0          23h
cluster1-pxc-1                                     3/3     Running     0          7h37m
cluster1-pxc-2                                     3/3     Running     0          7h18m
percona-xtradb-cluster-operator-6756dbf588-vxjxt   1/1     Running     0          24h
xb-backup1-hlz2p                                   0/1     Completed   0          21h
xb-cron-cluster1-fs-pvc-2026480026-372f8-2gfhr     0/1     Completed   0          13h

2) There are some configuration options which have to be enabled in a custom resource file[cr.yaml] to allow cross-site replication.

  • Expose all source PXC nodes so they can be communicated from outside or DR cluster.
expose:
      	enabled: true
      	Type: LoadBalancer

  • Define a dedicated replication channel and enable the source option.
replicationChannels:
    - name: pxc1_to_pxc2
      isSource: true

  • Finally, applying the custom resource changes.
shell> kubectl apply -f cr.yaml

3) Now we will notice some “EXTERNAL IP” details for each PXC node. This is the endpoint that DR node [cluster1-pxc-0] will use to connect to DC.

shell> kubectl get svc
NAME                              TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                                          AGE
cluster1-haproxy                  ClusterIP      34.118.227.249   <none>          3306/TCP,3309/TCP,33062/TCP,33060/TCP,8404/TCP   4h1m
cluster1-haproxy-replicas         ClusterIP      34.118.225.41    <none>          3306/TCP                                         4h1m
cluster1-pxc                      ClusterIP      None             <none>          3306/TCP,33062/TCP,33060/TCP                     4h1m
cluster1-pxc-0                    LoadBalancer   34.118.234.140   34.29.145.138   3306:30425/TCP                                   4h1m
cluster1-pxc-1                    LoadBalancer   34.118.239.132   34.30.233.0     3306:31340/TCP                                   4h1m
cluster1-pxc-2                    LoadBalancer   34.118.236.64    35.225.0.19     3306:30642/TCP                                   4h1m
cluster1-pxc-unready              ClusterIP      None             <none>          3306/TCP,33062/TCP,33060/TCP                     4h1m
percona-xtradb-cluster-operator   ClusterIP      34.118.235.168   <none>          443/TCP                                          4h11m

At this point, we are done with the DC setup. Next, we will take a backup from Source which we later used to build the DR.

 

Backup

  • Defining access key/secrets to connect to the GCP/S3 bucket.
cat backup-secret-s3.yaml

apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-name-backup-s3
type: Opaque
data:
  AWS_ACCESS_KEY_ID: <KEY>
  AWS_SECRET_ACCESS_KEY: <SECRET>

  • In the custom resource file [cr.yaml] , we also need to define the bucket , secret file and endpoint/region details.
backup:

 storages:
   s3-us-west:
      type: s3
      verifyTLS: true

    s3:
      bucket: <bucket>
      credentialsSecret: my-cluster-name-backup-s3
      region: us-west-2
      endpointUrl: https://storage.googleapis.com

shell> kubectl apply -f cr.yaml

  • Finally, we can take the backup by creating a [backup.yaml] file with below details.
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
#  finalizers:
#    - percona.com/delete-backup
  name: backup1
spec:
  pxcCluster: cluster1
  storageName:  s3-us-west

shell> kubectl apply -f cr.yaml

  • We can verify the successful backup as follows.
kubectl get pxc-backup
NAME      CLUSTER    STORAGE      DESTINATION                                     STATUS      COMPLETED   AGE
backup1   cluster1   s3-us-west   s3://<bucket>/cluster1-2026-04-07-15:55:46-full   Succeeded   125m        127m

As the backup is also ready, we can now move to the DR setup part.

 

DR Configuration

Below we have a similar PXC setup as having in DC in a separate Node/ K8s Cluster.

kubectl get pods -n pxc-dr
NAME                                               READY   STATUS      RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running     0          35h
cluster1-haproxy-1                                 2/2     Running     0          35h
cluster1-haproxy-2                                 2/2     Running     0          35h
cluster1-pxc-0                                     3/3     Running     0          35h
cluster1-pxc-1                                     3/3     Running     0          35h
cluster1-pxc-2                                     3/3     Running     0          35h
percona-xtradb-cluster-operator-6756dbf588-2wc5m   1/1     Running     0          38h
prepare-job-restore1-cluster1-8h4vn                0/1     Completed   0          35h
restore-job-restore1-cluster1-trfg6                0/1     Completed   0          35h
xb-cron-cluster1-fs-pvc-2026480025-372f8-wv6bt     0/1     Completed   0          28h
xb-cron-cluster1-fs-pvc-2026490025-372f8-gxd59     0/1     Completed   0          4h48m

First, we need to restore the backup on the DR server.

Data Restoration

  • Here we will create the [backup-secret-s3.yaml] file which contains the GCP/S3 credentials.
apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-name-backup-s3
type: Opaque
data:
  AWS_ACCESS_KEY_ID: <KEY>
  AWS_SECRET_ACCESS_KEY: <SECRET>

shell> kubectl apply -f backup-secret-s3.yaml

  • Next, we will create a [restore.yaml] file while mentioning the backup source and other useful information.
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore1
#  annotations:
#    percona.com/headless-service: "true"
spec:
  pxcCluster: cluster1
  backupSource:
#    verifyTLS: true
    destination: s3://<bucket>/cluster1-2026-04-07-15:55:46-full
    s3:
      bucket: <bucket>
      credentialsSecret: my-cluster-name-backup-s3
      endpointUrl: https://storage.googleapis.com/

shell> kubectl apply -f restore.yaml

  • Once the restoration is finished successfully, we will see the status below.
shell> kubectl get pxc-restore
NAME       CLUSTER    STATUS      COMPLETED   AGE
restore1   cluster1   Succeeded               27m

Now we can do the remaining DR changes in the custom resource file [cr.yaml]. Basically, we need to add the replication channel and all source EXTERNAL-IPs. This cross-DC replication supports Automatic Asynchronous Replication Connection Failover feature, so in case any of the DC node is down, the Replica can connect and resume from other available DC nodes.

replicationChannels:
    - name: pxc1_to_pxc2
      isSource: false
      sourcesList:
      - host: 34.29.145.138  
        port: 3306
        weight: 100

      - host: 34.30.233.0
        port: 3306
        weight: 100

      - host: 35.225.0.19
        port: 3306
        weight: 100

shell> kubectl apply -f cr.yaml

For backup and restoration on the PXC operator, the manuals below can be referenced further.

 

Replication

Initially, when we check the replication status, we can notice the following error. This is because with [caching_sha2_password] authentication, it should be a secure SSL/TLS communication, or else we can use SOURCE_PUBLIC_KEY_PATH/GET_SOURCE_PUBLIC_KEY  which basicaly enables the RSA key pair-based password exchange by requesting the public key from the source. 

shell> kubectl exec -it cluster1-pxc-0  -- sh
shell> mysql -uroot -p

mysql> show replica status\G;
*************************** 1. row ***************************
             Replica_IO_State: Connecting to source
                  Source_Host: 35.225.0.19
                  Source_User: replication
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: 
          Read_Source_Log_Pos: 4
               Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000001
                Relay_Log_Pos: 4
        Relay_Source_Log_File: 
           Replica_IO_Running: Connecting
          Replica_SQL_Running: Yes
...

Error:

Last_IO_Error: Error connecting to source 'replication@35.225.0.19:3306'. This was attempt 2/3, with a delay of 60 seconds between attempts. Message: Access denied for user 'replication'@'35.225.0.19.' (using password: YES)

Once we passed “GET_SOURCE_PUBLIC_KEY” in the “CHANGE REPLICATION” command the  error is resolved and DR successfully able to communicate with the DC.

mysql> STOP REPLICA;
mysql> STOP REPLICA IO_THREAD FOR CHANNEL 'pxc1_to_pxc2';
mysql> CHANGE REPLICATION SOURCE TO SOURCE_USER='replication', SOURCE_PASSWORD='password', GET_SOURCE_PUBLIC_KEY=1 FOR CHANNEL 'pxc1_to_pxc2';
mysql> START REPLICA;

Note  – The Replication user will be auto-created on the DC node. So, with the help of below command we can get the decoded password for “replication” user.

shell> kubectl get secret cluster1-secrets -o jsonpath="{.data.replication}" | base64 --decode

mysql> show replica status\G;
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 35.225.0.19
                  Source_User: replication
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000006
          Read_Source_Log_Pos: 3047027
               Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000001
                Relay_Log_Pos: 150132
        Relay_Source_Log_File: binlog.000006
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...

The other PXC DR nodes will sync as usual with the Galera Synchronous replication process. 

Source Failover

The asynchronous connection failover is already enabled on the DR as we defined initially in the custom resource file. The “External IPs”  shows different here because they changed in this testing scenario.

mysql> select * from performance_schema.replication_asynchronous_connection_failover;
+--------------+---------------+------+-------------------+--------+--------------+
| CHANNEL_NAME | HOST          | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME |
+--------------+---------------+------+-------------------+--------+--------------+
| pxc1_to_pxc2 | 34.29.145.138 | 3306 |                   |    100 |              |
| pxc1_to_pxc2 | 34.45.151.96  | 3306 |                   |    100 |              |
| pxc1_to_pxc2 | 34.71.57.38   | 3306 |                   |    100 |              |
+--------------+---------------+------+-------------------+--------+--------------+
3 rows in set (0.00 sec)

Now, in case the existing Source DC[cluster1-pxc-2] is down, the DR will connect to one of the other available DC nodes based on the “Weight” and chronological order [pxc-2, pxc-1, pxc-0 etc].

  • Here, we temporarily take down the Source DC[cluster1-pxc-2] node.
kubectl get pods -n pxc
NAME                                               READY   STATUS      RESTARTS     AGE
cluster1-haproxy-0                                 2/2     Running     0            2d3h
cluster1-haproxy-1                                 2/2     Running     0            2d3h
cluster1-haproxy-2                                 2/2     Running     0            2d3h
cluster1-pxc-0                                     3/3     Running     0            2d3h
cluster1-pxc-1                                     3/3     Running     0            35h
cluster1-pxc-2                                     2/3     Running     1 (6s ago)   34h
percona-xtradb-cluster-operator-6756dbf588-vxjxt   1/1     Running     0            2d3h
xb-backup1-hlz2p                                   0/1     Completed   0            2d1h
xb-cron-cluster1-fs-pvc-2026480026-372f8-2gfhr     0/1     Completed   0            41h
xb-cron-cluster1-fs-pvc-2026490026-372f8-mgfpv     0/1     Completed   0            17h

  • The DR replication breaks as it can’t reach the DC [cluster1-pxc-2].
mysql> show replica status\G;
*************************** 1. row ***************************
             Replica_IO_State: Reconnecting after a failed source event read
                  Source_Host: 34.71.57.38
                  Source_User: replication
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000012
          Read_Source_Log_Pos: 198
               Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000002
                Relay_Log_Pos: 369
        Relay_Source_Log_File: binlog.000012
           Replica_IO_Running: Connecting
          Replica_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Source_Log_Pos: 198
              Relay_Log_Space: 602
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Source_SSL_Allowed: No
           Source_SSL_CA_File: 
           Source_SSL_CA_Path: 
              Source_SSL_Cert: 
            Source_SSL_Cipher: 
               Source_SSL_Key: 
        Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: Yes
                Last_IO_Errno: 2003
                Last_IO_Error: Error reconnecting to source 'replication@34.71.57.38:3306'. This was attempt 2/3, with a delay of 60 seconds between attempts. Message: Can't connect to MySQL server on '34.71.57.38:3306' (111)

  • Once it reaches the “source_retry_count” and “source_connect_retry”, the Replica connects to another Source DC[cluster1-pxc-1].
mysql> show replica status\G;
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 34.45.151.96
                  Source_User: replication
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000007
          Read_Source_Log_Pos: 198
               Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000003
                Relay_Log_Pos: 369
        Relay_Source_Log_File: binlog.000007
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...

Quick Summary

In this blog post, we walk through the steps to configure Cross-Site Replication in the Percona PXC operator. Although we have used the operator native Xtrabackup to feed the data to the DR via the restore process, we can also use logical backup options like (mysqldump, mydumper, etc.) to accomplish the same goals. 

Using an “Asynchronous Replication” process to sync DR could lead to delays or replication lag due to its flow, or, more importantly, when working across data centres, where network latency is a big factor. However, adding a DR(PXC) cluster to DC(PXC) directly via synchronous replication could be more impactful or lead to flow control issues if any of the DR nodes struggle or experience performance/saturation issues. So, it’s equally important to consider all aspects or challenges before deploying in production.

The post Deploying Cross-Site Replication in Percona Operator for MySQL (PXC) appeared first on Percona.

Aug
02
2018
--

Amazon RDS Multi-AZ Deployments and Read Replicas

RDS Multi-AZ

Amazon RDS is a managed relational database service that makes it easier to set up, operate, and scale a relational database in the cloud. One of the common questions that we get is “What is Multi-AZ and how it’s different from Read Replica, do I need both?”.  I have tried to answer this question in this blog post and it depends on your application needs. Are you looking for High Availability (HA), read scalability … or both?

Before we go to into detail, let me explain two common terms used with Amazon AWS.

Region – an AWS region is a separate geographical area like US East (N. Virginia), Asia Pacific (Mumbai), EU (London) etc. Each AWS Region has multiple, isolated locations known as Availability Zones.

Availability Zone (AZ) – AZ is simply one or more data centers, each with redundant power, networking and connectivity, housed in separate facilities. Data centers are geographically isolated within the same region.

What is Multi-AZ?

Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments.

In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica of the master DB in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support and to minimize latency during system backups. In the event of planned database maintenance, DB instance failure, or an AZ failure of your primary DB instance, Amazon RDS automatically performs a failover to the standby so that database operations can resume quickly without administrative intervention.

You can check in the AWS management console if a database instance is configured as Multi-AZ. Select the RDS service, click on the DB instance and review the details section.

AWS management console showing that instance is Multi-AZ

This screenshot from AWS management console (above) shows that the database is hosted as Multi-AZ deployment and the standby replica is deployed in us-east-1a AZ.

Benefits of Multi-AZ deployment:

  • Replication to a standby replica is synchronous which is highly durable.
  • When a problem is detected on the primary instance, it will automatically failover to the standby in the following conditions:
    • The primary DB instance fails
    • An Availability Zone outage
    • The DB instance server type is changed
    • The operating system of the DB instance is undergoing software patching.
    • A manual failover of the DB instance was initiated using Reboot with failover.
  • The endpoint of the DB instance remains the same after a failover, the application can resume database operations without manual intervention.
  • If a failure occurs, your availability impact is limited to the time that the automatic failover takes to complete. This helps to achieve increased availability.
  • It reduces the impact of maintenance. RDS performs maintenance on the standby first, promotes the standby to primary master, and then performs maintenance on the old master which is now a standby replica.
  • To prevent any negative impact of the backup process on performance, Amazon RDS creates a backup from the standby replica.

Amazon RDS does not failover automatically in response to database operations such as long-running queries, deadlocks or database corruption errors. Also, the Multi-AZ deployments are limited to a single region only, cross-region Multi-AZ is not currently supported.

Can I use an RDS standby replica for read scaling?

The Multi-AZ deployments are not a read scaling solution, you cannot use a standby replica to serve read traffic. Multi-AZ maintains a standby replica for HA/failover. It is available for use only when RDS promotes the standby instance as the primary. To service read-only traffic, you should use a Read Replica instead.

What is Read Replica?

Read replicas allow you to have a read-only copy of your database.

When you create a Read Replica, you first specify an existing DB instance as the source. Then Amazon RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot. You can use MySQL native asynchronous replication to keep Read Replica up-to-date with the changes. The source DB must have automatic backups enabled for setting up read replica.

Benefits of Read Replica

  • Read Replica helps in decreasing load on the primary DB by serving read-only traffic.
  • A Read Replica can be manually promoted as a standalone database instance.
  • You can create Read Replicas within AZ, Cross-AZ or Cross-Region.
  • You can have up to five Read Replicas per master, each with own DNS endpoint. Unlike a Multi-AZ standby replica, you can connect to each Read Replica and use them for read scaling.
  • You can have Read Replicas of Read Replicas.
  • Read Replicas can be Multi-AZ enabled.
  • You can use Read Replicas to take logical backups (mysqldump/mydumper) if you want to store the backups externally to RDS.
  • Read Replica helps to maintain a copy of databases in a different region for disaster recovery.

At AWS re:Invent 2017, AWS announced the preview for Amazon Aurora Multi-Master, this will allow users to create multiple Aurora writer nodes and helps in scaling reads/writes across multiple AZs. You can sign up for preview here.

Conclusion

While both (Multi-AZ and Read replica) maintain a copy of database but they are different in nature. Use Multi-AZ deployments for High Availability and Read Replica for read scalability. You can further set up a cross-region read replica for disaster recovery.

The post Amazon RDS Multi-AZ Deployments and Read Replicas appeared first on Percona Database Performance Blog.

Feb
25
2016
--

High availability with asynchronous replication… and transparent R/W split

High availability with asynchronous replication

High availability with asynchronous replicationIn this post, the first one of a Maxscale series, I describe how to use MariaDB’s MaxScale and MySQL-utilities with MySQL Asynchronous replication.

When we talk about high availability with asynchronous replication, we always think about MHA or PRM. But if we want to transparently use the slave(s) for READs, what can we use ?

Description:

  • Three MySQL servers, but one has very limited resources and will never be able to handle the production load. In fact this node is used for backup and some back-office queries.
  • We would like to use one of the nodes as a master and the other two as slaves, but only one will be addressed by the application for the READs. If needed, that same node will become the master.
  • The application doesn’t handle READ and WRITE connections, and it’s impossible to change it.

To achieve our goals, we will use MaxScale and it’s R/W filter. When using Maxscale and asynchronous replication with MariaDB, it’s possible to use MariaDB’s replication manager, which is a wonderful tool written in Go. Unfortunately, this tool doesn’t support standard MySQL. To replace it, I used then the Oracle’s MySQL-Utilities.

Our three nodes are:

  • percona1 (master)
  • percona2 (powerful slave)
  • percona3 (weak slave)

It’s mandatory in this solution to use GTID, as it’s the only method supported by the mysql-utilities we are using.

This is the MaxScale configuration:

[maxscale]
threads=4
[Splitter Service]
type=service
router=readwritesplit
servers=percona1, percona2
user=maxscale
passwd=264D375EC77998F13F4D0EC739AABAD4
[Splitter Listener]
type=listener
service=Splitter Service
protocol=MySQLClient
port=3306
socket=/tmp/ClusterMaster
[percona1]
type=server
address=192.168.90.2
port=3306
protocol=MySQLBackend
[percona2]
type=server
address=192.168.90.3
port=3306
protocol=MySQLBackend
[percona3]
type=server
address=192.168.90.4
port=3306
protocol=MySQLBackend
[Replication Monitor]
type=monitor
module=mysqlmon
servers=percona1, percona2, percona3
user=maxscale
passwd=264D375EC77998F13F4D0EC739AABAD4
monitor_interval=1000
script=/usr/local/bin/failover.sh
events=master_down
[CLI]
type=service
router=cli
[CLI Listener]
type=listener
service=CLI
protocol=maxscaled
address=localhost
port=6603

As you can notice, the Splitter Service contains only the two nodes able to handle the load.

And to perform the failover, in the Replication Monitor section, we define a script to use when the master is down.

That script calls mysqlrpladmin from the mysql-utilities.

In the script we also define the following line to be sure the weak slave will never become a master.

never_master=192.168.90.4

When everything is setup and running, you should see something like this:

# maxadmin -pmariadb list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
percona1           | 192.168.90.2    |  3306 |          15 | Master, Running
percona2           | 192.168.90.3    |  3306 |        1025 | Slave, Running
percona3           | 192.168.90.4    |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

So as you can see, Maxscale discovers on its own which server is the master; this doesn’t need to be specified in the configuration.

You can also use mysqldrpladmin utility to verify the cluster’s health:

# /usr/bin/mysqlrpladmin --rpl-user=repl:replpercona --master=manager:percona@192.168.90.2:3306 --slaves=manager:percona@192.168.90.3:3306,manager:percona@192.168.90.4:3306  health
# Checking privileges.
#
# Replication Topology Health:
+---------------+-------+---------+--------+------------+---------+
| host          | port  | role    | state  | gtid_mode  | health  |
+---------------+-------+---------+--------+------------+---------+
| 192.168.90.2  | 3306  | MASTER  | UP     | ON         | OK      |
| 192.168.90.3  | 3306  | SLAVE   | UP     | ON         | OK      |
| 192.168.90.4  | 3306  | SLAVE   | UP     | ON         | OK      |
+---------------+-------+---------+--------+------------+---------+

Try it with --verbose ????

When we test with sysbench, and we stop the master, we can see that there are some errors due to disconnects. Also, during the promotion of the new master, sysbench can’t reconnect:

[  20s] queue length: 0, concurrency: 0
[  21s] threads: 8, tps: 2.00, reads: 28.00, writes: 8.00, response time: 107.61ms (95%), errors: 0.00, reconnects:  0.00
[  21s] queue length: 0, concurrency: 0
[  22s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  22s] queue length: 0, concurrency: 0
[  23s] threads: 8, tps: 1.00, reads: 14.00, writes: 4.00, response time: 100.85ms (95%), errors: 0.00, reconnects:  0.00
[  23s] queue length: 0, concurrency: 0
[  24s] threads: 8, tps: 0.00, reads: 11.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  24s] queue length: 0, concurrency: 1
[  25s] threads: 8, tps: 1.00, reads: 3.00, writes: 4.00, response time: 235.41ms (95%), errors: 0.00, reconnects:  0.00
[  25s] queue length: 0, concurrency: 0
[  26s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  26s] queue length: 0, concurrency: 0
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
[  27s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  27s] queue length: 0, concurrency: 3
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
[  28s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  28s] queue length: 0, concurrency: 4
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
[  29s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  29s] queue length: 0, concurrency: 5
[  30s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  30s] queue length: 0, concurrency: 5
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
[  31s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  31s] queue length: 0, concurrency: 7
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
WARNING: Both max-requests and max-time are 0, running endless test
sysbench 0.5:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 8
Target transaction rate: 1/sec
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored
Threads started!
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
PANIC: unprotected error in call to Lua API (Failed to connect to the database)
WARNING: Both max-requests and max-time are 0, running endless test
sysbench 0.5:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 8
Target transaction rate: 1/sec
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored
Threads started!
[   1s] threads: 8, tps: 1.99, reads: 27.93, writes: 7.98, response time: 211.49ms (95%), errors: 0.00, reconnects:  0.00
[   1s] queue length: 0, concurrency: 0
[   2s] threads: 8, tps: 1.00, reads: 14.00, writes: 4.00, response time: 51.01ms (95%), errors: 0.00, reconnects:  0.00
[   2s] queue length: 0, concurrency: 0
[   3s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[   3s] queue length: 0, concurrency: 0
[   4s] threads: 8, tps: 1.00, reads: 13.99, writes: 4.00, response time: 80.28ms (95%), errors: 0.00, reconnects:  0.00

It took 8 seconds to automatically failover.

Then we can see the status of the servers:

# maxadmin -pmariadb list serversServers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
percona1           | 192.168.90.2    |  3306 |          17 | Down
percona2           | 192.168.90.3    |  3306 |        1025 | Master, Running
percona3           | 192.168.90.4    |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

If we re-start percona1, we can see now:

# maxadmin -pmariadb list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
percona1           | 192.168.90.2    |  3306 |          17 | Running
percona2           | 192.168.90.3    |  3306 |        1025 | Master, Running
percona3           | 192.168.90.4    |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

To add the node again in the asynchronous replication as a slave, we need to use another MySQL utility, mysqlreplicate:

# mysqlreplicate --master=manager:percona@192.168.90.3 --slave=manager:percona@192.168.90.2 --rpl-user=repl:replpercona
# master on 192.168.90.3: ... connected.
# slave on 192.168.90.2: ... connected.
# Checking for binary logging on master...
# Setting up replication...
# ...done.

This is source of failover.sh:

#!/bin/bash
# failover.sh
# wrapper script to mysqlrpladmin
# user:password pair, must have administrative privileges.
user=manager:percona
# user:password pair, must have REPLICATION SLAVE privileges.
repluser=repl:replpercona
never_master=192.168.90.4
ARGS=$(getopt -o '' --long 'event:,initiator:,nodelist:' -- "$@")
eval set -- "$ARGS"
while true; do
    case "$1" in
        --event)
            shift;
            event=$1
            shift;
        ;;
        --initiator)
            shift;
            initiator=$1
            shift;
        ;;
        --nodelist)
            shift;
            nodelist=$1
            shift;
        ;;
        --)
            shift;
            break;
        ;;
    esac
done
# find the candidates
for i in $(echo $nodelist | sed s/,/n/g)
do
  if [[ "$i" =~ "$never_master" ]]
  then
     # do nothing
     echo nothing >/dev/null
  else
     if [[ "$i" =~ "$initiator" ]]
     then
	# do nothing
        echo nothing >/dev/null
     else
        candidates="$candidates,${user}@${i}"
     fi
  fi
  if [[ "$i" =~ "$initiator" ]]
  then
     # do nothing
     echo nothing >/dev/null
  else
     slaves="$slaves,${user}@${i}"
  fi
done
cmd="/usr/bin/mysqlrpladmin --rpl-user=$repluser --slaves=${slaves#?} --candidates=${candidates#?} failover"
# uncomment following line for debug
#echo $cmd >> /tmp/fred
eval $cmd

In the next post, we will focus on the monitoring module used in this configuration.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com