Having a separate DR cluster for production databases is a modern day requirement or necessity for tech and other related businesses that rely heavily on their database systems. Setting up such a [DC -> DR] topology for Percona XtraDB Cluster (PXC), which is a virtually- synchronous cluster, can be a bit challenging in a complex Kubernetes environment.
Here, Percona Operator for MySQL comes in handy, with a minimal number of steps to configure such a topology, which ensures a remote side backup or a disaster recovery solution.
So without taking much time, let’s see how the overall setup and configurations look from a practical standpoint.

DC Configuration
1) Here we have a three-node PXC cluster running on the DC side.
shell> kubectl get pods -n pxc NAME READY STATUS RESTARTS AGE cluster1-haproxy-0 2/2 Running 0 23h cluster1-haproxy-1 2/2 Running 0 23h cluster1-haproxy-2 2/2 Running 0 23h cluster1-pxc-0 3/3 Running 0 23h cluster1-pxc-1 3/3 Running 0 7h37m cluster1-pxc-2 3/3 Running 0 7h18m percona-xtradb-cluster-operator-6756dbf588-vxjxt 1/1 Running 0 24h xb-backup1-hlz2p 0/1 Completed 0 21h xb-cron-cluster1-fs-pvc-2026480026-372f8-2gfhr 0/1 Completed 0 13h
2) There are some configuration options which have to be enabled in a custom resource file[cr.yaml] to allow cross-site replication.
- Expose all source PXC nodes so they can be communicated from outside or DR cluster.
expose:
enabled: true
Type: LoadBalancer
- Define a dedicated replication channel and enable the source option.
replicationChannels:
- name: pxc1_to_pxc2
isSource: true
- Finally, applying the custom resource changes.
shell> kubectl apply -f cr.yaml
3) Now we will notice some “EXTERNAL IP” details for each PXC node. This is the endpoint that DR node [cluster1-pxc-0] will use to connect to DC.
shell> kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cluster1-haproxy ClusterIP 34.118.227.249 <none> 3306/TCP,3309/TCP,33062/TCP,33060/TCP,8404/TCP 4h1m cluster1-haproxy-replicas ClusterIP 34.118.225.41 <none> 3306/TCP 4h1m cluster1-pxc ClusterIP None <none> 3306/TCP,33062/TCP,33060/TCP 4h1m cluster1-pxc-0 LoadBalancer 34.118.234.140 34.29.145.138 3306:30425/TCP 4h1m cluster1-pxc-1 LoadBalancer 34.118.239.132 34.30.233.0 3306:31340/TCP 4h1m cluster1-pxc-2 LoadBalancer 34.118.236.64 35.225.0.19 3306:30642/TCP 4h1m cluster1-pxc-unready ClusterIP None <none> 3306/TCP,33062/TCP,33060/TCP 4h1m percona-xtradb-cluster-operator ClusterIP 34.118.235.168 <none> 443/TCP 4h11m
At this point, we are done with the DC setup. Next, we will take a backup from Source which we later used to build the DR.
Backup
- Defining access key/secrets to connect to the GCP/S3 bucket.
cat backup-secret-s3.yaml
apiVersion: v1 kind: Secret metadata: name: my-cluster-name-backup-s3 type: Opaque data: AWS_ACCESS_KEY_ID: <KEY> AWS_SECRET_ACCESS_KEY: <SECRET>
- In the custom resource file [cr.yaml] , we also need to define the bucket , secret file and endpoint/region details.
backup:
storages:
s3-us-west:
type: s3
verifyTLS: true
s3:
bucket: <bucket>
credentialsSecret: my-cluster-name-backup-s3
region: us-west-2
endpointUrl: https://storage.googleapis.com
…
shell> kubectl apply -f cr.yaml
- Finally, we can take the backup by creating a [backup.yaml] file with below details.
apiVersion: pxc.percona.com/v1 kind: PerconaXtraDBClusterBackup metadata: # finalizers: # - percona.com/delete-backup name: backup1 spec: pxcCluster: cluster1 storageName: s3-us-west
…
shell> kubectl apply -f cr.yaml
- We can verify the successful backup as follows.
kubectl get pxc-backup NAME CLUSTER STORAGE DESTINATION STATUS COMPLETED AGE backup1 cluster1 s3-us-west s3://<bucket>/cluster1-2026-04-07-15:55:46-full Succeeded 125m 127m
As the backup is also ready, we can now move to the DR setup part.
DR Configuration
Below we have a similar PXC setup as having in DC in a separate Node/ K8s Cluster.
kubectl get pods -n pxc-dr NAME READY STATUS RESTARTS AGE cluster1-haproxy-0 2/2 Running 0 35h cluster1-haproxy-1 2/2 Running 0 35h cluster1-haproxy-2 2/2 Running 0 35h cluster1-pxc-0 3/3 Running 0 35h cluster1-pxc-1 3/3 Running 0 35h cluster1-pxc-2 3/3 Running 0 35h percona-xtradb-cluster-operator-6756dbf588-2wc5m 1/1 Running 0 38h prepare-job-restore1-cluster1-8h4vn 0/1 Completed 0 35h restore-job-restore1-cluster1-trfg6 0/1 Completed 0 35h xb-cron-cluster1-fs-pvc-2026480025-372f8-wv6bt 0/1 Completed 0 28h xb-cron-cluster1-fs-pvc-2026490025-372f8-gxd59 0/1 Completed 0 4h48m
First, we need to restore the backup on the DR server.
Data Restoration
- Here we will create the [backup-secret-s3.yaml] file which contains the GCP/S3 credentials.
apiVersion: v1 kind: Secret metadata: name: my-cluster-name-backup-s3 type: Opaque data: AWS_ACCESS_KEY_ID: <KEY> AWS_SECRET_ACCESS_KEY: <SECRET>
…
shell> kubectl apply -f backup-secret-s3.yaml
- Next, we will create a [restore.yaml] file while mentioning the backup source and other useful information.
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
name: restore1
# annotations:
# percona.com/headless-service: "true"
spec:
pxcCluster: cluster1
backupSource:
# verifyTLS: true
destination: s3://<bucket>/cluster1-2026-04-07-15:55:46-full
s3:
bucket: <bucket>
credentialsSecret: my-cluster-name-backup-s3
endpointUrl: https://storage.googleapis.com/
…
shell> kubectl apply -f restore.yaml
- Once the restoration is finished successfully, we will see the status below.
shell> kubectl get pxc-restore NAME CLUSTER STATUS COMPLETED AGE restore1 cluster1 Succeeded 27m
Now we can do the remaining DR changes in the custom resource file [cr.yaml]. Basically, we need to add the replication channel and all source EXTERNAL-IPs. This cross-DC replication supports Automatic Asynchronous Replication Connection Failover feature, so in case any of the DC node is down, the Replica can connect and resume from other available DC nodes.
replicationChannels:
- name: pxc1_to_pxc2
isSource: false
sourcesList:
- host: 34.29.145.138
port: 3306
weight: 100
- host: 34.30.233.0
port: 3306
weight: 100
- host: 35.225.0.19
port: 3306
weight: 100
…
shell> kubectl apply -f cr.yaml
For backup and restoration on the PXC operator, the manuals below can be referenced further.
- https://docs.percona.com/percona-operator-for-mysql/pxc/backups-ondemand.html
- https://docs.percona.com/percona-operator-for-mysql/pxc/backups-restore-to-new-cluster.html
Replication
Initially, when we check the replication status, we can notice the following error. This is because with [caching_sha2_password] authentication, it should be a secure SSL/TLS communication, or else we can use SOURCE_PUBLIC_KEY_PATH/GET_SOURCE_PUBLIC_KEY which basicaly enables the RSA key pair-based password exchange by requesting the public key from the source.
shell> kubectl exec -it cluster1-pxc-0 -- sh shell> mysql -uroot -p
mysql> show replica status\G;
*************************** 1. row ***************************
Replica_IO_State: Connecting to source
Source_Host: 35.225.0.19
Source_User: replication
Source_Port: 3306
Connect_Retry: 60
Source_Log_File:
Read_Source_Log_Pos: 4
Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000001
Relay_Log_Pos: 4
Relay_Source_Log_File:
Replica_IO_Running: Connecting
Replica_SQL_Running: Yes
...
Error:
Last_IO_Error: Error connecting to source 'replication@35.225.0.19:3306'. This was attempt 2/3, with a delay of 60 seconds between attempts. Message: Access denied for user 'replication'@'35.225.0.19.' (using password: YES)
Once we passed “GET_SOURCE_PUBLIC_KEY” in the “CHANGE REPLICATION” command the error is resolved and DR successfully able to communicate with the DC.
mysql> STOP REPLICA; mysql> STOP REPLICA IO_THREAD FOR CHANNEL 'pxc1_to_pxc2'; mysql> CHANGE REPLICATION SOURCE TO SOURCE_USER='replication', SOURCE_PASSWORD='password', GET_SOURCE_PUBLIC_KEY=1 FOR CHANNEL 'pxc1_to_pxc2'; mysql> START REPLICA;
Note – The Replication user will be auto-created on the DC node. So, with the help of below command we can get the decoded password for “replication” user.
shell> kubectl get secret cluster1-secrets -o jsonpath="{.data.replication}" | base64 --decode
mysql> show replica status\G;
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 35.225.0.19
Source_User: replication
Source_Port: 3306
Connect_Retry: 60
Source_Log_File: binlog.000006
Read_Source_Log_Pos: 3047027
Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000001
Relay_Log_Pos: 150132
Relay_Source_Log_File: binlog.000006
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
...
The other PXC DR nodes will sync as usual with the Galera Synchronous replication process.
Source Failover
The asynchronous connection failover is already enabled on the DR as we defined initially in the custom resource file. The “External IPs” shows different here because they changed in this testing scenario.
mysql> select * from performance_schema.replication_asynchronous_connection_failover; +--------------+---------------+------+-------------------+--------+--------------+ | CHANNEL_NAME | HOST | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME | +--------------+---------------+------+-------------------+--------+--------------+ | pxc1_to_pxc2 | 34.29.145.138 | 3306 | | 100 | | | pxc1_to_pxc2 | 34.45.151.96 | 3306 | | 100 | | | pxc1_to_pxc2 | 34.71.57.38 | 3306 | | 100 | | +--------------+---------------+------+-------------------+--------+--------------+ 3 rows in set (0.00 sec)
Now, in case the existing Source DC[cluster1-pxc-2] is down, the DR will connect to one of the other available DC nodes based on the “Weight” and chronological order [pxc-2, pxc-1, pxc-0 etc].
- Here, we temporarily take down the Source DC[cluster1-pxc-2] node.
kubectl get pods -n pxc NAME READY STATUS RESTARTS AGE cluster1-haproxy-0 2/2 Running 0 2d3h cluster1-haproxy-1 2/2 Running 0 2d3h cluster1-haproxy-2 2/2 Running 0 2d3h cluster1-pxc-0 3/3 Running 0 2d3h cluster1-pxc-1 3/3 Running 0 35h cluster1-pxc-2 2/3 Running 1 (6s ago) 34h percona-xtradb-cluster-operator-6756dbf588-vxjxt 1/1 Running 0 2d3h xb-backup1-hlz2p 0/1 Completed 0 2d1h xb-cron-cluster1-fs-pvc-2026480026-372f8-2gfhr 0/1 Completed 0 41h xb-cron-cluster1-fs-pvc-2026490026-372f8-mgfpv 0/1 Completed 0 17h
- The DR replication breaks as it can’t reach the DC [cluster1-pxc-2].
mysql> show replica status\G;
*************************** 1. row ***************************
Replica_IO_State: Reconnecting after a failed source event read
Source_Host: 34.71.57.38
Source_User: replication
Source_Port: 3306
Connect_Retry: 60
Source_Log_File: binlog.000012
Read_Source_Log_Pos: 198
Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000002
Relay_Log_Pos: 369
Relay_Source_Log_File: binlog.000012
Replica_IO_Running: Connecting
Replica_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Source_Log_Pos: 198
Relay_Log_Space: 602
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Source_SSL_Allowed: No
Source_SSL_CA_File:
Source_SSL_CA_Path:
Source_SSL_Cert:
Source_SSL_Cipher:
Source_SSL_Key:
Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: Yes
Last_IO_Errno: 2003
Last_IO_Error: Error reconnecting to source 'replication@34.71.57.38:3306'. This was attempt 2/3, with a delay of 60 seconds between attempts. Message: Can't connect to MySQL server on '34.71.57.38:3306' (111)
- Once it reaches the “source_retry_count” and “source_connect_retry”, the Replica connects to another Source DC[cluster1-pxc-1].
mysql> show replica status\G;
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 34.45.151.96
Source_User: replication
Source_Port: 3306
Connect_Retry: 60
Source_Log_File: binlog.000007
Read_Source_Log_Pos: 198
Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000003
Relay_Log_Pos: 369
Relay_Source_Log_File: binlog.000007
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
...
Quick Summary
In this blog post, we walk through the steps to configure Cross-Site Replication in the Percona PXC operator. Although we have used the operator native Xtrabackup to feed the data to the DR via the restore process, we can also use logical backup options like (mysqldump, mydumper, etc.) to accomplish the same goals.
Using an “Asynchronous Replication” process to sync DR could lead to delays or replication lag due to its flow, or, more importantly, when working across data centres, where network latency is a big factor. However, adding a DR(PXC) cluster to DC(PXC) directly via synchronous replication could be more impactful or lead to flow control issues if any of the DR nodes struggle or experience performance/saturation issues. So, it’s equally important to consider all aspects or challenges before deploying in production.
The post Deploying Cross-Site Replication in Percona Operator for MySQL (PXC) appeared first on Percona.

The aim of 





























In