At Percona, we’ve always been dedicated to providing robust, open source solutions that meet our users’ evolving needs. Percona XtraDB Cluster (PXC) stands as a testament to this commitment, offering a highly available and scalable solution for your MySQL and Percona Server for MySQL deployments. We understand that database high availability is critical for your […]
26
2025
Percona XtraDB Cluster: Our Commitment to Open Source High Availability
26
2025
Percona XtraDB Cluster: Our Commitment to Open Source High Availability
At Percona, we’ve always been dedicated to providing robust, open source solutions that meet our users’ evolving needs. Percona XtraDB Cluster (PXC) stands as a testament to this commitment, offering a highly available and scalable solution for your MySQL and Percona Server for MySQL deployments. We understand that database high availability is critical for your […]
12
2024
GCache and Record-Set Cache Encryption in Percona XtraDB Cluster – Part Two
Keeping Your Data Safe: An Introduction to Data-at-Rest Encryption in Percona XtraDB Cluster. In the first part of this blog post, we learned how to enable GCache and Record-Set cache encryption in Percona XtraDB Cluster. This part will explore the details of the implementation to understand what happens behind the scenes. How does it work […]
07
2024
GCache and Record-Set Cache Encryption in Percona XtraDB Cluster – Part One
Keeping Your Data Safe: An Introduction to Data-at-Rest Encryption in Percona XtraDB Cluster. Data breaches are a major concern today. Whether you’re a business owner or value your privacy, the thought of your sensitive information being exposed is unsettling. Fortunately, data-at-rest encryption can help. Think of it as a sophisticated lock that scrambles your data, […]
28
2023
Consequences of Forcing Bootstrap on Percona XtraDB Cluster
Recently, I was working on a very unfortunate case that revolved around diverging clusters, data loss, missing important log errors, and forcing commands on Percona XtraDB Cluster (PXC). Even though PXC tries its best to explain what happens in the error log, I can vouch that it can be missed or overlooked when you do not know what to expect.
This blog post is a warning tale, an invitation to try yourself and break stuff (not in production, right?).
TLDR:
Do you know right away what happened when seeing this log?
2023-06-22T08:23:29.003334Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:group_post_state_exchange():433: Reversing history: 171 -> 44, this member has applied 127 more events than the primary component.Data loss is possible. Must abort.
Demonstration
Using the great https://github.com/datacharmer/dbdeployer:
$ dbdeployer deploy replication --topology=pxc --sandbox-binary=~/opt/pxc 8.0.31
Let’s write some data
$ ./sandboxes/pxc_msb_8_0_31/sysbench oltp_read_write --tables=2 --table-size=1000 prepare
Then let’s suppose someone wants to restart node 1. For some reason, they read somewhere in your internal documentation that they should bootstrap in that situation. With dbdeployer, this will translate to:
$ ./sandboxes/pxc_msb_8_0_31/node1/stop stop /home/yoann-lc/sandboxes/pxc_msb_8_0_31/node1 $ ./sandboxes/pxc_msb_8_0_31/node1/start --wsrep-new-cluster ......................................................................................................^C
It fails, as it should.
In reality, those bootstrap mistakes happen in homemade start scripts, puppet or ansible modules, or even internal procedures applied in the wrong situation.
Why did it fail? First error to notice:
2023-06-22T08:00:48.322148Z 0 [ERROR] [MY-000000] [Galera] It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
Reminder: Bootstrap should only be used when every node has been double-checked to be down; it’s a manual operation. It fails here because it was not forced and because this node was not the last to be stopped in the cluster.
Good reflex: Connecting to other mysql and check for ‘wsrep_cluster_size’ and ‘wsrep_cluster_status’ statuses before anything.
mysql> show global status where variable_name IN ('wsrep_local_state','wsrep_local_state_comment','wsrep_local_commits','wsrep_received','wsrep_cluster_size','wsrep_cluster_status','wsrep_connected');
Do not: Apply blindly what this log is telling you to do.
But we are here to “fix” around and find out, so let’s bootstrap.
$ sed -i 's/safe_to_bootstrap: 0/safe_to_bootstrap: 1/' ./sandboxes/pxc_msb_8_0_31/node1/data/grastate.dat $ ./sandboxes/pxc_msb_8_0_31/node1/start --wsrep-new-cluster .. sandbox server started
At this point, notice that from node1, you have:
$ ./sandboxes/pxc_msb_8_0_31/node1/use -e "show global status where variable_name in ('wsrep_cluster_status', 'wsrep_cluster_size')" +----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | wsrep_cluster_size | 1 | | wsrep_cluster_status | Primary | +----------------------+---------+
But from node2 and node3 you will have:
$ ./sandboxes/pxc_msb_8_0_31/node2/use -e "show global status where variable_name in ('wsrep_cluster_status', 'wsrep_cluster_size')" +----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | wsrep_cluster_size | 2 | | wsrep_cluster_status | Primary | +----------------------+---------+
Looks fishy. But does your monitoring really alert you to this?
Let’s write some more data, obviously on node1, because why not? It looks healthy.
$ ./sandboxes/pxc_msb_8_0_31/node1/sysbench oltp_delete --tables=2 --table-size=1000 --events=127 run
127 will be useful later on.
Nightmare ensues
We are a few days later. You are still writing to your node. Some new reason to restart node1 comes. Maybe you want to apply a parameter.
$ ./sandboxes/pxc_msb_8_0_31/node1/restart .............................................................................................................................................................^C
It fails?
Reviewing logs, you would find:
$ less sandboxes/pxc_msb_8_0_31/node1/data/msandbox.err ... 2023-06-22T08:23:29.003334Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:group_post_state_exchange():433: Reversing history: 171 -> 44, this member has applied 127 more events than the primary component.Data loss is possible. Must abort. ...
Voila, We find our “127” again.
Good reflex: Depends. It would need a post of its own, but that’s a serious problem.
Do not: Force SST on this node. Because it will work, and every data inserted on node1 will be lost.
What does it mean?
When forcing bootstrap, a node will always start. It won’t ever try to connect to other nodes if they are healthy. The other nodes won’t try to connect to the third one either; from their point of view, it just never joined, so it’s not part of the cluster.
When restarting the previously bootstrapped node1 in non-bootstrapped mode, that’s the first time they all see each other in a while.
Each time a transaction is committed, it is replicated along with a sequence number (seqno). The seqno is an ever-growing number. It is used by nodes to determine if incremental state transfer is possible, or if a node state is coherent with others.
Now that node1 is no longer in bootstrap mode, node1 connects to the other members. node1 shares its state (last primary members, seqno). The other nodes correctly picked up that this seqno looks suspicious because it’s higher than their own, meaning the node joining could have applied more transactions. It could also mean it was from some other cluster.
Because nodes are in doubt, nothing will happen. Node1 is denied joining and will not do anything. It won’t try to resynchronize automatically, and it won’t touch its data. Node2 and node3 are not impacted; they will be kept as is too.
How to proceed from there will depend as there are no general guidelines. Ideally, a source of truth should be found. If both clusters applied writes, that’s the toughest situation to be in, and it’s a split brain.
Note: seqno are just numbers. Having equal seqno does not actually guarantee that the underlying transactions applied are identical, but it’s still useful as a simple sanity check. If we were to mess around even more and apply 127 transactions on node2, or even modify seqno manually in grastate.dat, we could have “interesting” results. Try it out (not in production, mind you)!
Note: If you are unaware of bootstrapping and how to properly recover, check out the documentation.
Conclusion
Bootstrap is a last resort procedure, don’t force it lightly. Do not force SST right away if a node does not want to join either. You should always check the error log first.
Fortunately, PXC does not blindly let any node join without some sanity checks.
Minimize unexpected downtime and data loss with a highly available, open source MySQL clustering solution.
25
2022
Exploring Data Dump and Load Utility With MySQL Shell
In this blog post, I will try to cover a few dump and data-loading utilities offered by MySQL Shell.
What is MySQL Shell?
It is a robust client and code editor for MySQL. In addition to having APIs for dealing with MySQL, MySQL Shell offers scripting capabilities for JavaScript and Python. It provides an AdminAPI to manage MySQL servers and create InnoDB Cluster and ReplicaSet deployments. It supports an X DevAPI to work on both SQL and NoSQL interfaces via the X Protocol. Finally, it provides utilities to make working with MySQL in Python and Javascript mode easier.
Now, let’s jump into some practical stuff and see what these tools offer us.
Dump Utility
In the dump utility, there are three tools: util.dumpInstance(), util.dumpSchemas(), and util.dumpTables () so let’s discuss them one by one.
1. “util.dumpInstance()” – This was introduced in MySQL Shell 8.0.21 and supports the export of all databases excluding information_schema, MySQL, ndbinfo, performance_schema, and sys schema. The backups can be kept locally or could be taken remotely from the MySQL Shell utility.
Performing a complete instance dump with “util.dumpInstance()”:
a) Log in to the database via MySQL Shell.
shell> mysqlsh --uri root@localhost
b) Prior to doing the actual backup, we will first perform a dryRun procedure to check for any potential problems or compatibility issues.
mysql shell > util.dumpInstance("/home/backup/", {dryRun:”true”})
Output:
dumping. Acquiring global read lock Global read lock acquired Initializing - done 1 out of 5 schemas will be dumped and within them 16 tables, 0 views. 3 out of 5 users will be dumped. Gathering information - done All transactions have been started Global read lock has been released Writing global DDL files Writing users DDL Writing DDL - done Starting data dump 0% (0 rows / ~1.60M rows), 0.00 rows/s, 0.00 B/s uncompressed, 0.00 B/s compressed
So, here we didn’t get any issues or warnings so we are good to proceed with the actual backup process.
c) Running the full instance backup and saving in the target location.
mysql shell >util.dumpInstance("/home/backup/")
Output:
dumping. Acquiring global read lock Global read lock acquired Initializing - done 1 out of 5 schemas will be dumped and within them 16 tables, 0 views. 3 out of 5 users will be dumped. Gathering information - done All transactions have been started Global read lock has been released Writing global DDL files Writing users DDL Running data dump using 4 threads. NOTE: Progress information uses estimated values and may not be accurate. Writing schema metadata - done Writing DDL - done Writing table metadata - done Starting data dump 101% (1.63M rows / ~1.60M rows), 114.56K rows/s, 22.13 MB/s uncompressed, 10.09 MB/s compressed Dump duration: 00:00:15s Total duration: 00:00:15s Schemas dumped: 1 Tables dumped: 16 Uncompressed data size: 313.55 MB Compressed data size: 142.87 MB Compression ratio: 2.2 Rows written: 1629368 Bytes written: 142.87 MB Average uncompressed throughput: 20.01 MB/s Average compressed throughput: 9.12 MB/s
We have successfully finished the backup here. By default, it employs “4 threads” and applies the zstd compression algorithm to the database backup.
d) Running the instance dump with more advanced options.
mysql shell >util.dumpInstance("/home/backup/", {threads:8,maxRate:"100M",consistent:true,chunking:true,bytesPerchunk:"64M",compression:"zstd"})
Output:
Acquiring global read lock Global read lock acquired Initializing - done 1 out of 5 schemas will be dumped and within them 16 tables, 0 views. 3 out of 5 users will be dumped. Gathering information - done All transactions have been started Global read lock has been released Writing global DDL files Writing users DDL Running data dump using 8 threads. NOTE: Progress information uses estimated values and may not be accurate. Writing schema metadata - done Writing DDL - done Writing table metadata - done Starting data dump 101% (1.63M rows / ~1.60M rows), 468.47K rows/s, 84.22 MB/s uncompressed, 38.41 MB/s compressed Dump duration: 00:00:03s Total duration: 00:00:03s Schemas dumped: 1 Tables dumped: 16 Uncompressed data size: 313.55 MB Compressed data size: 142.87 MB Compression ratio: 2.2 Rows written: 1629368 Bytes written: 142.87 MB Average uncompressed throughput: 85.24 MB/s Average compressed throughput: 38.84 MB/s
There are various options available to control and enhance the dump process.
“dryRun”: List out all of the compatibility issues before starting the dump. “consistent”: In order to provide consistent backup, it automatically acquires a global read lock. By default, it is turned on. “chunking”: splits each table's data into a number of files. It is enabled by default. “bytesPerChunk”: specify the approximate number of bytes that should be written for each data block. “threads”: The number of parallel threads to employ when dumping data from the MySQL instance. "4" threads are used by default. “maxRate”: Maximum data read throughput for each thread during the dump in bytes per second.
2) “util.dumpSchemas” – This was introduced in MySQL Shell 8.0.21 and supports the export of selected databases either locally or remotely.
Performing database dump with “util.dumpSchemas()”. Here we will take a specific database(“sbtest”) backup only.
mysql shell > util.dumpSchemas(["sbtest"],"/home/backup/sbtest/",{threads :2})
Output:
dumping. Acquiring global read lock Global read lock acquired Initializing - done 1 schemas will be dumped and within them 16 tables, 0 views. Gathering information - done All transactions have been started Global read lock has been released Writing global DDL files Running data dump using 2 threads. NOTE: Progress information uses estimated values and may not be accurate. Writing schema metadata - done Writing DDL - done Writing table metadata - done Starting data dump 104% (1.63M rows / ~1.56M rows), 424.56K rows/s, 81.96 MB/s uncompressed, 37.35 MB/s compressed Dump duration: 00:00:03s Total duration: 00:00:03s Schemas dumped: 1 Tables dumped: 16 Uncompressed data size: 313.55 MB Compressed data size: 142.87 MB Compression ratio: 2.2 Rows written: 1629368 Bytes written: 142.87 MB Average uncompressed throughput: 81.82 MB/s Average compressed throughput: 37.28 MB/s
3) “util.dumpTables” – In MySQL Shell 8.0.22, the table dump utility “util.dumpTables()” was released.
Performing selective tables dump. Here we take only table “sbtest2” dump from the “sbtest” database.
mysql shell> util.dumpTables("sbtest", [ "sbtest2"], "/home/backup/sbtest/",{threads: 4})
Output:
Writing schema metadata - done Writing DDL - done Writing table metadata - done Starting data dump 101% (101.89K rows / ~100.58K rows), 0.00 rows/s, 0.00 B/s uncompressed, 0.00 B/s compressed Dump duration: 00:00:00s Total duration: 00:00:00s Schemas dumped: 1 Tables dumped: 1 Uncompressed data size: 19.61 MB Compressed data size: 8.93 MB Compression ratio: 2.2 Rows written: 101892 Bytes written: 8.93 MB Average uncompressed throughput: 19.61 MB/s Average compressed throughput: 8.93 MB/s
Data load utility
So far, we’ve seen how to use various methods to take a dump. We will now demonstrate how to restore the same using a single restoration command for all sorts of backups.
The utility “util.loadDump()” for loading dumps was introduced in MySQL Shell 8.0.21. It enabled the parallel loading of tables or table chunks during the data-loading process.
Load the dump files using util.loadDump().
a) Running a dry run process to validate any compatibility issues or errors prior to the actual restoration process.
mysql shell > util.loadDump("/home/backup/", {dryRun :"true"})
Output:
Loading DDL and Data from '/home/backup/' using 4 threads. Opening dump... dryRun enabled, no changes will be made. Target is MySQL 5.7.38-41-57-log. Dump was produced from MySQL 5.7.38-41-57-log Scanning metadata - done Checking for pre-existing objects... Executing common preamble SQL Executing DDL - done Executing view DDL - done Starting data load Executing common postamble SQL 0% (0 bytes / 313.55 MB), 0.00 B/s, 16 / 16 tables done Recreating indexes - done No data loaded. 0 warnings were reported during the load.
Note – no data loaded, it just prints us information about any warnings or errors during the restoration process.
b) Running the data restoration process.
mysql shell > util.loadDump("/home/backup/", {progressFile :"/home/backuplog.json",threads:4,backgroundThreads:4,maxBytesPerTransaction:"4096"})
Output:
Loading DDL and Data from '/home/backup/' using 4 threads. Opening dump... Target is MySQL 5.7.38-41-57-log. Dump was produced from MySQL 5.7.38-41-57-log Scanning metadata - done Checking for pre-existing objects... Executing common preamble SQL Executing DDL - done Executing view DDL - done Starting data load Executing common postamble SQL 100% (313.55 MB / 313.55 MB), 3.03 MB/s, 16 / 16 tables done Recreating indexes - done 16 chunks (1.63M rows, 313.55 MB) for 16 tables in 1 schemas were loaded in 2 min 43 sec (avg throughput 1.92 MB/s) 0 warnings were reported during the load.
Here, we have successfully completed the restoration process.
There are various options available to control and enhance the data loading process.
“threads”:The number of parallel threads to employ when dumping data from the MySQL instance. "4" is used by default. “backgroundThreads”:the number of threads in the pool of background threads that were used to retrieve the file data. “maxBytesPerTransaction”:The maximum number of bytes that can be loaded from a data file with a single LOAD DATA statement. “loadIndexes”: Secondary indexes are not built during the import when this option is set to false; you must do so afterwards. This may come in handy if you're loading the DDL files and data files separately and want to modify the table structure after the DDL files have been loaded. “deferTableIndexes”:Postpone creating secondary indexes until after loading the table's contents. This may speed up loading. off denotes that all indexes are built as the table is loaded.
Taking backups on cloud storage (GCP and AWS S3)
MySQL Shell also provides the capability to store dumps remotely over some S3-compatible cloud storage such as Amazon S3 and Google Cloud storage. Here, we’ll talk about how to do it with Google Cloud storage and an Amazon S3 environment.
Let’s start by configuring it for Google Cloud storage.
Prerequisites
- Storage Bucket should exist in Google Cloud
- Access keys and secrets defined in a file(“/home/credentials”)
[gcb] aws_access_key_id=xxxxxxxx aws_secret_access_key=xxxxxxxx
- GCP cloud storage endpoint – “https://storage.googleapis.com”
a) Running full backup and putting it in the GCP bucket “ajpxctest”.
mysql shell> util.dumpInstance("sbtest",{s3BucketName: "ajpxctest",s3EndpointOverride: "https://storage.googleapis.com", s3CredentialsFile:"/home/credentials", s3Profile: "gcb", threads: 4})
Output:
dumping. Acquiring global read lock Global read lock acquired Initializing - done 1 out of 5 schemas will be dumped and within them 16 tables, 0 views. 3 out of 5 users will be dumped. Gathering information - done All transactions have been started Global read lock has been released Writing global DDL files Writing users DDL Running data dump using 4 threads. NOTE: Progress information uses estimated values and may not be accurate. Writing schema metadata - done Writing DDL - done Writing table metadata - done Starting data dump 104% (1.63M rows / ~1.56M rows), 172.12K rows/s, 34.60 MB/s uncompressed, 15.75 MB/s compressed Dump duration: 00:00:12s Total duration: 00:00:14s Schemas dumped: 1 Tables dumped: 16 Uncompressed data size: 313.55 MB Compressed data size: 142.87 MB Compression ratio: 2.2 Rows written: 1629368 Bytes written: 142.87 MB Average uncompressed throughput: 25.21 MB/s Average compressed throughput: 11.49 MB/s
b) Then, using these same details, we will restore the data from the S3 bucket into the local database.
mysql shell> util.loadDump("sbtest",{s3BucketName: "ajpxctest", s3EndpointOverride: "https://storage.googleapis.com",s3CredentialsFile:"/home/credentials", s3Profile: "gcb", threads: 4})
Output:
Loading DDL and Data from AWS S3 bucket=ajpxctest, prefix='sbtest' using 4 threads. Opening dump... Target is MySQL 5.7.38-41-57-log. Dump was produced from MySQL 5.7.38-41-57-log Fetching dump data from remote location... Listing files - done Scanning metadata - done Checking for pre-existing objects... Executing common preamble SQL Executing DDL - done Executing view DDL - done Starting data load 1 thds loading / 100% (313.55 MB / 313.55 MB), 6.23 MB/s, 14 / 16 tables done Recreating indexes - done Executing common postamble SQL 16 chunks (1.63M rows, 313.55 MB) for 16 tables in 1 schemas were loaded in 1 min 8 sec (avg throughput 7.41 MB/s) 0 warnings were reported during the load.
S3 options:
“s3BucketName”: Name of the s3 bucket where the dump will be kept. “s3EndpointOverride”: s3 endpoint or url. “s3CredentialsFile”: Credential file containing key & secret details. “s3Profile”: The name of the s3 profile to identify the credential details.
Let’s configure the same for Amazon S3
Prerequisites
- AWS S3 bucket should exist
- Configure AWS CLI
###~/.aws/credentials### [default] aws_access_key_id=xxxxxxx aws_secret_access_key=xxxxxxx ###~/.aws/config### [default] region = us-east-1
a) Now, let’s perform the instance backup process which keeps the dump file in “ajtestbkp” S3 bucket.
mysql shell> util.dumpInstance("sbtest",{s3bucketName: "ajtestbkp",threads: 4})
Output:
Acquiring global read lock Global read lock acquired Initializing - done 1 out of 5 schemas will be dumped and within them 16 tables, 0 views. 3 out of 5 users will be dumped. Gathering information - done All transactions have been started Global read lock has been released Writing global DDL files Writing users DDL Running data dump using 4 threads. NOTE: Progress information uses estimated values and may not be accurate. Writing schema metadata - done Writing DDL - done Writing table metadata - done Starting data dump 101% (1.63M rows / ~1.61M rows), 287.53K rows/s, 53.81 MB/s uncompressed, 24.51 MB/s compressed Dump duration: 00:00:06s Total duration: 00:00:07s Schemas dumped: 1 Tables dumped: 16 Uncompressed data size: 313.55 MB Compressed data size: 142.87 MB Compression ratio: 2.2 Rows written: 1629368 Bytes written: 142.87 MB Average uncompressed throughput: 49.93 MB/s Average compressed throughput: 22.75 MB/s
b) Now, let’s restore the dump from S3 into our local database.
mysql shell> util.loadDump("sbtest",{s3BucketName: "ajtestbkp", threads: 4})
Output:
Loading DDL and Data from AWS S3 bucket=ajtestbkp, prefix='sbtest' using 4 threads. Opening dump... Target is MySQL 5.7.38-41-57-log. Dump was produced from MySQL 5.7.38-41-57-log Fetching dump data from remote location... Listing files - done Scanning metadata - done Checking for pre-existing objects... Executing common preamble SQL Executing DDL - done Executing view DDL - done Starting data load 1 thds loading / 100% (313.55 MB / 313.55 MB), 8.80 MB/s, 15 / 16 tables done Recreating indexes - done Executing common postamble SQL 16 chunks (1.63M rows, 313.55 MB) for 16 tables in 1 schemas were loaded in 38 sec (avg throughput 9.14 MB/s) 0 warnings were reported during the load.
Summary
These utilities are a great addition to MySQL Shell and very useful in the scenario of backup/restoration of large tables by using different control options. Now, we can have better handling of the dump/load process by defining options such as threads, max rate, and maxBytesPerTransaction. Dump utilities are supported in (MySQL 5.6,5.7, and 8.0), however, for data loading, the database should be either MySQL 5.7 or 8.0.
15
2022
Percona XtraDB Cluster on Amazon EC2 and Two Interesting Changes in PXC 8.0
This article outlines the basic configurations for setting up and deploying Percona XtraDB Cluster 8.0 (PXC) on Amazon EC2, as well as what is new in the setup compared to Percona XtraDB Cluster 5.7.
What is Percona XtraDB Cluster an ideal fit for?
Percona XtraDB Cluster is a cost-effective, high-performance clustering solution for mission-critical data. It combines all the improvements, and functionality found in MySQL 8 with Percona Server for MySQL‘s Enterprise features and Percona’s upgraded Galera library.
A Percona XtraDB Cluster environment is an ideal fit for applications requiring 5-9s uptime with high read workloads; industries like financial or healthcare businesses that require in-house or externally dedicated database resources.
How is a three-node cluster configured in an EC2 environment?
In order to describe the setup procedures, I’ll be using Amazon EC2 instances to build the environment, and based on the business requirements, we may utilize alternative infrastructures to build the cluster environment.
Amazon EC2 settings are designed to provide uptime and high availability. When designing architecture in an EC2 environment, it is preferable to have one node situated in another Availability Zones to avoid the loss of an entire AZ and its data.
If a different region is planned for a node, we can prevent the loss of the entire region and its data. It is desirable that nodes and regions have appropriate network connectivity because network latency between the two regions affects synchronous replication write latency. Alternatively, an async replica in a different region is an option.
I’m not going into too much depth on Amazon EC2 to keep this blog brief and readable.
To build the three-node Percona XtraDB 8.0 cluster environment we first spin up the following three nodes in EC2. I’m using Amazon Linux but you can also use Ubuntu or any of the Percona-supported operating systems.
It is advised that a cluster’s nodes all have the same configuration.
PXCNode1 IP Address: 172.31.18.119
PXCNode2 IP Address: 172.31.86.114
PXCNode3 IP Address: 172.31.26.152
To install the Percona repository on all three nodes, use the following command.
$ sudo yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
Enable the Percona Server for MySQL 8.0 repository in all three nodes by running the following command.
$ sudo percona-release setup pxc-80 * Disabling all Percona Repositories * Enabling the Percona XtraDB Cluster 8.0 repository * Enabling the Percona Tools repository <*> All done! $
Using the following command, install the Percona XtraDB Cluster packages and software on all three nodes.
$ sudo yum install percona-xtradb-cluster
Before starting the nodes, update the basic variables listed below for the nodes
The following default variables must be modified with the first installations. Those that came from PXC 5.7 might wonder why wsrep_sst_auth is missing. The wsrep_sst_auth variable was removed in PXC 8 since it causes security concerns, as the user and password are saved in the config file and are easily visible to OS users.
In PXC 8, a temporary user is created when a new node joins the existing cluster. For additional details on this security enhancement, check this article.
$ vi /etc/my.cnf .. ######## wsrep ############### wsrep_provider=/usr/lib64/galera4/libgalera_smm.so wsrep_cluster_address=gcomm://172.31.18.119,172.31.86.114,172.31.26.152 wsrep_slave_threads=8 wsrep_log_conflicts wsrep_node_address=172.31.18.119 wsrep_cluster_name=pxc-cluster #If wsrep_node_name is not specified, then system hostname will be used wsrep_node_name=pxc-cluster-node-1 wsrep_sst_method=xtrabackup-v2 ..
Percona XtraDB Cluster nodes utilize the following ports by default, so we must open them and ensure that the nodes can communicate with one another.
3306 is used for MySQL client connections and SST (State Snapshot Transfer) via mysqldump. 4444 is used for SST via Percona XtraBackup. 4567 is used for write-set replication traffic (over TCP) and multicast replication (over TCP and UDP). 4568 is used for IST (Incremental State Transfer).
For example, to test access.
Node 1 # socat - TCP-LISTEN:4444 hello Node 2 # echo "hello" | socat - TCP:172.31.18.119:4444 #
How is the first node bootstrapped?
After configuring each PXC node, you must bootstrap the cluster starting with the first node. All of the data that you wish to replicate to additional nodes must be present on the first node.
Run the command to bootstrap the first node.
# systemctl start mysql@bootstrap.service #
Since this is a brand new install, a temporary password is generated for the ‘root’ MySQL user which we can find in the mysqld.log
# grep -i "A temporary password is generated " /var/log/mysqld.log 2022-09-13T06:52:37.700818Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: PmqQiGf*#6iy #
Reset the temporary password using the following alter.
$ mysql -uroot -p Enter password: mysql> SET PASSWORD = 'GdKG*12#ULmE'; Query OK, 0 rows affected (0.03 sec)
How can the cluster’s remaining nodes be joined?
Before starting node2, you must copy the SSL certificates from node1 to node2 (and to node3). PXC 8 by default encrypts all replication communication, so this is a critical step that most users miss, causing cluster startup failures.
PXCNode1# scp /var/lib/mysql/*.pem 172.31.86.114:/var/lib/mysql/ PXCNode2# chown -R mysql.mysql /var/lib/mysql/*.pem
Start the node.
PXCNode2# systemctl start mysql #
Verify the following SET and make sure they appear as below once the node has been added to the cluster.
PXCNode2$ mysql -uroot -p -e " show global status where variable_name IN ('wsrep_local_state','wsrep_local_state_comment','wsrep_cluster_size','wsrep_cluster_status','wsrep_connected','wsrep_ready');" Enter password: +---------------------------+---------+ | Variable_name | Value | +---------------------------+---------+ | wsrep_cluster_size | 2 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_ready | ON | +---------------------------+---------+
The third node may be added to the cluster using the same procedures, and its status will then look as follows.
PXCNode3$ mysql -uroot -p -e " show global status where variable_name IN ('wsrep_local_state','wsrep_local_state_comment','wsrep_cluster_size','wsrep_cluster_status','wsrep_connected','wsrep_ready');" Enter password: +---------------------------+---------+ | Variable_name | Value | +---------------------------+---------+ | wsrep_cluster_size | 3 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_ready | ON | +---------------------------+---------+
Additional supporting factors
Additionally, the use of failover technologies like ProxySQL, which assist in removing failing nodes from the active read pool in some cases and shifting the primary, is advised.
It is advised to have a backup in place, with the open-source tool Percona XtraBackup taking physical copies of the dataset that are significantly faster to recover. It is strongly advised to back up binary logs using mysqlbinlog in order to do point-in-time recovery. Backups should be encrypted, compressed, and transferred to S3 as soon as possible.
To copy binlogs to an s3 bucket the command looks like this.
aws s3 sync /backups/binlogs/ s3://backup-bucket/ upload: ../../../../backups/binlogs/binlog.000001.gz.gpg to s3://backup-bucket/binlog.000001.gpg
For query analytics and time-based database performance insights, the open-source tool Percona Monitoring and Management is highly recommended. Using the Amazon Marketplace, this needs to be deployed on a separate host. This monitors the operating system and MySQL metrics and provides sophisticated query analytics.
11
2022
How to Build Percona XtraDB Cluster From Sources
Percona XtraDB Cluster (PXC) is a database clustering solution for MySQL. It ensures high availability, prevents downtime and data loss, and provides linear scalability for a growing environment.
Lately, the number of questions about how to build Percona software has increased. More and more people try to add their own patches, add some modifications, and build software by themselves. But this raises the question of how to do this in the same way as Percona does, as sometimes the compiler flag can make a drastic impact on the final binary. This is a really essential question and let’s look at how we build Percona XtraDB Cluster here at Percona.
At the first glance, the building process is not something difficult. Everyone knows that magic make install all
command and suppose that as a result everything will be built and installed on the system. But databases are very essential for applications and it is needed to do everything carefully.
Let’s divide the build process into the stages:
- Prepare build environments
- Prepare source tarballs
- Prepare source package
- Prepare binary packages
We can skip the third stage, but in this case, we wouldn’t be able to make repeatable builds so I would suggest always creating source packages.
So let’s move forward and discuss each step.
One of the key points during a build is that the environment should be clean and it is necessary to install all needed dependencies, and so on. For each version, the dependency list would be different. How do you get the correct dependency list? You can get all build requirements from the spec file (on rpm-based systems) or from the control file( on deb-based systems).
As for source tarball, you can easily do the following:
- Get source tarball from the website
We publish source tarball for each release we issue, and you can easily get it for any released version (this is the link for the 8.0.28 version):
- Download tarball from GitHub
Here is the link:
You can use build instructions from our website and proceed with the compilation process:
https://www.percona.com/doc/percona-xtradb-cluster/LATEST/install/compile.html#compile
So everything looks easy. But what is the way we use, internally, to prepare our release packages? As I mentioned earlier, each compiler option can make a significant effect.
Everyone tries to make life easier and automate all tasks. It is great, as automation is one of the keys to success as you can work on other tasks once the release build is in progress. So we have created a build script that is used for making builds. It can be found in the PXC GitHub repo and can be used by anyone to make his own builds.
This script can install all needed dependencies for the build environment, create binary tarballs, source RPMs and debs, RPMs, debs itself, and the binary tarball. So it covers all build cycles. So how do we use it?
The script has various params:
--builddir=DIR Absolute path to the dir where all actions will be performed --get_sources Source will be downloaded from github --build_src_rpm If it is 1 src rpm will be built --build_source_deb If it is 1 source deb package will be built --build_rpm If it is 1 rpm will be built --build_deb If it is 1 deb will be built --build_tarball If it is 1 tarball will be built --install_deps Install build dependencies(root privileges are required) --branch Branch for build --repo Repo for build --rpm_release RPM version( default = 1) --deb_release DEB version( default = 1) --debug Build debug tarball
So let’s see how we can make the build process easier:
1. Download the build script:
wget https://raw.githubusercontent.com/percona/percona-xtradb-cluster/8.0/build-ps/pxc_builder.sh
2. Create a build directory where you are going to perform all build actions:
mkdir /tmp/BUILD_PXC
3. Install dependencies(please note root permissions are required):
sudo ./pxc_builder.sh --builddir=/tmp/BUILD_PXC --install_deps=1
4. Download source code:
– From Percona repo (it is used by default):
./pxc_builder.sh --builddir=/tmp/BUILD_PXC --get_sources=1 --branch=Percona-XtraDB-Cluster-8.0.28-19
– From your own repo and branch:
./pxc_builder.sh --builddir=/tmp/BUILD_PXC --get_sources=1 --branch=<your_branch_name> --repo=<link_to_your_repo_on_github>
5. Prepare src rpm:
./pxc_builder.sh --builddir=/tmp/BUILD_PXC --build_src_rpm=1
Please note if you already have your source tarball, just create a directory named source_tarball in the build directory and put it into it.
6. Prepare source deb:
./pxc_builder.sh --builddir=/tmp/BUILD_PXC --build_source_deb=1
Please note if you already have your source tarball, just create a directory named source_tarball in the build directory and put it into it.
7. Prepare rpm:
./pxc_builder.sh --builddir=/tmp/BUILD_PXC --build_rpm=1
8. Prepare deb:
./pxc_builder.sh --builddir=/tmp/BUILD_PXC --build_deb=1
9. Prepare tarball:
./pxc_builder.sh --builddir=/tmp/BUILD_PXC –build_tarball=1
So as you can see, the build procedure becomes easier and you don’t need to think about what dependencies are needed, what cmake params should be used, etc. This build script will make everything for you and will use all build flags (if you didn’t change them in your sources) that we use for release builds.
25
2022
Managing MySQL Configurations with the PXC Kubernetes Operator V1.10.0 Part 1: Introduction
Introduction/FAQ
Question: I need to run a production-grade open source MySQL DB.
Answer: Percona to the rescue! Percona XtraDB Cluster (PXC) is an open source enterprise MySQL solution that helps you to ensure data availability for your applications while improving security and simplifying the development of new applications in the most demanding public, private, and hybrid cloud environments
Question: I forgot to mention that I need to run it on Kubernetes.
Answer: Percona to the rescue again! Percona Distribution for MySQL Operator based on Percona XtraDB Cluster contains everything you need to quickly and consistently deploy and scale Percona XtraDB Cluster instances in a Kubernetes-based environment on-premises or in the cloud.
Question: I have a lot of MySQL configurations to manage.
Answer: PXC Operator makes it easy to manage MySQL Configurations. Let’s explore.
For the rest of the article, the name of the PXC cluster is assumed to be cluster1, and this can be modified based on the user preference.
How can I change the MySQL configurations?
If you have not done it already, the first thing to do is install the PXC operator. Our Quickstart guide gives detailed instructions on how to get started.
There are three possible ways to modify the MySQL configurations as described in the Documentation for MySQL options:
- Custom Resource PerconaXtraDBCluster (pxc/pxcs/perconaxtradbclusters )
- Config map with name cluster1-pxc
- Secret with name cluster1-pxc
Which option should I choose for managing configurations?
The choice of using the above options depends on the use case and the user’s preferences.
Following are some examples:
Using ConfigMap
- If the MySQL configuration is pretty big and/or if you want to maintain the configuration separately rather than updating everything in PXC object.
- If you want to provide permission to change MySQL configurations but not the other properties of PXC objects like resources, affinity, etc., K8s RBAC can be used to achieve this. A Role/ClusterRole can be created to provide access only for the ConfigMap which is used for MySQL configuration.
Using Secrets
- If there is any sensitive information that needs to be used in the configuration, secrets are recommended. Even though k8s secrets are just base64 encoded data, secrets have the advantage of integrating well with vaults, and it’s always best practice to use k8s secrets than ConfigMap when there is sensitive data.
What happens when I change MySQL configuration?
Any changes in MySQL configurations will generally recycle the pods in reverse order if the RollingUpdate strategy is used.
Example: If three replicas are used for the PXC cluster, cluster1-pxc-[0,1,2] pods would be created. When MySQL configuration is changed, cluster1-pxc-2 will be terminated first and the system will wait till the new pod cluster1-pxc-2 starts running and becomes healthy, then cluster1-pxc-1 will be terminated, and so on.
Following are the changes observed with the watch command.
# kubectl get po -l app.kubernetes.io/component=pxc --watch NAME READY STATUS RESTARTS AGE cluster1-pxc-0 3/3 Running 0 8m23s cluster1-pxc-1 3/3 Running 0 10m cluster1-pxc-2 3/3 Running 0 13m cluster1-pxc-2 3/3 Terminating 0 13m cluster1-pxc-2 0/3 Terminating 0 14m cluster1-pxc-2 0/3 Pending 0 0s cluster1-pxc-2 0/3 Init:0/1 0 1s cluster1-pxc-2 0/3 PodInitializing 0 8s cluster1-pxc-2 2/3 Running 0 10s cluster1-pxc-2 3/3 Running 0 2m cluster1-pxc-1 3/3 Terminating 0 14m cluster1-pxc-1 0/3 Terminating 0 14m cluster1-pxc-1 0/3 Pending 0 0s cluster1-pxc-1 0/3 Init:0/1 0 1s cluster1-pxc-1 0/3 PodInitializing 0 6s cluster1-pxc-1 2/3 Running 0 8s cluster1-pxc-1 3/3 Running 0 2m1s cluster1-pxc-0 3/3 Terminating 0 13m cluster1-pxc-0 0/3 Terminating 0 14m cluster1-pxc-0 0/3 Pending 0 0s cluster1-pxc-0 0/3 Init:0/1 0 0s cluster1-pxc-0 0/3 PodInitializing 0 6s cluster1-pxc-0 2/3 Running 0 8s cluster1-pxc-0 3/3 Running 0 2m
In the upcoming post, we will see the precedence and the changes happening while modifying MySQL configurations. Stay tuned!
21
2022
Preventing Stale Reads on Percona XtraDB Cluster by Enforcing Causality Checks on a Per-Query Basis
When we run a SELECT in a replica server and it returns a different result to the one we would have obtained had we queried the source server instead, due to changes in the dataset that have not yet been replicated or synchronized to the replica, we get what is known as a stale read. My colleague Marcelo has blogged about Stale Reads and How to Fix Them before.
The need to avoid stale reads is a constraint that leads people into migrating to Percona XtraDB Cluster (PXC) / Galera Cluster or, more recently, Group Replication. I am focusing on PXC and Galera in this short blog post, which, similar to Group Replication, implements not exactly a synchronous replication model but what is commonly referred to as a “virtually synchronous” replication (but not without some polemic). In short, it means a transaction that is committed in one node is guaranteed (and there’s a certification process behind this) to have “reached” other nodes, but that doesn’t mean the changes this transaction carries with it have been applied on the other nodes already. Thus, stale reads can happen in PXC too. For the long version, see the manual page about Certification in Percona XtraDB Cluster.
Why does PXC implement virtually synchronous replication instead of fully synchronous replication? There are a few challenges to accomplish this but we can nail it down to performance constraints. Fully synchronous replication requires not only for the transactions to reach the other nodes but for the transactions to be applied to them too. And then they have to report back the success of the operation. For each transaction. This takes time and increases the overall commit time, which in practice makes writing to the database slow and impacts overall concurrency.
Evolution of “synchronous” replication on PXC
To be clear, no matter how you configure PXC, it won’t become a fully-synchronous system (neither will Group Replication, for that matter). BUT, it is possible to make it behave in such a way that it enforces data consistency. That’s what the variable wsrep_causal_reads has been used for. When this setting was introduced, it worked like a switch that one could use to kind of enable “synchronous” replication (as in data consistency) across the cluster – either for all transactions (globally) or at a connection basis (per session). In fact, what this setting does in practice is to enforce causality checks; quoting the manual: “When this variable is set to ON, the replica will wait until that event is applied before doing any other queries”. By waiting, it means blocking new writes until the expected consistency is reached.
Being able to enable “synchronous” replication for selected connections brought the potential to mix the best of both modes: operate with virtually synchronous replication by default (good performance) and use “synchronous” replication (once more, enforcing data consistency) when stale reads cannot be afforded.
The problem with this approach is that it requires some planning in advance to determine which connections should be configured with wsrep_causal_reads enabled. Without carefully planning ahead, what ends up happening in practice in many cases is having more connections configured with wsrep_causal_reads enabled than necessary. As a result, the cluster performance gets closer to that of operating with that setting applied globally.
The switch button that was wsrep_causal_reads evolved into the more granular wsrep_sync_wait, which allows the configuration of causality checks for different combinations of statements.
The cherry on top
While wsrep_sync_wait expanded the possibilities for causality checks, it still requires the “planning ahead” discussed above. I recently worked on the troubleshooting of a slow cluster and found it was configured with wsrep_sync_wait set to three (“Perform checks for READ, UPDATE, and DELETE statements”) all around, which is an even more “aggressive” configuration than the legacy wsrep_causal_reads. The justification for operating this way was that it was not possible to identify upfront which connection would be issuing a particular query that could not afford stale reads.
The good news is that starting with Percona XtraDB Cluster 8.0.26-16, which was released earlier this year, we can set wsrep_sync_wait on a per-query basis through the use of the optimizer hint SET_VAR: “The SET_VAR hint sets the session value of a system variable temporarily (for the duration of a single statement)”.
Here’s a simple example of employing SET_VAR to enforce the highest level of causality checks for a simple SELECT:
SELECT /*+ SET_VAR(wsrep_sync_wait=7) */ balance FROM accounts WHERE id=100;
If you find it is complicated to determine beforehand which connections should be configured to avoid stale reads, know that now you have the option to literally hard-code this constraint in your query. And if you end up deploying your database in a regular MySQL server, which is shipped without any wsrep_ settings, the optimizer hint is simply ignored and the query is processed normally.