The aim of Percona Kubernetes Operator for Percona XtraDB Cluster is to be a special type of controller introduced to simplify complex deployments. The Operator extends the Kubernetes API with custom resources. The Operator solution is using Percona XtraDB Cluster (PXC) behind the hood to provide a highly available, resilient, and scalable MySQL service in the Kubernetes space.
This solution comes with all the advantages/disadvantages provided by Kubernetes, plus some advantages of its own like the capacity to scale reads on the nodes that are not Primary.
Of course, there are some limitations like the way PXC handles DDLs, which may impact the service, but there is always a cost to pay to get something, expecting to have it all for free is unreasonable.
In this context, we need to talk and cover what is full read consistency in this solution and why it is important to understand the role it plays.
Stale Reads
When using Kubernetes we should talk about the service and not about the technology/product used to deliver such service.
In our case, the Percona Operator is there to deliver a MySQL service. We should then see that as a whole, as a single object. To be more clear what we must consider is NOT the fact we have a cluster behind the service but that we have a service that to be resilient and highly available, use a cluster.
We should not care if a node/pod goes down unless the service is discontinued.
What we have as a plus in the Percona Operator solution is a certain level of READ scalability. This achieved optimizing the use of the non PRIMARY nodes, and instead of having them sitting there applying only replicated data, the Percona Operator provides access to them to scale the reads.
But… there is always a BUT ?
Let us start with an image:
By design, the apply and commit finalize in Galera (PXC) may have (and has) a delay between nodes. This means that, if using defaults, applications may have inconsistent reads if trying to access the data from different nodes than the Primary.
It provides access using two different solutions:
- Using HAProxy (default)
- Using ProxySQL
When using HAProxy you will have 2 entry points:
- cluster1-haproxy, which will point to the Primary ONLY, for reads and writes. This is the default entry point for the applications to the MySQL database.
- cluster1-haproxy-replicas, which will point to all three nodes and is supposed to be used for READS only. This is the PLUS you can use if your application has READ/WRITE separation.
Please note that at the moment there is nothing preventing an application to use the cluster1-haproxy-replicas also for write, but that is dangerous and wrong because will generate a lot of certification conflicts and BF abort given it will distribute writes all over the cluster impacting on performance as well (and not giving you any write scaling):
[marcotusa@instance-1 ~]$ for i in `seq 1 100`; do mysql -h cluster1-haproxy-replicas -e "insert into test.iamwritingto values(null,@@hostname)";done +----------------+-------------+ | host | count(host) | +----------------+-------------+ | cluster1-pxc-1 | 34 | | cluster1-pxc-2 | 33 | | cluster1-pxc-0 | 33 | +----------------+-------------+
When using ProxySQL the entry point is a single one, but you may define query rules to automatically split the R/W requests coming from the application. This is the preferred method when an application has no way to separate the READS from the writes.
Here I have done a comparison of the two methods, HAProxy and ProxySQL.
Now, as mentioned above, by default, PXC (any Galera base solution) comes with some relaxed settings, for performance purposes. This is normally fine in many standard cases, but if you use the Percona Operator and use the PLUS of scaling reads using the second access point with HAproxy or Query Rules with Proxysql, you should NOT have stale reads, given the service must provide consistent data, as if you are acting on a single node.
To achieve that you can change the defaults and change the parameter in PXC wsrep_sync_wait.
When changing the parameter wsrep_sync_wait as explained in the documentation, the node initiates a causality check, blocking incoming queries while it catches up with the cluster.
Once all data on the node receiving the READ request is commit_finalized, the node performs the read.
But this has a performance impact, as said before.
What Is The Impact?
To test the performance impact I had used a cluster deployed in GKE, with these characteristics:
- 3 Main nodes n2-standard-8 (8 vCPUs, 32 GB memory)
- 1 App node n2-standard-8 (8 vCPUs, 32 GB memory)
- PXC pods using:
- 25GB of the 32 available
- 6 CPU of the 8 available
- HAProxy:
- 600m CPU
- 1GB RAM
- PMM agent
- 500m CPU
- 500 MB Ram
In the application node, I used sysbench running two instances, one in r/w mode the other only reads. Finally, to test the stale read, I used the stale read test from my test suite.
Given I was looking for results with a moderate load, I just used 68/96/128 threads per sysbench instance.
Results
Marco, did we have or not have stale reads? Yes, we did:
I had from 0 (with very light load) up to 37% stale reads with a MODERATED load, where moderated was the 128 threads sysbench running.
Setting wsrep_sync_wait=3 of course I had full consistency. But I had performance loss:
As you can see, I had an average loss of 11% in case of READS:
While for writes the average loss was 16%.
Conclusions
At this point, we need to stop and think about what is worth doing. If my application is READs heavy and READs scaling, it is probably worth enabling the full synchronicity given scaling on the additional node allows me to have 2x or more READs.
If instead my application is write critical, probably losing also ~16% performance is not good.
Finally if my application is stale reads tolerant, I will just go with the defaults and get all the benefits without penalties.
Also keep in mind that Percona Kubernetes Operator for Percona XtraDB Cluster is designed to offer a MySQL service so the state of the single node is not as critical as if you are using a default PXC installation, PODs are by nature ephemeral objects while service is resilient.
References
Percona Kubernetes Operator for Percona XtraDB Cluster
https://github.com/Tusamarco/testsuite
https://en.wikipedia.org/wiki/Isolation_(database_systems)#Dirty_reads
https://galeracluster.com/library/documentation/mysql-wsrep-options.html#wsrep-sync-wait
https://www.slideshare.net/lefred.descamps/galera-replication-demystified-how-does-it-work