Percona XtraDB Cluster 5.7.19-29.22 is now available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.7Percona announces the release of Percona XtraDB Cluster 5.7.19-29.22 on September 22, 2017. Binaries are available from the downloads section or our software repositories.

NOTE: You can also run Docker containers from the images in the Docker Hub repository.

Percona XtraDB Cluster 5.7.19-29.22 is now the current release, based on the following:

All Percona software is open-source and free.

Upgrade Instructions

After you upgrade each node to Percona XtraDB Cluster 5.7.19-29.22, run the following command on one of the nodes:

$ mysql -uroot -p < /usr/share/mysql/pxc_cluster_view.sql

Then restart all nodes, one at a time:

$ sudo service mysql restart

New Features

  • Introduced the pxc_cluster_view table to get a unified view of the cluster. This table is exposed through the performance schema.

    mysql> select * from pxc_cluster_view;
    HOST_NAME  UUID                                  STATUS  LOCAL_INDEX  SEGMENT
    n1         b25bfd59-93ad-11e7-99c7-7b26c63037a2  DONOR   0            0
    n2         be7eae92-93ad-11e7-88d8-92f8234d6ce2  JOINER  1            0
    2 rows in set (0.01 sec)
  • PXC-803: Added support for new features in Percona XtraBackup 2.4.7:

    • wsrep_debug enables debug logging
    • encrypt_threads specifies the number of threads that XtraBackup should use for encrypting data (when encrypt=1). This value is passed using the --encrypt-threads option in XtraBackup.
    • backup_threads specifies the number of threads that XtraBackup should use to create backups. See the --parallel option in XtraBackup.


  • PXC-835: Limited wsrep_node_name to 64 bytes.
  • PXC-846: Improved logging to report reason of IST failure.
  • PXC-851: Added version compatibility check during SST with XtraBackup:
    • If a donor is 5.6 and a joiner is 5.7: A warning is printed to perform mysql_upgrade.
    • If a donor is 5.7 and a joiner is 5.6: An error is printed and SST is rejected.

Fixed Bugs

  • PXC-825: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to include the --defaults-group-suffix when logging to syslog. For more information, see #1559498.
  • PXC-826: Fixed multi-source replication to PXC node slave. For more information, see #1676464.
  • PXC-827: Fixed handling of different binlog names between donor and joiner nodes when GTID is enabled. For more information, see #1690398.
  • PXC-830: Rejected the RESET MASTER operation when wsrep provider is enabled and gtid_mode is set to ON. For more information, see #1249284.
  • PXC-833: Fixed connection failure handling during SST by making the donor retry connection to joiner every second for a maximum of 30 retries. For more information, see #1696273.
  • PXC-839: Fixed GTID inconsistency when setting gtid_next.
  • PXC-840: Fixed typo in alias for systemd configuration.
  • PXC-841: Added check to avoid replication of DDL if sql_log_bin is disabled. For more information, see #1706820.
  • PXC-842: Fixed deadlocks during Load Data Infile (LDI) with log-bin disabled by ensuring that a new transaction (of 10 000 rows) starts only after the previous one is committed by both wsrep and InnoDB. For more information, see #1706514.
  • PXC-843: Fixed situation where the joiner hangs after SST has failed by dropping all transactions in the receive queue. For more information, see #1707633.
  • PXC-853: Fixed cluster recovery by enabling wsrep_ready whenever nodes become PRIMARY.
  • PXC-862: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to use the ssl-dhparams value from the configuration file.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!


Percona XtraDB Cluster 5.6.37-26.21 is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6.34-26.19Percona announces the release of Percona XtraDB Cluster 5.6.37-26.21 on September 20, 2017. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.37-26.21 is now the current release, based on the following:

All Percona software is open-source and free.


  • PXC-851: Added version compatibility check during SST with XtraBackup:
    • If donor is 5.6 and joiner is 5.7: A warning is printed to perform mysql_upgrade.
    • If donor is 5.7 and joiner is 5.6: An error is printed and SST is rejected.

Fixed Bugs

  • PXC-825: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to include the --defaults-group-suffix when logging to syslog. For more information, see #1559498.
  • PXC-827: Fixed handling of different binlog names between donor and joiner nodes when GTID is enabled. For more information, see #1690398.
  • PXC-830: Rejected the RESET MASTER operation when wsrep provider is enabled and gtid_mode is set to ON. For more information, see #1249284.
  • PXC-833: Fixed connection failure handling during SST by making the donor retry connection to joiner every second for a maximum of 30 retries. For more information, see #1696273.
  • PXC-841: Added check to avoid replication of DDL if sql_log_bin is disabled. For more information, see #1706820.
  • PXC-853: Fixed cluster recovery by enabling wsrep_ready whenever nodes become PRIMARY.
  • PXC-862: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to use the ssl-dhparams value from the configuration file.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!


The MySQL High Availability Landscape in 2017 (the Babies)

MySQL High Availability

This post is the third of a series focusing on the MySQL high availability solutions available in 2017.

The first post looked at the elders, the technologies that have been around for more than ten years. The second post talked about the adults, the more recent and mature technologies. In this post, we will look at the emerging MySQL high availability solutions. The “baby” MySQL high availability solutions I chose for the blog are group replication, proxies and distributed storage.

Group replication

Group replication is the Oracle response to Galera. The term “InnoDB cluster” means a cluster using group replication. The goal is offering similar functionalities, especially the almost synchronous feature.

At first glance, the group replication implementation appears to be rather elegant. The basis is the GTID replication mode. The nodes of an InnoDB cluster share a single UUID sequence. To control the replication lag, Oracle added a flow control layer. While Galera requires unanimity, group replication only requires a majority. The majority protocol in use is derived from Paxos. A majority protocol makes the cluster more resilient to a slow node.

Like Galera, when you add flow control you needs queues. Group replication has two queues. There is one queue for the certification process and one queue for the appliers. What is interesting in the Oracle approach is the presence of a throttling mechanism. When flow control is requested by a node, instead of halting the processing of new transactions like Galera, the rate of transactions is throttled. That can help to meet strict timing SLAs.

Because the group replication logic is fairly similar to Galera, they suffer from the same limitations: large transactions, latency and hot rows. Group replication is recent. The first GA version is 5.7.17, from December 2016. It is natural then that it has a number of sharp edges. I won’t extend too much here, but if you are interested read here, here. I am confident over time group replication will get more polished. Some automation, like the Galera SST process, would also be welcome.

Given the fact the technology is recent, I know no Percona customer using group replication in production.


Intelligent proxies can be viewed as another type of upcoming MySQL high availability solution. It is not strictly MySQL. In fact, this solution is more of a mix of other solutions.

The principle is simple: you connect to a proxy, and the proxy directs you to a valid MySQL server. The proxy has to monitor the states of the back-end servers, and maybe even perform actions on them. Of course, the proxy layer must not become a single point of failure. There should be more than one proxy host for basic HA. If more that one proxy is used at the same time, they’ll have to agree on the state of the back-end servers. For example, on a cluster using MySQL async replication, if the proxies are not sending the write traffic to the same host, things will quickly become messy.

There are few ways of achieving this. The simplest solution is an active-passive setup where only one proxy is active at a given time. You’ll need some kind of logic to determine if the proxy host is available or not. Typical choices will use tools like keepalived or Pacemaker.

A second option is to have the proxies agree to a deterministic way of identifying a writer node. For example, with a Galera-based cluster, the sane back-end node with the lowest wsrep_local_index could be the writer node.

Finally, the proxies could talk to each other and coordinate. Such an approach is promising. It could allow a single proxy to perform the monitoring and inform its peers of the results. It would allow also coordinated actions on the cluster when a failure is detected.

Currently, there are a few options in terms of proxies:

  • ProxySQL: An open-source that understands the MySQL protocol and can do R/W splitting, query caching, sharding, SQL firewalling, etc. A new alpha level feature, mirroring, targets the inter-proxy communication need.
  • MaxScale: No longer fully open-source (BSL), but understands the MySQL protocol. Can do R/W splitting, sharding, binlog serving, SQL firewalling, etc.
  • MySQL Router: MySQL Router is an open-source proxy developed by Oracle for InnoDB Cluster (Group replication). It understands the MySQL protocol and also supports the new X protocol. It can do R/W splitting.
  • HAProxy: HAProxy is a popular open-source TCP level proxy. It doesn’t understand the MySQL protocol. It needs helper scripts, responding to HTTP type requests, to figure the node’s health.

To these open source proxies, there are two well-known commercial proxy-like solutions, Tungsten and ScaleArc. Both of these technologies are mature and are not “babies” in terms of age and traction. On top of these, there are also numerous hardware-based load balancer solutions.

The importance of proxies in MySQL high availability has led Percona to include ProxySQL in the latest releases of Percona XtraDB Cluster. In collaboration with the ProxySQL maintainer, René Cannaò, features have been added to make ProxySQL aware of the Percona XtraDB Cluster state.

Proxies are already often deployed in MySQL high availability solutions. Often proxies are only doing load balancing type work. We start to see deployment using proxies for more advanced things, like read/write splitting and sharding.

Distributed storage

Replication setup using distributed storage


This MySQL high availability solution is a project I am interested in. It is fair to say it is more a “fetus” than a real “baby,” since I know nobody using it in production. You can see this solution as a shared storage approach on steroids.

The simplest solution requires a three-node Ceph cluster. The nodes also run MySQL and the datadir is a Ceph RBD block device. Data in Ceph is automatically replicated to multiple hosts. This built-in data replication is an important component of the solution. Also, Ceph RBD supports snapshots and clones. A clone is a copy of the whole data set that consumes only the data that changed (delta) in terms of storage. Our three MySQL servers will thus not use three full copies of the dataset, but only one full copy and two deltas. As time passes, the deltas grow. When they are too large, we can simply generate new snapshots and clones and be back to day one. The generation of a new snapshot and clone takes a few seconds, and doesn’t require stopping MySQL.

The obvious use case for the distributed storage approach is a read-intensive workload on a very large dataset. The setup can handle a lot of writes. The higher the write load, the more frequently there will be a snapshot refresh. Keep in mind that refreshing a snapshot of a 10 TB data set takes barely more time than for a 1 GB data set.

For that purpose, I wrote an SST script for Percona XtraDB Cluster that works with Ceph. I blogged about it here. I also wrote a Ceph snapshot/clone backup script that can provision a slave from a master snapshot. I’ll blog about how to use this Ceph backup script in the near future.

Going further with distributed storage, multiple MySQL instances could use the same data pages. Ceph would be use as a distributed object store for InnoDB pages. This would allow to build an open-source Aurora like database. Coupled with Galera or Group replication, you could have a highly-available MySQL cluster sharing a single copy of the dataset.

I started to modify MySQL, actually Percona Server for MySQL 5.7, to add support for Ceph/Rados. Rados is the object store protocol of Ceph. There is still a lot of effort needed to make it work. My primary job is not development, so progress is slow. My work can be found (here). The source compiles well but MySQL doesn’t fully start. I need to debug where things are going wrong.

Adding a feature to MySQL like that is an awesome way to learn the internals of MySQL. I would really appreciate any help if you are interested in this project.


Over the three articles in this series, we have covered the 2017 landscape of MySQL high availability solutions. The first focused on the old timers, “the elders”, composed of: replication, shared storage and NDB. The second articles dealt with the solutions that are more recent and have a good traction: Galera and RDS Aurora. The conclusion of the series is the current article, which looked at what could be possibly coming in term of MySQL high availability solutions.

The main goal of this series is to help planning the deployment of MySQL in a highly-available way. I hope it can be used for hints and pointers to get better and more efficient solutions.


The MySQL High Availability Landscape in 2017 (The Adults)

In this blog post, we’ll look at some of the MySQL high availability solution options.

In the previous post of this series, we looked at the MySQL high availability (HA) solutions that have been around for a long time. I called these solutions “the elders.” Some of these solutions (like replication) are heavily used today and have been improved from release to release of MySQL.

This post focuses on the MySQL high availability solutions that have appeared over the last five years and gained a fair amount of traction in the community. I chose to include this group only two solutions: Galera and RDS Aurora. I’ll use the term “Galera” generically: it covers Galera Cluster, MariaDB Cluster and Percona XtraDB Cluster. I debated for some time whether or not to include Aurora. I don’t like the fact that they use closed source code. Given the tight integration with the AWS environment, what is the commercial risk of opening the source code? That question evades me, but I am not on the business side of technology. ?


When I say “Galera,” it means a replication protocol supported by a library provided by Codeship, a Finnish company. The library needs hooks inside the MySQL source code for the replication protocol to work. In Percona XtraDB cluster, I counted 66 .cc files where the word “wsrep” is present. As you can see, it is not a small task to add support for Galera to MySQL. Not all the implementations are similar. Percona, for example, focused more on stability and usability at the expense of new features.

high availability

Let’s start with a quick description of a Galera-based cluster. Unless you don’t care about split-brain, a Galera cluster needs at least three nodes. The Galera replication protocol is nearly synchronous, which is a huge gain compared to regular MySQL replication. It performs transactions almost simultaneously on all the nodes, and the protocol ensures the same commit order. The transactions are almost synchronous because there are incoming queues on each node to improve performance. The presence of these incoming queues forces an extra step: the certification. The certification compares an incoming transaction with the ones already queued. If there a conflict, it returns a deadlock error.

For performance reasons, the certification process must be quick so that the incoming queue stays in memory. Since the number of transactions defines the size of the queue, the presence of large transactions uses a lot of memory. There are safeguards against memory overload, so be aware that transactions like:

update ATableWithMillionsRows set colA=1;

will likely fail. That’s the first important limitation of a Galera-based cluster: the size of the number transactions is limited.

It is also critical to uniquely identify conflicting rows. The best way to achieve an efficient row comparison is to make sure all the tables have a primary key. In a Galera-based cluster, your tables need primary keys otherwise you’ll run into trouble. That’s the second limitation of a Galera based cluster: the need for primary keys. Personally, I think that a table should always have a primary key – but I have seen many oddities…

Another design characteristic is the need for an acknowledgment by all the nodes when a transaction commits. That means the network link with the largest latency between two nodes will set the floor value of the transactional latency. It is an important factor to consider when deploying a Galera-based cluster over a WAN. Similarly, an overloaded node can slow down the cluster if it cannot acknowledge the transaction in a timely manner. In most cases, adding slave threads will allow you to overcome the throughput limitations imposed by the network latency. Each transaction will suffer from latency, but more of them will be able to run at the same time and maintain the throughput.

The exception here is when there are “hot rows.” A hot row is a row that is hammered by updates all the time. The transactions affecting hot rows cannot be executed in parallel and are thus limited by the network latency.

Since Galera-based clusters are very popular, they must also have some good points. The first and most obvious is the full-durability support. Even if the node on which you executed a transaction crashes a fraction of second after the commit statement returned, the data is present on the other nodes incoming queues. In my opinion, it is the main reason for the demise of the share storage solution. Before Galera-based clusters, the shared storage solution was the only other solution guaranteeing no data loss in case of a crash.

While the standby node is unusable with the shared storage solution, all the nodes of a Galera-based cluster are available and are almost in sync. All the nodes can be used for reads without stressing too much about replication lag. If you accept a higher risk of deadlock errors, you can even write on all nodes.

Finally, and not the least, there is an automatic provisioning service for the new nodes called SST. During the SST process, a joiner node asks the cluster for a donor. One of the existing nodes agrees to be the donor and initiate a full backup. The backup is streamed over the network to the joiner and restored there. When the backup completes, the joiner performs an IST to get the recent updates and, once applied, joins the cluster. The most common SST method uses the Percona XtraBackup utility. When using SST for XtraBackup, the cluster is fully available during the SST, although it may degrade performance. This feature really simplifies the operational side of things.

The technology is very popular. Of course, I am a bit biased since I work for Percona and one of our flagship products is Percona XtraDB Cluster – an implementation of the Galera protocol. Other than standard MySQL replication, it is by far the most common HA solution used by the customers I work with.

RDS Aurora

The second “adult” MySQL high availability solution is RDS Aurora. I hesitated to add Aurora here, mainly because it is not an open-source technology. I must also admit that I haven’t followed the latest developments around Aurora very closely. So, let’s describe Aurora.

There are three major parts in Aurora: at least one database server, the writer node and the storage.

high availability

What makes Aurora special is the storage layer has its own processing logic. I don’t know if the processing logic is part of the writer node AWS instance or part of the storage service directly, since the source code is not available. Anyway, I’ll call that layer the appliers. The applier role is to apply redo log fragments that then allow the writer node to write only the redo log fragments (normally written to the InnoDB log files). The appliers read those fragments and modify the pages in the storage. If a node requests a page that has pending redo fragments to be applied, they get applied before returning the page.

From the writer node perspective, there are much fewer writes. There is also no direct upper bound in terms of a number of fragments to be queued, so it is a bit like having


 set to an extremely large value. Also, since Aurora doesn’t need to flush pages, if the write node needs to read from the storage, and there are no free pages in the buffer pool, it can just discard one even if it is “dirty.” Actually, there are no dirty pages in the buffer pool.

So far, that seems to be very good for high write loads with spikes. What about the reader nodes? These reader nodes receive the updates from the writer nodes. If the update concerns a page they have in their buffer pool, they can modify it in place or discard it and read again from the storage. Again, without the source code, it is hard to tell the implementation. The point is, the readers have the same data as the master, they just can’t lag behind.

Apart from the impossibility of any reader lag, the other good point of Aurora is the storage. InnoDB pages are saved as objects in an object store, not like in a regular file on a file system. That means you don’t need to over-provision your storage ahead of time. You pay for what you are using – actually the maximum you ever use. InnoDB tablespaces do not shrink, even with Aurora.

Furthermore, if you have a 5TB dataset and your workload is such that you would need ten servers (one writer and nine readers), you still need only 5TB of storage if you are not replicating to other AZ. If we compare with regular MySQL and replication, you would have one master and nine slaves, each with 5TB of storage, for a total of 50TB. On top of that, you’ll have at least ten times the write IOPS.

So, storage wise, we have something that could be very interesting for applications with large datasets and highly concurrent read heavy workloads. You ensure high availability with the ability to promote a reader to writer automatically. You access the primary or the readers through endpoints that automatically connect to the correct instances. Finally, you can replicate the storage to multiple availability zones for DR.

Of course, such an architecture comes with a trade-off. If you experiment with Aurora, you’ll quickly discover that the smallest instance types underperform while the largest ones perform in a more expected manner. It is also quite easy to overload the appliers. Just perform the following queries:

update ATableWithMillionsRows set colA=1;
select count(*) from ATableWithMillionsRows where colA=1;

given that the table ATableWithMillionsRows is larger than the buffer pool. The select will hang for a long time because the appliers are overloaded by the number of pages to update.

In term of adoption, we have some customers at Percona using Aurora, but not that many. It could be that users of Aurora do not naturally go to Percona for services and support. I also wonder about the decision to keep the source code closed. It is certainly not a positive marketing factor in a community like the MySQL community. Since the Aurora technology seems extremely bounded to their ecosystem, is there really a risk for the technology to be reused by a competitor? With a better understanding of the technology through open access to the source, Amazon could have received valuable contributions. It would also be much easier to understand, tune and recommend Aurora.

Further reading:


Group Replication: the Sweet and the Sour

Group Replication

In this blog, we’ll look at group replication and how it deals with flow control (FC) and replication lag. 


In the last few months, we had two main actors in the MySQL ecosystem: ProxySQL and Group-Replication (with the evolution to InnoDB Cluster). 

While I have extensively covered the first, my last serious work on Group Replication dates back to some lab version years past.

Given that Oracle decided to declare it GA, and Percona’s decision to provide some level of Group Replication support, I decided it was time for me to take a look at it again.

We’ve seen a lot of coverage already too many Group Replication topics. There are articles about Group Replication and performance, Group Replication and basic functionalities (or lack of it like automatic node provisioning), Group Replication and ProxySQL, and so on.

But one question kept coming up over and over in my mind. If Group Replication and InnoDB Cluster have to work as an alternative to other (virtually) synchronous replication mechanisms, what changes do our customers need to consider if they want to move from one to the other?

Solutions using Galera (like Percona XtraDB Cluster) must take into account a central concept: clusters are data-centric. What matters is the data and the data state. Both must be the same on each node at any given time (commit/apply). To guarantee this, Percona XtraDB Cluster (and other solutions) use a set of data validation and Flow Control processes that work to the ensure a consistent cluster data set on each node.

The upshot of this principle is that an application can query ANY node in a Percona XtraDB Cluster and get the same data, or write to ANY node and know that the data is visible everywhere in the cluster at (virtually) the same time.

Last but not least, inconsistent nodes should be excluded and either rebuild or fixed before rejoining the cluster.

If you think about it, this is very useful. Guaranteeing consistency across nodes allows you to transparently split write/read operations, failover from one node to another with very few issues, and more.

When I conceived of this blog on Group Replication (or InnoDB Cluster), I put myself in the customer shoes. I asked myself: “Aside from all the other things we know (see above), what is the real impact of moving from Percona XtraDB Cluster to Group Replication/InnoDB Cluster for my application? Since Group Replication still (basically) uses replication with binlogs and relaylog, is there also a Flow Control mechanism?” An alarm bell started to ring in my mind.

My answer is: “Let’s do a proof of concept (PoC), and see what is really going on.”


I setup a simple set of servers using Group Replication with a very basic application performing writes on a single writer node, and (eventually) reads on the other nodes. 

You can find the schema definition here. Mainly I used the four tables from my windmills test suite — nothing special or specifically designed for Group Replication. I’ve used this test a lot for Percona XtraDB Cluster in the past, so was a perfect fit.

Test Definition

The application will do very simple work, and I wanted to test four main cases:

  1. One thread performing one insert at each transaction
  2. One thread performing 50 batched inserts at each transaction
  3. Eight threads performing one insert to each transaction
  4. Eight threads performing 50 batched inserts at each transaction

As you can see, a pretty simple set of operations. Then I decided to test it using the following four conditions on the servers:

  1. One slave worker FC as default
  2. One slave worker FC set to 25
  3. Eight slave workers FC as default
  4. Eight slave workers FC set to 25

Again nothing weird or strange from my point of view. I used four nodes:

  1. Gr1 Writer
  2. Gr2 Reader
  3. Gr3 Reader minimal latency (~10ms)
  4. Gr4 Reader minimal latency (~10ms)

Finally, I had to be sure I measured the lag in a way that allowed me to reference it consistently on all nodes. 

I think we can safely say that the incoming GTID (last_ Received_transaction_set from replication_connection_status) is definitely the last change applied to the master that the slave node knows about. More recent changes could have occurred, but network delay can prevent them from being “received.” The other point of reference is GTID_EXECUTED, which refers to the latest GTID processed on the node itself.

The closest query that can track the distance will be:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-2),':',1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( Received_transaction_set,':',-2),':',1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag

Or in the case of a single worker:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(Received_transaction_set,':',-1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag;

The result will be something like this:

| last_executed |
| 23607         |
| last_received |
| 23607         |
| real_lag |
|        0 |

The whole set of tests can be found here, with all the commands you need to run the application (you can find it here) and replicate the tests. I will focus on the results (otherwise this blog post would be far too long), but I invite you to see the details.

The Results

Efficiency on Writer by Execution Time and Rows/Sec

Using the raw data from the tests (Excel spreadsheet available here), I was interested in identifying if and how the Writer is affected by the use of Group Replication and flow control.

Reviewing the graph, we can see that the Writer has a linear increase in the execution time (when using default flow control) that matches the increase in the load. Nothing there is concerning, and all-in-all we see what is expected if the load is light. The volume of rows at the end justifies the execution time.

It’s a different scenario if we use flow control. The execution time increases significantly in both cases (single worker/multiple workers). In the worst case (eight threads, 50 inserts batch) it becomes four times higher than the same load without flow control.

What happens to the inserted rows? In the application, I traced the rows inserted/sec. It is easy to see what is going on there:

We can see that the Writer with flow control activated inserts less than a third of the rows it processes without flow control. 

We can definitely say that flow control has a significant impact on the Writer performance. To clarify, let’s look at this graph:

Without flow control, the Writer processes a high volume of rows in a limited amount of time (results from the test of eight workers, eight threads, 50 insert batch). With flow control, the situation changes drastically. The Writer takes a long time processing a significantly smaller number of rows/sec. In short, performance drops significantly.

But hey, I’m OK with that if it means having a consistent data-set cross all nodes. In the end, Percona XtraDB Cluster and similar solutions pay a significant performance price match the data-centric principle. 

Let’s see what happen on the other nodes.

Entries Lag

Well, this scenario is not so good:

When NOT using flow control, the nodes lag behind the writer significantly. Remember that by default flow control in Group Replication is set to 25000 entries (I mean 25K of entries!!!).

What happens is that as soon as I put some salt (see load) on the Writer, the slave nodes start to lag. When using the default single worker, that will have a significant impact. While using multiple workers, we see that the lag happens mainly on the node(s) with minimal (10ms) network latency. The sad thing is that is not really going down with respect to the single thread worker, indicating that the simple minimal latency of 10ms is enough to affect replication.

Time to activate the flow control and have no lag:

Unfortunately, this is not the case. As we can see, the lag of single worker remains high for Gr2 (154 entries). While using multiple workers, the Gr3/4 nodes can perform much better, with significantly less lag (but still high at ~1k entries).

It is important to remember that at this time the Writer is processing one-third or less of the rows it is normally able to. It is also important to note that I set 25 to the entry limit in flow control, and the Gr3 (and Gr4) nodes are still lagging more than 1K entries behind.

To clarify, let check the two graphs below:

Using the Writer (Master) as a baseline in entry #N, without flow control, the nodes (slaves) using Group Replication start to significantly lag behind the writer (even with a light load).

The distance in this PoC ranged from very minimal (with 58 entries), up to much higher loads (3849 entries):

Using flow control, the Writer (Master) diverges less, as expected. If it has a significant drop in performance (one-third or less), the nodes still lag. The worst-case is up to 1363 entries. 

I need to underline here that we have no further way (that I am aware of, anyway) to tune the lag and prevent it from happening.

This means an application cannot transparently split writes/reads and expect consistency. The gap is too high.

A Graph That Tells Us a Story

I used Percona Monitoring and Management (PMM) to keep an eye on the nodes while doing the tests. One of the graphs really showed me that Group Replication still has some “limits” as the replication mechanism for a cluster:

This graph shows the MySQL queries executed on all the four nodes, in the testing using 8-50 threads-batch and flow control. 

As you can see, the Gr1 (Writer) is the first one to take off, followed by Gr2. Nodes Gr3 and Gr4 require a bit more, given the binlog transmission (and 10ms delay). Once the data is there, they match (inconsistently) the Gr2 node. This is an effect of flow control asking the Master to slow down. But as previously seen, the nodes will never match the Writer. When the load test is over, the nodes continue to process the queue for additional ~130 seconds. Considering that the whole load takes 420 seconds on the Writer, this means that one-third of the total time on the Writer is spent syncing the slave AFTERWARDS.

The above graph shows the same test without flow control. It is interesting to see how the Writer moved above 300 queries/sec, while G2 stayed around 200 and Gr3/4 far below. The Writer was able to process the whole load in ~120 seconds instead 420, while Gr3/4 continue to process the load for an additional ~360 seconds.

This means that without flow control set, the nodes lag around 360 seconds behind the Master. With flow control set to 25, they lag 130 seconds.

This is a significant gap.


Going back to the reason why I was started this PoC, it looks like my application(s) are not a good fit for Group Replication given that I have set Percona XtraDB Cluster to scale out the reads and efficiently move my writer to another when I need to. 

Group Replication is still based on asynchronous replication (as my colleague Kenny said). It makes sense in many other cases, but it doesn’t compare to solutions based on virtually synchronous replication. It still requires a lot of refinement.

On the other hand, for applications that can afford to have a significant gap between writers and readers it is probably fine. But … doesn’t standard replication already cover that? 

Reviewing the Oracle documentations (, I can see why Group Replication as part of the InnoDB cluster could help improve high availability when compared to standard replication. 

But I also think it is important to understand that Group Replication (and derived solutions like InnoDB cluster) are not comparable or a replacement for data-centric solutions as Percona XtraDB Cluster. At least up to now.

Good MySQL to everyone.


Where Do I Put ProxySQL?


In this blog post, we’ll look at how to deploy ProxySQL.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. It was created for DBAs by René Cannaò, as a means of solving complex replication topology issues. When bringing up ProxySQL with my clients, I always get questions about where it fits into the architecture. This post should clarify that.

Before continuing, you might want to know why you should use this software. The features that are of interest include:

  • MySQL firewall
  • Connection pooling
  • Shard lookup and automated routing
  • Ability to read/write split
  • Automatically switch to another master in case of active master failure
  • Query cache
  • Performance metrics
  • Other neat features!

Initial Configuration

In general, you install it on nodes that do not have a running MySQL database. You manage it via the MySQL command line on another port, usually 6032. Once it is started the configuration in /etc is not used, and you do everything within the CLI. The backend database is actually SQLite, and the db file is stored in /var/lib/proxysql.

There are many guides out there on initializing and installing it, so I won’t cover those details here. It can be as simple as:

apt-get install proxysql

ProxySQL Architecture

While most first think to install ProxySQL on a standalone node between the application and database, this has the potential to affect query performance due to the additional latency from network hops.



To have minimal impact on performance (and avoid the additional network hop), many recommend installing ProxySQL on the application servers. The application then connects to ProxySQL (acting as a MySQL server) on localhost, using Unix Domain Socket, and avoiding extra latency. It would then use its routing rules to reach out and talk to the actual MySQL servers with its own connection pooling. The application doesn’t have any idea what happens beyond its connection to ProxySQL.


Reducing Your Network Attack Surface

Another consideration is reducing your network attack surface. This means attempting to control all of the possible vulnerabilities in your network’s hardware and software that are accessible to unauthenticated users.

Percona generally suggests that you put a ProxySQL instance on each application host, like in the second image above. This suggestion is certainly valid for reducing latency in your database environment (by limiting network jumps). But while this is good for performance, it can be bad for security.

Every instance must be able to talk to:

  • Every master
  • Every slave

As you can imagine, this is a security nightmare. With every instance, you have x many more connections spanning your network. That’s x many more connections an attacker might exploit.

Instead, it can be better to have one or more ProxySQL instances that are between your application and MySQL servers (like the first image above). This provides a reasonable DMZ-type setup that prevents opening too many connections across the network.

That said, both architectures are valid production configurations – depending on your requirements.


Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication High Availability Webinar: Q & A

High Availability Webinar

High Availability WebinarThank you for attending the Wednesday, June 21, 2017 high availability webinar titled Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication. In this blog, I will provide answers to the Q & A for that webinar.

You can find the slides and a recording of the webinar here.

Is there a minimum MySQL server version for Group Replication?

MySQL Group Replication is GA since MySQL Community 5.7.17. This is the lowest version that you should use for the Group Replication feature. Otherwise, you are using a beta version.

Since 5.7.17 was the GA release, it’s strongly recommended you use the latest 5.7 minor release. Bugs get fixed and features added in each of the minor releases (as can be seen in the Limitations section in the slide deck).

In MySQL 5.6 and earlier versions, Group Replication is not supported. Note that Percona Server for MySQL 5.7.17 and beyond also ships with Group Replication.

Can I use Percona XtraDB Cluster with MariaDB v10.2? or must I use Percona Server for MySQL?

Percona XtraDB Cluster is Percona Server for MySQL and Percona XtraBackup with the modified Galera library. You cannot run Percona XtraDB Cluster on MariaDB.

However, as Percona XtraDB Cluster is open source, it is possible that MariaDB/Codership implements our modifications into their codebase.

If Percona XtraDB Cluster does not allow InnoDB tables, how do we typically deal with applications that need to use MyISAM tables?

You cannot use MyISAM with Percona XtraDB Cluster, Galera or Group Replication. However, there is experimental MyISAM support in Galera/Percona XtraDB Cluster. But we strongly recommend that you don’t use this in production. It effectively executes all statements in Total Order Isolation, which results in bad performance.

What is a typical business use case for the Group Replication? I specifically like the writes order feature.

Typical use cases are:

  • Environments with strict **durability** requirements
  • Write to multiple nodes simultaneously while keeping data **consistent**
  • Reducing failover time
  • Using other nodes for read-scaling, where reading stale data is more difficult for the application (as opposed to standard asynchronous replication)

The use cases for Galera and Percona XtraDB Cluster are similar.

Where do you run ProxySQL, on a separate server? We are using HAProxy.

You can deploy ProxySQL in many different ways. One common method of installation is to run ProxySQL on a separate layer of servers (ensuring there is failover on this layer). Another commonly used method is to run a ProxySQL daemon on every application server.

Do you support KVM?

Yes, there are no limitations on virtualization solutions.

Can you give some examples of an “arbitrator”?

Some useful links:

What does Percona XtraDB add to make it more performant than InnoDB?

The scalability and performance improvement of Percona XtraDB are listed on the Percona Server for MySQL documentation page:

How scalable is Percona XtraDB Cluster storage wise? Do we have any limitations?

Storage happens through the storage engine (which is InnoDB). Percona XtraDB Cluster does not have any different limitations than Percona Server for MySQL or MySQL.

However, we need to also consider the practical side of things: the larger the cluster gets, the longer certain operations take. For example, when adding a new node to the cluster another node must be the donor and provide all the data. This will take substantially longer with larger datasets. Certain operational aspects might therefore become more complex.

Is there any development to add multiple nodes simultaneously?

No, at the moment only one node can join the cluster at the same time. Other nodes automatically wait until it is finished before joining.

Why does Galera say we cannot use READ COMMITTED isolation for multimaster mode, even though we can start the cluster with READ-COMMITTED?

You can use READ-COMMITTED as transaction isolation level. The limitation is that you cannot use SERIALIZABLE:

Galera Cluster and MariaDB currently do not prevent a user from using this transaction isolation level. Percona XtraDB Cluster implemented the strict mode to prevent these operations:

MariaDB 10.2 fixed the check constraints issue, When will Percona fix this issue?

There are currently no plans to support CHECK constraints in Percona Server for MySQL (and therefore Percona XtraDB Cluster as well).

As Percona Server is effectively a fully backwards-compatible (but modified) MySQL Community Server, CHECK constraints is a feature that normally would be implemented in MySQL Community first.

Can you share your performance benchmark git repository (if you have one)?

We don’t have a performance benchmark in git repository. You can get detailed information about this benchmark in this blog: Performance improvements in Percona XtraDB Cluster 5.7.17-29.20.

On your slide pointing to scalability charts, how many nodes did you run your test against?

We used a three-node cluster for this performance benchmark.

The product is using Master-Master replication. As such what do you mean when you talk about failover in such configuration?
Where do you maintain the cluster state?

All technologies automatically maintain the cluster state as you add and remove nodes.

What are the network/IP requirements for Proxy SQL?

There are no specific requirements. More documentation about ProxySQL can be found here:


The MySQL High Availability Landscape in 2017 (The Elders)

High Availability

In this blog, we’ll look at different MySQL high availability options.

The dynamic MySQL ecosystem is rapidly evolving many technologies built around MySQL. This is especially true for the technologies involved with the high availability (HA) aspects of MySQL. When I joined Percona back in 2009, some of these HA technologies were very popular – but have since been almost forgotten. During the same interval, new technologies have emerged. In order to give some perspective to the reader, and hopefully help to make better choices, I’ll review the MySQL HA landscape as it is in 2017. This review will be in three parts. The first part (this post) will cover the technologies that have been around for a long time: the elders. The second part will focus on the technologies that are very popular today: the adults. Finally, the last part will try to extrapolate which technologies could become popular in the upcoming years: the babies.

Quick disclaimer, I am reporting on the technologies I see the most. There are likely many other solutions not covered here, but I can’t talk about technologies I have barely or never used. Apart from the RDS-related technologies, all the technologies covered are open-source. The target audience for this post are people relatively new to MySQL.

The Elders

Let’s define the technologies in the elders group. These are technologies that anyone involved with MySQL for last ten years is sure to be aware of. I could have called this group the “classics”.  I include the following technologies in this group:

  • Replication
  • Shared storage
  • NDB cluster

Let’s review these technologies in the following sections.


Simple replication topology


MySQL replication is very well known. It is one of the main features behind the wide adoption of MySQL. Replication gets used almost everywhere. The reasons for that are numerous:

  • Replication is simple to setup. There are tons of how-to guides and scripts available to add a slave to a MySQL server. With Amazon RDS, adding a slave is just a few clicks.
  • Slaves allow you to easily scale reads. The slaves are accessible and can be used for reads. This is the most common way of scaling up a MySQL database.
  • Slaves have little impact on the master. Apart from the added network traffic, the presence of slaves does not impact the master performance significantly.
  • It is well known. No surprises here.
  • Used for failover. Your master died, promote a slave and use it as your new master.
  • Used for backups. You don’t want to overload your master with the backups, run them off a slave.

Of course, replication also has some issues:

  • Replication can lag. Replication used to be single-threaded. That means a master with a concurrent load could easily outpace a slave. MySQL 5.6 and MariaDB 10.0 have introduced some parallelism to the slave. Newer versions have further improved to a point where today’s slaves are many times faster than they were.
  • Slaves can diverge. When you modify data on the master, the slave must perform the exact same update. That seems easy, but there are many ways an update can be non-deterministic with statement-based replication. They fixed many issues, and the introduction of row-based replication has been another big step forward. Still, if you write directly to a slave you are asking for trouble. There is a read_only setting, but if the MySQL user has the “SUPER” privilege it is just ignored. That’s why there is now the “super_read_only” setting. Tools like pt-table-checksum and pt-table-sync from the Percona toolkit exist to solve this problem.
  • Replication can impact the master. I wrote above that the presence of slaves does not affect the master, but logging changes are more problematic. The most common issue is the InnoDB table-level locking for auto_increment values with statement-based replication. Only one thread can insert new rows at a time. You can avoid this issue with row-based replication and properly configuring settings.
  • Data gets lost. Replication is asynchronous. That means the master will reply “done” after a commit statement even though the slaves have not received updates yet. Some transactions can get lost if the master crashes.

Although an old technology, a lot of work has been done on replication. It is miles away from the replication implementation of 5.0.x. Here’s a list, likely incomplete, of the evolution of replication:

  • Row based replication (since 5.1). The binary internal representation of the rows is sent instead of the SQL statements. This makes replication more robust against slave divergence.
  • Global transaction ID (since 5.6). Transactions are uniquely identified. Replication can be setup without knowing the binlog file and offset.
  • Checksum (since 5.6). Binlog events have checksum values to validate their integrity.
  • Semi-sync replication (since 5.5). An addition to the replication protocol to make the master aware of the reception of events by the slaves. This helps to avoid losing data when a master crashes.
  • Multi-source replication (since 5.7). Allows a slave to have more than one master.
  • Multi-threaded replication (since 5.6). Allows a slave to use multiple threads. This helps to limit the slave lag.

Managing replication is a tedious job. The community has written many tools to manage replication:

  • MMM. An old Perl tool that used to be quite popular, but had many issues. Now rarely used.
  • MHA. The most popular tool to manage replication. It excels at reconfiguring replication without losing data, and does a decent at handling failover.  It is also simple. No wonder it is popular.
  • PRM. A Pacemaker-based solution developed to replace MMM. It’s quite good at failover, but not as good as MHA at reconfiguring replication. It’s also quite complex, thanks to Pacemaker. Not used much.
  • Orchestrator. The new cool tool. It can manage complex topologies and has a nice web-based interface to monitor and control the topology.


Shared Storage

Simple shared storage topology


Back when I was working for MySQL ten years ago, shared storage HA setups were very common. A shared storage HA cluster uses one copy of the database files between one of two servers. One server is active, the other one is passive. In order to be shared, the database files reside on a device that can be mounted by both servers. The device can be physical (like a SAN), or logical (like a Linux DRBD device). On top of that, you need a cluster manager (like Pacemaker) to handle the resources and failovers. This solution is very popular because it allows for failover without losing any transactions.

The main drawback of this setup is the need for an idle standby server. The standby server cannot have any other assigned duties since it must always be ready to take over the MySQL server. A shared storage solution is also obviously not resilient to file-level corruption (but that situation is exceptional). Finally, it doesn’t play well with a cloud-based environment.

Today, newly-deployed shared storage HA setups are rare. The only ones I encountered over the last year were either old implementations needing support, or new setups that deployed because of existing corporate technology stacks. That should tell you about the technology’s loss of popularity.

NDB Cluster

A simple NDB Cluster topology


An NDB Cluster is a distributed clustering solution that has been around for a long time. I personally started working with this technology back in 2008. An NDB Cluster has three types of nodes: SQL, management and data. A full HA cluster requires a minimum of four nodes.

An NDB Cluster is not a general purpose database due to its distributed nature. For suitable workloads, it is extraordinary good. For unsuitable workloads, it is miserable. A suitable workload for an NDB Cluster contains high concurrency, with a high rate of small primary key oriented transactions. Reaching one million trx/s on an NDB Cluster is nothing exceptional.

At the other end of the spectrum, a poor workload for an NDB Cluster is a single-threaded report query on a star-like schema. I have seen some extreme cases where just the network time of a reporting query amounted to more than 20 minutes.

Although NDB Clusters have improved, and are still improving, their usage has been pushed toward niche-type applications. Overall, the technology is losing ground and is now mostly used for Telcos and online gaming applications.


Upcoming HA Webinar Wed 6/21: Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication

High Availability

High AvailabilityJoin Percona’s MySQL Practice Manager Kenny Gryp and QA Engineer, Ramesh Sivaraman as they present a high availability webinar around Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication on Wednesday, June 21, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

What are the implementation differences between Percona XtraDB Cluster 5.7, Galera Cluster 5.7 and MySQL Group Replication?

  • How do they work?
  • How do they behave differently?
  • Do these methods have any major issues?

This webinar will describe the differences and shed some light on how QA is done for each of the different technologies.

Register for the webinar here.

High AvailabilityRamesh Sivaraman, QA Engineer

Ramesh joined the Percona QA Team in March 2014. He has almost six years of experience in database administration and, before joining Percona, was giving MySQL database support to various service and product based internet companies. Ramesh’s professional interests include writing shell/Perl script to automate routine tasks and new technology. Ramesh lives in Kerala, the southern part of India, close to his family.

High AvailabilityKenny Gryp, MySQL Practice Manager

Kenny is currently MySQL Practice Manager at Percona.


Q & A: MySQL In the Cloud – Migration, Best Practices, High Availability, Scaling

MySQL in the Cloud

MySQL in the CloudIn this blog, we will provide answers to the Q & A for the MySQL In the Cloud: Migration, Best Practices, High Availability, Scaling webinar.

First, we want to thank everybody for attending the June 7, 2017 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that we were unable to answer during the webinar:

How does Percona XtraDB cluster work with AWS for MySQL clustering?

Percona XtraDB Cluster works especially well in cloud environments, including Amazon EC2. Since Percona XtraDB Cluster only requires one network round trip per transaction for write transactions commits, and keeps all reads local, allows it to deploy high performance multi AZ and even multi region clusters. The fact that each Percona XtraDB Cluster node contains all the data allows it to avoid reliance on the EBS storage. You can run Percona XtraDB Cluster on NVMe storage based i3 EC2 nodes to achieve high performance even with very IO-intensive workloads. Automatic provisioning and cluster self healing allows you to easily scale the cluster. We have simple tutorial on how to deploy Percona XtraDB Cluster on AWS – check it out here.

How do you approach master-master model? Are there enough reasons to use the model to implement multi-site scaling?

There are two distinct multi-master modes in existence. A synchronous Master-Master solution, like the one offered by Percona XtraDB Cluster (virtually synchronous to be exact), guarantees there are no data conflicts as you connect to the nodes located at different sites. The downside of this model is that writes can be expensive. As such, it works well in environments with low latency between the different sites, or when high latency for updates can be tolerated. Percona XtraDB Cluster is greatly optimized in that it requires only one network roundtrip to complete a commit transaction. This significantly reduces the added latency compared to many other solutions.

In contrast, asynchronous Master-Master means you can perform writes locally, without waiting on a network round trip.  It comes with the downside of possible data conflicts. In MySQL, it can be implemented using MySQL Replication. MySQL Replication only detects conflicts at this point, however, and stops if it detects a conflict. It has no good built-in conflict resolution. Ensuring conflicts do not happen on the application level is hard and error prone, and only recommended in rare cases. Most applications out there do not use Active Master-Master, but rather design an architecture where each database replication set operates with a only a single writable node.

Do the Percona tools work in the cloud, like in Amazon Aurora?

We try to make Percona software in the cloud when it makes sense. For example, Percona Toolkit and Percona Monitoring Management support Amazon RDS and Amazon Aurora. Percona XtraBackup does not, as it requires physical access to the database files (Amazon RDS and Aurora don’t provide that).  Having said that, Amazon recently updated its Aurora migration documentation to include the use of XtraBackup. Amazon Aurora supports backups taken by Percona XtraBackup as a way to import data.

What is the fastest way to verify and validate backups created by XtraBackup for databases around 2-3TB?

In the big picture, you test backups by doing some sort of restore and validation. This can be done manually, but is much better if automated. There are three levels of such validation:

  • Basic Validation. Run –apply-log and ensure it completes successfully. Start the MySQL instance and run some basic queries to ensure it works. Often running some queries to see that recent data is present is a good idea.  
  • Consistency Validation.  Additionally, run Check Table on all tables to ensure there is no corruption. This way, tables and indexes data structures are validated.   
  • Full Validation. Restore the backup and connect the restored backup as a MySQL slave (possibly to one of the existing slaves). Let it catch up and then run pt-table-checksum to validate consistency and ensure that the data in backup matches what is in the source.

Running a checktable on databases on AWS IO optimized instances takes up to eight hours. Any other suggestions on how to replace checktable in validation?”

Without knowing the table size, it is hard for me to assess whether eight hours is reasonable for your environment. However, generally speaking you should not run a Full Validation on every backup. Full Validation first and foremost validates the backup and restore pipeline. If you’re not seeing issues, doing it once per month is plenty. You want to do lighter checks on a daily and weekly basis. 

What approach would you recommend for a data warehouse needing about 80,000IOPS, currently on FusionIO bare metal? Which cloud solution would be my best bet?

This is complicated question. To answer it properly requires more information. We need to know what type of operations your database performs. Working with a Percona Consultant to do an A&D for your environment would give you best answer. In general though, EBS (even with a large number of provisioned IOPs) would not match FusionIO in IO request latency. I3 high IO instances with NVMe storage is closer match. If budget is not a concern, you can look into X1 instances. These can have up to 2TB of memory and often allow getting all (or a large portion) of the database in memory for even higher performance.

Thanks for attending the MySQL In the Cloud: Migration, Best Practices, High Availability, Scaling webinar! Post any more MySQL in the cloud comments below.

Powered by WordPress | Theme: Aeros 2.0 by