Aug
31
2015
--

High-load clusters and desynchronized nodes on Percona XtraDB Cluster

There can be a lot of confusion and lack of planning in Percona XtraDB Clusters in regards to nodes becoming desynchronized for various reasons.  This can happen a few ways:

When I say “desynchronized” I mean a node that is permitted to build up a potentially large wsrep_local_recv_queue while some operation is happening.  For example a node taking a backup would set wsrep_desync=ON during the backup and potentially fall behind replication some amount.

Some of these operations may completely block Galera from applying transactions, while others may simply increase load on the server enough that it falls behind and applies at a reduced rate.

In all the cases above, flow control is NOT used while the node cannot apply transactions, but it MAY be used while the node is recovering from the operation.  For an example of this, see my last blog about IST.

If a cluster is fairly busy, then the flow control that CAN happen when the above operations catch up MAY be detrimental to performance.

Example setup

Let us take my typical 3 node cluster with workload on node1.  We are taking a blocking backup of some kind on node3 so we are executing the following steps:

  1. node3> set global wsrep_desync=ON;
  2. Node3’s “backup” starts, this starts with FLUSH TABLES WITH READ LOCK;
  3. Galera is paused on node3 and the wsrep_local_recv_queue grows some amount
  4. Node3’s “backup” finishes, finishing with UNLOCK TABLES;
  5. node3> set global wsrep_desync=OFF;

During the backup

This includes up through step 3 above.  My node1 is unaffected by the backup on node3, I can see it averaging 5-6k writesets(transactions) per second which it did before we began:

Screen Shot 2015-08-19 at 2.38.34 PM

 

node2 is also unaffected:

Screen Shot 2015-08-19 at 2.38.50 PM

but node3 is not applying and its queue is building up:

Screen Shot 2015-08-19 at 2.39.04 PM

Unlock tables, still wsrep_desync=ON

Let’s examine briefly what happens when node3 is permitted to start applying, but wsrep_desync stays enabled:

Screen Shot 2015-08-19 at 2.42.16 PM

node1’s performance is pretty much the same, node3 is not using flow control yet. However, there is a problem:

Screen Shot 2015-08-19 at 2.43.13 PM

It’s hard to notice, but node3 is NOT catching up, instead it is falling further behind!  We have potentially created a situation where node3 may never catch up.

The PXC nodes were close enough to the red-line of performance that node3 can only apply just about as fast (and somewhat slower until it heats up a bit) as new transactions are coming into node1.

This represents a serious concern in PXC capacity planning:

Nodes do not only need to be fast enough to handle normal workload, but also to catch up after maintenance operations or failures cause them to fall behind.

Experienced MySQL DBA’s will realize this isn’t all that different than Master/Slave replication.

Flow Control as a way to recovery

So here’s the trick:  if we turn off wsrep_desync on node3 now, node3 will use flow control if and only if the incoming replication exceeds node3’s apply rate.  This gives node3 a good chance of catching up, but the tradeoff is reducing write throughput of the cluster.  Let’s see what this looks like in context with all of our steps.  wsrep_desync is turned off at the peak of the replication queue size on node3, around 12:20PM:

Screen Shot 2015-08-19 at 2.47.12 PM

Screen Shot 2015-08-19 at 2.48.07 PM

So at the moment node3 starts utilizing flow control to prevent falling further behind, our write throughput (in this specific environment and workload) is reduced by approximately 1/3rd (YMMV).   The cluster will remain in this state until node3 catches up and returns to the ‘Synced’ state.  This catchup is still happening as I write this post, almost 4 hours after it started and will likely take another hour or two to complete.

I can see a more realtime representation of this by using myq_status on node1, summarizing every minute:

[root@node1 ~]# myq_status -i 1m wsrep
mycluster / node1 (idx: 1) / Galera 3.11(ra0189ab)
         Cluster  Node       Outbound      Inbound       FlowC     Conflct Gcache     Appl
    time P cnf  # stat laten msgs data que msgs data que pause snt lcf bfa   ist  idx  %ef
19:58:47 P   5  3 Sync 0.9ms 3128 2.0M   0   27 213b   0 25.4s   0   0   0 3003k  16k  62%
19:59:47 P   5  3 Sync 1.1ms 3200 2.1M   0   31 248b   0 18.8s   0   0   0 3003k  16k  62%
20:00:47 P   5  3 Sync 0.9ms 3378 2.2M  32   27 217b   0 26.0s   0   0   0 3003k  16k  62%
20:01:47 P   5  3 Sync 0.9ms 3662 2.4M  32   33 266b   0 18.9s   0   0   0 3003k  16k  62%
20:02:47 P   5  3 Sync 0.9ms 3340 2.2M  32   27 215b   0 27.2s   0   0   0 3003k  16k  62%
20:03:47 P   5  3 Sync 0.9ms 3193 2.1M   0   27 215b   0 25.6s   0   0   0 3003k  16k  62%
20:04:47 P   5  3 Sync 0.9ms 3009 1.9M  12   28 224b   0 22.8s   0   0   0 3003k  16k  62%
20:05:47 P   5  3 Sync 0.9ms 3437 2.2M   0   27 218b   0 23.9s   0   0   0 3003k  16k  62%
20:06:47 P   5  3 Sync 0.9ms 3319 2.1M   7   28 220b   0 24.2s   0   0   0 3003k  16k  62%
20:07:47 P   5  3 Sync 1.0ms 3388 2.2M  16   31 251b   0 22.6s   0   0   0 3003k  16k  62%
20:08:47 P   5  3 Sync 1.1ms 3695 2.4M  19   39 312b   0 13.9s   0   0   0 3003k  16k  62%
20:09:47 P   5  3 Sync 0.9ms 3293 2.1M   0   26 211b   0 26.2s   0   0   0 3003k  16k  62%

This reports around 20-25 seconds of flow control every minute, which is consistent with that ~1/3rd of performance reduction we see in the graphs above.

Watching node3 the same way proves it is sending the flow control (FlowC snt):

mycluster / node3 (idx: 2) / Galera 3.11(ra0189ab)
         Cluster  Node       Outbound      Inbound       FlowC     Conflct Gcache     Appl
    time P cnf  # stat laten msgs data que msgs data que pause snt lcf bfa   ist  idx  %ef
17:38:09 P   5  3 Dono 0.8ms    0   0b   0 4434 2.8M 16m 25.2s  31   0   0 18634  16k  80%
17:39:09 P   5  3 Dono 1.3ms    0   0b   1 5040 3.2M 16m 22.1s  29   0   0 37497  16k  80%
17:40:09 P   5  3 Dono 1.4ms    0   0b   0 4506 2.9M 16m 21.0s  31   0   0 16674  16k  80%
17:41:09 P   5  3 Dono 0.9ms    0   0b   0 5274 3.4M 16m 16.4s  27   0   0 22134  16k  80%
17:42:09 P   5  3 Dono 0.9ms    0   0b   0 4826 3.1M 16m 19.8s  26   0   0 16386  16k  80%
17:43:09 P   5  3 Jned 0.9ms    0   0b   0 4957 3.2M 16m 18.7s  28   0   0 83677  16k  80%
17:44:09 P   5  3 Jned 0.9ms    0   0b   0 3693 2.4M 16m 27.2s  30   0   0  131k  16k  80%
17:45:09 P   5  3 Jned 0.9ms    0   0b   0 4151 2.7M 16m 26.3s  34   0   0  185k  16k  80%
17:46:09 P   5  3 Jned 1.5ms    0   0b   0 4420 2.8M 16m 25.0s  30   0   0  245k  16k  80%
17:47:09 P   5  3 Jned 1.3ms    0   0b   1 4806 3.1M 16m 21.0s  27   0   0  310k  16k  80%

There are a lot of flow control messages (around 30) per minute.  This is a lot of ON/OFF toggles of flow control where writes are briefly delayed rather than a steady “you can’t write” for 20 seconds straight.

It also interestingly spends a long time in the Donor/Desynced state (even though wsrep_desync was turned OFF hours before) and then moves to the Joined state (this has the same meaning as during an IST).

Does it matter?

As always, it depends.

If these are web requests and suddenly the database can only handle ~66% of the traffic, that’s likely a problem, but maybe it just slows down the website somewhat.  I want to emphasize that WRITES are what is affected here.  Reads on any and all nodes should be normal (though you probably don’t want to read from node3 since it is so far behind).

If this were some queue processing that had reduced throughput, I’d expect it to possibly catch up later

This can only be answered for your application, but the takeaways for me are:

  • Don’t underestimate your capacity requirements
  • Being at the redline normally means you are well past the redline for abnormal events.
  • Plan for maintenance and failure recoveries
  • Where possible, build queuing into your workflows so diminished throughput in your architecture doesn’t generate failures.

Happy clustering!

Graphs in this post courtesy of VividCortex.

The post High-load clusters and desynchronized nodes on Percona XtraDB Cluster appeared first on Percona Data Performance Blog.

Oct
30
2014
--

Get a handle on your HA at Percona Live London 2014

Liz_Fred_Kenny_Percona Live London

From left: Liz van Dijk, Frédéric Descamps and Kenny Gryp

If you’re following this blog, it’s quite likely you’re already aware of the Percona Live London 2014 conference coming up in just a few days. Just in case, though (you know, if you’re still looking for an excuse to sign up), I wanted to put a spotlight on the tutorial to be delivered by my esteemed colleagues Frédéric Descamps (@lefred) and Kenny Gryp (@gryp), and myself.

The past two years at Percona we’ve been spending a substantial amount of time working with customers taking their first steps into creating Highly Available MySQL environments built on Galera. Percona XtraDB Cluster allows you to get it up and running very fast, but as any weathered “HA” DBA will tell you, building the cluster is only the beginning. (Percona XtraDB Cluster is an open source (free) high-availability and high-scalability solution for MySQL clustering.)

Any cluster technology is likely to introduce a great amount of complexity to your environment, and in our tutorial we want to show you not only how to get started, but also how to avoid many of the operational pitfalls we’ve encountered. Our tutorial, Percona XtraDB Cluster in a nutshell, will be taking place on Monday 3 November and is a full-day (6 hours) session, with an intense hands-on approach.

We’ll be covering a great deal of practical topics, such as:

  • Things to keep in mind when migrating an existing environment over to PXC
  • How to manage and maintain the cluster, keeping it in good shape
  • Load balancing requests across the cluster
  • Considerations for deploying PXC in the cloud

Planning on attending? Be sure to come prepared! Given the hands-on approach of the tutorial, make sure you bring your laptop with enough disk space (~20GB) and processing power to run at least 4 small VirtualBox VM’s.

We look forward to seeing you there!

The post Get a handle on your HA at Percona Live London 2014 appeared first on MySQL Performance Blog.

Feb
18
2013
--

Percona MySQL University comes to Toronto on March 22

Percona CEO Peter Zaitsev leads a track at the inaugural Percona MySQL University event in Raleigh, N.C. on Jan. 29, 2013.

Percona CEO Peter Zaitsev leads a track at the inaugural Percona MySQL University event in Raleigh, N.C. on Jan. 29, 2013.

Following our events in Raleigh, Montevideo, Buenos Aires, Percona MySQL University comes to Toronto on March 22nd.

This is going to our most dense event yet, absolutely packed with information. Even though we have just 1 track we have 12 talks and 11 speakers. We had unique opportunity this time because Percona’s Consulting, Support, RemoteDBA, Training team are having internal meeting at the start of the week so we had many speakers available.

Specifically the MySQL High Availability, Replication, Clustering are getting a lot of coverage.

Special thanks to FreshBooks who kindly agreed to host this event at their office.

Event is FREE but space is limited, so reserve your space by registering now!

More info
What is Percona MySQL University? It’s a series of one-day, free events designed to educate and inform developers and system architects on the latest and greatest MySQL products, services and technologies. The practical knowledge you’ll receive will help you be more successful in tackling your own MySQL challenges. Percona MySQL University is an opportunity to learn from, and connect with, some of the world’s top experts in MySQL performance and scaling. It’s also a fantastic networking opportunity among your MySQL peers.

The post Percona MySQL University comes to Toronto on March 22 appeared first on MySQL Performance Blog.

Jan
24
2013
--

Percona MySQL University in Montevideo and Buenos Aires

Following our Percona MySQL University event in Raleigh,NC Percona MySQL University comes to South America! We’ll have a Full day FREE MySQL Technical Educational events in Montevideo on February 5th, 2013 and Buenos Aires on February 7th.

I’m very excited to bring these events to MySQL Community in Uruguay and Argentina. This is my first trip to South America and it looks like it is going to be a lot of fun!

With Percona MySQL University events, we focus on MySQL Education for broad group of users. We’ve specially prepared talks that will be interesting for people just starting with MySQL to experienced MySQL developers and DBAs. So whether you’re a college student or an experienced MySQL professional, you should find something of value.

Two of very exciting topics we’re going to cover are discussions on MySQL 5.6 with “MySQL 5.6: Advantages in a nutshell” as well as  the great new Replication an Clustering solution for MySQL based on Galera technology with “High availability and clustering with Percona XtraDB Cluster.” Some of the talks are going to be delivered in Spanish.

The event will also provide great networking opportunities and I hope you will make many new friends in MySQL Space from attending this event.

Finally let me thank the ORT University and Gallito for supporting Percona MySQL University in Montevideo.

Space is limited so do not delay, register now for Montevideo or Buenos Aires!

The post Percona MySQL University in Montevideo and Buenos Aires appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com