MaxScale: A new tool to solve your MySQL scalability problems

Ever since MySQL replication has existed, people have dreamed of a good solution to automatically split read from write operations, sending the writes to the MySQL master and load balancing the reads over a set of MySQL slaves. While if at first it seems easy to solve, the reality is far more complex.

First, the tool needs to make sure it parses and analyses correctly all the forms of SQL MySQL supports in order to sort writes from reads, something that is not as easy as it seems. Second, it needs to take into account if a session is in a transaction or not.

While in a transaction, the default transaction isolation level in InnoDB, Repeatable-read, and the MVCC framework insure that you’ll get a consistent view for the duration of the transaction. That means all statements executed inside a transaction must run on the master but, when the transaction commits or rollbacks, the following select statements on the session can be again load balanced to the slaves, if the session is in autocommit mode of course.

Then, what do you do with sessions that set variables? Do you restrict those sessions to the master or you replay them to the slave? If you replay the set variable commands, you need to associate the client connection to a set of MySQL backend connections, made of at least a master and a slave. What about temporary objects like with “create temporary table…”? How do you deal when a slave lags behind or what if worse, replication is broken? Those are just a few of the challenges you face when you want to build a tool to perform read/write splitting.

Over the last few years, a few products have tried to tackle the read/write split challenge. The MySQL_proxy was the first attempt I am aware of at solving this problem but it ended up with many limitations. ScaleARC does a much better job and is very usable but it stills has some limitations. The latest contender is MaxScale from MariaDB and this post is a road story of my first implementation of MaxScale for a customer.

Let me first introduce what is MaxScale exactly. MaxScale is an open source project, developed by MariaDB, that aims to be a modular proxy for MySQL. Most of the functionality in MaxScale is implemented as modules, which includes for example, modules for the MySQL protocol, client side and server side.

Other families of available modules are routers, monitors and filters. Routers are used to determine where to send a query, Read/Write splitting is accomplished by the readwritesplit router. The readwritesplit router uses an embedded MySQL server to parse the queries… quite clever and hard to beat in term of query parsing.

There are other routers available, the readconnrouter is basically a round-robin load balancer with optional weights, the schemarouter is a way to shard your data by schema and the binlog router is useful to manage a large number of slaves (have a look at Booking.com’s Jean-François Gagné’s talk at PLMCE15 to see how it can be used).

Monitors are modules that maintain information about the backend MySQL servers. There are monitors for a replicating setup, for Galera and for NDB cluster. Finally, the filters are modules that can be inserted in the software stack to manipulate the queries and the resultsets. All those modules have well defined APIs and thus, writing a custom module is rather easy, even for a non-developer like me, basic C skills are needed though. All event handling in MaxScale uses epoll and it supports multiple threads.

Over the last few months I worked with a customer having a challenging problem. On a PXC cluster, they have more than 30k queries/s and because of their write pattern and to avoid certification issues, they want to have the possibility to write to a single node and to load balance the reads. The application is not able to do the Read/Write splitting so, without a tool to do the splitting, only one node can be used for all the traffic. Of course, to make things easy, they use a lot of Java code that set tons of sessions variables. Furthermore, for ISO 27001 compliance, they want to be able to log all the queries for security analysis (and also for performance analysis, why not?). So, high query rate, Read/Write splitting and full query logging, like I said a challenging problem.

We experimented with a few solutions. One was a hardware load balancer that failed miserably – the implementation was just too simple, using only regular expressions. Another solution we tried was ScaleArc but it needed many rules to whitelist the set session variables and to repeat them to multiple servers. ScaleArc could have done the job but all the rules increases the CPU load and the cost is per CPU. The queries could have been sent to rsyslog and aggregated for analysis.

Finally, the HA implementation is rather minimalist and we had some issues with it. Then, we tried MaxScale. At the time, it was not GA and was (is still) young. Nevertheless, I wrote a query logging filter module to send all the queries to a Kafka cluster and we gave it a try. Kafka is extremely well suited to record a large flow of queries like that. In fact, at 30k qps, the 3 Kafka nodes are barely moving with cpu under 5% of one core. Although we encountered some issues, remember MaxScale is very young, it appeared to be the solution with the best potential and so we moved forward.

The folks at MariaDB behind MaxScale have been very responsive to the problems we encountered and we finally got to a very usable point and the test in the pilot environment was successful. The solution is now been deployed in the staging environment and if all goes well, it will be in production soon. The following figure is simplified view of the internals of MaxScale as configured for the customer:

MaxScale internals

The blocks in the figure are nearly all defined in the configuration file. We define a TCP listener using the MySQL protocol (client side) which is linked with a router, either the readwritesplit router or the readconn router.

The first step when routing a query is to assign the backends. This is where the read/write splitting decision is made. Also, as part of the steps required to route a query, 2 filters are called, regexp (optional) and Genlog. The regexp filter may be used to hot patch a query and the Genlog filter is the logging filter I wrote for them. The Genlog filter will send a json string containing about what can be found in the MySQL general query log plus the execution time.

Authentication attempts are also logged but the process is not illustrated in the figure. A key point to note, the authentication information is cached by MaxScale and is refreshed upon authentication failure, the refresh process is throttled to avoid overloading the backend servers. The servers are continuously monitored, the interval is adjustable, and the server status are used when the decision to assign a backend for a query is done.

In term of HA, I wrote a simple Pacemaker resource agent for MaxScale that does a few fancy things like load balancing with IPTables (I’ll talk about that in future post). With Pacemaker, we have a full fledge HA solution with quorum and fencing on which we can rely.

Performance wise, it is very good – a single core in a virtual environment was able to read/write split and log to Kafka about 10k queries per second. Although MaxScale supports multiple threads, we are still using a single thread per process, simply because it yields a slightly higher throughput and the custom Pacemaker agent deals with the use of a clone set of MaxScale instances. Remember we started early using MaxScale and the beta versions were not dealing gracefully with threads so we built around multiple single threaded instances.

So, since a conclusion is needed, MaxScale has proven to be a very useful and flexible tool that allows to elaborate solutions to problems that were very hard to tackle before. In particular, if you need to perform read/write splitting, then, try MaxScale, it is best solution for that purpose I have found so far. Keep in touch, I’ll surely write other posts about MaxScale in the near future.

The post MaxScale: A new tool to solve your MySQL scalability problems appeared first on MySQL Performance Blog.


Percona University: Back to school Feb. 12 in Raleigh, N.C.

Percona CEO Peter Zaitsev leads a track at the inaugural Percona University event in Raleigh, N.C. on Jan. 29, 2013.

Percona CEO Peter Zaitsev leads a track at the inaugural Percona University event in Raleigh, N.C. on Jan. 29, 2013.

About two years ago we held our first-ever Percona University event in Raleigh, N.C. It was a great success with high attendance and very positive feedback which led us to organize a number of similar educational events in different locations around the world.

And next month we’ll be back where it all started. On February 12, Percona University comes to Raleigh – and this time the full-day educational event will be much more cool. What have we changed? Take a look at the agenda.

First – this is no longer just a MySQL-focused event. While 10 years ago MySQL was the default, dominating choice for modern companies looking to store and process data effectively – this is no longer the case. And as such the event’s theme is “Smart Data.” In addition to MySQL, Percona and MariaDB technologies (which you would expect to be covered), we have talks about Hadoop, MongoDB, Cassandra, Redis, Kafka, SQLLite.

However the “core” data-store technologies is not the only thing successful data architects should know – one should also be well-versed in the modern approaches to the infrastructure and general data management. This is why we also have talks about Ansible and OpenStack, DBaaS and PaaS as well as a number of more talks about big-picture topics around architecture and technology management.

Second – this is our first multi-track Percona University event – we had so many great speakers interested in speaking that we could not fit them all into one track, so we have two tracks now with 25 sessions which makes that quite an educational experience!

Third – while we’re committed to having those events be very affordable, we decided to charge $10 per attendee. The reason for this is to encourage people to register who actually plan on attending – when hosting free events we found out that way too many registered and never showed up, which was causing the venues to rapidly fill past capacity and forcing us to turn away those who could actually be there. It was also causing us to order more food than needed, causing waste. We trust $10 will not prevent you from attending, but if it does cause hardship, just drop me a note and I’ll give you a free pass.

A few other things you need to know:

This is very much a technically focused event. I have encouraged all speakers to make it about technology rather than sales pitches or marketing presentations.

This is low-key educational event. Do not expect it to be very fancy. If you’re looking for the great conference experience consider attending the Percona Live MySQL Conference and Expo this April.

Although it’s a full-day event, you can come for just part of the day. We recognize many of you will not be able to take a full day from work and may be able to attend only in the morning or the afternoon. This is totally fine. The morning registration hours is when most people will register, however, there will be someone on the desk to get you your pass throughout the day.

Thinking of Attending? Take a look at the day’s sessions and then register as space is limited. The event will be held at North Carolina State University’s McKimmon Conference & Training Center. I hope to see you there!

The post Percona University: Back to school Feb. 12 in Raleigh, N.C. appeared first on MySQL Performance Blog.


Exploring message brokers

Message brokers are not regularly covered here but are, nonetheless, important web-related technologies. Some time ago, I was asked by one of our customer to review a selection of OSS message brokers and propose a couple of good candidates. The requirements were fairly simple: behave well when there’s a large backlog of messages, be able to create a cluster and in case of the failure of a node in a cluster, try to protect the data but never blocks the publishers even though that might imply data lost. Nothing fancy regarding queues and topics management. I decided to write my findings here, before I forget…

I don’t consider myself a message broker specialist and I spent only about a day or two on each so, I may have done some big mistakes configuration wise. I’ll take the blame if something is misconfigured or not used correctly.


RabbitMQ is well known and popular message broker and it has many powerful features. The documentation on the RabbitMQ web site is excellent and there are many books available. RabbitMQ is written in Erlang, not a widely used programming language but well adapted to such tasks. The company Pivotal develops and maintains RabbitMQ. I reviewed version 3.2.2 on CentOS 6 servers.

The installation was easy, I installed Erlang version R14B from epel and the RabbitMQ rpm. The only small issue I had was that the server is expecting “″ to be resolved in /etc/hosts and the openstack VMs I used were missing that. Easy to fix. I also installed and enabled the management plugin.

The RabbitMQ configuration is set in the rabbitmq.config file and it has tons of adjustable parameters. I used the defaults. In term of client API, RabbitMQ support a long list of languages and some standards protocols, like STOMP are available with a plugin. Queues and topics can be created either by the web interface or through the client API directly. If you have more than one node, they can be clustered and then, queues and topics, can be replicated to other servers.

I created 4 queues, wrote a ruby client and started inserting messages. I got a publishing rate of about 20k/s using multiple threads but I got a few stalls caused by the vm_memory_high_watermark, from my understanding during those stalls it writing to disk. Not exactly awesome given my requirements. Also, some part is always kept in memory even if a queue is durable so, even though I had plenty of disk space, the memory usage grew and eventually hit the vm_memory_high_watermark setting. The cpu load was pretty high during the load, between 40% and 50% on an 8 cores VM.

Even though my requirements were not met, I setup a replicated queue on 2 nodes and inserted a few millions objects. I killed one of the two nodes and insert were even faster but then… I did a mistake. I restarted the node and asked for a resync. Either I didn’t set it correctly or the resync is poorly implemented but it took forever to resync and it was slowing down as it progressed. At 58% done, it has been running for 17h, one thread at 100%. My patience was exhausted.

So, lots of feature, decent performance but behavior not compatible with the requirements.


Kafka has been designed originally by LinkedIn, it is written in Java and it is now under the Apache project umbrella. Sometimes you look at a technology and you just say: wow, this is really done the way it should be. At least I could say that for the purpose I had. What is so special about Kafka is the architecture, it stores the messages in flat files and consumers ask messages based on an offset. Think of it like a MySQL server (producer) saving messages (updates SQL) to its binlogs and slaves (consumers) ask messages based on an offset. The server is pretty simple and just don’t care about the consumers much. That simplicity makes it super fast and low on resource. Old messages can be retained on a time base (like expire_logs_days) and/or on a storage usage base.

So, if the server doesn’t keep track of what has been consumed on each topics, how do can you have multiple consumer. The missing element here is Zookeeper. The Kafka server uses Zookeeper for cluster membership and routing while the consumers can also use Zookeeper or something else for synchronization. The sample consumer provided with the server uses Zookeeper so you can launch many instances and they’ll synchronize automatically. For the ones that doesn’t know Zookeeper, it is a highly-available synchronous distributed storage system. If you know Corosync, it provides somewhat the same functionality.

Feature wise Kafka, isn’t that great. There’s no web frontend builtin although a few are available in the ecosystem. Routing and rules are inexistent and stats are just with JMX. But, the performance… I reached a publishing speed of 165k messages/s over a single thread, I didn’t bother tuning for more. Consuming was essentially disk bound on the server, 3M messages/s… amazing. That was without Zookeeker coordination. Memory and cpu usage were modest.

To test clustering, I created a replicated queue, inserted a few messages, stopped a replica, inserted a few millions more messages and restarted the replica. I took only a few seconds to resync.

So, Kafka is very good fit for the requirements, stellar performance, low resource usage and nice fit with the requirements.


ActiveMQ is another big player in the field with an impressive feature set. ActiveMQ is more in the RabbitMQ league than Kafka and like Kafka, it is written in Java. HA can be provided by the storage backend, levelDB supports replication but I got some issues with it. My requirements are not for full HA, just to make sure the publishers are never blocked so I dropped the storage backend replication in favor of a mesh of brokers.

My understanding of the mesh of brokers is that you connect to one of the members and you publish or consume a message. You don’t know on which node(s) the queue is located, the broker you connect to knows and routes your request. To further help, you can specify all the brokers on the connection string and the client library will just reconnect to another if the one you are connected to goes down. That looks pretty good for the requirements.

With the mesh of brokers setup, I got an insert rate of about 5000 msg/s over 15 threads and a single consumer was able to read 2000 msg/s. I let it run for a while and got 150M messages. At this point though, I lost the web interface and the publishing rate was much slower.

So, a big beast, lot of features, decent performance, on the edge with the requirements.


Kestrel is another interesting broker, this time, more like Kafka. Written in scala, the Kestrel broker speaks the memcached protocol. Basically, the key becomes the queue name and the object is the message. Kestrel is very simple, queues are defined in a configuration file but you can specify, per queue, storage limits, expiration and behavior when limits are reached. With a setting like “discardOldWhenFull = true”, my requirement of never blocking the publishers is easily met.

In term of clustering Kestrel is a bit limited but each can publish its availability to Zookeeper so that publishers and consumers can be informed of a missing server and adjust. Of course, if you have many Kestrel servers with the same queue defined, the consumers will need to query all of the broker to get the message back and strict ordering can be a bit hard.

In term of performance, a few simple bash scripts using nc to publish messages easily reached 10k messages/s which is very good. The rate is static over time and likely limited by the reconnection for each message. The presence of consumers slightly reduces the publishing rate but nothing drastic. The only issue I had was when a large number of messages expired, the server froze for some time but that was because I forgot to set maxExpireSweep to something like 100 and all the messages were removed in one pass.

So, fairly good impression on Kestrel, simple but works well.


For the requirements given by the customer, Kafka was like a natural fit. It offers a high guarantee that the service will be available and non-blocking under any circumstances. In addition, messages can easily be replicated for higher data availability. Kafka performance is just great and resource usage modest.

The post Exploring message brokers appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com