Common MongoDB Topologies

MongoDB Topologies

In this blog post, we’ll look at some common MongoDB topologies used in database deployments.

The question of the best architecture for MongoDB will arise in your conversations between developers and architects. In this blog, we wanted to go over the main sharded and unsharded designs, with their pros and cons.

We will first look at “Replica Sets.” Replica sets are the most basic form of high availability (HA) in MongoDB, and the building blocks for sharding. From there, we will cover sharding approaches and if you need to go that route.

Replica Set

Replica Set

From the MongoDB manual:

replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.

Short of sharding, this is the ideal way to run MongoDB. Things like high availability, failover and recovery become automated with no action typically needed. If you expect large growth or more than 200G of data, you should consider using this plus sharding to reduce your mean time to recovery on a restore from backup.


  • Elections happen automatically and unnoticed by application setup with retry
  • Rebuilding a new node, or adding an additional read-only node, is as easy as  “rs.add(‘hostname’)”
  • Can skip building indexes to improve write speed
  • Can have members
    • hidden in other geographic location
    • delayed replication
    • analytics nodes via taggings


  • Depending on the size of the oplog used, you can use  10-100+% more space to hold to change data for replication
  • You must scale up not out meaning more expensive hardware
  • Recovery using a sharded approach is faster than having is all on a single node ( parallelism)

Flat Mongos (not load balanced)

Flat Mongos

This is one of MongoDB’s more suggested deployment designs. To understand why, we should talk about the driver and the fact that it supports a CSV list of mongos hosts for fail-over.

You can’t distribute writes in a single replica set. Instead, they all need to go to the primary node. You can distribute reads to the secondaries using Read Preferences. The driver keeps track of what is a primary and what is a secondary and routes queries appropriately.

Conceptually, the driver should have connections bucketed into the mongos they go to. This allowed the 3.0+ driver to be semi-stateless and ask any connection to a specific mongos to preform a getMore to that mongos. In theory, this allows slightly more concurrency. Realistically you only use one mongos, since this is only a fail-over system.


  • Mongos is on its own gear, so it will not run the application out of memory
  • If Mongos doesn’t respond, the driver “fails-over” to the next in the list
  • Can be put closer to the database or application depending on your network and sorting needs


  • You can’t use mongos in a list evenly, so it is only good for fail-over (not evenness) in most drivers. Please read specific drivers for support, and test thoroughly.

Load Balanced (preferred if possible)

Load Balanced

According to the Mongo docs:

You may also deploy a group of mongos instances and use a proxy/load balancer between the application and the mongos. In these deployments, you must configure the load balancer for client affinity so that every connection from a single client reaches the same mongos.

This is the model used by platforms such as ObjectRocket. In this pattern, you move mongos nodes to their own tier but then put them behind a load-balancer. In this design, you can even out the use of mongos by using a least-connection system. The challenge, however, is new drivers have issues with getMores. By this we mean the getMore selects a new random connection, and the load balancer can’t be sure which mongos should get it. Thus it has a one in N (number of mongos) chance of selecting the right one, or getting a “Cursor Not Found” error.


  • Ability to have an even use of mongos
  • Mongos are separated from each other and the applications to prevent memory and CPU contention
  • You can easily remove or add mongos to help scale the layer without code changes
  • High availability at every level (multiple mongos, multiple configs, ReplSet for high availability and even multiple applications for app failures)


  • If batching is used, unless switched to an IP pinning algorithm (which loses evenness) you can get “Cursor Not Found” errors due to the wrong mongos getting getMore and bulk connector connections

App-Centric Mongos

Appcentric Mongos

By and large, this is one of the most typical deployment designs for MongoDB sharding. In it, we have each application host talking to a mongos on the local network interface. This ensures there is very little latency to the application from the mongos.

Additionally, this means if a mongos fails, at most its own host is affected instead of the wider range of all application hosts.


  • Local mongos on the loopback interface mean low to no latency
  • Limited scope of outage if this mongos fails
  • Can be geographically farther from the data storage in cases where you have a DR site


  • Mongos is a memory hog; you could steal from your application memory to support running it here
    • Made worse with large batches, many connections, and sorting
  • Mongos is single-threaded and could become a bottleneck for your application
  • It is possible for a slow network to cause bad decision making, including duplicate databases on different shards. The functional result is data writing intermittently to two locations, and a DBA must remediate that at some point (think MMM VIP ping pong issues)
  • All sorting and limits are applied on the application host. In cases where the sort uses an index this is OK, but if not indexed the entire result set must be held in memory by mongos and then sorted, then returned the limited number of results to the client. This is the typical cause of mongos OOM’s errors due to the memory issues listed before.


The topologies are above cover many of the deployment needs for MongoDB environments. Hope this helps, and list any questions in the comments below.


Webinar Wednesday, October 18, 2017: How to Scale with MongoDB

Scale with MongoDB

Scale with MongoDBJoin Percona’s Senior Technical Services Engineer Adamo Tonete as he presents How To Scale with MongoDB on Wednesday, October 18, 2017, at 11:00 am PDT / 2:00 pm EDT (UTC-7).

In this webinar, we will talk about how to scale with MongoDB, up to thousands of writes and reads per second. What are the common issues when you scale with MongoDB? Is it better to shard or to add further secondaries?

We will walk through many common scaling situations, and through the steps needed to deploy a sharded cluster: from a single instance to a sharded environment. We will also talk about common mistakes/pitfalls a company can make when scaling its database – and how to avoid such situations.

Register for the webinar.

Adamo Tonete, Senior Technical Services Engineer

Adamo joined Percona in 2015, after working as a MongoDB/MySQL database administrator for three years. As the main database member of a startup, he was responsible for suggesting the best architecture and data flows for a worldwide company in a 24×7 environment. Before that, he worked as a Microsoft SQL Server DBA for a large e-commerce company, mainly on performance tuning and automation. Adamo has almost eight years of experience working as a DBA, and in the past three has moved to NoSQL technologies without giving up relational databases. He likes to play video games and study everything that is related to engines. Adamo lives with his wife in São Paulo, Brazil.


When Should I Enable MongoDB Sharding?

MongoDB Sharding

MongoDB ShardingIn this blog post, we will talk about MongoDB sharding and walk through the main reasons why you should start a cluster (independent of the approach you have chosen).

Note: I will cover this subject in my webinar How To Scale with MongoDB on Wednesday, October 18, 2017, at 11:00 am PDT / 2:00 pm EDT (UTC-7).

Sharding is the most complex architecture you can deploy using MongoDB, and there are two main approaches as to when to shard or not. The first is to configure the cluster as soon as possible – when you predict high throughput and fast data growth.

The second says you should use a cluster as the best alternative when the application demands more resources than the replica set can offer (such as low memory, an overloaded disk or high processor load). This approach is more corrective than preventative, but we will discuss that in the future.

1) Disaster recovery plan

Disaster recovery (DR) is a very delicate topic: how long would you tolerate an outage? If necessary, how long would it take you to restore the entire database? Depending on the database size and on disk speed, a backup/restore process might take hours or even days!
There is no hard number in Gigabytes to justify a cluster. But in general, you should engage when the database is more than 200GB the backup and restore processes might take a while to finish.
Let’s consider the case where we have a replica set with a 300GB database. The full restore process might last around four hours, whereas if the database has two shards, it will take about two hours – and depending on the number of shards we can improve that time. Simple math: if there are two shards, the restore process takes half of the time to restore when compared to a single replica set.

2) Hardware limitations

Disk and memory are inexpensive nowadays. However, this is not true when companies need to scale out to high numbers (such as TB of RAM). Suppose your cloud provider can only offer you up to 5,000 IOPS in the disk subsystem, but the application needs more than that to work correctly. To work around this performance limitation, it is better to start a cluster and divide the writes among instances. That said, if there are two shards the application will have 10000 IOPS available to use for writes and reads in the disk subsystem.

3) Storage engine limitations

There are a few storage engine limitations that can be a bottleneck in your use case. MMAPv2 does have a lock per collection, while WiredTiger has tickets that will limit the number of writes and reads happening concurrently. Although we can tweak the number of tickets available in WiredTiger, there is a virtual limit – which means that changing the available tickets might generate processor overload instead of increasing performance. If one of these situations becomes a bottleneck in your system, you start a cluster. Once you shard the collection, you distribute the load/lock among the different instances.

4) Hot data vs. cold data

Several databases only work with a small percentage of the data being stored. This is called hot data or working set. Cold data or historical data is rarely read, and demands considerable system resources when it is. So why spend money on expensive machines that only store cold data or low-value data? With a cluster deployment we can choose where the cold data is stored, and use cheap devices and disks to do so. The same is true for hot data – we can use better machines to have better performance. This methodology also speeds up writes and reads on the hot data, as the indexes are smaller and add less overhead to the system.

5) Geo-distributed data

It doesn’t matter whether this need comes from application design or legal compliance. If the data must stay within continent or country borders, a cluster helps make that happen. It is possible to limit data localization so that it is stored solely in a specific “part of the world.” The number of shards and their geographic positions is not essential for the application, as it only views the database. This is commonly used in worldwide companies for better performance, or simply to comply with the local law.

6) Infrastructure limitations

Infrastructure and hardware limitations are very similar. When thinking about infrastructure, however, we focus on specific cases when the instances should be small. An example is running MongoDB on Mesos. Some providers only offer a few cores and a limited amount of RAM. Even if you are willing to pay more for that, it is not possible to purchase more than they offer as their products. A cluster provides the option to split a small amount of data among a lot of shards, reaching the same performance a big and expensive machine provides.

7) Failure isolation

Consider that a replica set or a single instance holds all the data. If for any reason this instance/replica set goes down, the whole application goes down. In a cluster, if we lose one of the five shards, 80% of the data is still available. Running a few shards helps to isolate failures. Obviously, running a bunch of instances makes the cluster prone to have a failed instance, but as each shard must have at least three instances the probability of the entire shard being down is minimal. For providers that offer different zones, it is good practice to have different members of the shard in different availability zones (or even different regions).

8) Speed up queries

Queries can take too long, depending on the number of reads they perform. In a clustered deployment, queries can run in parallel and speed up the query response time. If a query runs in ten seconds in a replica set, it is very likely that the same query will run in five to six seconds if the cluster has two shards, and so on.

I hope this helps with MongoDB sharding. Having a cluster solves several other problems as well, and we have listed only a few of them. Don’t miss our webinar regarding scaling out MongoDB next Wednesday, October 18, 2017!


Percona Live Featured Tutorial with Derek Downey, David Turner and René Cannaò — ProxySQL Tutorial

Percona Live Featured Tutorial

Welcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Derek Downey (OSDB Practice Advocate, Pythian), David Turner (Storage SRE, Uber) and René Cannaò (MySQL SRE, Dropbox / ProxySQL). Their session is ProxySQL Tutorial. There is a stigma attached to database proxies when it comes to MySQL. This tutorial hopes to blow away that stigma by showing you what can be done with a proxy designed from the ground up to perform. I had a chance to speak with Derek, David and René and learn a bit more about ProxySQL:

Percona: How did you get into database technology? What do you love about it?

Percona Live Featured Tutorial
Derek Downey

Derek: I took a relational database course in college based on Oracle. Set theory and the relational model made a lot of sense to me. After a few years as a web developer at a small company, I transitioned to a hybrid SysAdmin/DBA role and got my first taste of the potential of “the cloud” (and some of the drawbacks).

I really came to understand that data is the lifeblood of any organization, and making sure it is always available through any disaster – from human error to hurricanes – is a unique and exciting challenge.

You should never notice the DBA if they’re doing their job right. There’s not much praise for a DBA on a job well done. But it’s a vital position to keep a company running. And that suits me just fine.

David: I started working for the Advanced Projects Group at the University of Missouri, now known as MOREnet. They were largely responsible for connecting all of the libraries and schools in the state to the Internet. I was initially helping them with their Internet presence as a webmaster. Later they needed help with their databases. I got very excited about working with Oracle at the time, and decided to join that team.

My relationship with MySQL started primarily because the cost of sharding Oracle was so high. Additionally, MySQL’s replication allowed us to use slaves. Oracle’s Dataguard/standby options wouldn’t allow reads from the slaves at that time. Lastly, MySQL was sort of “wild west” fun, since it lacked so many other features that Oracle had long ago. You had to get creative. It has been humbling to see how much innovation has come from the community and how far MySQL has come. And this is only the beginning!

Percona Live Featured Tutorial
René Cannaò

René: My career followed the classic path of a system administrator that ends up becoming a DBA. I used to work for a few companies as webmaster,  and finally as SysAdmin for a web hosting company. I always saw a similar pattern: “the bottleneck is in the database.” Nobody ever knew why the database was the bottleneck. I volunteered to improve the performance of this “unknown system.” Learning was a fun experience, and the result was extremely rewarding. I love understanding how databases operate and their internals. This is the only way to be able to get the maximum performance: “scientia potentia est”!

Percona: Your tutorial is called “ProxySQL Tutorial.” What exactly is ProxySQL, and what does it do?

Derek: I’ll leave it to René, the creator of ProxySQL, to give more detail on exactly what it is. But for a DBA trying to ensure their data is always available, it is a new and exciting tool in our toolbox.

René: ProxySQL is the MySQL data gateway. It’s the Stargate that can inspect, control, transform, manage, and route all traffic between clients and database servers. It builds reliable and fault-tolerant networks. It is a software bridge that empowers MySQL DBAs, built by DBAs for DBAs, allowing them to control all MySQL traffic where previously such traffic could not be controlled either on the client side (normally the developers’ realm) or server side (where there are not enough tools).

Percona Live Featured Tutorial
David Turner

David: Architecturally, ProxySQL is a separate process between the client and the database. Because traffic passes through it, ProxySQL can become many things (three of which got my attention). It can be a multiplexer, a filter, and a replicator.

Multiplexers reduce many signals down to a few. Web servers often open many static connections to MySQL. Since MySQL can only support a limited number of connections before performance suffers, ProxySQL’s ability to transparently manage tens of thousands of connections while only opening a few to the database is a great feature.

Administrators can update ProxySQL to filter and even rewrite queries based on patterns they decide on. As someone that has worked in operations and seen how long it can take to disable misbehaving applications, this is a very compelling feature. With ProxySQL in place, I can completely block a query from the database in no time.

ProxySQL’s replication or mirroring capability means that all of the queries sent to one database can now be sent to N databases. As someone that has to roll out new versions of MySQL, test index changes, and benchmark hardware this is also a compelling feature.

Percona: What are the advantages of using ProxySQL in a database environment?

René: ProxySQL is the bridge between the clients and the servers. It creates two layers, and controls all the communication between the two. Sitting in the middle, ProxySQL provides a lot of advantages normally impossible to achieve in a standard database environment, such as throttling or blocking queries, rewriting queries, implementing sharding, read/write splitting, caching, duplicating traffic, handling backend failures, failovers, integration with HA solutions, generating real-time statistics, etc. All this, without any application change, and completely transparent to the application.

Derek: For me, ProxySQL decouples the application from the data layer. This provides more control over the backend database environment to the DBA in regards to queries, maintenance and failovers, without impact to the application.

David: In addition to the roles noted above, ProxySQL can be part of a failover solution, routing queries to a new master when the current master fails. Other advantages are splitting queries over multiple databases to distribute the load, provide additional metrics, etc.

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Derek: This tutorial highlights what ProxySQL is trying to achieve, and discusses how to add ProxySQL to common architectures environments. Attendees will get hands-on experience with the technology, and learn how to install and configure ProxySQL to achieve query rewriting, seamless failover, and query mirroring.

David: Not only hands-on experience with ProxySQL, but an understanding of how much they can leverage with it. The more I use ProxySQL, the more advantages I see to it. For example, I did not realize that by clustering ProxySQL processes I can distribute the query matching and rewrites over many hosts, as well as use them as a caching layer.

René: ProxySQL is built upon very innovative technologies. Certain architectural concepts like hostgroups, chaining of query rules, granular routing and sharding, query retries and the very powerful Admin interface are concepts not always intuitive for DBAs with experience using other proxies. This tutorial helps understand these concepts, and attendees get hand-on experience in the configuration of ProxySQL in various scenarios.

Percona: What are you most looking forward to at Percona Live?

David: First, the people. Next, everything everyone is working on. We’re really lucky to work in such an innovative and collaborative industry. As databases evolve, we are on the ground floor of their evolution. What an exciting place to be.

Derek: I am mostly looking forward to reconnecting with my peers in the MySQL community. Both ones I’ve formerly worked with or previously met at Percona Live, as well as meeting new open source database professionals and hearing how they are providing solutions for their companies.

René: I am looking forward to attending sessions regarding new features in MySQL 8 and other new technologies. But moreover, I am excited to interact with MySQL users and get more input on how to improve ProxySQL so that it can become an indispensable tool in any MySQL environment.

Register for Percona Live Data Performance Conference 2017, and see Derek, David and René present their ProxySQL Tutorial. Use the code FeaturedTutorial and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.


MySQL Sharding Models for SaaS Applications

MySQL Sharding Models

MySQL Sharding ModelsIn this blog post, I’ll discuss MySQL sharding models, and how they apply to SaaS application environments.

MySQL is one of the most popular database technologies used to build many modern SaaS applications, ranging from simple productivity tools to business-critical applications for the financial and healthcare industries.

Pretty much any large scale SaaS application powered by MySQL uses sharding to scale. In this blog post, we will discuss sharding choices as they apply to these kinds of applications.

In MySQL, unlike in some more modern technologies such as MongoDB, there is no standard sharding implementation that the vast majority of applications use. In fact, if anything “no standard” is the standard. The common practice is to roll your own sharding framework, as famous MySQL deployments such as Facebook and Twitter have done. MySQL Cluster – the MySQL software that has built-in Automatic Sharding functionality – is rarely deployed (for a variety of reasons). MySQL Fabric, which has been the official sharding framework, has no traction either.

When sharding today, you have a choice of rolling your own system from scratch, using comprehensive sharding platform such as Vitess or using a proxy solution to assist you with sharding. For proxy solutions, MySQL Router is the official solution. But in reality, third party solutions such as open source ProxySQL, commercial ScaleArc and semi-commercial (BSL)  MariaDB MaxScale are widely used. Keep in mind, however, that traffic routing is only one of the problems that exist in large scale sharding implementations.

Beneath all these “front end” choices for sharding on the application database connection framework or database proxy, there are some lower level decisions that you’ve got to make. Namely, around how your data is going to be led out and organized on the MySQL nodes.

When it comes to SaaS applications, at least one answer is simple. It typically makes sense to shard your data by “customer” or “organization” using some sort of mapping tables. In the vast majority of cases, single node (or replicated cluster) should be powerful enough to handle all the data and load coming from each customer.

What Should I Ask Myself Now?

The next set questions you should ask yourself are around your SaaS applications:

  • How much revenue per customer are you generating?
  • Do your customers (or regulations) require data segregation?
  • Are all the customers about the same, or are there outliers?
  • Are all your customers running the same database schema?

I address the answers in the sections below.

How Much Revenue?

How much revenue per customer you’re generating is an important number. It defines how much infrastructure costs per customer you can afford. In the case of “freemium” models, and customers generating less than $1 a month an average, you might need to ensure low overhead per customer (even if you have to compromise on customer isolation).

How much revenue per customer you’re generating is an important number. It defines how much infrastructure costs per customer you can afford. In the case of “freemium” models, and customers generating less than $1 a month an average, you might need to ensure low overhead per customer (even if you have to compromise on customer isolation).

Typically with low revenue customers, you have to co-locate the data inside the same MySQL instance (potentially even same tables). In the case of high revenue customers, isolation in separate MySQL instances (or even containers or virtualized OS instances) might be possible.

Data Segregation?

Isolation is another important area of consideration. Some enterprise customers might require that their data is physically separate from others. There could also be government regulations in play that require customer data to be stored in a specific physical location. If this is the case, you’re looking at completely dedicated customer environments. Or at the very least, separate database instances (which come with additional costs).

Customer Types?

Customer size and requirements are also important. A system designed to handle all customers of approximately the same scale (for example, personal accounting) is going to be different than if you are in the business of blog hosting. Some blogs might be 10,000 times more popular than the average.

Same Database Schema?

Finally, there is a there is the big question of whether all your customers are running the same database schema and same software version. If you want to support different software versions (if your customers require a negotiated maintenance window for software upgrades, for example) or different database schemas (if the schema is dependent on the custom functionality and modules customers might use, for example), keeping such customers in different MySQL schemas make sense.

Sharding Models

This gets us to the following sharding isolation models, ranging from lowest to highest:

  • Customers Share Schemas. This is the best choice when you have very large numbers of low-revenue customers. In this case, you would map multiple customers to the same set of tables, and include something like a customer_id field in them to filter customer data. This approach minimizes customer overhead and reduces customer isolation. It’s harder to backup/restore data for individual customers, and it is easier to introduce coding mistakes that can access other customers data. This method does not mean there is only one schema, but that there is a one-to-many relationship between schemas and customers.  For example, you might have 100 schema’s per MySQL instance, each handling 1000 to 10000 customers (depending on the application). Note that with a well-designed sharding implementation, you should be able to map customers individually to schemas. This allows you to have key customer data stored in dedicated schemas, or even on dedicated nodes.
  • Schema per Customer. This is probably the most common sharding approach in MySQL powered SaaS applications. Especially ones that have substantial revenue ($10+ per month / per customer). In this model, each customer’s data is stored in its own schema (database). This makes it very easy to backup/restore individual customers. It allows customers to have different schemas (i.e., add custom tables). It also allows them to run different versions of the application if desired. This approach allows the application server to use different MySQL users connecting on behalf of different customers, which adds an extra level of protection from accidental (or intentional) access of data that belongs to different customers. The schema per customer approach also makes it easier to move the shards around, and limits maintenance impact. The downside of this approach is higher overhead. It also results in a large number of tables per instance, and potentially larger numbers of files (which can be hard to manage).
  • Database Instance per Customer. You achieve even better isolation by having a MySQL instance per customer. This approach, however, increases overhead even further. The recent rise of light virtualization technologies and containers has reduced its usage.
  • OS Instance/Container per Customer. This approach allows you to improve isolation even further. It can be used for any customer, but can also be applied to selected customers in a model that uses Schema per Customer model for a majority of them.  Dedicated OS Instance, with improved isolation and better performance SLAs, might be a feature of some premium customer tiers. This method not only allows better isolation, but it also let’s you handle outliers better. You might chose to run a majority of your customers on the hardware (or cloud instance) that has best price/performance numbers, and also place some of the larger customers on the highest performance nodes.
  • Environment per customer. Finally, if you take this all the way you can build completely separate environments for customers. This includes databases, application servers and other required components. This is especially useful if you need to deploy the application close to the customer – which includes the appliance model, or deployment in the customer’s data center or cloud provider. This also allows you to accommodate customers if their data must be stored in a specific location. This is often due to government regulations. It is worth noting that many SaaS applications, even if they do not quite have one environment per customer, have multiple independent environments. These are often hosted in different locations or availability zones. Such setups allow you to reduce the impact of large-scale failures to only a portion of your customers. This avoids overloading your customer service group and allowing the operational organization to focus on repairing smaller environments.

The farther you go down this route – from the shared schema to an environment per customer – the more important is to have a high level of automation. With a shared schema, you often can get by with little automation (and some environments manually set up) and all the schema’s pre-created. If customer sign up requires setting up dedicated database instance or the whole environment, manual implementation doesn’t scale. For this type of setup, you need state-of-the-art automation and orchestration.


I hope this helps you to understand your options for MySQL sharding models. Each of the different sharding models for SaaS applications powered by MySQL have benefits and drawbacks. As you can see, many of these approaches require you to work with a large number of tables in the MySQL – this will be the topic of one of my next posts!


MongoDB PIT Backups: Part 2

MongoDB PIT Backups

This blog post is the second in a series covering MongoDB PIT backups. You can find the first part here.

Sharding Makes Everything Fun(ner)

The first blog post in this series looked at MongoDB backups in a simple single-replica set environment. In this post, we’ll look at the scale-out use case. When sharding, we have exactly the same problem as we do on a single replica set. However, now the problem is multiplied by the number of replica sets in the cluster. Additionally, we have a bonus problem: each replica set has unique data. That means to get a truly consistent snapshot of the cluster, we need to orchestrate our backups to capture a single consistent point in time. Just so we’re on the same page, that means that every replica set needs to stop their backups at, or near, the same time that the slowest replica set stops. Are you sufficiently confused now? Let me get to a basic concept that I forgot to cover in the first post, and then I’ll give you a simple description of the problem.

Are you Write Concerned?

So far, I’ve neglected to talk about the very important role of “write concern” when taking consistent backups. In MongoDB, the database is not durable by default. By “durable,” I mean “on disk” when the database acknowledges receipt of an operation from your application. There are most likely several reasons for this. Most likely the biggest one originally was probably throughput given a lack of concurrency.

However, the side effect is possible data loss due to loss of operations applied only in memory. Changing the write concern to “journaled” (

j : true

) will change this behavior so that MongoDB journals changes before acknowledging them (you also need to be running with journal enabled).

TIP: For true durability in a replica set, you should use a write concern of “majority” for operations and the writeConcernMajorityJournalDefault : true on all replica set members (new to v3.4). This has the added benefit of greatly decreasing the chance of rollback after an election.

Wow, you’re inconsistent

At the risk of being repetitive, the crux of this issue is that we need to run a backup on every shard (replica set). This is necessary because every shard has a different piece of the data set. Each piece of that data set is necessary to get an entire picture of the data set for the cluster (and thus, your application). Since we’re using mongodump, we’ll only have a consistent snapshot at the point in time when the backup completes. This means we must end each shard’s backup at a consistent point in time. We cannot expect that the backup will complete in exactly the same amount of time on every shard, which is what we’ll need for a consistent point in time across the cluster. This means that Shard1 might have a backup that is consistent to 12:05 PM, and another shard that is consistent to 12:06 PM. In a high traffic environment (the kind that’s likely to need horizontal scale), this could mean thousands of lost documents. Here’s a diagram:

MongoDB PIT Backups
MongoDB PIT Backups


Here’s the math to illustrate the problem:

  • Shard1’s backup will contain 30,000 documents ((100 docs * 60 secs) * 5 mins)
  • Shard2’s backup will contain 36,000 documents ((100 docs * 60 secs) * 6 mins)

In this example, to get a consistent point in time you’d need to remove all insert, update and delete operations that happened on Shard 2 from the time that Shard 1’s backup completed (6,000 documents). This means examining the timestamp of every operation in the oplog and reversing it’s operation. That’s a very intensive process, and will be unique for every mongodump that’s executed. Furthermore, this is a pretty tricky thing to do. The repeatable and much more efficient method is to have backups that finish in a consistent state, ready to restore when needed.

Luckily, Percona has you covered!

You’re getting more consistent

Having data is important, but knowing what data you have is even more important. Here’s how you can be sure you know what you have in your MongoDB backups:

David Murphy has released his MongoDB Consistent Backup Tool in the Percona Labs github account, and has written a very informative blog post about it. My goal with these blog posts is to make it even easier to understand the problem and how to solve it. We’ve already had an exhaustive discussion about the problem on both small and large scales. How about the solution?

It’s actually pretty simple. The solution, at a basic level, is to use a simple algorithm to decide when a cluster-wide consistent point-in-time can be reached. In the MongoDB Consistent Backup tool, this is done by the backup host kicking off backups on a “known good member” of each shard (that’s a pretty cool feature by itself) and then tracking the progress of each dump. At the same time the backup is kicked off, the backup host kicks off a separate thread that tails the oplog on each “known good member” until the mongodump on the slowest shard completes. By using this method, we have a very simple way of deciding when we can get a cluster-wide consistent snapshot. In other words, when the slowest member completes their piece of the workload. Here’s the same workload from Figure 4, but with the MongoDB Consistent Backup Tool methodology:

MongoDB PIT Backups
MongoDB PIT Backups


TIP: The amount of time that it takes to perform these backups is often decided by two factors:

  1. How evenly distributed the data is across the shards (balanced)
  2. How much data each shard contains (whether or not it’s balanced).

The takeaway here is that you may need to shard so that each shard has a manageable volume of data. This allows you to hit your backup/restore windows more easily.

…The Proverbial “Monkey Wrench”

There’s always a “gotcha” just when you think you’ve got your mind around any difficult concept. Of course, this is no different.

There is one very critical concept in sharding that we didn’t cover: tracking what data lies on which shard. This is important for routing the workload to the right place, and balancing the data across the shards. In MongoDB, this is completed by the config servers. If you cannot reach (or recover) your config servers, your entire cluster is lost! For obvious reasons, you need to back them up as well. With the Percona Labs MongoDB Consistent Backup Tool, there are actually two modes used to backup config servers: v3.2 and greater, and legacy. The reason is that in v3.2, config servers went from mirrors to a standard replica set. In v3.2 mode, we just treat the config servers like another replica set. They have their own mongodump and oplog tail thread. They get a backup that is consistent to the same point in time as all other shards in the cluster. If you’re on a version of MongoDB prior to v3.2, and you’re interested in an explanation of legacy mode, please refer back to David’s blog post.

The Wrap Up

We’ve examined the problems with getting consistent backups in a running MongoDB environment in this and the previous blog posts. Whether you have a single replica set or a sharded cluster, you should have a much better understanding of what the problems are and how Percona has you covered. If you’re still confused, or you’d just like to ask some additional questions, drop a comment in the section below. Or shoot me a tweet @jontobs, and I’ll make sure to get back to you.


MongoDB 3.4: Sharding Improvements

Sharding Improvements

In this blog post, we will discuss some of the Sharding improvements in the recent MongoDB 3.4 GA release.


Let’s go over what MongoDB Sharding “is” at a simplified, high level.

The concept of “sharding” exists to allow MongoDB to scale to very large data sets that may exceed the available resources of a single node or replica set. When a MongoDB collection is sharding-enabled, it’s data is broken into ranges called “chunks.” These are intended to be evenly distributed across many nodes or replica sets (called “shards”). MongoDB computes the ranges of a given chunk based on a mandatory document-key called a “shard key.” The shard key is used in all read and write queries to route a database request to the right shard.

The MongoDB ecosystem introduced additional architectural components so that this could happen:

  1. A shard. A single MongoDB node or replica set used for storing the cluster data. There are usually many shards in a cluster and more shards can be added/removed to scale.
  2. “mongos” router. A sharding-aware router for handling client traffic. There can be one or more mongos instances in a cluster.
  3. The “config servers”. Used for storing the cluster metadata. Config servers are essentially regular MongoDB servers dedicated to storing the cluster metadata within the “config” database. Database traffic does not access these servers, only the mongos.

Under sharding, all client database traffic is directed to one or more of the mongos router process(es), which use the cluster metadata, such as the chunk ranges, the members of the cluster, etc., to service requests while keeping the details of sharding transparent to the client driver. Without the cluster metadata, the routing process(es) do not know where the data is, making the config servers a critical component of sharding. Due to this, at least three config servers are required for full fault tolerance.

Sharding: Chunk Balancer

To ensure chunks are always balanced among the cluster shards, a process named the “chunk balancer” (or simply “balancer”) runs periodically, moving chunks from shard to shard to ensure data is evenly distributed. When a chunk is balanced, the balancer doesn’t actually move any data, it merely coordinates the transfer between the source and destination shard and updates the cluster metadata when chunks have moved.

Before MongoDB 3.4, the chunk balancer would run on whichever mongos process could acquire a cluster-wide balancer lock first. From my perspective this was a poor architectural decision for these reasons:

  1. Predictability. Due to the first-to-lock nature, the mongos process running the balancer is essentially chosen at random. This can complicate troubleshooting as you try to chase down which mongos process is the active balancer to see what it is doing, it’s logs, etc. As a further example: it is common in some deployments for the mongos process to run locally on application servers and in large organizations it is common for a DBA to not have access to application hosts – something I’ve ran into many times myself.
  2. Efficiency. mongos was supposed to be a stateless router, not a critical administrative process! As all client traffic passes in-line through the mongos process, it is important for it to be as simple, reliable and efficient as possible.
  3. Reliability. in order to operate, the mongos process must read and write cluster metadata hosted within the config servers. As mongos is almost always running on a physically separate host from the config servers, any disruption (network, hardware, etc) in between the balancer and config server nodes will break balancing!

Luckily, MongoDB 3.4 has come to check this problem (and many others) off of my holiday wish list!

MongoDB 3.4: Chunk Balancer Moved to Config Servers

In MongoDB 3.4, the chunk balancer was moved to the Primary config server, bringing these solutions to my concerns about the chunk balancer:

  1. Predictability. The balancer is always running in a single, predictable place: the primary config server.
  2. Efficiency. Removing the balancer from “mongos” allows it to worry about routing only. Also, as config servers are generally dedicated nodes that are never directly hit by client database traffic, in my opinion this is a more efficient place for the balancer to run.
  3. Reliability. Perhaps the biggest win I see with this change is the balancer can no longer lose connectivity with the cluster metadata that is stored on separate hosts. The balancer now runs inside the same node as the metadata!
  4. Centralized. As a freebie, now all the background/administrative components of Sharding are in one place!

Note: although we expect the overhead of the balancer to be negligible, keep in mind that a minor overhead is added to the config server Primary-node due to this change.

See more about this change here:

MongoDB 3.4: Required Config Server Replica Set

In MongoDB releases before 3.2, the set of cluster config servers received updates using a mode called Sync Cluster Connection Config (SCCC) to ensure all nodes received the same change. This essentially meant that any updates to cluster metadata would be sent N x times from the mongos to the config servers in a fan-out pattern. This is another legacy design choice that always confused me, considering MongoDB already has facilities for reliably replicating data: MongoDB Replication. Plus without transactions in MongoDB, there are some areas where SCCC can fail.

Luckily MongoDB 3.2 introduced Replica-Set based config servers as an optional feature. This moved us away from the SCCC fan-out mode to traditional replication and write concerns for consistency. This brought many benefits: rebuilding a config server node became simpler, backups became more straightforward and flexible and the move towards a consistent method of achieving consistent updates simplified the architecture.

MongoDB 3.4 requires Replica-Set based config servers, and removed the SCCC mode entirely. This might require some changes for some, but I think the benefits outweigh the cost. For more details on how to upgrade from SCCC to Replica-Set based config servers, see this article.

Note: the balancer in MongoDB 3.4 always runs on the config server that is the ‘PRIMARY’ of the replica set.

MongoDB 3.4: Parallel Balancing

As intended, MongoDB Sharded Clusters can get very big, with 10s, 100s or even 1000s of shards. Historically MongoDB’s balancer worked in serial, meaning it could only coordinate 1 x chunk balancing round at any given time within the cluster. On very large clusters, this limitation poses a huge throughput limitation on balancing: all chunk moves have to wait in a serial queue.

In MongoDB 3.4, the chunk balancer can now perform several chunk moves in parallel given they’re between a unique source and destination shard. Given shards: A, B, C and D, this means that a migration from A -> B can now happen at the same time as a migration from C -> D as they’re mutually exclusive source and destination shards. Of course, you need four or more shards to really see the benefit of this change.

Of course, on large clusters this change could introduce a significant change in network bandwidth usage. This is due to the ability for several balancing operations to occur at once. Be sure to test your network capacity with this change.

See more about this change here:


Of course, there were many other improvements to sharding and other areas in 3.4. We hope to cover more in the future. These are just some of my personal highlights.

For more information about what has changed in the new GA release, see: MongoDB 3.4 Release Notes. Also, please check out our beta release of Percona Server for MongoDB 3.4. This includes all the improvements in MongoDB 3.4 plus additional storage engines and features.



Percona Live Europe featured talk with Anthony Yeh — Launching Vitess: How to run YouTube’s MySQL sharding engine

Percona Live Europe featured talk

Percona Live Europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Anthony Yeh, Software Engineer, Google. His talk will be on Launching Vitess: How to run YouTube’s MySQL sharding engine. Vitess is YouTube’s solution for scaling MySQL horizontally through sharding, built as a general-purpose, open-source project. Now that Vitess 2.0 has reached general availability, they’re moving beyond “getting started” guides and working with users to develop and document best practices for launching Vitess in their own production environments.

I had a chance to speak with Anthony and learn a bit more about Vitess:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Anthony: Before joining YouTube as a software engineer, I worked on photonic integrated circuits as a graduate student researcher at U.C. Berkeley. So I guess you could say I took a rather circuitous path to the database field. My co-presenter Dan and I have that in common. If you see him at the conference, I recommend asking him about his story.

I don’t actually think of myself as being in database development though; that’s probably more Sugu‘s area. I treat Vitess as just another distributed system, and my job is to make it more automated, more reliable, and easier to administer. My favorite part of this job is when open-source contributors send us new features and plug-ins, and all I have to do is review them. Keep those pull requests coming!

Percona: Your talk is going to be on “Launching Vitess: How to run YouTube’s MySQL sharding engine.” How has Vitess moved from a YouTube fix to a viable enterprise data solution?

Anthony: I joined Vitess a little over two years ago, right when they decided to expand the team’s focus to include external usability as a key goal. The idea was to transform Vitess from a piece of YouTube infrastructure that happens to be open-source, into an open-source solution that YouTube happens to use.

At first, the biggest challenge was getting people to tell us what they needed to make Vitess work well in their environments. Attending Percona Live is a great way to keep a pulse on how the industry uses MySQL, and talk with exactly the people who can give us that feedback. Progress really picked up early this year when companies like Flipkart and Pixel Federation started not only trying out Vitess on their systems, but contributing back features, plug-ins, and connectors.

My half of the talk will summarize all the things we’ve learned from these early adopters about migrating to Vitess and running it in various environments. We also convinced one of our Site Reliability Engineers to give the second half of the talk, to share firsthand what it’s like to run Vitess in production.

Percona: What new features and fixes can people look forward to in the latest release?

Anthony: The biggest new feature in Vitess 2.0 is something that was codenamed “V3” (sorry about the naming confusion). In a nutshell, this completes the transition of all sharding logic from the app into Vitess: at first you had to give us a shard name, then you just had to tell us the sharding key value. Now you just send a regular query and we do the rest.

To make this possible, Vitess has to parse and analyze the query, for which it then builds a distributed execution plan. For queries served by a single shard, the plan collapses to a simple routing decision without extra processing. But for things like cross-shard joins, Vitess will generate new queries and combine results from multiple shards for you, in much the same way your app would otherwise do it.

Percona: Why is sharding beneficial to databases? Are there pros and cons to sharding?

Anthony: The main pro for sharding is horizontal scalability, the holy grail of distributed databases. It offers the promise of a magical knob that you simply turn up when you need more capacity. The biggest cons have usually been that it’s a lot of work to make your app handle sharding, and it multiplies the operational overhead as you add more and more database servers.

The goal of Vitess is to create a generalized solution to these problems, so we can all stop building one-off sharding layers within our apps, and replace a sea of management scripts with a holistic, self-healing distributed database.

Percona: Vitess is billed as being for web applications based in cloud and dedicated hardware infrastructures. Was it designed specifically for one or the other, and does it work better for certain environments?

Anthony: Vitess started out on dedicated YouTube hardware and later moved into Borg, which is Google’s internal precursor to Kubernetes. So we know from experience that it works in both types of environments. But like any distributed system, there are lots of benefits to running Vitess under some kind of cluster orchestration system. We provide sample configs to get you started on Kubernetes, but we would love to also have examples for other orchestration platforms like Mesos, Swarm, or Nomad, and we’d welcome contributions in this area.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Anthony: I hope to meet people who have ideas about how to make Vitess better, and I look forward to learning more about how others are solving similar problems.

You can read more about Anthony and Vitess on the Vitess blog.

Want to find out more about Anthony, Vitess, YouTube and and sharding? Register for Percona Live Europe 2016, and come see his talk Launching Vitess: How to run YouTube’s MySQL sharding engine.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!


MySQL Sharding with ProxySQL

MySQL Sharding with ProxySQL

MySQL Sharding with ProxySQLThis article demonstrates how MySQL sharding with ProxySQL works.

Recently a colleague of mine asked me to provide a simple example on how ProxySQL performs sharding.

In response, I’m writing this short tutorial in the hope it will illustrate ProxySQL’s sharding functionalities, and help people out there better understand how to use it.

ProxySQL is a very powerful platform that allows us to manipulate and manage our connections and queries in a simple but effective way. This article shows you how.

Before starting let’s clarify some basic concepts.

  • ProxySQL organizes its internal set of servers in Host Groups (HG), and each HG can be associated with users and Query Rules (QR)
  • Each QR can be final (apply = 1) or = let ProxySQL continue to parse other QRs
  • A QR can be a rewrite action, be a simple match, have a specific target HG, or be generic
  • QRs are defined using regex

You can see QRs as a sequence of filters and transformations that you can arrange as you like.

These simple basic rules give us enormous flexibility. They allow us to create very simple actions like a simple query re-write, or very complex chains that could see dozens of QR concatenated. Documentation can be found here.

The information related to HGs or QRs is easily accessible using the ProxySQL administrator interface, in the tables mysql_servers, mysql_query_rules and stats.stats_mysql_query_rules. The last one allows us to evaluate if and how the rule(s) is used.

With regards to sharding, what can ProxySQL do to help us achieve what we need (in a relatively easy way)? Some people/companies include sharding logic in the application, use multiple connections to reach the different targets, or have some logic to split the load across several schemas/tables. ProxySQL allows us to simplify the way connectivity and query distribution is supposed to work reading data in the query or accepting HINTS.

No matter what the requirements, the sharding exercise can be summarized in a few different categories.

  • By splitting the data inside the same container (like having a shard by State where each State is a schema)
  • By physical data location (this can have multiple MySQL servers in the same room, as well as having them geographically distributed)
  • A combination of the two, where I do split by State using a dedicated server, and again split by schema/table by whatever (say by gender)

In the following examples, I show how to use ProxySQL to cover the three different scenarios defined above (and a bit more).

The example below will report text from the Admin ProxySQL interface and the MySQL console. I will mark each one as follows:

  • Mc for MySQL console
  • Pa for ProxySQL Admin

Please note that the MySQL console MUST use the -c flag to pass the comments in the query. This is because the default behavior in the MySQL console is to remove the comments.

I am going to illustrate procedures that you can replicate on your laptop, and when possible I will mention a real implementation. This because I want you to directly test the ProxySQL functionalities.

For the example described below I have a PrxySQL v1.2.2 that is going to become the master in few days. You can download it from:

git clone
git checkout v1.2.2

Then to compile:

cd <path to proxy source code>
make install

If you need full instructions on how to install and configure ProxySQL, read here and here.

Finally, you need to have the WORLD test DB loaded. WORLD test DB can be found here.

Shard inside the same MySQL Server using three different schemas split by continent

Obviously, you can have any number of shards and relative schemas. What is relevant here is demonstrating how traffic gets redirected to different targets (schemas), maintaining the same structure (tables), by discriminating the target based on some relevant information in the data or pass by the application.

OK, let us roll the ball.

| Continent     | count(Code) |
| Asia          |          51 | <--
| Europe        |          46 | <--
| North America |          37 |
| Africa        |          58 | <--
| Oceania       |          28 |
| Antarctica    |           5 |
| South America |          14 |

For this exercise, I will use three hosts in replica.

To summarize, I will need:

  • Three hosts: 192.168.1.[5-6-7]
  • Three schemas: Continent X + world schema
  • One user : user_shardRW
  • Three hostgroups: 10, 20, 30 (for future use)

First, we will create the schemas Asia, Africa, Europe:

Create schema [Asia|Europe|North_America|Africa];
create table Asia.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='Asia' ;
create table Europe.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='Europe' ;
create table Africa.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='Africa' ;
create table North_America.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='North America' ;
create table Asia.Country as select * from  world.Country where Continent='Asia' ;
create table Europe.Country as select * from  world.Country where Continent='Europe' ;
create table Africa.Country as select * from  world.Country  where Continent='Africa' ;
create table North_America.Country as select * from  world.Country where Continent='North America' ;

Now, create the user

grant all on *.* to user_shardRW@'%' identified by 'test';

Now let us start to configure the ProxySQL:

insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('user_shardRW','test',1,10,'test_shard1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('',10,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('',20,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('',30,3306,100);

With this we have defined the user, the servers and the host groups.

Let us start to define the logic with the query rules:

delete from mysql_query_rules where rule_id > 30;
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (31,1,'user_shardRW',"^SELECT\s*(.*)\s*from\s*world.(\S*)\s(.*).*Continent='(\S*)'\s*(\s*.*)$","SELECT \1 from \4.\2 WHERE 1=1 \5",1);

I am now going to query the master (or a single node), but I am expecting ProxySQL to redirect the query to the right shard, catching the value of the continent:

 SELECT name,population from world.City  WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;
| name | population |
| Roma |    2643581 |

You can say: “Hey! You are querying the schema World, of course you get back the correct data.”

This is not what really happened. ProxySQL did not query the schema World, but the schema Europe.

Let’s look at the details:

select * from stats_mysql_query_digest;
Original    :SELECT name,population from world.City  WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;
Transformed :SELECT name,population from Europe.City WHERE ?=? and CountryCode=? order by population desc limit ?

Let me explain what happened.

Rule 31 in ProxySQL will take all the FIELDS we will pass in the query. It will catch the CONTINENT in the WHERE clause, it will take any condition after WHERE and it will reorganize the queries all using the RegEx.

Does this work for any table in the sharded schemas? Of course it does.

A query like:

SELECT name,population from world.Country WHERE Continent='Asia' ;

Will be transformed into:

SELECT name,population from Asia.Country WHERE ?=?

| name                 | population |
| Afghanistan          |   22720000 |
| United Arab Emirates |    2441000 |
| Armenia              |    3520000 |
<snip ...>
| Vietnam              |   79832000 |
| Yemen                |   18112000 |

Another possible approach is to instruct ProxySQL to shard is to pass a hint inside a comment. Let see how.

First let me disable the rule I just inserted. This is not really needed, but we’ll do it so you can see how. ?

mysql> update mysql_query_rules set active=0 where rule_id=31;
Query OK, 1 row affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)


Now what I want is for *ANY* query that contains the comment /* continent=X */ to go to the continent X schema, same server.

To do so, I instruct ProxySQL to replace any reference to the world schema inside the query I am going to submit.

delete from mysql_query_rules where rule_id in (31,33,34,35,36);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (31,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Asia\s*\*.*",null,0,23,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (32,1,'user_shardRW','world.','Asia.',0,23,23);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (33,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Europe\s*\*.*",null,0,25,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (34,1,'user_shardRW','world.','Europe.',0,25,25);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (35,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Africa\s*\*.*",null,0,24,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (36,1,'user_shardRW','world.','Africa.',0,24,24);

How does this work?

I have defined two concatenated rules. The first captures the incoming query containing the desired value (like continent = Asia). If the match is there, ProxySQL exits that action, but while doing so it will read the Apply field. If Apply is 0, it will read the FlagOUT value. At this point it will go to the first rule (in sequence) that has the value of FlagIN equal to the FlagOUT.

The second rule gets the request and will replace the value of world with the one I have defined. In short, it replaces whatever is in the match_pattern with the value that is in the replace_pattern.

Now ProxySQL implements the Re2 Google library for RegEx. Re2 is very fast but has some limitations, like it does NOT support (at the time of the writing) the flag option g. In other words, if I have a select with many tables, and as such several “world”, Re2 will replace ONLY the first instance.

As such, a query like:

Select /* continent=Europe */ * from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

Will be transformed into:

Select /* continent=Europe */ * from Europe.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

And fail.

The other day, Rene and I were discussing how to solve this given the lack of implementation in Re2. Finally, we opted for recursive actions.

What does this mean? It means that ProxySQL from v1.2.2 now has a new functionality that allows recursive calls to a Query Rule. The maximum number of iterations that ProxySQL can run is managed by the option (global variable) mysql-query_processor_iterations. Mysql-query_processor_iterations define how many operations a query process can execute as whole (from start to end).

This new implementation allows us to reference a Query Rule to itself to be executed multiple times.

If you go back you will notice that QR 34 has FlagIN and FlagOUT pointing to the same value of 25 and Apply =0. This brings ProxySQL to recursively call rule 34 until it changes ALL the values of the word world.

The result is the following:

Select /* continent=Europe */ Code, City.Name, City.population  from world.Country join world.City on world.City.CountryCode=world.Country.Code where City.population > 10000 group by Name order by City.Population desc limit 5;
| Code | Name          | population |
| RUS  | Moscow        |    8389200 |
| GBR  | London        |    7285000 |
| RUS  | St Petersburg |    4694000 |
| DEU  | Berlin        |    3386667 |
| ESP  | Madrid        |    2879052 |

You can see ProxySQL internal information using the following queries:

 select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
| 1      | 1    | 33      | NULL                | \S*\s*\/\*\s*continent=.*Europe\s*\*.* | NULL            | NULL      | 0     | 0      | 25      |
| 1      | 4    | 34      | NULL                | world.                                 | Europe.         | NULL      | 0     | 25     | 25      |
| 1      | 0    | 35      | NULL                | \S*\s*\/\*\s*continent=.*Africa\s*\*.* | NULL            | NULL      | 0     | 0      | 24      |
| 1      | 0    | 36      | NULL                | world.                                 | Africa.         | NULL      | 0     | 24     | 24      |


select * from stats_mysql_query_digest;
<snip and taking only digest_text>
Select Code, City.Name, City.population from Europe.Country join Europe.City on Europe.City.CountryCode=Europe.Country.Code where City.population > ? group by Name order by City.Population desc limit ?

As you can see ProxySQL has nicely replaced the word world in the query to Europe, and it ran Query Rule 34 four times (hits).

This is obviously working for Insert/Update/Delete as well.

Queries like:

insert into  /* continent=Europe */  world.City values(999999,'AAAAAAA','ITA','ROMA',0) ;

Will be transformed into:

select digest_text from stats_mysql_query_digest;
| digest_text                               |
| insert into Europe.City values(?,?,?,?,?) |

And executed only on the desired schema.

Sharding by host

Using hint

How can I shard and redirect the queries to a host (instead of a schema)? This is even easier! ?

The main point is that whatever matches the rule should go to a defined HG. No rewrite imply, which means less work.

So how this is done? As before, I have three NODES: 192.168.1.[5-6-7]. For this example, I will use world DB (no continent schema), distributed in each node, and I will retrieve the node bound to the IP to be sure I am going to the right place.

I instruct ProxySQL to send my query by using a HINT to a specific host. I choose the hint “shard_host_HG”, and I am going to inject it in the query as a comment.

As such the Query Rules will be:

delete from mysql_query_rules where rule_id in (40,41,42, 10,11,12);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (10,1,'user_shardRW',"\/\*\s*shard_host_HG=.*Europe\s*\*.",10,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (11,1,'user_shardRW',"\/\*\s*shard_host_HG=.*Asia\s*\*.",20,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (12,1,'user_shardRW',"\/\*\s*shard_host_HG=.*Africa\s*\*.",30,0);

While the queries I am going to test are:

Select /* shard_host_HG=Europe */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' limit 5; SELECT * /* shard_host_HG=Europe */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
Select /* shard_host_HG=Asia */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='IND' limit 5; SELECT * /* shard_host_HG=Asia */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
Select /* shard_host_HG=Africa */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ETH' limit 5; SELECT * /* shard_host_HG=Africa */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';

Running the query for Africa, I will get:

Select /* shard_host_HG=Africa */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ETH' limit 5; SELECT * /* shard_host_HG=Africa */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
| Name        | Population |
| Addis Abeba |    2495000 |
| Dire Dawa   |     164851 |
| Nazret      |     127842 |
| Gonder      |     112249 |
| Dese        |      97314 |
| BIND_ADDRESS  |    |

That will give me:

select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
| 1      | 0    | 40      | NULL                | \/\*\s*shard_host_HG=.*Europe\s*\*.    | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 0    | 41      | NULL                | \/\*\s*shard_host_HG=.*Asia\s*\*.      | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 2    | 42      | NULL                | \/\*\s*shard_host_HG=.*Africa\s*\*.    | NULL            | NULL      | 0     | 0      | 0       | <-- Note the HITS (2 as the run queries)

In this example, we have NO replace_patter. This is only a matching and redirecting Rule, where the destination HG is defined in the value of the destination_hostgroup attribute while inserting. In the case for Africa, it is HG 30.

The server in HG 30 is:

select hostgroup_id,hostname,port,status from mysql_servers ;
| hostgroup_id | hostname    | port | status |
| 10           | | 3306 | ONLINE |
| 20           | | 3306 | ONLINE |
| 30           | | 3306 | ONLINE | <---

This perfectly matches our returned value.

You can try by your own the other two continents.

Using destination_hostgroup

Another way to assign a query’s final host is to use the the destination_hostgroup, set the Schema_name attribute and use the use schema syntax in the query.

For example:

INSERT INTO mysql_query_rules (active,schemaname,destination_hostgroup,apply) VALUES
(1, 'shard00', 1, 1), (1, 'shard01', 1, 1), (1, 'shard03', 1, 1),
(1, 'shard04', 2, 1), (1, 'shard06', 2, 1), (1, 'shard06', 2, 1),
(1, 'shard07', 3, 1), (1, 'shard08', 3, 1), (1, 'shard09', 3, 1);

And then in the query do something like :

use shard02; Select * from tablex;

I mention this method because it is currently one of the most common in large companies using SHARDING.

But it is not safe, because it relays on the fact the query will be executed in the desired HG. The risk of error is high.

Just think if a query doing join against a specified SHARD:

use shard01; Select * from tablex join shard03 on =;

This will probably generate an error because shard03 is probably NOT present on the host containing shard01.

As such this approach can be used ONLY when you are 100% sure of what you are doing, and when you are sure NO query will have an explicit schema declaration.

Shard by host and by schema

Finally, is obviously possible to combine the two approaches and shard by host with only a subset of schemas.

To do so, let’s use all three nodes and have the schema distribution as follow:

  • Europe on Server -> HG 10
  • Asia on Server -> HG 20
  • Africa on Server -> HG 30

I have already set the query rules using HINT, so what I have to do is to use them BOTH to combine the operations:

Select /* shard_host_HG=Asia */ /* continent=Asia */  City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='IND' limit 5; SELECT * /* shard_host_HG=Asia */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
| Name               | Population |
| Mumbai (Bombay)    |   10500000 |
| Delhi              |    7206704 |
| Calcutta [Kolkata] |    4399819 |
| Chennai (Madras)   |    3841396 |
| Hyderabad          |    2964638 |
5 rows in set (0.00 sec)
| BIND_ADDRESS  |    |
1 row in set (0.01 sec)

mysql> select digest_text from stats_mysql_query_digest;
| digest_text                                                                                                                                |
| SELECT * from information_schema.GLOBAL_VARIABLES where variable_name like ?                                                               |
| Select City.Name, City.Population from Asia.Country join Asia.City on Asia.City.CountryCode=Asia.Country.Code where Country.code=? limit ? |
2 rows in set (0.00 sec)
mysql> select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
| 1      | 0    | 10      | NULL                | \/\*\s*shard_host_HG=.*Europe\s*\*.    | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 2    | 11      | NULL                | \/\*\s*shard_host_HG=.*Asia\s*\*.      | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 0    | 12      | NULL                | \/\*\s*shard_host_HG=.*Africa\s*\*.    | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 0    | 13      | NULL                | NULL                                   | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 1    | 31      | NULL                | \S*\s*\/\*\s*continent=.*Asia\s*\*.*   | NULL            | NULL      | 0     | 0      | 23      |
| 1      | 4    | 32      | NULL                | world.                                 | Asia.           | NULL      | 0     | 23     | 23      |
| 1      | 0    | 33      | NULL                | \S*\s*\/\*\s*continent=.*Europe\s*\*.* | NULL            | NULL      | 0     | 0      | 25      |
| 1      | 0    | 34      | NULL                | world.                                 | Europe.         | NULL      | 0     | 25     | 25      |
| 1      | 0    | 35      | NULL                | \S*\s*\/\*\s*continent=.*Africa\s*\*.* | NULL            | NULL      | 0     | 0      | 24      |
| 1      | 0    | 36      | NULL                | world.                                 | Africa.         | NULL      | 0     | 24     | 24      |

As you can see rule 11 has two HITS, which means my queries will go to the associated HG. But given that the Apply for rule 11 is =0, ProxySQL will first continue to process the Query Rules.

As such it will also transform the queries for rules 31 and 32, each one having the expected number of hits (one the first and the second four because of the loop).

That was all we had to do to perform a perfect two layer sharding in ProxySQL.


ProxySQL allows the user to access data distributed by shards in a very simple way. The query rules that follow the consolidate RegEx pattern, in conjunction with the possibility to concatenate rules and the Host Group approach definition give us huge flexibility with relative simplicity.



It is obvious that I need to acknowledge and kudos the work Rene’ Cannao is doing to make ProxySQL a solid, fast and flexible product. I also should mention that I work with him very often (more often than he likes), asking him for fixes and discussing optimization strategies. Requests that he satisfies with surprising speed and efficiency.


Webinar Tuesday, May 24: Understanding how your MongoDB schema affects scaling, and when to consider sharding for help


Please join David Murphy on Tuesday, May 24 at 10 am PDT (UTC-7) as he presents “Understanding how your MongoDB schema affects scaling, and when to consider sharding for help.”

David will discuss the pros and cons of a few MongoDB schema design patterns on a stand-alone machine, and then will look at how sharding affects them.  He’ll examine what assumptions did you make that could cause havoc on your CPU, memory and network during a scatter gather.   This webinar will help answer the questions:

  • Would you still use the same schema if you knew you were going to shard?
  • Are your fetches using the same shard, or employing parallelism to boost performance?
  • Are you following the golden rules of schema design?

Register for this webinar here.

MongoDB Schema ShardingDavid Murphy, MongoDB Practice Manager

David joined Percona in October 2015 as Practice Manager for MongoDB. Before that, David joined the ObjectRocket by Rackspace team as the Lead DBA in Sept 2013. With the growth involved with any recently acquired startup, David’s role covered a wide range of evangelism, research, run book development, knowledge base design, consulting, technical account management, mentoring and much more. Before the world of MongoDB, David was a MySQL and NoSQL architect at Electronic Arts working with some of the largest titles in the world like FIFA, SimCity, and Battle Field providing tuning, design, and technology choice responsibilities. David maintains an active interest in database speaking and exploring new technologies.

Powered by WordPress | Theme: Aeros 2.0 by