Aug
22
2017
--

The MySQL High Availability Landscape in 2017 (The Adults)

In this blog post, we’ll look at some of the MySQL high availability solution options.

In the previous post of this series, we looked at the MySQL high availability (HA) solutions that have been around for a long time. I called these solutions “the elders.” Some of these solutions (like replication) are heavily used today and have been improved from release to release of MySQL.

This post focuses on the MySQL high availability solutions that have appeared over the last five years and gained a fair amount of traction in the community. I chose to include this group only two solutions: Galera and RDS Aurora. I’ll use the term “Galera” generically: it covers Galera Cluster, MariaDB Cluster and Percona XtraDB Cluster. I debated for some time whether or not to include Aurora. I don’t like the fact that they use closed source code. Given the tight integration with the AWS environment, what is the commercial risk of opening the source code? That question evades me, but I am not on the business side of technology. ?

Galera

When I say “Galera,” it means a replication protocol supported by a library provided by Codeship, a Finnish company. The library needs hooks inside the MySQL source code for the replication protocol to work. In Percona XtraDB cluster, I counted 66 .cc files where the word “wsrep” is present. As you can see, it is not a small task to add support for Galera to MySQL. Not all the implementations are similar. Percona, for example, focused more on stability and usability at the expense of new features.

high availability

Let’s start with a quick description of a Galera-based cluster. Unless you don’t care about split-brain, a Galera cluster needs at least three nodes. The Galera replication protocol is nearly synchronous, which is a huge gain compared to regular MySQL replication. It performs transactions almost simultaneously on all the nodes, and the protocol ensures the same commit order. The transactions are almost synchronous because there are incoming queues on each node to improve performance. The presence of these incoming queues forces an extra step: the certification. The certification compares an incoming transaction with the ones already queued. If there a conflict, it returns a deadlock error.

For performance reasons, the certification process must be quick so that the incoming queue stays in memory. Since the number of transactions defines the size of the queue, the presence of large transactions uses a lot of memory. There are safeguards against memory overload, so be aware that transactions like:

update ATableWithMillionsRows set colA=1;

will likely fail. That’s the first important limitation of a Galera-based cluster: the size of the number transactions is limited.

It is also critical to uniquely identify conflicting rows. The best way to achieve an efficient row comparison is to make sure all the tables have a primary key. In a Galera-based cluster, your tables need primary keys otherwise you’ll run into trouble. That’s the second limitation of a Galera based cluster: the need for primary keys. Personally, I think that a table should always have a primary key – but I have seen many oddities…

Another design characteristic is the need for an acknowledgment by all the nodes when a transaction commits. That means the network link with the largest latency between two nodes will set the floor value of the transactional latency. It is an important factor to consider when deploying a Galera-based cluster over a WAN. Similarly, an overloaded node can slow down the cluster if it cannot acknowledge the transaction in a timely manner. In most cases, adding slave threads will allow you to overcome the throughput limitations imposed by the network latency. Each transaction will suffer from latency, but more of them will be able to run at the same time and maintain the throughput.

The exception here is when there are “hot rows.” A hot row is a row that is hammered by updates all the time. The transactions affecting hot rows cannot be executed in parallel and are thus limited by the network latency.

Since Galera-based clusters are very popular, they must also have some good points. The first and most obvious is the full-durability support. Even if the node on which you executed a transaction crashes a fraction of second after the commit statement returned, the data is present on the other nodes incoming queues. In my opinion, it is the main reason for the demise of the share storage solution. Before Galera-based clusters, the shared storage solution was the only other solution guaranteeing no data loss in case of a crash.

While the standby node is unusable with the shared storage solution, all the nodes of a Galera-based cluster are available and are almost in sync. All the nodes can be used for reads without stressing too much about replication lag. If you accept a higher risk of deadlock errors, you can even write on all nodes.

Finally, and not the least, there is an automatic provisioning service for the new nodes called SST. During the SST process, a joiner node asks the cluster for a donor. One of the existing nodes agrees to be the donor and initiate a full backup. The backup is streamed over the network to the joiner and restored there. When the backup completes, the joiner performs an IST to get the recent updates and, once applied, joins the cluster. The most common SST method uses the Percona XtraBackup utility. When using SST for XtraBackup, the cluster is fully available during the SST, although it may degrade performance. This feature really simplifies the operational side of things.

The technology is very popular. Of course, I am a bit biased since I work for Percona and one of our flagship products is Percona XtraDB Cluster – an implementation of the Galera protocol. Other than standard MySQL replication, it is by far the most common HA solution used by the customers I work with.

RDS Aurora

The second “adult” MySQL high availability solution is RDS Aurora. I hesitated to add Aurora here, mainly because it is not an open-source technology. I must also admit that I haven’t followed the latest developments around Aurora very closely. So, let’s describe Aurora.

There are three major parts in Aurora: at least one database server, the writer node and the storage.

high availability

What makes Aurora special is the storage layer has its own processing logic. I don’t know if the processing logic is part of the writer node AWS instance or part of the storage service directly, since the source code is not available. Anyway, I’ll call that layer the appliers. The applier role is to apply redo log fragments that then allow the writer node to write only the redo log fragments (normally written to the InnoDB log files). The appliers read those fragments and modify the pages in the storage. If a node requests a page that has pending redo fragments to be applied, they get applied before returning the page.

From the writer node perspective, there are much fewer writes. There is also no direct upper bound in terms of a number of fragments to be queued, so it is a bit like having

innodb_log_file_size

 set to an extremely large value. Also, since Aurora doesn’t need to flush pages, if the write node needs to read from the storage, and there are no free pages in the buffer pool, it can just discard one even if it is “dirty.” Actually, there are no dirty pages in the buffer pool.

So far, that seems to be very good for high write loads with spikes. What about the reader nodes? These reader nodes receive the updates from the writer nodes. If the update concerns a page they have in their buffer pool, they can modify it in place or discard it and read again from the storage. Again, without the source code, it is hard to tell the implementation. The point is, the readers have the same data as the master, they just can’t lag behind.

Apart from the impossibility of any reader lag, the other good point of Aurora is the storage. InnoDB pages are saved as objects in an object store, not like in a regular file on a file system. That means you don’t need to over-provision your storage ahead of time. You pay for what you are using – actually the maximum you ever use. InnoDB tablespaces do not shrink, even with Aurora.

Furthermore, if you have a 5TB dataset and your workload is such that you would need ten servers (one writer and nine readers), you still need only 5TB of storage if you are not replicating to other AZ. If we compare with regular MySQL and replication, you would have one master and nine slaves, each with 5TB of storage, for a total of 50TB. On top of that, you’ll have at least ten times the write IOPS.

So, storage wise, we have something that could be very interesting for applications with large datasets and highly concurrent read heavy workloads. You ensure high availability with the ability to promote a reader to writer automatically. You access the primary or the readers through endpoints that automatically connect to the correct instances. Finally, you can replicate the storage to multiple availability zones for DR.

Of course, such an architecture comes with a trade-off. If you experiment with Aurora, you’ll quickly discover that the smallest instance types underperform while the largest ones perform in a more expected manner. It is also quite easy to overload the appliers. Just perform the following queries:

update ATableWithMillionsRows set colA=1;
select count(*) from ATableWithMillionsRows where colA=1;

given that the table ATableWithMillionsRows is larger than the buffer pool. The select will hang for a long time because the appliers are overloaded by the number of pages to update.

In term of adoption, we have some customers at Percona using Aurora, but not that many. It could be that users of Aurora do not naturally go to Percona for services and support. I also wonder about the decision to keep the source code closed. It is certainly not a positive marketing factor in a community like the MySQL community. Since the Aurora technology seems extremely bounded to their ecosystem, is there really a risk for the technology to be reused by a competitor? With a better understanding of the technology through open access to the source, Amazon could have received valuable contributions. It would also be much easier to understand, tune and recommend Aurora.

Further reading:

Jul
20
2017
--

Where Do I Put ProxySQL?

ProxySQL

In this blog post, we’ll look at how to deploy ProxySQL.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. It was created for DBAs by René Cannaò, as a means of solving complex replication topology issues. When bringing up ProxySQL with my clients, I always get questions about where it fits into the architecture. This post should clarify that.

Before continuing, you might want to know why you should use this software. The features that are of interest include:

  • MySQL firewall
  • Connection pooling
  • Shard lookup and automated routing
  • Ability to read/write split
  • Automatically switch to another master in case of active master failure
  • Query cache
  • Performance metrics
  • Other neat features!

Initial Configuration

In general, you install it on nodes that do not have a running MySQL database. You manage it via the MySQL command line on another port, usually 6032. Once it is started the configuration in /etc is not used, and you do everything within the CLI. The backend database is actually SQLite, and the db file is stored in /var/lib/proxysql.

There are many guides out there on initializing and installing it, so I won’t cover those details here. It can be as simple as:

apt-get install proxysql

ProxySQL Architecture

While most first think to install ProxySQL on a standalone node between the application and database, this has the potential to affect query performance due to the additional latency from network hops.

 

ProxySQL

To have minimal impact on performance (and avoid the additional network hop), many recommend installing ProxySQL on the application servers. The application then connects to ProxySQL (acting as a MySQL server) on localhost, using Unix Domain Socket, and avoiding extra latency. It would then use its routing rules to reach out and talk to the actual MySQL servers with its own connection pooling. The application doesn’t have any idea what happens beyond its connection to ProxySQL.

ProxySQL

Reducing Your Network Attack Surface

Another consideration is reducing your network attack surface. This means attempting to control all of the possible vulnerabilities in your network’s hardware and software that are accessible to unauthenticated users.

Percona generally suggests that you put a ProxySQL instance on each application host, like in the second image above. This suggestion is certainly valid for reducing latency in your database environment (by limiting network jumps). But while this is good for performance, it can be bad for security.

Every instance must be able to talk to:

  • Every master
  • Every slave

As you can imagine, this is a security nightmare. With every instance, you have x many more connections spanning your network. That’s x many more connections an attacker might exploit.

Instead, it can be better to have one or more ProxySQL instances that are between your application and MySQL servers (like the first image above). This provides a reasonable DMZ-type setup that prevents opening too many connections across the network.

That said, both architectures are valid production configurations – depending on your requirements.

Jun
23
2017
--

Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication High Availability Webinar: Q & A

High Availability Webinar

High Availability WebinarThank you for attending the Wednesday, June 21, 2017 high availability webinar titled Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication. In this blog, I will provide answers to the Q & A for that webinar.

You can find the slides and a recording of the webinar here.

Is there a minimum MySQL server version for Group Replication?

MySQL Group Replication is GA since MySQL Community 5.7.17. This is the lowest version that you should use for the Group Replication feature. Otherwise, you are using a beta version.

Since 5.7.17 was the GA release, it’s strongly recommended you use the latest 5.7 minor release. Bugs get fixed and features added in each of the minor releases (as can be seen in the Limitations section in the slide deck).

In MySQL 5.6 and earlier versions, Group Replication is not supported. Note that Percona Server for MySQL 5.7.17 and beyond also ships with Group Replication.

Can I use Percona XtraDB Cluster with MariaDB v10.2? or must I use Percona Server for MySQL?

Percona XtraDB Cluster is Percona Server for MySQL and Percona XtraBackup with the modified Galera library. You cannot run Percona XtraDB Cluster on MariaDB.

However, as Percona XtraDB Cluster is open source, it is possible that MariaDB/Codership implements our modifications into their codebase.

If Percona XtraDB Cluster does not allow InnoDB tables, how do we typically deal with applications that need to use MyISAM tables?

You cannot use MyISAM with Percona XtraDB Cluster, Galera or Group Replication. However, there is experimental MyISAM support in Galera/Percona XtraDB Cluster. But we strongly recommend that you don’t use this in production. It effectively executes all statements in Total Order Isolation, which results in bad performance.

What is a typical business use case for the Group Replication? I specifically like the writes order feature.

Typical use cases are:

  • Environments with strict **durability** requirements
  • Write to multiple nodes simultaneously while keeping data **consistent**
  • Reducing failover time
  • Using other nodes for read-scaling, where reading stale data is more difficult for the application (as opposed to standard asynchronous replication)

The use cases for Galera and Percona XtraDB Cluster are similar.

Where do you run ProxySQL, on a separate server? We are using HAProxy.

You can deploy ProxySQL in many different ways. One common method of installation is to run ProxySQL on a separate layer of servers (ensuring there is failover on this layer). Another commonly used method is to run a ProxySQL daemon on every application server.

Do you support KVM?

Yes, there are no limitations on virtualization solutions.

Can you give some examples of an “arbitrator”?

Some useful links:

What does Percona XtraDB add to make it more performant than InnoDB?

The scalability and performance improvement of Percona XtraDB are listed on the Percona Server for MySQL documentation page: https://www.percona.com/doc/percona-server/LATEST/index.html

How scalable is Percona XtraDB Cluster storage wise? Do we have any limitations?

Storage happens through the storage engine (which is InnoDB). Percona XtraDB Cluster does not have any different limitations than Percona Server for MySQL or MySQL.

However, we need to also consider the practical side of things: the larger the cluster gets, the longer certain operations take. For example, when adding a new node to the cluster another node must be the donor and provide all the data. This will take substantially longer with larger datasets. Certain operational aspects might therefore become more complex.

Is there any development to add multiple nodes simultaneously?

No, at the moment only one node can join the cluster at the same time. Other nodes automatically wait until it is finished before joining.

Why does Galera say we cannot use READ COMMITTED isolation for multimaster mode, even though we can start the cluster with READ-COMMITTED?

You can use READ-COMMITTED as transaction isolation level. The limitation is that you cannot use SERIALIZABLE: http://galeracluster.com/documentation-webpages/isolationlevels.html.

Galera Cluster and MariaDB currently do not prevent a user from using this transaction isolation level. Percona XtraDB Cluster implemented the strict mode to prevent these operations: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html#explicit-table-locking

MariaDB 10.2 fixed the check constraints issue, When will Percona fix this issue?

There are currently no plans to support CHECK constraints in Percona Server for MySQL (and therefore Percona XtraDB Cluster as well).

As Percona Server is effectively a fully backwards-compatible (but modified) MySQL Community Server, CHECK constraints is a feature that normally would be implemented in MySQL Community first.

Can you share your performance benchmark git repository (if you have one)?

We don’t have a performance benchmark in git repository. You can get detailed information about this benchmark in this blog: Performance improvements in Percona XtraDB Cluster 5.7.17-29.20.

On your slide pointing to scalability charts, how many nodes did you run your test against?

We used a three-node cluster for this performance benchmark.

The product is using Master-Master replication. As such what do you mean when you talk about failover in such configuration?
Where do you maintain the cluster state?

All technologies automatically maintain the cluster state as you add and remove nodes.

What are the network/IP requirements for Proxy SQL?

There are no specific requirements. More documentation about ProxySQL can be found here: https://github.com/sysown/proxysql/wiki.

Jun
20
2017
--

The MySQL High Availability Landscape in 2017 (The Elders)

High Availability

In this blog, we’ll look at different MySQL high availability options.

The dynamic MySQL ecosystem is rapidly evolving many technologies built around MySQL. This is especially true for the technologies involved with the high availability (HA) aspects of MySQL. When I joined Percona back in 2009, some of these HA technologies were very popular – but have since been almost forgotten. During the same interval, new technologies have emerged. In order to give some perspective to the reader, and hopefully help to make better choices, I’ll review the MySQL HA landscape as it is in 2017. This review will be in three parts. The first part (this post) will cover the technologies that have been around for a long time: the elders. The second part will focus on the technologies that are very popular today: the adults. Finally, the last part will try to extrapolate which technologies could become popular in the upcoming years: the babies.

Quick disclaimer, I am reporting on the technologies I see the most. There are likely many other solutions not covered here, but I can’t talk about technologies I have barely or never used. Apart from the RDS-related technologies, all the technologies covered are open-source. The target audience for this post are people relatively new to MySQL.

The Elders

Let’s define the technologies in the elders group. These are technologies that anyone involved with MySQL for last ten years is sure to be aware of. I could have called this group the “classics”.  I include the following technologies in this group:

  • Replication
  • Shared storage
  • NDB cluster

Let’s review these technologies in the following sections.

Replication

Simple replication topology

 

MySQL replication is very well known. It is one of the main features behind the wide adoption of MySQL. Replication gets used almost everywhere. The reasons for that are numerous:

  • Replication is simple to setup. There are tons of how-to guides and scripts available to add a slave to a MySQL server. With Amazon RDS, adding a slave is just a few clicks.
  • Slaves allow you to easily scale reads. The slaves are accessible and can be used for reads. This is the most common way of scaling up a MySQL database.
  • Slaves have little impact on the master. Apart from the added network traffic, the presence of slaves does not impact the master performance significantly.
  • It is well known. No surprises here.
  • Used for failover. Your master died, promote a slave and use it as your new master.
  • Used for backups. You don’t want to overload your master with the backups, run them off a slave.

Of course, replication also has some issues:

  • Replication can lag. Replication used to be single-threaded. That means a master with a concurrent load could easily outpace a slave. MySQL 5.6 and MariaDB 10.0 have introduced some parallelism to the slave. Newer versions have further improved to a point where today’s slaves are many times faster than they were.
  • Slaves can diverge. When you modify data on the master, the slave must perform the exact same update. That seems easy, but there are many ways an update can be non-deterministic with statement-based replication. They fixed many issues, and the introduction of row-based replication has been another big step forward. Still, if you write directly to a slave you are asking for trouble. There is a read_only setting, but if the MySQL user has the “SUPER” privilege it is just ignored. That’s why there is now the “super_read_only” setting. Tools like pt-table-checksum and pt-table-sync from the Percona toolkit exist to solve this problem.
  • Replication can impact the master. I wrote above that the presence of slaves does not affect the master, but logging changes are more problematic. The most common issue is the InnoDB table-level locking for auto_increment values with statement-based replication. Only one thread can insert new rows at a time. You can avoid this issue with row-based replication and properly configuring settings.
  • Data gets lost. Replication is asynchronous. That means the master will reply “done” after a commit statement even though the slaves have not received updates yet. Some transactions can get lost if the master crashes.

Although an old technology, a lot of work has been done on replication. It is miles away from the replication implementation of 5.0.x. Here’s a list, likely incomplete, of the evolution of replication:

  • Row based replication (since 5.1). The binary internal representation of the rows is sent instead of the SQL statements. This makes replication more robust against slave divergence.
  • Global transaction ID (since 5.6). Transactions are uniquely identified. Replication can be setup without knowing the binlog file and offset.
  • Checksum (since 5.6). Binlog events have checksum values to validate their integrity.
  • Semi-sync replication (since 5.5). An addition to the replication protocol to make the master aware of the reception of events by the slaves. This helps to avoid losing data when a master crashes.
  • Multi-source replication (since 5.7). Allows a slave to have more than one master.
  • Multi-threaded replication (since 5.6). Allows a slave to use multiple threads. This helps to limit the slave lag.

Managing replication is a tedious job. The community has written many tools to manage replication:

  • MMM. An old Perl tool that used to be quite popular, but had many issues. Now rarely used.
  • MHA. The most popular tool to manage replication. It excels at reconfiguring replication without losing data, and does a decent at handling failover.  It is also simple. No wonder it is popular.
  • PRM. A Pacemaker-based solution developed to replace MMM. It’s quite good at failover, but not as good as MHA at reconfiguring replication. It’s also quite complex, thanks to Pacemaker. Not used much.
  • Orchestrator. The new cool tool. It can manage complex topologies and has a nice web-based interface to monitor and control the topology.

 

Shared Storage

Simple shared storage topology

 

Back when I was working for MySQL ten years ago, shared storage HA setups were very common. A shared storage HA cluster uses one copy of the database files between one of two servers. One server is active, the other one is passive. In order to be shared, the database files reside on a device that can be mounted by both servers. The device can be physical (like a SAN), or logical (like a Linux DRBD device). On top of that, you need a cluster manager (like Pacemaker) to handle the resources and failovers. This solution is very popular because it allows for failover without losing any transactions.

The main drawback of this setup is the need for an idle standby server. The standby server cannot have any other assigned duties since it must always be ready to take over the MySQL server. A shared storage solution is also obviously not resilient to file-level corruption (but that situation is exceptional). Finally, it doesn’t play well with a cloud-based environment.

Today, newly-deployed shared storage HA setups are rare. The only ones I encountered over the last year were either old implementations needing support, or new setups that deployed because of existing corporate technology stacks. That should tell you about the technology’s loss of popularity.

NDB Cluster

A simple NDB Cluster topology

 

An NDB Cluster is a distributed clustering solution that has been around for a long time. I personally started working with this technology back in 2008. An NDB Cluster has three types of nodes: SQL, management and data. A full HA cluster requires a minimum of four nodes.

An NDB Cluster is not a general purpose database due to its distributed nature. For suitable workloads, it is extraordinary good. For unsuitable workloads, it is miserable. A suitable workload for an NDB Cluster contains high concurrency, with a high rate of small primary key oriented transactions. Reaching one million trx/s on an NDB Cluster is nothing exceptional.

At the other end of the spectrum, a poor workload for an NDB Cluster is a single-threaded report query on a star-like schema. I have seen some extreme cases where just the network time of a reporting query amounted to more than 20 minutes.

Although NDB Clusters have improved, and are still improving, their usage has been pushed toward niche-type applications. Overall, the technology is losing ground and is now mostly used for Telcos and online gaming applications.

Apr
22
2017
--

Better Than Linear Scaling

Scalability

In this blog, we’ll look at how to achieve better-than-linear scaling.

Scalability is the capability of a system, network or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. For example, we consider a system scalable if it is capable of increasing its total output under an increased load when resources (typically hardware) are added: https://en.wikipedia.org/wiki/Scalability.

It is often accepted as a fact that systems (in particular databases) can’t scale better than linearly. By this I mean when you double resources, the expected performance doubles, at best (and often is less than doubled).  

We can attribute this assumption to Amdahl’s law (https://en.wikipedia.org/wiki/Amdahl%27s_law), and later to the Universal Scalability Law (http://www.perfdynamics.com/Manifesto/USLscalability.html). Both these laws prescribe that it is impossible to achieve better than linear scalability. To be totally precise, this is practically correct for single server systems when the added resources are only CPU units.

Multi-nodes systems

However, I think databases systems no longer should be seen as single server systems. MongoDB and Cassandra for a long time have had multi-node auto-sharding capabilities. We are about to see the rise of strongly-consistent SQL based multi-node systems. And even MySQL is frequently deployed with manual sharding on multi-nodes.

The products like Vitess (http://vitess.io/) proposes auto-sharding for MySQL, and with ProxySQL (which I will use in my experiment) you can setup a basic sharding schema.

I describe multi-nodes setups, because in this environment it is possible to achieve much better than linear scalability. I will show this below.

Why is this important?

Understanding scalability of multi-node systems is important for resource planning, and understanding how much of a potential performance gain we can expect when we add more nodes. This is especially interesting for cloud deployments.

How is it possible?

I’ve written about how the size of available memory (cache) affects the performance. When we add additional nodes to the deployment, effectively we increase not only CPU cores, but also the memory that comes with the node (and we are adding extra IO capacity). So, with increasing node counts, we also increase available memory (and cache). As we can see from these graphs, the effect of extra memory could be non-linear (and actually better than linear). Playing on this fact, we can achieve better-than-linear scaling in a sharded setup. I am going to show the experimental setup of how to achieve this.

Experimental setup

To show the sharded setup we will use ProxySQL in front of N MySQL servers (shards). We also will use sysbench with 60 tables (4 million rows each, uniform distribution).

  • For one shard, this shard contains all 60 tables
  • For two shards, each shard contains 30 tables each
  • For three shards, each shard contains 20 tables each
  • For six shards, each shard contains ten tables each

So schematically, it looks like this:

One shard:

Scaling

Two shards:

Scaling

Six shards:

Scaling

We want to measure how the performance (for both throughput and latency) changes when we go from 1 to 2, to 3, to 4, to 5 and to 6 shards.

For the single shard, I used a Google Cloud instance with eight virtual CPUs and 16GB of RAM, where 10GB is allocated for the innodb_buffer_pool_size.

The database size (for all 60 tables) is about 51GB for the data, and 7GB for indexes.

For this we will use a sysbench read-only uniform workload, and ProxySQL helps to perform query routing. We will use ProxySQL query rules, and set sharding as:

mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "DELETE FROM mysql_query_rules"
shards=$1
for i in {1..60}
do
hg=$(( $i % $shards + 1))
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES ($i,1,'root','sbtest$is',$hg,1);"
done
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "LOAD MYSQL QUERY RULES TO RUNTIME;"

Command line for sysbench 1.0.4:
sysbench oltp_read_only.lua --mysql-socket=/tmp/proxysql.sock --mysql-user=root --mysql-password=test --tables=60 --table-size=4000000 --threads=60 --report-interval=10 --time=900 --rand-type=pareto run

The results

Nodes Throughput Speedup vs. 1 node Latency, ms
1 245 1.00 244.88
2 682 2.78 87.95
3 1659 6.77 36.16
4 2748 11.22 21.83
5 3384 13.81 17.72
6 3514 14.34 17.07

Scaling
As we can see, the performance improves by a factor much better than just linearly.

With five nodes, the improvement is 13.81 times compared to the single node.

The 6th node does not add much benefit, as at this time data practically fits into memory (with five nodes, the total cache size is 50GB compared to the 51GB data size)

Factors that affects multi-node scaling

How can we model/predict the performance gain? There are multiple factors to take into account: the size of the active working set, the available memory size and (also importantly) the distribution of the access to the working set (with uniform distribution being the best case scenario, and with access to the one with only one row being the opposite corner-case, where speedup is impossible). Also we need to keep network speed in mind: if we come close to using all available network bandwidth, it will be impossible to get significant improvement.

Conclusion

In multi-node, auto-scaling, auto-sharding distributed systems, the traditional scalability models do not provide much help. We need to have a better framework to understand how multiple nodes affect performance.

Jan
26
2017
--

Percona Live Featured Tutorial with Frédéric Descamps — MySQL InnoDB Cluster & Group Replication in a Nutshell: Hands-On Tutorial

Percona Live Featured Tutorial

Percona Live Featured TutorialWelcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Frédéric Descamps, MySQL Community Manager at Oracle. Frédéric is probably better known in the community as “LeFred” (Twitter: @lefred)! His tutorial is MySQL InnoDB Cluster and Group Replication in a Nutshell: Hands-On Tutorial (along with  Part 2). Frédéric is delivering this talk with Percona’s MySQL Practice Manager Kenny Gryp. Attendees will get their hands on virtual machines and migrate standard Master/Slave architecture to the new MySQL InnoDB Cluster (native Group Replication) with minimal downtime. I had a chance to speak with Frédéric and learn a bit more about InnoDB Cluster and Group Replication:

Percona: How did you get into database technology? What do you love about it?

Frédéric: I started with SQL on VMS and DBASE during my IT courses. I wrote my thesis on SQLServer replication. To be honest, the implementation at the time wasn’t particularly good. At the same time, I enjoyed hacking a Perl module that allowed you to run SQL queries against plain text files. Then I worked as a Progress developer for MFG/Pro (a big customized ERP). When I decided to work exclusively for an open source company, I was managing more and more customers’ databases in MySQL (3.23 and 4.x), so my colleagues and I took MySQL training. It was still MySQL AB at that time. I passed my MySQL 4.1 Core and Professional Certification, and I ended up delivering MySQL Training for MySQL AB too. Some might also remember that I worked for a company called Percona for a while, before moving on to Oracle!

Percona: Your and Kenny’s tutorial is called “MySQL InnoDB Cluster & Group Replication in a nutshell: hands-on tutorial.” Why would somebody want to migrate to this technology?

Frédéric: The main reason to migrate to MySQL InnoDB is to achieve high availability (HA) easily for your MySQL database, while still maintaining performance.

Thanks to  MySQLShell’s adminAPI, you can create a cluster in less than five minutes. All the management can be done remotely, with just some shell commands. Additionally, for Group Replication our engineers have leveraged existing standard MySQL infrastructures (GTIDs, Multi-Source Replication, Multi-threaded Slave Applier, Binary Logs, etc.), which are already well known by the majority of MySQL DBAs.

Percona: How can moving to a high availability environment help businesses?

Frédéric: Time is money! Many businesses can’t afford downtime anymore. Having a reliable HA solution is crucial for businesses on the Internet. Expectations have changed: users want this to be natively supported inside the database technology versus externally. For example, when I started to work in IT, it was very common to have several maintenance windows during the night. But currently, and almost universally, the customer base is spread worldwide. Any time is during business hours somewhere in the world!

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Frédéric: I think that current HA solutions are too complex to setup and manage. They use external tools. In our tutorial, Kenny and I will demonstrate how MySQL InnoDB Cluster, being an internal implementation, is extremely easy to use. We will also cover some scenarios where things go wrong, and how to deal with them. Performance and ease-of-use were two key considerations in the design of InnoDB Cluster.

Percona: What are you most looking forward to at Percona Live?

Frédéric: Like every time I’ve attended, my main goal is to bring to the audience all the new and amazing stuff we are implementing in MySQL. MySQL has evolved quickly these last few years, and we don’t really plan to slow down. I also really enjoy the feedback from users and other MySQL professionals. This helps focus us on what really matters for our users. And finally, it’s a great opportunity to re-connect with ex-colleagues and friends.

You can find out more about Frédéric Descamps and his work with InnoDB Cluster at his Twitter handle @lefredRegister for Percona Live Data Performance Conference 2017, and see Frédéric and Kenny present their MySQL InnoDB Cluster and Group Replication in a Nutshell: Hands-On Tutorial talk. Use the code FeaturedTalk and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Oct
10
2016
--

Take Percona’s One-Click 2017 Top Database Concerns Poll

2017 top database concerns

2017 top database concernsTake Percona’s One-Click 2017 Top Database Concerns Poll.

With 2017 coming quick around the corner, it’s time to start thinking about what next year is going to bring to the open source database community. We just finished Percona Live Europe 2017, and at the conference we looked at new technologies, new techniques, and new ideas. With all this change comes some uncertainty and concern regarding change: how is a changing ecosystem going to affect your database environment? Are you up to speed on the latest technologies and how they can impact your world?

Are your biggest concerns for 2017 scalability, performance, security, monitoring, updates and bugs, or staffing issues? What do you think is going to be your biggest issue in the coming year??

 

Please take a few seconds and answer the following poll. It will help the community get an idea of how the new year could impact their critical database environments.

If you’ve faced specific issues, feel free to comment below. We’ll post a follow-up blog with the results!

Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.

You can see the results of our last blog poll on security here.

Jun
20
2016
--

Webinar Thursday June 23: Choosing a MySQL High Availability Solution Today

MySQL High Availability

Please join Percona, Technical Account Manager, Michael Patrick on Thursday, June 23, 2016 at 10 AM PDT (UTC-7) as he presents “Choosing a MySQL High Availability Solution Today.”

High availability (HA) is one of the solutions to improve performance, avoid data outages, and recover quickly from disasters. An HA environment helps guarantee that your database doesn’t have a single point of failure, accommodates rapid growth and exponentially increasing database size, and enables the applications that power your business.

Michael will discuss various topologies for achieving High Availability with MySQL.

Topics include:

  • Percona XtraDB Cluster
  • DRBD
  • MHA
  • MySQL Orchestrator

Each solution has advantages and challenges. Attendees will gain a deeper understanding of how to choose the best solution for their needs while avoiding some of the pitfalls of making the wrong choices. Avoid the costly mistakes that commonly cause outages and lost revenue. Plus get the latest and greatest developments in the technologies!

Register now.

MySQL High AvailabilityMichael Patrick
Technical Account Manager

Mike came to Percona in 2015 after working for a variety of large corporations running hundreds of MySQL and Percona XtraDB Clusters in production environments. He is skilled in performance tuning, server auditing, high availability, multi-data center replication, migration, and other MySQL-related activities. Mike holds a B.S. in Computer Science from East Tennessee State University. In his off time, he enjoys Martial Arts and Cave Exploration. He lives in East Tennessee with his wife and he has four children.

Mar
08
2016
--

Orchestrator: MySQL Replication Topology Manager

Orchestrator MySQL topology manager

Orchestrator MySQL replication topology managerThis blog post discusses Orchestrator: MySQL Replication Topology Manager.

What is Orchestrator?

Orchestrator is a replication topology manager for MySQL.

It has many great features:

  • The topology and status of the replication tree is automatically detected and monitored
  • Either a GUI, CLI or API can be used to check the status and perform operations
  • Supports automatic failover of the master, and the replication tree can be fixed when servers in the tree fail – either manually or automatically
  • It is not dependent on any specific version or flavor of MySQL (MySQL, Percona Server, MariaDB or even MaxScale binlog servers)
  • Orchestrator supports many different types of topologies, from a single master -> slave  to complex multi-layered replication trees consisting of hundreds of servers
  • Orchestrator can make topology changes and will do so based on the state at that moment; it does not require a configuration to be defined with what corresponds to the database topology
  • The GUI is not only there to report the status – one of the cooler things you can do is change replication just by doing a drag and drop in the web interface (of course you can do this and much more through the CLI and API as well)

Here’s a gif that demonstrates this (click on an image to see a larger version):

Orchestrator Drag N Drop

Orchestrator’s manual is quite extensive and detailed, so the goal of this blogpost is not to go through every installation and configuration step. It will just give a global overview on how Orchestrator works, while mentioning some important and interesting settings.

How Does It Work?

Orchestrator is a go application (binaries, including

rpm

  and

deb

  packages are available for download).

It requires it’s own MySQL database as a backend server to store all information related to the Orchestrator managed database cluster topologies.

There should be at least one Orchestrator daemon, but it is recommended to run many Orchestrator daemons on different servers at the same time – they will all use the same backend database but only one Orchestrator is going to be “active” at any given moment in time. (You can check who is active under the

Status

  menu on the web interface, or in the database in the

active_node

  table.)

Using MySQL As Database Backend, Isn’t That A SPOF?

If the Orchestrator MySQL database is gone, it doesn’t mean the monitored MySQL clusters stop working. Orchestrator just won’t be able to control the replication topologies anymore. This is similar to how MHA works: everything will work but you can not perform a failover until MHA is back up again.

At this moment, it’s required to have a MySQL backend and there is no clear/tested support for having this in high availability (HA) as well. This might change in the future.

Database Server Installation Requirements

Orchestrator only needs a MySQL user with limited privileges (

SUPER, PROCESS, REPLICATION SLAVE, RELOAD

) to connect to the database servers. With those permissions, it is able to check the replication status of the node and perform replication changes if necessary. It supports different ways of replication: binlog file positions, MySQL&MariaDB GTID, Pseudo GTID and Binlog servers.

There is no need to install any extra software on the database servers.

Automatic Master Failure Recovery

One example of what Orchestrator can do is promote a slave if a master is down. It will choose the most up to date slave to be promoted.

Let’s see what it looks like:

dtdggKGF0f

In this test we lost

rep1

 (master) and Orchestrator promoted

rep4

  to be the new master, and started replicating the other servers from the new master.

With the default settings, if

rep1

 comes back 

rep4

  is going to continue the replication from

rep1

. This behavior can be changed with the setting 

ApplyMySQLPromotionAfterMasterFailover:True

 in the configuration.

Command Line Interface

Orchestrator has a nice command line interface too. Here are some examples:

Print the topology:

> orchestrator -c topology -i rep1:3306 cli
rep1:3306   [OK,5.6.27-75.0-log,ROW,>>]
+ rep2:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep3:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep4:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep5:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]

Move a slave:

orchestrator -c relocate -i rep2:3306 -d rep4:3306

Print the topology again:

> orchestrator -c topology -i rep1:3306 cli
rep1:3306     [OK,5.6.27-75.0-log,ROW,>>]
+ rep3:3306   [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep4:3306   [OK,5.6.27-75.0-log,ROW,>>,GTID]
  + rep2:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep5:3306   [OK,5.6.27-75.0-log,ROW,>>,GTID]

As we can see, 

rep2

  now is replicating from

rep4

 .

Long Queries

One nice addition to the GUI is how it displays slow queries on all servers inside the replication tree. You can even kill bad queries from within the GUI.

Orchestrator GUI Long Queries

Orchestrator Configuration Settings

Orchestrator’s daemon configuration can be found in 

/etc/orchestrator.conf.json

. There are many configuration options, some of which we elaborate here:

  • SlaveLagQuery

      – Custom queries can be defined to check slave lag.

  • AgentAutoDiscover

      – If set to

    True

     , Orchestrator will auto-discover the topology.

  • HTTPAuthPassword

      and

    HTTPAuthUser

      –  Avoids everybody being able to access the Web GUI and change your topology.

  • RecoveryPeriodBlockSeconds

      – Avoids flapping.

  • RecoverMasterClusterFilters

      –  Defines which clusters should auto failover/recover.

  • PreFailoverProcesses

      – Orchestrator will execute this command before the failover.

  • PostFailoverProcesses

      – Orchestrator will execute this command after the failover.

  • ApplyMySQLPromotionAfterMasterFailover

      –  Detaches the promoted slave after failover.

  • DataCenterPattern

      – If there are multiple data centers, you can mark them using a pattern (they will get different colors in the GUI):Orchestrator thinks every server is in a different DC now.

Limitations

While being a very feature-rich application, there are still some missing features and limitations of which we should be aware.

One of the key missing features is that there is no easy way to promote a slave to be the new master. This could be useful in scenarios where the master server has to be upgraded, there is a planned failover, etc. (this is a known feature request).

Some known limitations:

  • Slaves can not be manually promoted to be a master
  • Does not support multi-source replication
  • Does not support all types of parallel replication
  • At this moment, combining this with Percona XtraDB Cluster (Galera) is not supported

Is Orchestrator Your High Availability Solution?

In order to integrate this in your HA architecture or include in your fail-over processes you still need to manage many aspects manually, which can all be done by using the different hooks available in Orchestrator:

  • Updating application connectivity:
    • VIP handling,
    • Updating DNS
    • Updating Proxy server (MaxScale , HAProxy , ProxySQL…) connections.
  • Automatically setting slaves to read only to avoid writes happening on non-masters and causing data inconsistencies
  • Fencing (STONITH) of the dead master, to avoid split-brain in case a crashed master comes back online (and applications still try to connect to it)
  • If semi-synchronous replication needs to be used to avoid data loss in case of master failure, this has to be manually added to the hooks as well

The work that needs to be done is comparable to having a setup with MHA or MySQLFailover.

This post also doesn’t completely describe the decision process that Orchestrator takes to determine if a server is down or not. The way we understand it right now, one active Orchestrator node will make the decision if a node is down or not. It does check a broken node’s slaves replication state to determine if Orchestrator isn’t the only one losing connectivity (in which it should just do nothing with the production servers). This is already a big improvement compared to MySQLFailover, MHA or even MaxScale’s failoverscripts, but it still might cause some problems in some cases (more information can be found on Shlomi Noach’s blog).

Summary

The amount of flexibility and power and fun that this tool gives you with a very simple installation process is yet to be matched. Shlomi Noach did a great job developing this at Outbrain, Booking.com and now at GitHub.

If you are looking for MySQL Topology Manager, Orchestrator is definitely worth looking at.

Jul
10
2015
--

How to create a rock-solid MySQL database backup & recovery strategy

Percona MySQL and MongoDB WebinarsHave you ever wondered what could happen if your MySQL database goes down?

Although it’s evident such a crash will cause downtime – and surely some business impact in terms of revenue – can you do something to reduce this impact?

The simple answer is “yes” by doing regular backups (of course) but are you 100% sure that your current backup strategy will really come through when an outage occurs? And how much precious time will pass (and how much revenue will be lost) before you get your business back online?

I usually think of backups as the step after HA fails. Let’s say we’re in M<>M replication and something occurs that kills the db but the HA can’t save the day. Let’s pretend that the UPS fails and those servers are completely out. You can’t failover; you have to restore data. Backups are a key piece of “Business Continuity.” Also factor in the frequent need to restore data that’s been altered by mistake. No ‘WHERE’ clause or DROP TABLE in prod instead of DEV. These instances are where backups are invaluable.

Let’s take some time and discuss the possible backup strategies with MySQL…  how to make backups efficiently and also examine the different tools that are available. We’ll cover these topics and more during my July 15  webinar: “Creating a Best-in-Class Backup and Recovery System for Your MySQL Environment” starting at 10 a.m. Pacific time.

On a related note, did you know that most online backups are possible with mysqldump and you can save some space on backups by using simple Linux tools? I’ll also cover this so be sure to join me next Wednesday. Oh, and it’s a free webinar, too!

Stay tuned!

The post How to create a rock-solid MySQL database backup & recovery strategy appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com