Aug
01
2017
--

Group Replication: the Sweet and the Sour

Group Replication

In this blog, we’ll look at group replication and how it deals with flow control (FC) and replication lag. 

Overview

In the last few months, we had two main actors in the MySQL ecosystem: ProxySQL and Group-Replication (with the evolution to InnoDB Cluster). 

While I have extensively covered the first, my last serious work on Group Replication dates back to some lab version years past.

Given that Oracle decided to declare it GA, and Percona’s decision to provide some level of Group Replication support, I decided it was time for me to take a look at it again.

We’ve seen a lot of coverage already too many Group Replication topics. There are articles about Group Replication and performance, Group Replication and basic functionalities (or lack of it like automatic node provisioning), Group Replication and ProxySQL, and so on.

But one question kept coming up over and over in my mind. If Group Replication and InnoDB Cluster have to work as an alternative to other (virtually) synchronous replication mechanisms, what changes do our customers need to consider if they want to move from one to the other?

Solutions using Galera (like Percona XtraDB Cluster) must take into account a central concept: clusters are data-centric. What matters is the data and the data state. Both must be the same on each node at any given time (commit/apply). To guarantee this, Percona XtraDB Cluster (and other solutions) use a set of data validation and Flow Control processes that work to the ensure a consistent cluster data set on each node.

The upshot of this principle is that an application can query ANY node in a Percona XtraDB Cluster and get the same data, or write to ANY node and know that the data is visible everywhere in the cluster at (virtually) the same time.

Last but not least, inconsistent nodes should be excluded and either rebuild or fixed before rejoining the cluster.

If you think about it, this is very useful. Guaranteeing consistency across nodes allows you to transparently split write/read operations, failover from one node to another with very few issues, and more.

When I conceived of this blog on Group Replication (or InnoDB Cluster), I put myself in the customer shoes. I asked myself: “Aside from all the other things we know (see above), what is the real impact of moving from Percona XtraDB Cluster to Group Replication/InnoDB Cluster for my application? Since Group Replication still (basically) uses replication with binlogs and relaylog, is there also a Flow Control mechanism?” An alarm bell started to ring in my mind.

My answer is: “Let’s do a proof of concept (PoC), and see what is really going on.”

The POC

I setup a simple set of servers using Group Replication with a very basic application performing writes on a single writer node, and (eventually) reads on the other nodes. 

You can find the schema definition here. Mainly I used the four tables from my windmills test suite — nothing special or specifically designed for Group Replication. I’ve used this test a lot for Percona XtraDB Cluster in the past, so was a perfect fit.

Test Definition

The application will do very simple work, and I wanted to test four main cases:

  1. One thread performing one insert at each transaction
  2. One thread performing 50 batched inserts at each transaction
  3. Eight threads performing one insert to each transaction
  4. Eight threads performing 50 batched inserts at each transaction

As you can see, a pretty simple set of operations. Then I decided to test it using the following four conditions on the servers:

  1. One slave worker FC as default
  2. One slave worker FC set to 25
  3. Eight slave workers FC as default
  4. Eight slave workers FC set to 25

Again nothing weird or strange from my point of view. I used four nodes:

  1. Gr1 Writer
  2. Gr2 Reader
  3. Gr3 Reader minimal latency (~10ms)
  4. Gr4 Reader minimal latency (~10ms)

Finally, I had to be sure I measured the lag in a way that allowed me to reference it consistently on all nodes. 

I think we can safely say that the incoming GTID (last_ Received_transaction_set from replication_connection_status) is definitely the last change applied to the master that the slave node knows about. More recent changes could have occurred, but network delay can prevent them from being “received.” The other point of reference is GTID_EXECUTED, which refers to the latest GTID processed on the node itself.

The closest query that can track the distance will be:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-2),':',1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( Received_transaction_set,':',-2),':',1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag

Or in the case of a single worker:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(Received_transaction_set,':',-1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag;

The result will be something like this:

+---------------+
| last_executed |
+---------------+
| 23607         |
+---------------+
+---------------+
| last_received |
+---------------+
| 23607         |
+---------------+
+----------+
| real_lag |
+----------+
|        0 |
+----------+

The whole set of tests can be found here, with all the commands you need to run the application (you can find it here) and replicate the tests. I will focus on the results (otherwise this blog post would be far too long), but I invite you to see the details.

The Results

Efficiency on Writer by Execution Time and Rows/Sec

Using the raw data from the tests (Excel spreadsheet available here), I was interested in identifying if and how the Writer is affected by the use of Group Replication and flow control.

Reviewing the graph, we can see that the Writer has a linear increase in the execution time (when using default flow control) that matches the increase in the load. Nothing there is concerning, and all-in-all we see what is expected if the load is light. The volume of rows at the end justifies the execution time.

It’s a different scenario if we use flow control. The execution time increases significantly in both cases (single worker/multiple workers). In the worst case (eight threads, 50 inserts batch) it becomes four times higher than the same load without flow control.

What happens to the inserted rows? In the application, I traced the rows inserted/sec. It is easy to see what is going on there:

We can see that the Writer with flow control activated inserts less than a third of the rows it processes without flow control. 

We can definitely say that flow control has a significant impact on the Writer performance. To clarify, let’s look at this graph:

Without flow control, the Writer processes a high volume of rows in a limited amount of time (results from the test of eight workers, eight threads, 50 insert batch). With flow control, the situation changes drastically. The Writer takes a long time processing a significantly smaller number of rows/sec. In short, performance drops significantly.

But hey, I’m OK with that if it means having a consistent data-set cross all nodes. In the end, Percona XtraDB Cluster and similar solutions pay a significant performance price match the data-centric principle. 

Let’s see what happen on the other nodes.

Entries Lag

Well, this scenario is not so good:

When NOT using flow control, the nodes lag behind the writer significantly. Remember that by default flow control in Group Replication is set to 25000 entries (I mean 25K of entries!!!).

What happens is that as soon as I put some salt (see load) on the Writer, the slave nodes start to lag. When using the default single worker, that will have a significant impact. While using multiple workers, we see that the lag happens mainly on the node(s) with minimal (10ms) network latency. The sad thing is that is not really going down with respect to the single thread worker, indicating that the simple minimal latency of 10ms is enough to affect replication.

Time to activate the flow control and have no lag:

Unfortunately, this is not the case. As we can see, the lag of single worker remains high for Gr2 (154 entries). While using multiple workers, the Gr3/4 nodes can perform much better, with significantly less lag (but still high at ~1k entries).

It is important to remember that at this time the Writer is processing one-third or less of the rows it is normally able to. It is also important to note that I set 25 to the entry limit in flow control, and the Gr3 (and Gr4) nodes are still lagging more than 1K entries behind.

To clarify, let check the two graphs below:

Using the Writer (Master) as a baseline in entry #N, without flow control, the nodes (slaves) using Group Replication start to significantly lag behind the writer (even with a light load).

The distance in this PoC ranged from very minimal (with 58 entries), up to much higher loads (3849 entries):

Using flow control, the Writer (Master) diverges less, as expected. If it has a significant drop in performance (one-third or less), the nodes still lag. The worst-case is up to 1363 entries. 

I need to underline here that we have no further way (that I am aware of, anyway) to tune the lag and prevent it from happening.

This means an application cannot transparently split writes/reads and expect consistency. The gap is too high.

A Graph That Tells Us a Story

I used Percona Monitoring and Management (PMM) to keep an eye on the nodes while doing the tests. One of the graphs really showed me that Group Replication still has some “limits” as the replication mechanism for a cluster:

This graph shows the MySQL queries executed on all the four nodes, in the testing using 8-50 threads-batch and flow control. 

As you can see, the Gr1 (Writer) is the first one to take off, followed by Gr2. Nodes Gr3 and Gr4 require a bit more, given the binlog transmission (and 10ms delay). Once the data is there, they match (inconsistently) the Gr2 node. This is an effect of flow control asking the Master to slow down. But as previously seen, the nodes will never match the Writer. When the load test is over, the nodes continue to process the queue for additional ~130 seconds. Considering that the whole load takes 420 seconds on the Writer, this means that one-third of the total time on the Writer is spent syncing the slave AFTERWARDS.

The above graph shows the same test without flow control. It is interesting to see how the Writer moved above 300 queries/sec, while G2 stayed around 200 and Gr3/4 far below. The Writer was able to process the whole load in ~120 seconds instead 420, while Gr3/4 continue to process the load for an additional ~360 seconds.

This means that without flow control set, the nodes lag around 360 seconds behind the Master. With flow control set to 25, they lag 130 seconds.

This is a significant gap.

Conclusions

Going back to the reason why I was started this PoC, it looks like my application(s) are not a good fit for Group Replication given that I have set Percona XtraDB Cluster to scale out the reads and efficiently move my writer to another when I need to. 

Group Replication is still based on asynchronous replication (as my colleague Kenny said). It makes sense in many other cases, but it doesn’t compare to solutions based on virtually synchronous replication. It still requires a lot of refinement.

On the other hand, for applications that can afford to have a significant gap between writers and readers it is probably fine. But … doesn’t standard replication already cover that? 

Reviewing the Oracle documentations (https://dev.mysql.com/doc/refman/5.7/en/group-replication-background.html), I can see why Group Replication as part of the InnoDB cluster could help improve high availability when compared to standard replication. 

But I also think it is important to understand that Group Replication (and derived solutions like InnoDB cluster) are not comparable or a replacement for data-centric solutions as Percona XtraDB Cluster. At least up to now.

Good MySQL to everyone.

Jul
21
2017
--

Faster Node Rejoins with Improved IST performance

IST Performance

In this blog, we’ll look at how improvements to Percona XtraDB Cluster improved IST performance.

Introduction

Starting in version 5.7.17-29.20 of Percona XtraDB Cluster significantly improved performance. Depending on the workload, the increase in throughput is in the range of 3-10x. (More details here). These optimization fixes also helped improve IST (Incremental State Transfer) performance. This blog is aimed at studying the IST impact.

IST

IST stands for incremental state transfer. When a node of the cluster leaves the cluster for a short period of time and then rejoins the cluster it needs to catch-up with cluster state. As part of this sync process existing node of the cluster (aka DONOR) donates missing write-sets to rejoining node (aka JOINER). In short, flow involves, applying missing write-sets on JOINER as it does during active workload replication.

Percona XtraDB Cluster / Galera already can apply write-sets in parallel using multiple applier threads. Unfortunately, due to commit contention, the commit action was serialized. This was fixed in the above Percona XtraDB Cluster release, allowing commits to proceed in parallel.

IST uses the same path for applying write-sets, except that it is more like a batch operation.

IST Performance

Let’s look at IST performance before and now.

Setup

  1. Two node cluster (node-1 and node-2) and gcache is configured large enough to avoid purging as we need IST
  2. Start workload against node-1 for 30 seconds
  3. Shutdown node-2
  4. Start workload that performs 4M requests against node-1. Workload produces ~3.5M write-sets that are cached in gcache and used later for IST
  5. Start node-2 with N-applier threads
  6. Wait until IST is done
  7. ….. repeat steps 3-6 with different values of N.

Observations:

  • IST is 4x faster with PXC-5.7.17 (compared to previous releases)
  • Improved performance means a quicker node rejoin, and an overall increase in cluster productivity as joiner node is available to process the workload more quickly

Conclusion

Percona XtraDB Cluster 5.7.17 significantly improved IST performance. A faster re-join of the node effectively means better cluster productivity and flexibility in planning maintenance window. So what are you waiting for? Upgrade to Percona XtraDB Cluster 5.7.17 or latest Percona XtraDB Cluster 5.7 release and experience the power!

Jul
05
2017
--

Differences in PREPARE Statement Error Handling with Binary and Text Protocol (Percona XtraDB Cluster / Galera)

PREPARE Statement

PREPARE StatementIn this blog, we’ll look at the differences in how a PREPARE statement handles errors in binary and text protocols.

Introduction

Since Percona XtraDB Cluster is a multi-master solution, when an application executes conflicting workloads one of the workloads gets rolled back with a DEADLOCK error. While the same holds true even if you fire the workload through a PREPARE statement, there are differences between using the MySQL connector API (with binary protocol) and the MySQL client (with text protocol). Let’s look at these differences with the help of an example.

Base Workload

  • Say we have a two-node cluster (n1 and n2) with the following base schema and tables:
    use test;
    create table t (i int, k int, primary key pk(i)) engine=innodb;
    insert into t values (1, 10), (2, 20), (3, 30);
    select * from t;
  • Workload (w1) on node-1 (n1):
    prepare st1 from 'update t set k = k + 100 where i > 1';
    execute st1;
  • Workload (w2) on node-2 (n2):
    prepare st1 from 'update t set k = k + 100 where i > 1';
    execute st1;
  • The workloads are conflicting, and the first node to commit wins. Let’s assume n1 commits first, and the write-set is replicated to n2 while n2 is busy executing the local workload.

Different Scenarios Based on Configuration

wsrep_retry_autocommit > 0
  • n2 tries to apply the replicated workload, thereby forcefully aborting (brute-force abort) the local workload (w2).
  • The forcefully aborted workload is retried (based on wsrep_retry_autocommit value (default is 1)).
  • w2 succeeds on retry, and the new state of table t would be:
    (1, 10), (2, 220), (3, 230);
  • The user can increase wsrep_retry_autocommit to a higher value, though we don’t recommend this as a higher value increases pressure on the system to retry failed workloads. A retry doesn’t ensure things always pass if additional conflicting replicated workload are lined up in the certification queue. Also, it is important to note that a retry works only if
    AUTO_COMMIT=ON

    . For a begin-commit transaction, retry is not applicable. If we abort this transaction, then the complete transaction is rolled back. The workload executor application should have the logic to expect workload failure due to conflict and handle it accordingly.

wsrep_retry_autocommit = 0
  • Let’s say the user sets wsrep_retry_autocommit=0 on node-2 and executes the same workload.
  • This time it doesn’t retry the local workload (w2). Instead, it sends an error to the end-user:
    mysql> execute st1;
    ERROR 1213 (40001): WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction
    mysql> select * from t;
    +---+------+
    | i | k |
    +---+------+
    | 1 | 10 |
    | 2 | 120 |
    | 3 | 130 |
    +---+------+
    3 rows in set (0.00 sec)
  • Only the replicated workload (w1) is applied, and local workload w2 fails with a DEADLOCK error.
Use of MySQL Connector API

Now let’s try to execute these workloads through Connector API. Connector API uses a binary protocol, which means it directly invokes the PREPARE statement and executes the statement using dedicated command codes (

COM_STMT_PREPARE

 and

COM_STMT_EXECUTE

).

While the conflicting transaction internal abort remains same, the

COM_QUERY

 command code (text protocol) handling corrects the “query interrupted error” and resets it to “deadlock error” based on the

wsrep_conflict_state

 value. It also has logic to retry the auto-commit enable statement.

This error handling and retry logic is not present when the

COM_STMT_PREPARE

 and

COM_STMT_EXECUTE

 command codes are used. This means the user sees a different error (non-masked error) when we try the same workload through an application using the MySQL Connector API:

Output of application from node-2:
Statement init OK!
Statement prepare OK!
Statement execution failed: Query execution was interrupted
exact error-code to be specific: ER_QUERY_INTERRUPTED - 1317 - "Query execution was interrupted"

As you can see above, the statement/workload execution fails with “Query execution was interrupted” error vs. “deadlock” error (as seen with the normal MySQL client).

Conclusion

While the core execution remains same, error handling is different with text and binary protocol. This issue/limitation was always present, but it’s more prominent since sysbench-1.0 started using the PREPARE statement as a default.

What does this means to end-user/application?

  • Nothing has changed from the Percona XtraDB Cluster perspective, except that if your application is planning to use a binary protocol (like Connector API), then the application should able to handle both the errors and retry the failed transaction accordingly.
Jun
23
2017
--

Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication High Availability Webinar: Q & A

High Availability Webinar

High Availability WebinarThank you for attending the Wednesday, June 21, 2017 high availability webinar titled Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication. In this blog, I will provide answers to the Q & A for that webinar.

You can find the slides and a recording of the webinar here.

Is there a minimum MySQL server version for Group Replication?

MySQL Group Replication is GA since MySQL Community 5.7.17. This is the lowest version that you should use for the Group Replication feature. Otherwise, you are using a beta version.

Since 5.7.17 was the GA release, it’s strongly recommended you use the latest 5.7 minor release. Bugs get fixed and features added in each of the minor releases (as can be seen in the Limitations section in the slide deck).

In MySQL 5.6 and earlier versions, Group Replication is not supported. Note that Percona Server for MySQL 5.7.17 and beyond also ships with Group Replication.

Can I use Percona XtraDB Cluster with MariaDB v10.2? or must I use Percona Server for MySQL?

Percona XtraDB Cluster is Percona Server for MySQL and Percona XtraBackup with the modified Galera library. You cannot run Percona XtraDB Cluster on MariaDB.

However, as Percona XtraDB Cluster is open source, it is possible that MariaDB/Codership implements our modifications into their codebase.

If Percona XtraDB Cluster does not allow InnoDB tables, how do we typically deal with applications that need to use MyISAM tables?

You cannot use MyISAM with Percona XtraDB Cluster, Galera or Group Replication. However, there is experimental MyISAM support in Galera/Percona XtraDB Cluster. But we strongly recommend that you don’t use this in production. It effectively executes all statements in Total Order Isolation, which results in bad performance.

What is a typical business use case for the Group Replication? I specifically like the writes order feature.

Typical use cases are:

  • Environments with strict **durability** requirements
  • Write to multiple nodes simultaneously while keeping data **consistent**
  • Reducing failover time
  • Using other nodes for read-scaling, where reading stale data is more difficult for the application (as opposed to standard asynchronous replication)

The use cases for Galera and Percona XtraDB Cluster are similar.

Where do you run ProxySQL, on a separate server? We are using HAProxy.

You can deploy ProxySQL in many different ways. One common method of installation is to run ProxySQL on a separate layer of servers (ensuring there is failover on this layer). Another commonly used method is to run a ProxySQL daemon on every application server.

Do you support KVM?

Yes, there are no limitations on virtualization solutions.

Can you give some examples of an “arbitrator”?

Some useful links:

What does Percona XtraDB add to make it more performant than InnoDB?

The scalability and performance improvement of Percona XtraDB are listed on the Percona Server for MySQL documentation page: https://www.percona.com/doc/percona-server/LATEST/index.html

How scalable is Percona XtraDB Cluster storage wise? Do we have any limitations?

Storage happens through the storage engine (which is InnoDB). Percona XtraDB Cluster does not have any different limitations than Percona Server for MySQL or MySQL.

However, we need to also consider the practical side of things: the larger the cluster gets, the longer certain operations take. For example, when adding a new node to the cluster another node must be the donor and provide all the data. This will take substantially longer with larger datasets. Certain operational aspects might therefore become more complex.

Is there any development to add multiple nodes simultaneously?

No, at the moment only one node can join the cluster at the same time. Other nodes automatically wait until it is finished before joining.

Why does Galera say we cannot use READ COMMITTED isolation for multimaster mode, even though we can start the cluster with READ-COMMITTED?

You can use READ-COMMITTED as transaction isolation level. The limitation is that you cannot use SERIALIZABLE: http://galeracluster.com/documentation-webpages/isolationlevels.html.

Galera Cluster and MariaDB currently do not prevent a user from using this transaction isolation level. Percona XtraDB Cluster implemented the strict mode to prevent these operations: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html#explicit-table-locking

MariaDB 10.2 fixed the check constraints issue, When will Percona fix this issue?

There are currently no plans to support CHECK constraints in Percona Server for MySQL (and therefore Percona XtraDB Cluster as well).

As Percona Server is effectively a fully backwards-compatible (but modified) MySQL Community Server, CHECK constraints is a feature that normally would be implemented in MySQL Community first.

Can you share your performance benchmark git repository (if you have one)?

We don’t have a performance benchmark in git repository. You can get detailed information about this benchmark in this blog: Performance improvements in Percona XtraDB Cluster 5.7.17-29.20.

On your slide pointing to scalability charts, how many nodes did you run your test against?

We used a three-node cluster for this performance benchmark.

The product is using Master-Master replication. As such what do you mean when you talk about failover in such configuration?
Where do you maintain the cluster state?

All technologies automatically maintain the cluster state as you add and remove nodes.

What are the network/IP requirements for Proxy SQL?

There are no specific requirements. More documentation about ProxySQL can be found here: https://github.com/sysown/proxysql/wiki.

Jun
20
2017
--

The MySQL High Availability Landscape in 2017 (The Elders)

High Availability

In this blog, we’ll look at different MySQL high availability options.

The dynamic MySQL ecosystem is rapidly evolving many technologies built around MySQL. This is especially true for the technologies involved with the high availability (HA) aspects of MySQL. When I joined Percona back in 2009, some of these HA technologies were very popular – but have since been almost forgotten. During the same interval, new technologies have emerged. In order to give some perspective to the reader, and hopefully help to make better choices, I’ll review the MySQL HA landscape as it is in 2017. This review will be in three parts. The first part (this post) will cover the technologies that have been around for a long time: the elders. The second part will focus on the technologies that are very popular today: the adults. Finally, the last part will try to extrapolate which technologies could become popular in the upcoming years: the babies.

Quick disclaimer, I am reporting on the technologies I see the most. There are likely many other solutions not covered here, but I can’t talk about technologies I have barely or never used. Apart from the RDS-related technologies, all the technologies covered are open-source. The target audience for this post are people relatively new to MySQL.

The Elders

Let’s define the technologies in the elders group. These are technologies that anyone involved with MySQL for last ten years is sure to be aware of. I could have called this group the “classics”.  I include the following technologies in this group:

  • Replication
  • Shared storage
  • NDB cluster

Let’s review these technologies in the following sections.

Replication

Simple replication topology

 

MySQL replication is very well known. It is one of the main features behind the wide adoption of MySQL. Replication gets used almost everywhere. The reasons for that are numerous:

  • Replication is simple to setup. There are tons of how-to guides and scripts available to add a slave to a MySQL server. With Amazon RDS, adding a slave is just a few clicks.
  • Slaves allow you to easily scale reads. The slaves are accessible and can be used for reads. This is the most common way of scaling up a MySQL database.
  • Slaves have little impact on the master. Apart from the added network traffic, the presence of slaves does not impact the master performance significantly.
  • It is well known. No surprises here.
  • Used for failover. Your master died, promote a slave and use it as your new master.
  • Used for backups. You don’t want to overload your master with the backups, run them off a slave.

Of course, replication also has some issues:

  • Replication can lag. Replication used to be single-threaded. That means a master with a concurrent load could easily outpace a slave. MySQL 5.6 and MariaDB 10.0 have introduced some parallelism to the slave. Newer versions have further improved to a point where today’s slaves are many times faster than they were.
  • Slaves can diverge. When you modify data on the master, the slave must perform the exact same update. That seems easy, but there are many ways an update can be non-deterministic with statement-based replication. They fixed many issues, and the introduction of row-based replication has been another big step forward. Still, if you write directly to a slave you are asking for trouble. There is a read_only setting, but if the MySQL user has the “SUPER” privilege it is just ignored. That’s why there is now the “super_read_only” setting. Tools like pt-table-checksum and pt-table-sync from the Percona toolkit exist to solve this problem.
  • Replication can impact the master. I wrote above that the presence of slaves does not affect the master, but logging changes are more problematic. The most common issue is the InnoDB table-level locking for auto_increment values with statement-based replication. Only one thread can insert new rows at a time. You can avoid this issue with row-based replication and properly configuring settings.
  • Data gets lost. Replication is asynchronous. That means the master will reply “done” after a commit statement even though the slaves have not received updates yet. Some transactions can get lost if the master crashes.

Although an old technology, a lot of work has been done on replication. It is miles away from the replication implementation of 5.0.x. Here’s a list, likely incomplete, of the evolution of replication:

  • Row based replication (since 5.1). The binary internal representation of the rows is sent instead of the SQL statements. This makes replication more robust against slave divergence.
  • Global transaction ID (since 5.6). Transactions are uniquely identified. Replication can be setup without knowing the binlog file and offset.
  • Checksum (since 5.6). Binlog events have checksum values to validate their integrity.
  • Semi-sync replication (since 5.5). An addition to the replication protocol to make the master aware of the reception of events by the slaves. This helps to avoid losing data when a master crashes.
  • Multi-source replication (since 5.7). Allows a slave to have more than one master.
  • Multi-threaded replication (since 5.6). Allows a slave to use multiple threads. This helps to limit the slave lag.

Managing replication is a tedious job. The community has written many tools to manage replication:

  • MMM. An old Perl tool that used to be quite popular, but had many issues. Now rarely used.
  • MHA. The most popular tool to manage replication. It excels at reconfiguring replication without losing data, and does a decent at handling failover.  It is also simple. No wonder it is popular.
  • PRM. A Pacemaker-based solution developed to replace MMM. It’s quite good at failover, but not as good as MHA at reconfiguring replication. It’s also quite complex, thanks to Pacemaker. Not used much.
  • Orchestrator. The new cool tool. It can manage complex topologies and has a nice web-based interface to monitor and control the topology.

 

Shared Storage

Simple shared storage topology

 

Back when I was working for MySQL ten years ago, shared storage HA setups were very common. A shared storage HA cluster uses one copy of the database files between one of two servers. One server is active, the other one is passive. In order to be shared, the database files reside on a device that can be mounted by both servers. The device can be physical (like a SAN), or logical (like a Linux DRBD device). On top of that, you need a cluster manager (like Pacemaker) to handle the resources and failovers. This solution is very popular because it allows for failover without losing any transactions.

The main drawback of this setup is the need for an idle standby server. The standby server cannot have any other assigned duties since it must always be ready to take over the MySQL server. A shared storage solution is also obviously not resilient to file-level corruption (but that situation is exceptional). Finally, it doesn’t play well with a cloud-based environment.

Today, newly-deployed shared storage HA setups are rare. The only ones I encountered over the last year were either old implementations needing support, or new setups that deployed because of existing corporate technology stacks. That should tell you about the technology’s loss of popularity.

NDB Cluster

A simple NDB Cluster topology

 

An NDB Cluster is a distributed clustering solution that has been around for a long time. I personally started working with this technology back in 2008. An NDB Cluster has three types of nodes: SQL, management and data. A full HA cluster requires a minimum of four nodes.

An NDB Cluster is not a general purpose database due to its distributed nature. For suitable workloads, it is extraordinary good. For unsuitable workloads, it is miserable. A suitable workload for an NDB Cluster contains high concurrency, with a high rate of small primary key oriented transactions. Reaching one million trx/s on an NDB Cluster is nothing exceptional.

At the other end of the spectrum, a poor workload for an NDB Cluster is a single-threaded report query on a star-like schema. I have seen some extreme cases where just the network time of a reporting query amounted to more than 20 minutes.

Although NDB Clusters have improved, and are still improving, their usage has been pushed toward niche-type applications. Overall, the technology is losing ground and is now mostly used for Telcos and online gaming applications.

Jun
19
2017
--

Upcoming HA Webinar Wed 6/21: Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication

High Availability

High AvailabilityJoin Percona’s MySQL Practice Manager Kenny Gryp and QA Engineer, Ramesh Sivaraman as they present a high availability webinar around Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication on Wednesday, June 21, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

What are the implementation differences between Percona XtraDB Cluster 5.7, Galera Cluster 5.7 and MySQL Group Replication?

  • How do they work?
  • How do they behave differently?
  • Do these methods have any major issues?

This webinar will describe the differences and shed some light on how QA is done for each of the different technologies.

Register for the webinar here.

High AvailabilityRamesh Sivaraman, QA Engineer

Ramesh joined the Percona QA Team in March 2014. He has almost six years of experience in database administration and, before joining Percona, was giving MySQL database support to various service and product based internet companies. Ramesh’s professional interests include writing shell/Perl script to automate routine tasks and new technology. Ramesh lives in Kerala, the southern part of India, close to his family.

High AvailabilityKenny Gryp, MySQL Practice Manager

Kenny is currently MySQL Practice Manager at Percona.

Jun
02
2017
--

Percona XtraDB Cluster 5.7.18-29.20 is now available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.7Percona announces the release of Percona XtraDB Cluster 5.7.18-29.20 on June 2, 2017. Binaries are available from the downloads section or our software repositories.

NOTE: You can also run Docker containers from the images in the Docker Hub repository.

NOTE: Due to new package dependency, Ubuntu/Debian users should use apt-get dist-upgrade or apt-get installpercona-xtradb-cluster-57 to upgrade.

Percona XtraDB Cluster 5.7.18-29.20 is now the current release, based on the following:

All Percona software is open-source and free.

Fixed Bugs

  • PXC-749: Fixed memory leak when running INSERT on a table without primary key defined and wsrep_certify_nonPK disabled (set to 0).

    NOTE: We recommend you define primary keys on all tables for correct write-set replication.

  • PXC-812: Fixed SST script to leave the DONOR keyring when JOINER clears the datadir.

  • PXC-813: Fixed SST script to use UTC time format.

  • PXC-816: Fixed hook for caching GTID events in asynchronous replication. For more information, see #1681831.

  • PXC-820: Enabled querying of pxc_maint_mode by another client during the transition period.

  • PXC-823: Fixed SST flow to gracefully shut down JOINER node if SST fails because DONOR leaves the cluster due to network failure. This ensures that the DONOR is then able to recover to synced state when network connectivity is restored For more information, see #1684810.

  • PXC-824: Fixed graceful shutdown of Percona XtraDB Cluster node to wait until applier thread finishes.

Other Improvements

  • PXC-819: Added five new status variables to expose required values from wsrep_ist_receive_status and wsrep_flow_control_interval as numbers, rather than strings that need to be parsed:

    • wsrep_flow_control_interval_low
    • wsrep_flow_control_interval_high
    • wsrep_ist_receive_seqno_start
    • wsrep_ist_receive_seqno_current
    • wsrep_ist_receive_seqno_end

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Jun
02
2017
--

Percona XtraDB Cluster 5.6.36-26.20 is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6.34-26.19Percona announces the release of Percona XtraDB Cluster 5.6.36-26.20 on June 2, 2017. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.36-26.20 is now the current release, based on the following:

All Percona software is open-source and free.

NOTE: Due to end of life, Percona will stop producing packages for the following distributions after July 31, 2017:

  • Red Hat Enterprise Linux 5 (Tikanga)
  • Ubuntu 12.04 LTS (Precise Pangolin)

You are strongly advised to upgrade to latest stable versions if you want to continue using Percona software.

Fixed Bugs

  • PXC-749: Fixed memory leak when running INSERT on a table without primary key defined and wsrep_certify_nonPK disabled (set to 0).

    NOTE: We recommended you define primary keys on all tables for correct write set replication.

  • PXC-813: Fixed SST script to use UTC time format.

  • PXC-823: Fixed SST flow to gracefully shut down JOINER node if SST fails because DONOR leaves the cluster due to network failure. This ensures that the DONOR is then able to recover to synced state when network connectivity is restored For more information, see #1684810.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

May
31
2017
--

ProxySQL-Assisted Percona XtraDB Cluster Maintenance Mode

Percona XtraDB Cluster Maintenance Mode

Percona XtraDB Cluster Maintenance ModeIn this blog post, we’ll look at how Percona XtraDB Cluster maintenance mode uses ProxySQL to take cluster nodes offline without impacting workloads.

Percona XtraDB Cluster Maintenance Mode

Since Percona XtraDB Cluster offers a high availability solution, it must consider a data flow where a cluster node gets taken down for maintenance (through isolation from a cluster or complete shutdown).

Percona XtraDB Cluster facilitated this by introducing a maintenance mode. Percona XtraDB Cluster maintenance mode reduces the number of abrupt workload failures if a node is taken down using ProxySQL (as a load balancer).

The central idea is delaying the core node action and allowing ProxySQL to divert the workload.

How ProxySQL Manages Percona XtraDB Cluster Maintenance Mode

With Percona XtraDB Cluster maintenance mode, ProxySQL marks the node as OFFLINE when a user triggers a shutdown signal (or wants to put a specific node into maintenance mode):

  • When a user triggers a shutdown, Percona XtraDB Cluster node sets
    pxc_maint_mode

     to SHUTDOWN (from the DISABLED default) and sleep for x seconds (dictated by

    pxc_maint_transition_period

      — 10 secs by default). ProxySQLauto detects this change and marks the node as OFFLINE. With this change, ProxySQL avoids opening new connections for any DML transactions, but continues to service existing queries until

    pxc_maint_transition_period

    . Once the sleep period is complete, Percona XtraDB Cluster delivers a real shutdown signal — thereby giving ProxySQL enough time to transition the workload.

  • If the user needs to take a node into maintenance mode, the user can simply set
    pxc_maint_mode

     to MAINTENANCE. With that, 

    pxc_maint_mode

     is updated and the client connection updating it goes into sleep for x seconds (as dictated by

    pxc_maint_transition_period

    ) before giving back control to the user. ProxySQL auto-detects this change and marks the node as OFFLINE. With this change ProxySQL avoids opening new connections for any DML transactions but continues to service existing queries.

  • ProxySQL auto-detects this change in maintenance state and then automatically re-routes traffic, thereby reducing abrupt workload failures.

Technical Details:

  • The ProxySQL Galera checker script continuously monitors the state of individual nodes by checking the
    pxc_maint_mode

     variable status (in addition to the existing

    wsrep_local_state

    ) using the ProxySQL scheduler feature

  • Scheduler is a Cron-like implementation integrated inside ProxySQL, with millisecond granularity.
  • If
    proxysql_galera_checker

     detects

    pxc_maint_mode = SHUTDOWN | MAINTENANCE

    , then it marks the node as OFFLINE_SOFT.  This avoids the creation of new connections (or workloads) on the node.

Sample

proxysql_galera_checker

 log:

Thu Dec  8 11:21:11 GMT 2016 Enabling config
Thu Dec  8 11:21:17 GMT 2016 Check server 10:127.0.0.1:25000 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:17 GMT 2016 Check server 10:127.0.0.1:25100 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:17 GMT 2016 Check server 10:127.0.0.1:25200 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:17 GMT 2016 Changing server 10:127.0.0.1:25200 to status OFFLINE_SOFT due to SHUTDOWN
Thu Dec  8 11:21:17 GMT 2016 Number of writers online: 2 : hostgroup: 10
Thu Dec  8 11:21:17 GMT 2016 Enabling config
Thu Dec  8 11:21:22 GMT 2016 Check server 10:127.0.0.1:25000 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:22 GMT 2016 Check server 10:127.0.0.1:25100 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:22 GMT 2016 Check server 10:127.0.0.1:25200 , status OFFLINE_SOFT , wsrep_local_state 4

Ping us below with any questions or comments.

May
25
2017
--

What About ProxySQL and Mirroring?

ProxySQL and Mirroring

In this blog post, we’ll look at how ProxySQL and mirroring go together.

Overview

Let me be clear: I love ProxySQL, and I think it is a great component for expanding architecture flexibility and high availability. But not all that shines is gold! In this post, I want to correctly set some expectations, and avoid selling carbon for gold (carbon has it’s own uses, while gold has others).

First of all, we need to cover the basics of how ProxySQL manages traffic dispatch (I don’t want to call it mirroring, and I’ll explain further below).

ProxySQL receives a connection from the application, and through it we can have a simple SELECT or a more complex transaction. ProxySQL gets each query, passes them to the Query Processor, processes them, identifies if a query is mirrored, duplicates the whole MySQL session ProxySQL internal object and associates it to a mirror queue (which refer to a mirror threads pool). If the pool is free (has an available active slot in the concurrent active threads set) then the query is processed right away. If not, it will stay in the queue. If the queue is full, the query is lost.

Whatever is returned from the query goes to /dev/null, and as such no result set is passed back to the client.

The whole process is not free for a server. If you check the CPU utilization, you will see that the “mirroring” in ProxySQL actually doubles the CPU utilization. This means that the traffic on server A is impacted because of resource contention.

Summarizing, ProxySQL will:

  1. Send the query for execution in different order
  2. Completely ignore any transaction isolation
  3. Have different number of query executed on B with respect to A
  4. Add significant load on the server resources

This point, coupled with the expectations I mention in the reasoning at the end of this article, it is quite clear to me that at the moment we cannot consider ProxySQL as a valid mechanism to duplicate a consistent load from server A to server B.

Personally, I don’t think that the ProxySQL development team (Rene :D) should waste time on fixing this issue, as there are so many other things to cover and improve on in ProxySQL.

After working extensively with ProxySQL, and doing a deep QA on mirroring, I think that either we keep it as basic blind traffic dispatcher. Otherwise, a full re-conceptualization is required. But once we have clarified that, ProxySQL “traffic dispatch” (still don’t want to call it mirroring) remains a very interesting feature that can have useful applications – especially since it is easy to setup.

The following test results should help set the correct expectations.

The tests were simple: load data in a Percona XtraDB Cluster and use ProxySQL to replicate the load on a MySQL master-slave environment.

  • Machines for MySQL/Percona XtraDB Cluster: VM with CentOS 7, 4 CPU 3 GB RAM, attached storage
  • Machine for ProxySQL: VM CentOS 7, 8 CPU 8GB RAM

Why did I choose to give ProxySQL a higher volume of resources? I knew in advance I could need to play a bit with a couple of settings that required more memory and CPU cycles. I wanted to be sure I didn’t get any problems from ProxySQL in relation to CPU and memory.

The application that I was using to add load is a Java application I develop to perform my tests. The app is at https://github.com/Tusamarco/blogs/blob/master/stresstool_base_app.tar.gz, and the whole set I used to do the tests are here:  https://github.com/Tusamarco/blogs/tree/master/proxymirror.

I used four different tables:

+------------------+
| Tables_in_mirror |
+------------------+
| mirtabAUTOINC    |
| mirtabMID        |
| mirtabMIDPart    |
| mirtabMIDUUID    |

Ok so let start. Note that the meaningful tests are the ones below. For the whole set, refer to the whole set package. First setup ProxySQL:

First setup ProxySQL:

delete from mysql_servers where hostgroup_id in (500,501,700,701);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.5',500,3306,60000,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.5',501,3306,100,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.21',501,3306,20000,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.231',501,3306,20000,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.7',700,3306,1,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.7',701,3306,1,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.25',701,3306,1,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.43',701,3306,1,400);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
delete from mysql_users where username='load_RW';
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('load_RW','test',1,500,'test',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
delete from mysql_query_rules where rule_id=202;
insert into mysql_query_rules (rule_id,username,destination_hostgroup,mirror_hostgroup,active,retries,apply) values(202,'load_RW',500,700,1,3,1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

Test 1

The first test is mainly a simple functional test during which I insert records using one single thread in Percona XtraDB Cluster and MySQL. No surprise, here I have 3000 loops and at the end of the test I have 3000 records on both platforms.

To have a baseline we can see that the ProxySQL CPU utilization is quite low:

ProxySQL and Mirroring

At the same time, the number of “questions” against Percona XtraDB Cluster and MySQL very similar:

Percona XtraDB Cluster

ProxySQL and Mirroring

MySQL

ProxySQL and Mirroring

The other two metrics we want to keep an eye on are Mirror_concurrency and Mirror_queue_length. These two refer respectively to mysql-mirror_max_concurrency and mysql-mirror_max_queue_length:

ProxySQL and Mirroring

These two new variables and metrics were introduced in ProxySQL 1.4.0, with the intent to control and manage the load ProxySQL generates internally related to the mirroring feature. In this case, you can see we have a max of three concurrent connections and zero queue entries (all good).

Now that we have a baseline, and that we know at functional level “it works,” let see what happens when increasing the load.

Test 2

The scope of the test was identifying how ProxySQL behaves with a standard configuration and increasing load. It comes up that as soon as ProxySQL has a little bit more load, it starts to lose some queries along the way.

Executing 3000 loops for 40 threads only results in 120,000 rows inserted in all the four tables in Percona XtraDB Cluster. But the table in the secondary (mirrored) platform only has a variable number or inserted rows, between 101,359 and 104,072. This demonstrates consistent loss of data.

After reviewing and comparing the connections running in Percona XtraDB Cluster and the secondary, we can see that (as expected) Percona XtraDB Cluster’s number of connections is scaling and serving the number of incoming requests, while the connections on the secondary are limited by the default value of mysql-mirror_max_concurrency=16.

ProxySQL and Mirroring

Is also interesting to note that the ProxySQL transaction process queue maintains its connection to the Secondary longer than the connection to Percona XtraDB Cluster.

ProxySQL and Mirroring

As we can see above, the queue is an evident bell curve that reaches 6K entries (which is quite below the mysql-mirror_max_queue_length limit (32K)). Yet queries were dropped by ProxySQL, which indicates the queue is not really enough to accommodate the pending work.

ProxySQL and Mirroring

CPU-wise, ProxySQL (as expected) take a few more cycles, but nothing crazy. The overhead for the simple mirroring queue processing can be seen when the main load stops around 12:47.

Another interesting graph to keep an eye on is the one describing the executed commands inside Percona XtraDB Cluster and the secondary:

Percona XtraDB Cluster

ProxySQL and Mirroring

Secondary

ProxySQL and Mirroring

As you can see, the traffic on the secondary was significantly less (669 on average, compared to Percona XtraDB Cluster’s 1.17K). Then it spikes when the main load on the Percona XtraDB Cluster node terminates. In short it is quite clear that ProxySQL is not able to scale following the traffic existing in Percona XtraDB Cluster, and actually loses a significant amount of data on the secondary.

Doubling the load in Test 3 shows the same behavior, with ProxySQL reaches its limit for traffic duplication.

But can this be optimized?

The answer is, of course, yes! This is what the mysql-mirror_max_concurrency is for, so let;’s see what happens if we increase the value from 16 to 100 (just to make it crazy high).

Test 4 (two app node writing)

The first thing that comes to attention is that both Percona XtraDB Cluster and secondary report the same number of rows in the tables (240,000). That is a good first win.

Second, note the number of running connections:

ProxySQL and Mirroring

The graphs are now are much closer, and the queue drops to just a few entries.

Commands executed in Percona XtraDB Cluster:

And commands executed in the secondary:

Average execution reports the same value, and very similar trends.

Finally, what was the CPU cost and effect?

Percona XtraDB Cluster and secondary CPU utilization:

     

As expected, some difference in the CPU usage distribution exists. But the trend is consistent between the two nodes, and the operations.

The ProxySQL CPU utilization is definitely higher than before:

But it’s absolutely manageable, and still reflects the initial distribution.

What about CRUD? So far I’ve only tested the insert operation, but what happen if we run a full CRUD set of tests?

Test 7 (CRUD)

First of all, let’s review the executed commands in Percona XtraDB Cluster:

And the secondary:

While in appearance we have very similar workloads, selects aside the behavior will significantly diverge. This is because in the secondary the different operations are not encapsulated by the transaction. They are executed as they are received. We can see a significant difference in update and delete operations between the two.

Also, the threads in the execution show a different picture between the two platforms:

Percona XtraDB Cluster

Secondary

It appears quite clear that Percona XtraDB Cluster is constantly running more threads and more connections. Nevertheless, both platforms process a similar total number of questions:

Percona XtraDB Cluster

Secondary

Both have an average or around 1.17K/second questions.

This is also another indication of how much the impact of concurrent operation on behavior, with no respect to the isolation or execution order. Below we can clearly see different behavior by reviewing the CPU utilization:

Percona XtraDB Cluster

Secondary

Conclusions

To close this article, I want to go back to the start. We cannot consider the mirror function in ProxySQL as a real mirroring, but more as traffic redirection (check here for more reasoning on mirroring from my side).

Using ProxySQL with this approach is still partially effective in testing the load and the effect it has on a secondary platform. As we know, data consistency is not guaranteed in this scenario, and Selects, Updates and Deletes are affected (given the different data-set and result-set they manage).

The server behaviors change between the original and mirror, if not in the quantity or the quality.

I am convinced that when we need a tool able to test our production load on a different or new platform, we would do better to look to something else. Possibly query Playback, recently reviewed and significantly patched by DropBox (https://github.com/Percona-Lab/query-playback).

In the end, ProxySQL is already a cool tool. If it doesn’t cover mirroring well, I can live with that. I am interested in having it working as it should (and it does in many other functionalities).

Acknowledgments

As usual, to Rene, who worked on fixing and introducing new functionalities associated with mirroring, like queue and concurrency control.

To the Percona team who developed Percona Monitoring and Management (PMM): all the graphs here (except 3) come from PMM (some of them I customized).

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com