Sep
25
2017
--

Percona Live Europe: Tutorials Day

Percona Live Tutorials

Percona Live Europe TutorialsWelcome to the first day of the Percona Live Open Source Database Conference Europe 2017: Tutorials day! Technically the first day of the conference, this day focused on provided hands-on tutorials for people interested in learning directly how to use open source tools and technologies.

Today attendees went to training sessions taught by open source database experts and got first-hand experience configuring, working with, and experimenting with various open source technologies and software.

The first full day (which includes opening keynote speakers and breakout sessions) starts Tuesday 9/26 at 9:15 am.

Some of the tutorial topics covered today were:

Percona Live Europe TutorialsMonitoring MySQL Performance with Percona Monitoring and Management (PMM)

Michael Coburn, Percona

This was a hands-on tutorial covering how to set up monitoring for MySQL database servers using the Percona Monitoring and Management (PMM) platform. PMM is an open-source collection of tools for managing and monitoring MySQL and MongoDB performance. It provides thorough time-based analysis for database servers to ensure that they work as efficiently as possible.

We learned about:

  • The best practices on MySQL monitoring
  • Metrics and time series
  • Data collection, management and visualization tools
  • Monitoring deployment
  • How to use graphs to spot performance issues
  • Query analytics
  • Alerts
  • Trending and capacity planning
  • How to monitor HA

Percona Live Europe TutorialsHands-on ProxySQL

Rene Cannao, ProxySQL

ProxySQL is an open source proxy for MySQL that can provide HA and high performance with no changes in the application, using several built-in features and integration with clustering software. Those were only a few of the features we learned about in this hands-on tutorial.

Percona Live Europe TutorialsMongoDB: Sharded Cluster Tutorial

Jason Terpko, ObjectRocket
Antonios Giannopoulos, ObjectRocket

This tutorial guided us through the many considerations when deploying a sharded cluster. It covered the services that make up a sharded cluster, configuration recommendations for these services, shard key selection, use cases, and how data is managed within a sharded cluster. Maintaining a sharded cluster also has its challenges. We reviewed these challenges and how you can prevent them with proper design or ways to resolve them if they exist today.

Percona Live Europe TutorialsInnoDB Architecture and Performance Optimization

Peter Zaitsev, Percona

InnoDB is the most commonly used storage engine for MySQL and Percona Server for MySQL. It is the focus of most of the storage engine development by the MySQL and Percona Server for MySQL development teams.

In this tutorial, we looked at the InnoDB architecture, including new feature developments for InnoDB in MySQL 5.7 and Percona Server for MySQL 5.7. Peter explained how to use InnoDB in a database environment to get the best application performance and provide specific advice on server configuration, schema design, application architecture and hardware choices.

Peter updated this tutorial from previous versions to cover new MySQL 5.7 and Percona Server for MySQL 5.7 InnoDB features.

Join us tomorrow for the first full day of the Percona Live Open Source Database Conference Europe 2017!

Aug
01
2017
--

Group Replication: the Sweet and the Sour

Group Replication

In this blog, we’ll look at group replication and how it deals with flow control (FC) and replication lag. 

Overview

In the last few months, we had two main actors in the MySQL ecosystem: ProxySQL and Group-Replication (with the evolution to InnoDB Cluster). 

While I have extensively covered the first, my last serious work on Group Replication dates back to some lab version years past.

Given that Oracle decided to declare it GA, and Percona’s decision to provide some level of Group Replication support, I decided it was time for me to take a look at it again.

We’ve seen a lot of coverage already too many Group Replication topics. There are articles about Group Replication and performance, Group Replication and basic functionalities (or lack of it like automatic node provisioning), Group Replication and ProxySQL, and so on.

But one question kept coming up over and over in my mind. If Group Replication and InnoDB Cluster have to work as an alternative to other (virtually) synchronous replication mechanisms, what changes do our customers need to consider if they want to move from one to the other?

Solutions using Galera (like Percona XtraDB Cluster) must take into account a central concept: clusters are data-centric. What matters is the data and the data state. Both must be the same on each node at any given time (commit/apply). To guarantee this, Percona XtraDB Cluster (and other solutions) use a set of data validation and Flow Control processes that work to the ensure a consistent cluster data set on each node.

The upshot of this principle is that an application can query ANY node in a Percona XtraDB Cluster and get the same data, or write to ANY node and know that the data is visible everywhere in the cluster at (virtually) the same time.

Last but not least, inconsistent nodes should be excluded and either rebuild or fixed before rejoining the cluster.

If you think about it, this is very useful. Guaranteeing consistency across nodes allows you to transparently split write/read operations, failover from one node to another with very few issues, and more.

When I conceived of this blog on Group Replication (or InnoDB Cluster), I put myself in the customer shoes. I asked myself: “Aside from all the other things we know (see above), what is the real impact of moving from Percona XtraDB Cluster to Group Replication/InnoDB Cluster for my application? Since Group Replication still (basically) uses replication with binlogs and relaylog, is there also a Flow Control mechanism?” An alarm bell started to ring in my mind.

My answer is: “Let’s do a proof of concept (PoC), and see what is really going on.”

The POC

I setup a simple set of servers using Group Replication with a very basic application performing writes on a single writer node, and (eventually) reads on the other nodes. 

You can find the schema definition here. Mainly I used the four tables from my windmills test suite — nothing special or specifically designed for Group Replication. I’ve used this test a lot for Percona XtraDB Cluster in the past, so was a perfect fit.

Test Definition

The application will do very simple work, and I wanted to test four main cases:

  1. One thread performing one insert at each transaction
  2. One thread performing 50 batched inserts at each transaction
  3. Eight threads performing one insert to each transaction
  4. Eight threads performing 50 batched inserts at each transaction

As you can see, a pretty simple set of operations. Then I decided to test it using the following four conditions on the servers:

  1. One slave worker FC as default
  2. One slave worker FC set to 25
  3. Eight slave workers FC as default
  4. Eight slave workers FC set to 25

Again nothing weird or strange from my point of view. I used four nodes:

  1. Gr1 Writer
  2. Gr2 Reader
  3. Gr3 Reader minimal latency (~10ms)
  4. Gr4 Reader minimal latency (~10ms)

Finally, I had to be sure I measured the lag in a way that allowed me to reference it consistently on all nodes. 

I think we can safely say that the incoming GTID (last_ Received_transaction_set from replication_connection_status) is definitely the last change applied to the master that the slave node knows about. More recent changes could have occurred, but network delay can prevent them from being “received.” The other point of reference is GTID_EXECUTED, which refers to the latest GTID processed on the node itself.

The closest query that can track the distance will be:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-2),':',1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( Received_transaction_set,':',-2),':',1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag

Or in the case of a single worker:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(Received_transaction_set,':',-1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag;

The result will be something like this:

+---------------+
| last_executed |
+---------------+
| 23607         |
+---------------+
+---------------+
| last_received |
+---------------+
| 23607         |
+---------------+
+----------+
| real_lag |
+----------+
|        0 |
+----------+

The whole set of tests can be found here, with all the commands you need to run the application (you can find it here) and replicate the tests. I will focus on the results (otherwise this blog post would be far too long), but I invite you to see the details.

The Results

Efficiency on Writer by Execution Time and Rows/Sec

Using the raw data from the tests (Excel spreadsheet available here), I was interested in identifying if and how the Writer is affected by the use of Group Replication and flow control.

Reviewing the graph, we can see that the Writer has a linear increase in the execution time (when using default flow control) that matches the increase in the load. Nothing there is concerning, and all-in-all we see what is expected if the load is light. The volume of rows at the end justifies the execution time.

It’s a different scenario if we use flow control. The execution time increases significantly in both cases (single worker/multiple workers). In the worst case (eight threads, 50 inserts batch) it becomes four times higher than the same load without flow control.

What happens to the inserted rows? In the application, I traced the rows inserted/sec. It is easy to see what is going on there:

We can see that the Writer with flow control activated inserts less than a third of the rows it processes without flow control. 

We can definitely say that flow control has a significant impact on the Writer performance. To clarify, let’s look at this graph:

Without flow control, the Writer processes a high volume of rows in a limited amount of time (results from the test of eight workers, eight threads, 50 insert batch). With flow control, the situation changes drastically. The Writer takes a long time processing a significantly smaller number of rows/sec. In short, performance drops significantly.

But hey, I’m OK with that if it means having a consistent data-set cross all nodes. In the end, Percona XtraDB Cluster and similar solutions pay a significant performance price match the data-centric principle. 

Let’s see what happen on the other nodes.

Entries Lag

Well, this scenario is not so good:

When NOT using flow control, the nodes lag behind the writer significantly. Remember that by default flow control in Group Replication is set to 25000 entries (I mean 25K of entries!!!).

What happens is that as soon as I put some salt (see load) on the Writer, the slave nodes start to lag. When using the default single worker, that will have a significant impact. While using multiple workers, we see that the lag happens mainly on the node(s) with minimal (10ms) network latency. The sad thing is that is not really going down with respect to the single thread worker, indicating that the simple minimal latency of 10ms is enough to affect replication.

Time to activate the flow control and have no lag:

Unfortunately, this is not the case. As we can see, the lag of single worker remains high for Gr2 (154 entries). While using multiple workers, the Gr3/4 nodes can perform much better, with significantly less lag (but still high at ~1k entries).

It is important to remember that at this time the Writer is processing one-third or less of the rows it is normally able to. It is also important to note that I set 25 to the entry limit in flow control, and the Gr3 (and Gr4) nodes are still lagging more than 1K entries behind.

To clarify, let check the two graphs below:

Using the Writer (Master) as a baseline in entry #N, without flow control, the nodes (slaves) using Group Replication start to significantly lag behind the writer (even with a light load).

The distance in this PoC ranged from very minimal (with 58 entries), up to much higher loads (3849 entries):

Using flow control, the Writer (Master) diverges less, as expected. If it has a significant drop in performance (one-third or less), the nodes still lag. The worst-case is up to 1363 entries. 

I need to underline here that we have no further way (that I am aware of, anyway) to tune the lag and prevent it from happening.

This means an application cannot transparently split writes/reads and expect consistency. The gap is too high.

A Graph That Tells Us a Story

I used Percona Monitoring and Management (PMM) to keep an eye on the nodes while doing the tests. One of the graphs really showed me that Group Replication still has some “limits” as the replication mechanism for a cluster:

This graph shows the MySQL queries executed on all the four nodes, in the testing using 8-50 threads-batch and flow control. 

As you can see, the Gr1 (Writer) is the first one to take off, followed by Gr2. Nodes Gr3 and Gr4 require a bit more, given the binlog transmission (and 10ms delay). Once the data is there, they match (inconsistently) the Gr2 node. This is an effect of flow control asking the Master to slow down. But as previously seen, the nodes will never match the Writer. When the load test is over, the nodes continue to process the queue for additional ~130 seconds. Considering that the whole load takes 420 seconds on the Writer, this means that one-third of the total time on the Writer is spent syncing the slave AFTERWARDS.

The above graph shows the same test without flow control. It is interesting to see how the Writer moved above 300 queries/sec, while G2 stayed around 200 and Gr3/4 far below. The Writer was able to process the whole load in ~120 seconds instead 420, while Gr3/4 continue to process the load for an additional ~360 seconds.

This means that without flow control set, the nodes lag around 360 seconds behind the Master. With flow control set to 25, they lag 130 seconds.

This is a significant gap.

Conclusions

Going back to the reason why I was started this PoC, it looks like my application(s) are not a good fit for Group Replication given that I have set Percona XtraDB Cluster to scale out the reads and efficiently move my writer to another when I need to. 

Group Replication is still based on asynchronous replication (as my colleague Kenny said). It makes sense in many other cases, but it doesn’t compare to solutions based on virtually synchronous replication. It still requires a lot of refinement.

On the other hand, for applications that can afford to have a significant gap between writers and readers it is probably fine. But … doesn’t standard replication already cover that? 

Reviewing the Oracle documentations (https://dev.mysql.com/doc/refman/5.7/en/group-replication-background.html), I can see why Group Replication as part of the InnoDB cluster could help improve high availability when compared to standard replication. 

But I also think it is important to understand that Group Replication (and derived solutions like InnoDB cluster) are not comparable or a replacement for data-centric solutions as Percona XtraDB Cluster. At least up to now.

Good MySQL to everyone.

Jun
23
2017
--

Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication High Availability Webinar: Q & A

High Availability Webinar

High Availability WebinarThank you for attending the Wednesday, June 21, 2017 high availability webinar titled Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication. In this blog, I will provide answers to the Q & A for that webinar.

You can find the slides and a recording of the webinar here.

Is there a minimum MySQL server version for Group Replication?

MySQL Group Replication is GA since MySQL Community 5.7.17. This is the lowest version that you should use for the Group Replication feature. Otherwise, you are using a beta version.

Since 5.7.17 was the GA release, it’s strongly recommended you use the latest 5.7 minor release. Bugs get fixed and features added in each of the minor releases (as can be seen in the Limitations section in the slide deck).

In MySQL 5.6 and earlier versions, Group Replication is not supported. Note that Percona Server for MySQL 5.7.17 and beyond also ships with Group Replication.

Can I use Percona XtraDB Cluster with MariaDB v10.2? or must I use Percona Server for MySQL?

Percona XtraDB Cluster is Percona Server for MySQL and Percona XtraBackup with the modified Galera library. You cannot run Percona XtraDB Cluster on MariaDB.

However, as Percona XtraDB Cluster is open source, it is possible that MariaDB/Codership implements our modifications into their codebase.

If Percona XtraDB Cluster does not allow InnoDB tables, how do we typically deal with applications that need to use MyISAM tables?

You cannot use MyISAM with Percona XtraDB Cluster, Galera or Group Replication. However, there is experimental MyISAM support in Galera/Percona XtraDB Cluster. But we strongly recommend that you don’t use this in production. It effectively executes all statements in Total Order Isolation, which results in bad performance.

What is a typical business use case for the Group Replication? I specifically like the writes order feature.

Typical use cases are:

  • Environments with strict **durability** requirements
  • Write to multiple nodes simultaneously while keeping data **consistent**
  • Reducing failover time
  • Using other nodes for read-scaling, where reading stale data is more difficult for the application (as opposed to standard asynchronous replication)

The use cases for Galera and Percona XtraDB Cluster are similar.

Where do you run ProxySQL, on a separate server? We are using HAProxy.

You can deploy ProxySQL in many different ways. One common method of installation is to run ProxySQL on a separate layer of servers (ensuring there is failover on this layer). Another commonly used method is to run a ProxySQL daemon on every application server.

Do you support KVM?

Yes, there are no limitations on virtualization solutions.

Can you give some examples of an “arbitrator”?

Some useful links:

What does Percona XtraDB add to make it more performant than InnoDB?

The scalability and performance improvement of Percona XtraDB are listed on the Percona Server for MySQL documentation page: https://www.percona.com/doc/percona-server/LATEST/index.html

How scalable is Percona XtraDB Cluster storage wise? Do we have any limitations?

Storage happens through the storage engine (which is InnoDB). Percona XtraDB Cluster does not have any different limitations than Percona Server for MySQL or MySQL.

However, we need to also consider the practical side of things: the larger the cluster gets, the longer certain operations take. For example, when adding a new node to the cluster another node must be the donor and provide all the data. This will take substantially longer with larger datasets. Certain operational aspects might therefore become more complex.

Is there any development to add multiple nodes simultaneously?

No, at the moment only one node can join the cluster at the same time. Other nodes automatically wait until it is finished before joining.

Why does Galera say we cannot use READ COMMITTED isolation for multimaster mode, even though we can start the cluster with READ-COMMITTED?

You can use READ-COMMITTED as transaction isolation level. The limitation is that you cannot use SERIALIZABLE: http://galeracluster.com/documentation-webpages/isolationlevels.html.

Galera Cluster and MariaDB currently do not prevent a user from using this transaction isolation level. Percona XtraDB Cluster implemented the strict mode to prevent these operations: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html#explicit-table-locking

MariaDB 10.2 fixed the check constraints issue, When will Percona fix this issue?

There are currently no plans to support CHECK constraints in Percona Server for MySQL (and therefore Percona XtraDB Cluster as well).

As Percona Server is effectively a fully backwards-compatible (but modified) MySQL Community Server, CHECK constraints is a feature that normally would be implemented in MySQL Community first.

Can you share your performance benchmark git repository (if you have one)?

We don’t have a performance benchmark in git repository. You can get detailed information about this benchmark in this blog: Performance improvements in Percona XtraDB Cluster 5.7.17-29.20.

On your slide pointing to scalability charts, how many nodes did you run your test against?

We used a three-node cluster for this performance benchmark.

The product is using Master-Master replication. As such what do you mean when you talk about failover in such configuration?
Where do you maintain the cluster state?

All technologies automatically maintain the cluster state as you add and remove nodes.

What are the network/IP requirements for Proxy SQL?

There are no specific requirements. More documentation about ProxySQL can be found here: https://github.com/sysown/proxysql/wiki.

Jun
20
2017
--

The MySQL High Availability Landscape in 2017 (The Elders)

High Availability

In this blog, we’ll look at different MySQL high availability options.

The dynamic MySQL ecosystem is rapidly evolving many technologies built around MySQL. This is especially true for the technologies involved with the high availability (HA) aspects of MySQL. When I joined Percona back in 2009, some of these HA technologies were very popular – but have since been almost forgotten. During the same interval, new technologies have emerged. In order to give some perspective to the reader, and hopefully help to make better choices, I’ll review the MySQL HA landscape as it is in 2017. This review will be in three parts. The first part (this post) will cover the technologies that have been around for a long time: the elders. The second part will focus on the technologies that are very popular today: the adults. Finally, the last part will try to extrapolate which technologies could become popular in the upcoming years: the babies.

Quick disclaimer, I am reporting on the technologies I see the most. There are likely many other solutions not covered here, but I can’t talk about technologies I have barely or never used. Apart from the RDS-related technologies, all the technologies covered are open-source. The target audience for this post are people relatively new to MySQL.

The Elders

Let’s define the technologies in the elders group. These are technologies that anyone involved with MySQL for last ten years is sure to be aware of. I could have called this group the “classics”.  I include the following technologies in this group:

  • Replication
  • Shared storage
  • NDB cluster

Let’s review these technologies in the following sections.

Replication

Simple replication topology

 

MySQL replication is very well known. It is one of the main features behind the wide adoption of MySQL. Replication gets used almost everywhere. The reasons for that are numerous:

  • Replication is simple to setup. There are tons of how-to guides and scripts available to add a slave to a MySQL server. With Amazon RDS, adding a slave is just a few clicks.
  • Slaves allow you to easily scale reads. The slaves are accessible and can be used for reads. This is the most common way of scaling up a MySQL database.
  • Slaves have little impact on the master. Apart from the added network traffic, the presence of slaves does not impact the master performance significantly.
  • It is well known. No surprises here.
  • Used for failover. Your master died, promote a slave and use it as your new master.
  • Used for backups. You don’t want to overload your master with the backups, run them off a slave.

Of course, replication also has some issues:

  • Replication can lag. Replication used to be single-threaded. That means a master with a concurrent load could easily outpace a slave. MySQL 5.6 and MariaDB 10.0 have introduced some parallelism to the slave. Newer versions have further improved to a point where today’s slaves are many times faster than they were.
  • Slaves can diverge. When you modify data on the master, the slave must perform the exact same update. That seems easy, but there are many ways an update can be non-deterministic with statement-based replication. They fixed many issues, and the introduction of row-based replication has been another big step forward. Still, if you write directly to a slave you are asking for trouble. There is a read_only setting, but if the MySQL user has the “SUPER” privilege it is just ignored. That’s why there is now the “super_read_only” setting. Tools like pt-table-checksum and pt-table-sync from the Percona toolkit exist to solve this problem.
  • Replication can impact the master. I wrote above that the presence of slaves does not affect the master, but logging changes are more problematic. The most common issue is the InnoDB table-level locking for auto_increment values with statement-based replication. Only one thread can insert new rows at a time. You can avoid this issue with row-based replication and properly configuring settings.
  • Data gets lost. Replication is asynchronous. That means the master will reply “done” after a commit statement even though the slaves have not received updates yet. Some transactions can get lost if the master crashes.

Although an old technology, a lot of work has been done on replication. It is miles away from the replication implementation of 5.0.x. Here’s a list, likely incomplete, of the evolution of replication:

  • Row based replication (since 5.1). The binary internal representation of the rows is sent instead of the SQL statements. This makes replication more robust against slave divergence.
  • Global transaction ID (since 5.6). Transactions are uniquely identified. Replication can be setup without knowing the binlog file and offset.
  • Checksum (since 5.6). Binlog events have checksum values to validate their integrity.
  • Semi-sync replication (since 5.5). An addition to the replication protocol to make the master aware of the reception of events by the slaves. This helps to avoid losing data when a master crashes.
  • Multi-source replication (since 5.7). Allows a slave to have more than one master.
  • Multi-threaded replication (since 5.6). Allows a slave to use multiple threads. This helps to limit the slave lag.

Managing replication is a tedious job. The community has written many tools to manage replication:

  • MMM. An old Perl tool that used to be quite popular, but had many issues. Now rarely used.
  • MHA. The most popular tool to manage replication. It excels at reconfiguring replication without losing data, and does a decent at handling failover.  It is also simple. No wonder it is popular.
  • PRM. A Pacemaker-based solution developed to replace MMM. It’s quite good at failover, but not as good as MHA at reconfiguring replication. It’s also quite complex, thanks to Pacemaker. Not used much.
  • Orchestrator. The new cool tool. It can manage complex topologies and has a nice web-based interface to monitor and control the topology.

 

Shared Storage

Simple shared storage topology

 

Back when I was working for MySQL ten years ago, shared storage HA setups were very common. A shared storage HA cluster uses one copy of the database files between one of two servers. One server is active, the other one is passive. In order to be shared, the database files reside on a device that can be mounted by both servers. The device can be physical (like a SAN), or logical (like a Linux DRBD device). On top of that, you need a cluster manager (like Pacemaker) to handle the resources and failovers. This solution is very popular because it allows for failover without losing any transactions.

The main drawback of this setup is the need for an idle standby server. The standby server cannot have any other assigned duties since it must always be ready to take over the MySQL server. A shared storage solution is also obviously not resilient to file-level corruption (but that situation is exceptional). Finally, it doesn’t play well with a cloud-based environment.

Today, newly-deployed shared storage HA setups are rare. The only ones I encountered over the last year were either old implementations needing support, or new setups that deployed because of existing corporate technology stacks. That should tell you about the technology’s loss of popularity.

NDB Cluster

A simple NDB Cluster topology

 

An NDB Cluster is a distributed clustering solution that has been around for a long time. I personally started working with this technology back in 2008. An NDB Cluster has three types of nodes: SQL, management and data. A full HA cluster requires a minimum of four nodes.

An NDB Cluster is not a general purpose database due to its distributed nature. For suitable workloads, it is extraordinary good. For unsuitable workloads, it is miserable. A suitable workload for an NDB Cluster contains high concurrency, with a high rate of small primary key oriented transactions. Reaching one million trx/s on an NDB Cluster is nothing exceptional.

At the other end of the spectrum, a poor workload for an NDB Cluster is a single-threaded report query on a star-like schema. I have seen some extreme cases where just the network time of a reporting query amounted to more than 20 minutes.

Although NDB Clusters have improved, and are still improving, their usage has been pushed toward niche-type applications. Overall, the technology is losing ground and is now mostly used for Telcos and online gaming applications.

Jun
19
2017
--

Upcoming HA Webinar Wed 6/21: Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication

High Availability

High AvailabilityJoin Percona’s MySQL Practice Manager Kenny Gryp and QA Engineer, Ramesh Sivaraman as they present a high availability webinar around Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication on Wednesday, June 21, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

What are the implementation differences between Percona XtraDB Cluster 5.7, Galera Cluster 5.7 and MySQL Group Replication?

  • How do they work?
  • How do they behave differently?
  • Do these methods have any major issues?

This webinar will describe the differences and shed some light on how QA is done for each of the different technologies.

Register for the webinar here.

High AvailabilityRamesh Sivaraman, QA Engineer

Ramesh joined the Percona QA Team in March 2014. He has almost six years of experience in database administration and, before joining Percona, was giving MySQL database support to various service and product based internet companies. Ramesh’s professional interests include writing shell/Perl script to automate routine tasks and new technology. Ramesh lives in Kerala, the southern part of India, close to his family.

High AvailabilityKenny Gryp, MySQL Practice Manager

Kenny is currently MySQL Practice Manager at Percona.

May
16
2017
--

Percona Live Open Source Database Conference 2017 Slides and Videos Available

Percona Live

Percona LiveThe slides and videos from the Percona Live Open Source Database Conference 2017 are available for viewing and download. The videos and slides cover the keynotes, breakout sessions and MySQL and MongoDB 101 sessions.

To view slides, go to the Percona Live agenda, and select the talk you want slides for from the schedule, and click through to the talk web page. The slides are available below the talk description. There is also a page with all the slides that is searchable by topic, talk title, speaker, company or keywords.

To view videos, go to the Percona Live 2017 video page. The available videos are searchable by topic, talk title, speaker, company or keywords.

There are a few slides and videos outstanding due to unforeseen circumstances. However, we will upload those as they become available.

Some examples of videos and slide decks from the Percona Live conference:

MongoDB 101: Efficient CRUD Queries in MongoDB
Adamo Tonete, Senior Technical Engineer, Percona
Video: https://www.percona.com/live/17/content/efficient-crud-queries-mongodb
Slides: https://www.percona.com/live/17/sessions/efficient-crud-queries-mongodb

MySQL 101: Choosing a MySQL High Availability Solution
Marcos Albe, Principal Technical Services Engineer, Percona
Video: https://www.percona.com/live/17/content/choosing-mysql-high-availability-solution
Slides: https://www.percona.com/live/17/sessions/choosing-mysql-high-availability-solution

Breakout Session: Using the MySQL Document Store
Mike Zinner, Sr. Software Development Director and Alfredo Kojima, Sr. Software Development Manager, Oracle
Video: https://www.percona.com/live/17/content/using-mysql-document-store
Slides: https://www.percona.com/live/17/sessions/using-mysql-document-store

Keynote: Continuent is Back! But What Does Continuent Do Anyway?
Eero Teerikorpi, Founder and CEO and MC Brown, VP Products, Continuent
Video: https://www.percona.com/live/17/content/continuent-back-what-does-continuent-do-anyway
Slides: https://www.percona.com/live/17/sessions/continuent-back-what-does-continuent-do-anyway

Please let us know if you have any issues. Enjoy the videos!

Percona Live Europe 2017
Percona Live Europe 2017: Dublin, Ireland!

This year’s Percona Live Europe will take place September 25th-27th, 2017, in Dublin, Ireland. Put it on your calendar now! Information on speakers, talks, sponsorship and registration will be available in the coming months.

We have developed multiple sponsorship options to allow participation at a level that best meets your partnering needs. Our goal is to create a significant opportunity for our partners to interact with Percona customers, other partners and community members. Sponsorship opportunities are available for Percona Live Europe 2017.

Download a prospectus here.

We look forward to seeing you there!

Apr
19
2017
--

Performance improvements in Percona XtraDB Cluster 5.7.17-29.20

Percona XtraDB Cluster

In our latest release of Percona XtraDB Cluster, we’ve introduced major performance improvements to the MySQLwrite-set replication layer. In this post, we want to show what these improvements look like.

For the test, we used the sysbench OLTP_RW, UPDATE_KEY and UPDATE_NOKEY workloads with 100 tables, 4mln rows each, which gives about 100GB of datasize. In all the tests we use a three-node setup, connected via a 10GB network, with the sysbench load directed to the one primary node.

In the first chart, we show improvements comparing to the previous version (5.7.16):

Percona XtraDB Cluster

The main improvements come from concurrent workloads, under multiple threads.

The previous chart is for cases using enabled binary logs, but in some situations we will have deployments without binary logs enabled (Percona XtraDB Cluster does not require them). The latest release significantly improves performance for this case as well.

Here is a chart showing throughput without binary logs:

Percona XtraDB Cluster

Where does Percona XtraDB Cluster place in comparison with similar technologies? To find out, we’ll compare this release with MySQL 5.7.17 Group Replication and with the recently released MariaDB 10.2.5 RC.

For MySQL 5.7.17 Group Replication, I’ve reviewed two cases: “durable” with sync_binlog=1, and “relaxed durability” with sync_binlog=0.

Also for MySQL 5.7.17 Group Replication, we want to review two cases with different flow_control settings. The first setting is flow_control=25000 (the default setting). It provides better performance, but with the drawbacks that non-primary nodes will fall behind significantly and MySQL Group Replication does not provide a way to protect from reading stale data. So with a default flow_control=25000, we risk reading very outdated data. We also tested MySQL Group Replication with flow_control=1000 to minimize stale data on non-primary nodes.

A note on the Flow Control topic: it is worth mentioning that we also changed the flow_control default for Percona XtraDB Cluster. The default value is 100 instead of 16 (as in version 5.7.16).

Comparison chart with sync_binlog=1 (for MySQL Group Replication):

Percona XtraDB Cluster

Comparison chart with sync_binlog=0 (for MySQL Group Replication):

Percona XtraDB Cluster

So there are couple conclusions we can make out of these charts.

  1. The new version of Percona XtraDB Cluster performs on the level with MySQL Group Replication
  2. flow_control for MySQl Group Replication really makes a difference for performance, and default flow_control=25000 is better (with the risk of a lot of outdated data on non-primary nodes)

The reference our benchmark files and config files are here.

Mar
23
2017
--

Percona Live Featured Session with Luís Soares: The New MySQL Replication Features in MySQL 8

Percona Live Featured Session

Percona Live Featured SessionWelcome to another post in the series of Percona Live featured session blogs! In these blogs, we’ll highlight some of the session speakers that will be at this year’s Percona Live conference. We’ll also discuss how these sessions can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured session, we’ll meet Luís Soares, Principal Software Engineer at Oracle. His session is The New MySQL Replication Features in MySQL 8 (with fellow presenter Lars Thalmann, Development Director at Oracle). The most popular high availability (HA) techniques deployed are based on making services redundant, in particular by means of replication. This fits quite naturally in the MySQL universe, as MySQL server has provided a mature replication solution for over a decade now. Moreover, the new replication developments (and their roadmap) show that MySQL is also catering for the requirements posed by popular environments such as the cloud.

I had a chance to speak with Luís about MySQL 8.0 replication:

Percona: How did you get into database technology? What do you love about it?

Luis SoaresLuís: My background is in distributed systems, particularly in database replication, message passing technologies and fault-tolerance. It all started while I was taking my BSc in computer science. As I finished it, I felt very much drawn towards these subjects. That alone made me enroll in a Master’s course that allowed me to focus almost full time on database replication using group communication technologies. I continued to study this field for years, and further deepened my knowledge on this area. That was great fun and quite a learning experience!

Years went by, and eventually I ended up working at MySQL on the replication team. This happened after I came to a MySQL user conference to present some of the work that I was doing at the time.

These are very fond memories! But I digress!

Back to the point. In general, the thing I love about working on database replication is that I am constantly facing new and interesting problems. Data replication in itself is hard. Add to that the semantics and requirements of a database server, and complexity increases quite a bit. Also, building a generic database replication service that fits in a large set of use cases requires a lot of discipline and careful thinking when designing new features. And let’s not forget the Web itself, which is constantly changing. New technologies come and go at a fast pace. The volume of data that has to be handled, year after year, increases considerably. This poses scalability and integration challenges that need to be addressed.

All in all, these are very exciting times to work with high availability, data replication and data integration.

Now specifically about MySQL, I love the fact that I work on a popular database technology that embraced replication very early in its life cycle. Replication awareness runs deep in the product and in its ecosystem. Consequently, MySQL has an extensive user base exploring many different use case scenarios around replication. And this is extremely motivating, rewarding and exciting. I can honestly say that my day-to-day work is never boring!

Percona: Your talk is called The New MySQL Replication Features in MySQL 8. What are the key replication features in MySQL 8.0, and why are they important?

Luís: It was a huge amount of work to get the MySQL Group Replication plugin out with MySQL 5.7.17. Group Replication is a new plugin that gives the user some replication nice properties by resorting to group communication and state machine replication. This makes the system able to protect data against split brain situations, enables fault-tolerance and high availability and provides coordination between servers committing transactions that change the data.

In addition to Group Replication, the team has also invested quite a bit on core replication features. Some of these features were already released, and others will be released at some point in time in a MySQL DMR.

In the first 8.0 DMR (MySQL 8.0.0) replication has better instrumentation for row-based replication. The user can observe the row-based replication applier progress by querying performance schema tables. There is also an enhanced global transaction identifier.

GTIDs history management, as the user can set the variable GTID_PURGED in scenarios other than those where the server has an empty GTID execution history. And the user can now specify the stop condition when starting the relay log applier, even if there are multiple applier threads started.

All these features combined are of great help, since they reduce operations overhead through automation, better observability and coordination between servers.

Work continues on many fronts: performance, availability, scalability, efficiency and observability. Stay tuned!

Percona: How do these features make DBAs lives easier? What problems do they solve?

Luís: As mentioned above, the features in MySQL 8.0.0 take some of the operations burden from the DBA. Moreover, they allow the user to better observe what is happening inside the replication pipeline. This alone is quite interesting, since DBAs need to make decisions both when designing new deployments and when tackling issues, possibly having to meet very tight deadlines.

Simply put, these features will help DBAs to diagnose and fix problems faster.

Percona: What do you want attendees to take away from your session? Why should they attend?

Luís: Our session is primarily about the shiny new replication features already in MySQL 8. This is the first takeaway. To know, first hand, what is in MySQL 8 replication-wise. But there is another takeaway, and quite an interesting one. Part of the session is dedicated to presenting the overall ideas around MySQL replication. So attendees will get an overview of the roadmap, and will be able to participate and provide feedback along the way. They will learn more about the big picture, and we will bring together some of the hot MySQL technologies that we keep hearing about nowadays: Group Replication, InnoDB Clusters, Multi-Threaded Replication and more!

It will be fun.

Percona: What are you most looking forward to at Percona Live 2017?

Luís: As a conference participant, I look forward to doing some networking with the vibrant MySQL community. I must say, that I really enjoy engaging in nice technical discussions about my favorite topics: fault-tolerance, replication, dependability and distributed systems overall. The conference gives me a great opportunity to do this.

As a MySQL developer, and one that has been developing MySQL replication for quite some time now, I look forward to talking about the recent work that my team has done and getting all the feedback I can.

As a bystander, conferences like Percona Live make me realize how much MySQL has grown, and how much it has evolved. Replication, for instance, has had so many interesting features, release after release over the last eight or nine years. The community has embraced and deployed them, often worked/interacted with the developers to improve them by providing feedback, feature requests or contributions. And this means that they are part of the story too!

These conferences are always a great learning experience! After spending a week with the MySQL community, I always feel refreshed, energized, extra motivated and with lots of food for thought when I get back home.

Go MySQL!

Register for Percona Live Data Performance Conference 2017, and see Luís present his session on The New MySQL Replication Features in MySQL 8 (with fellow presenter Lars Thalmann, Development Director at Oracle). Use the code FeaturedTalk and receive $100 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community, as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Feb
24
2017
--

Quest for Better Replication in MySQL: Galera vs. Group Replication

Group Replication

Group ReplicationUPDATE: Some of the language in the original post was considered overly-critical of Oracle by some community members. This was not my intent, and I’ve modified the language to be less so. I’ve also changed term “synchronous” (which the use of is inaccurate and misleading) to “virtually synchronous.” This term is more accurate and already used by both technologies’ founders, and should be less misleading.

I also wanted to thank Jean-François Gagné for pointing out the incorrect sentence about multi-threaded slaves in Group Replication, which I also corrected accordingly.

In today’s blog post, I will briefly compare two major virtually synchronous replication technologies available today for MySQL.

More Than Asynchronous Replication

Thanks to the Galera plugin, founded by the Codership team, we’ve had the choice between asynchronous and virtually synchronous replication in the MySQL ecosystem for quite a few years already. Moreover, we can choose between at least three software providers: Codership, MariaDB and Percona, each with its own Galera implementation.

The situation recently became much more interesting when MySQL Group Replication went into GA (stable) stage in December 2016.

Oracle, the upstream MySQL provider, introduced its own replication implementation that is very similar in concept. Unlike the others mentioned above, it isn’t based on Galera. Group Replication was built from the ground up as a new solution. MySQL Group Replication shares many very similar concepts to Galera. This post doesn’t cover MySQL Cluster, another and fully-synchronous solution, that existed much earlier then Galera — it is a much different solution for different use cases.

In this post, I will point out a couple of interesting differences between Group Replication and Galera, which hopefully will be helpful to those considering switching from one to another (or if they are planning to test them).

This is certainly not a full list of all the differences, but rather things I found interesting during my explorations.

It is also important to know that Group Replication has evolved a lot before it went GA (its whole cluster layer was replaced). I won’t mention how things looked before the GA stage, and will just concentrate on latest available 5.7.17 version. I will not spend too much time on how Galera implementations looked in the past, and will use Percona XtraDB Cluster 5.7 as a reference.

Multi-Master vs. Master-Slave

Galera has always been multi-master by default, so it does not matter to which node you write. Many users use a single writer due to workload specifics and multi-master limitations, but Galera has no single master mode per se.

Group Replication, on the other hand, promotes just one member as primary (master) by default, and other members are put into read-only mode automatically. This is what happens if we try to change data on non-master node:

mysql> truncate test.t1;
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

To change from single primary mode to multi-primary (multi-master), you have to start group replication with the 

group_replication_single_primary_mode

variable disabled.
Another interesting fact is you do not have any influence on which cluster member will be the master in single primary mode: the cluster auto-elects it. You can only check it with a query:

mysql> SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME like 'group_replication%';
+----------------------------------+--------------------------------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 329333cd-d6d9-11e6-bdd2-0242ac130002 |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

Or just:

mysql> show status like 'group%';
+----------------------------------+--------------------------------------+
| Variable_name                    | Value                                |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 329333cd-d6d9-11e6-bdd2-0242ac130002 |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

To show the hostname instead of UUID, here:

mysql> select member_host as "primary master" from performance_schema.global_status join performance_schema.replication_group_members where variable_name='group_replication_primary_member' and member_id=variable_value;
+----------------+
| primary master |
+----------------+
| f18ff539956d   |
+----------------+
1 row in set (0.00 sec)

Replication: Majority vs. All

Galera delivers write transactions synchronously to ALL nodes in the cluster. (Later, applying happens asynchronously in both technologies.) However, Group Replication needs just a majority of the nodes confirming the transaction. This means a transaction commit on the writer succeeds and returns to the client even if a minority of nodes still have not received it.

In the example of a three-node cluster, if one node crashes or loses the network connection, the two others continue to accept writes (or just the primary node in Single-Primary mode) even before a faulty node is removed from the cluster.

If the separated node is the primary one, it denies writes due to the lack of a quorum (it will report the error

ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.

). If one of the nodes receives a quorum, it will be elected to primary after the faulty node is removed from the cluster, and will then accept writes.

With that said, the “majority” rule in Group Replication means that there isn’t a guarantee that you won’t lose any data if the majority nodes are lost. There is a chance these could apply some transactions that aren’t delivered to the minority at the moment they crash.

In Galera, a single node network interruption makes the others wait for it, and pending writes can be committed once either the connection is restored or the faulty node removed from cluster after the timeout. So the chance of losing data in a similar scenario is lower, as transactions always reach all nodes. Data can be lost in Percona XtraDB Cluster only in a really bad luck scenario: a network split happens, the remaining majority of nodes form a quorum, the cluster reconfigures and allows new writes, and then shortly after the majority part is damaged.

Schema Requirements

For both technologies, one of the requirements is that all tables must be InnoDB and have a primary key. This requirement is now enforced by default in both Group Replication and Percona XtraDB Cluster 5.7. Let’s look at the differences.

Percona XtraDB Cluster:

mysql> create table nopk (a char(10));
Query OK, 0 rows affected (0.08 sec)
mysql> insert into nopk values ("aaa");
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits use of DML command on a table (test.nopk) without an explicit primary key with pxc_strict_mode = ENFORCING or MASTER
mysql> create table m1 (id int primary key) engine=myisam;
Query OK, 0 rows affected (0.02 sec)
mysql> insert into m1 values(1);
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits use of DML command on a table (test.m1) that resides in non-transactional storage engine with pxc_strict_mode = ENFORCING or MASTER
mysql> set global pxc_strict_mode=0;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into nopk values ("aaa");
Query OK, 1 row affected (0.00 sec)
mysql> insert into m1 values(1);
Query OK, 1 row affected (0.00 sec)

Before Percona XtraDB Cluster 5.7 (or in other Galera implementations), there were no such enforced restrictions. Users unaware of these requirements often ended up with problems.

Group Replication:

mysql> create table nopk (a char(10));
Query OK, 0 rows affected (0.04 sec)
mysql> insert into nopk values ("aaa");
ERROR 3098 (HY000): The table does not comply with the requirements by an external plugin.
2017-01-15T22:48:25.241119Z 139 [ERROR] Plugin group_replication reported: 'Table nopk does not have any PRIMARY KEY. This is not compatible with Group Replication'
mysql> create table m1 (id int primary key) engine=myisam;
ERROR 3161 (HY000): Storage engine MyISAM is disabled (Table creation is disallowed).

I am not aware of any way to disable these restrictions in Group Replication.

GTID

Galera has it’s own Global Transaction ID, which has existed since MySQL 5.5, and is independent from MySQL’s GTID feature introduced in MySQL 5.6. If MySQL’s GTID is enabled on a Galera-based cluster, both numerations exist with their own sequences and UUIDs.

Group Replication is based on a native MySQL GTID feature, and relies on it. Interestingly, a separate sequence block range (initially 1M) is pre-assigned for each cluster member.

WAN Support

The MySQL Group Replication documentation isn’t very optimistic on WAN support, claiming that both “Low latency, high bandwidth network connections are a requirement” and “Group Replication is designed to be deployed in a cluster environment where server instances are very close to each other, and is impacted by both network latency as well as network bandwidth.” These statements are found here and here. However there is network traffic optimization: Message Compression.

I don’t see group communication level tunings available yet, as we find in the Galera evs.* series of

wsrep_provider_options

.

Galera founders actually encourage trying it in geo-distributed environments, and some WAN-dedicated settings are available (the most important being WAN segments).

But both technologies need a reliable network for good performance.

State Transfers

Galera has two types of state transfers that allow syncing data to nodes when needed: incremental (IST) and full (SST). Incremental is used when a node has been out of a cluster for some time, and once it rejoins the other nodes has the missing write sets still in Galera cache. Full SST is helpful if incremental is not possible, especially when a new node is added to the cluster. SST automatically provisions the node with fresh data taken as a snapshot from one of the running nodes (donor). The most common SST method is using Percona XtraBackup, which takes a fast and non-blocking binary data snapshot (hot backup).

In Group Replication, state transfers are fully based on binary logs with GTID positions. If there is no donor with all of the binary logs (included the ones for new nodes), a DBA has to first provision the new node with initial data snapshot. Otherwise, the joiner will fail with a very familiar error:

2017-01-16T23:01:40.517372Z 50 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236

The official documentation mentions that provisioning the node before adding it to the cluster may speed up joining (the recovery stage). Another difference is that in the case of state transfer failure, a Galera joiner will abort after the first try, and will shutdown its mysqld instance. The Group Replication joiner will then fall-back to another donor in an attempt to succeed. Here I found something slightly annoying: if no donor can satisfy joiner demands, it will still keep trying the same donors over and over, for a fixed number of attempts:

[root@cd81c1dadb18 /]# grep 'Attempt' /var/log/mysqld.log |tail
2017-01-16T22:57:38.329541Z 12 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2017-01-16T22:57:38.539984Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
2017-01-16T22:57:38.806862Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 3/10'
2017-01-16T22:58:39.024568Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 4/10'
2017-01-16T22:58:39.249039Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 5/10'
2017-01-16T22:59:39.503086Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 6/10'
2017-01-16T22:59:39.736605Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 7/10'
2017-01-16T23:00:39.981073Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 8/10'
2017-01-16T23:00:40.176729Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 9/10'
2017-01-16T23:01:40.404785Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 10/10'

After the last try, even though it fails, mysqld keeps running and allows client connections…

Auto Increment Settings

Galera adjusts the auto_increment_increment and auto_increment_offset values according to the number of members in a cluster. So, for a 3-node cluster,

auto_increment_increment

  will be “3” and

auto_increment_offset

  from “1” to “3” (depending on the node). If a number of nodes change later, these are updated immediately. This feature can be disabled using the 

wsrep_auto_increment_control

 setting. If needed, these settings can be set manually.

Interestingly, in Group Replication the

auto_increment_increment

 seems to be fixed at 7, and only

auto_increment_offset

 is set differently on each node. This is the case even in the default Single-Primary mode! this seems like a waste of available IDs, so make sure that you adjust the

group_replication_auto_increment_increment

 setting to a saner number before you start using Group Replication in production.

Multi-Threaded Slave Side Applying

Galera developed its own multi-threaded slave feature, even in 5.5 versions, for workloads that include tables in the same database. It is controlled with the  wsrep_slave_threads variable. Group Replication uses a feature introduced in MySQL 5.7, where the number of applier threads is controlled with slave_parallel_workers. Galera will do multi-threaded replication based on potential conflicts of changed/locked rows. Group Replication parallelism is based on an improved LOGICAL_CLOCK scheduler, which uses information from writesets dependencies. This can allow it to achieve much better results than in normal asynchronous replication MTS mode. More details can be found here: http://mysqlhighavailability.com/zooming-in-on-group-replication-performance/

Flow Control

Both technologies use a technique to throttle writes when nodes are slow in applying them. Interestingly, the default size of the allowed applier queue in both is much different:

Moreover, Group Replication provides separate certifier queue size, also eligible for the Flow Control trigger:

 group_replication_flow_control_certifier_threshold

. One thing I found difficult, is checking the actual applier queue size, as the only exposed one via performance_schema.replication_group_member_stats is the

Count_Transactions_in_queue

 (which only shows the certifier queue).

Network Hiccup/Partition Handling

In Galera, when the network connection between nodes is lost, those who still have a quorum will form a new cluster view. Those who lost a quorum keep trying to re-connect to the primary component. Once the connection is restored, separated nodes will sync back using IST and rejoin the cluster automatically.

This doesn’t seem to be the case for Group Replication. Separated nodes that lose the quorum will be expelled from the cluster, and won’t join back automatically once the network connection is restored. In its error log we can see:

2017-01-17T11:12:18.562305Z 0 [ERROR] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.'
2017-01-17T11:12:18.631225Z 0 [Note] Plugin group_replication reported: 'getstart group_id ce427319'
2017-01-17T11:12:21.735374Z 0 [Note] Plugin group_replication reported: 'state 4330 action xa_terminate'
2017-01-17T11:12:21.735519Z 0 [Note] Plugin group_replication reported: 'new state x_start'
2017-01-17T11:12:21.735527Z 0 [Note] Plugin group_replication reported: 'state 4257 action xa_exit'
2017-01-17T11:12:21.735553Z 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
2017-01-17T11:12:21.735558Z 0 [Note] Plugin group_replication reported: 'new state x_start'

Its status changes to:

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 329333cd-d6d9-11e6-bdd2-0242ac130002 | f18ff539956d | 3306 | ERROR |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
1 row in set (0.00 sec)

It seems the only way to bring it back into the cluster is to manually restart Group Replication:

mysql> START GROUP_REPLICATION;
ERROR 3093 (HY000): The START GROUP_REPLICATION command failed since the group is already running.
mysql> STOP GROUP_REPLICATION;
Query OK, 0 rows affected (5.00 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (1.96 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 24d6ef6f-dc3f-11e6-abfa-0242ac130004 | cd81c1dadb18 | 3306 | ONLINE |
| group_replication_applier | 329333cd-d6d9-11e6-bdd2-0242ac130002 | f18ff539956d | 3306 | ONLINE |
| group_replication_applier | ae148d90-d6da-11e6-897e-0242ac130003 | 0af7a73f4d6b | 3306 | ONLINE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
3 rows in set (0.00 sec

Note that in the above output, after the network failure, Group Replication did not stop. It waits in an error state. Moreover, in Group Replication a partitioned node keeps serving dirty reads as if nothing happened (for non-super users):

cd81c1dadb18 {test} ((none)) > SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 24d6ef6f-dc3f-11e6-abfa-0242ac130004 | cd81c1dadb18 | 3306 | ERROR |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
1 row in set (0.00 sec)
cd81c1dadb18 {test} ((none)) > select * from test1.t1;
+----+-------+
| id | a |
+----+-------+
| 1 | dasda |
| 3 | dasda |
+----+-------+
2 rows in set (0.00 sec)
cd81c1dadb18 {test} ((none)) > show grants;
+-------------------------------------------------------------------------------+
| Grants for test@% |
+-------------------------------------------------------------------------------+
| GRANT SELECT, INSERT, UPDATE, DELETE, REPLICATION CLIENT ON *.* TO 'test'@'%' |
+-------------------------------------------------------------------------------+
1 row in set (0.00 sec)

A privileged user can disable

super_read_only

, but then it won’t be able to write:

cd81c1dadb18 {root} ((none)) > insert into test1.t1 set a="split brain";
ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.
cd81c1dadb18 {root} ((none)) > select * from test1.t1;
+----+-------+
| id | a |
+----+-------+
| 1 | dasda |
| 3 | dasda |
+----+-------+
2 rows in set (0.00 sec)

I found an interesting thing here, which I consider to be a bug. In this case, a partitioned node can actually perform DDL, despite the error:

cd81c1dadb18 {root} ((none)) > show tables in test1;
+-----------------+
| Tables_in_test1 |
+-----------------+
| nopk |
| t1 |
+-----------------+
2 rows in set (0.01 sec)
cd81c1dadb18 {root} ((none)) > create table test1.split_brain (id int primary key);
ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.
cd81c1dadb18 {root} ((none)) > show tables in test1;
+-----------------+
| Tables_in_test1 |
+-----------------+
| nopk |
| split_brain |
| t1 |
+-----------------+
3 rows in set (0.00 sec)

In a Galera-based cluster, you are automatically protected from that, and a partitioned node refuses to allow both reads and writes. It throws an error: 

ERROR 1047 (08S01): WSREP has not yet prepared node for application use

. You can force dirty reads using the 

wsrep_dirty_reads

 variable.

There many more subtle (and less subtle) differences between these technologies – but this blog post is long enough already. Maybe next time ?

Article with Similar Subject

http://lefred.be/content/group-replication-vs-galera/

Feb
15
2017
--

Group Replication: Shipped Too Early

Group Replication

Group ReplicationThis blog post is my overview of Group Replication technology.

With Oracle clearly entering the “open source high availability solutions” arena with the release of their brand new Group Replication solution, I believe it is time to review the quality of the first GA (production ready) release.

TL;DR: Having examined the technology, it is my conclusion that Oracle seems to have released the GA version of Group Replication too early. While the product is definitely “working prototype” quality, the release seems rushed and unfinished. I found a significant number of issues, and I would personally not recommend it for production use.

It is obvious that Oracle is trying hard to ship technology to compete with Percona XtraDB Cluster, which is probably why they rushed to claim Group Replication GA quality.

If you’re all set to follow along and test Group Replication yourself, simplify the initial setup by using this Docker image. We can review some of the issues you might face together.

For the record, I tested the version based on MySQL 5.7.17 release.

No automatic provisioning

First off, the first thing you’ll find is there is NO way to automatically setup of a new node.

If you need to setup new node or recover an existing node from a fatal failure, you’ll need to manually provision the slave.

Of course, you can clone a slave using Percona XtraBackup or LVM by employing some self-developed scripts. But given the high availability nature of the product, one would expect Group Replication to automatically re-provision any failed node.

Bug: stale reads on nodes

Please see this bug:

One line summary: while any secondary nodes are “catching up” to whatever happened on a first node (it takes time to apply changes on secondary nodes), reads on a secondary node could return stale data (as shown in the bug report).

This behavior brings us back to the traditional asynchronous replication slave behavior (i.e., Group Replication’s predecessor).

It also contradicts the Group Replication documentation, which states: “There is a built-in group membership service that keeps the view of the group consistent and available for all servers at any given point in time.” (See https://dev.mysql.com/doc/refman/5.7/en/group-replication.html.)

I might also mention here that Percona XtraDB Cluster prevents stale reads (see https://www.percona.com/doc/percona-xtradb-cluster/5.7/wsrep-system-index.html#wsrep_sync_wait).

Bug: nodes become unusable after a big transaction, refusing to execute further transactions

There are two related bugs:

One line summary: after running a big transaction, any secondary nodes become unusable and refuse to perform any further transactions.

Obscure error messages

It is not uncommon to see cryptic error messages while testing Group Replication. For example:

mysql> commit;
ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.

This is fairly useless and provides little help until I check the mysqld error log. The log provides a little bit more information:

2017-02-09T02:05:36.996776Z 18 [ERROR] Plugin group_replication reported: '[GCS] Gcs_packet's payload is too big. Only the packets smaller than 2113929216 bytes can be compressed.'

Discussion:

The items highlighted above might not seem too bad at first, and you could assume that your workload won’t be affected. However, stale reads and node dysfunctions basically prevent me from running a more comprehensive evaluation.

My recommendation:

If you care about your data, then I recommend not using Group Replication in production. Currently, it looks like it might cause plenty of headaches, and it is easy to get inconsistent results.

For the moment, Group Replication appears an advanced – but broken – traditional MySQL asynchronous replication.

I understand Oracle’s dilemma. Usually people are hesitant to test a product that is not GA. So in order to get feedback from users, Oracle needs to push the product to GA. Oracle must absolutely solve the issues above during future QA cycles.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com