Apr
19
2017
--

Performance improvements in Percona XtraDB Cluster 5.7.17-29.20

Percona XtraDB Cluster

In our latest release of Percona XtraDB Cluster, we’ve introduced major performance improvements to the MySQLwrite-set replication layer. In this post, we want to show what these improvements look like.

For the test, we used the sysbench OLTP_RW, UPDATE_KEY and UPDATE_NOKEY workloads with 100 tables, 4mln rows each, which gives about 100GB of datasize. In all the tests we use a three-node setup, connected via a 10GB network, with the sysbench load directed to the one primary node.

In the first chart, we show improvements comparing to the previous version (5.7.16):

Percona XtraDB Cluster

The main improvements come from concurrent workloads, under multiple threads.

The previous chart is for cases using enabled binary logs, but in some situations we will have deployments without binary logs enabled (Percona XtraDB Cluster does not require them). The latest release significantly improves performance for this case as well.

Here is a chart showing throughput without binary logs:

Percona XtraDB Cluster

Where does Percona XtraDB Cluster place in comparison with similar technologies? To find out, we’ll compare this release with MySQL 5.7.17 Group Replication and with the recently released MariaDB 10.2.5 RC.

For MySQL 5.7.17 Group Replication, I’ve reviewed two cases: “durable” with sync_binlog=1, and “relaxed durability” with sync_binlog=0.

Also for MySQL 5.7.17 Group Replication, we want to review two cases with different flow_control settings. The first setting is flow_control=25000 (the default setting). It provides better performance, but with the drawbacks that non-primary nodes will fall behind significantly and MySQL Group Replication does not provide a way to protect from reading stale data. So with a default flow_control=25000, we risk reading very outdated data. We also tested MySQL Group Replication with flow_control=1000 to minimize stale data on non-primary nodes.

A note on the Flow Control topic: it is worth mentioning that we also changed the flow_control default for Percona XtraDB Cluster. The default value is 100 instead of 16 (as in version 5.7.16).

Comparison chart with sync_binlog=1 (for MySQL Group Replication):

Percona XtraDB Cluster

Comparison chart with sync_binlog=0 (for MySQL Group Replication):

Percona XtraDB Cluster

So there are couple conclusions we can make out of these charts.

  1. The new version of Percona XtraDB Cluster performs on the level with MySQL Group Replication
  2. flow_control for MySQl Group Replication really makes a difference for performance, and default flow_control=25000 is better (with the risk of a lot of outdated data on non-primary nodes)

The reference our benchmark files and config files are here.

Mar
23
2017
--

Percona Live Featured Session with Luís Soares: The New MySQL Replication Features in MySQL 8

Percona Live Featured Session

Percona Live Featured SessionWelcome to another post in the series of Percona Live featured session blogs! In these blogs, we’ll highlight some of the session speakers that will be at this year’s Percona Live conference. We’ll also discuss how these sessions can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured session, we’ll meet Luís Soares, Principal Software Engineer at Oracle. His session is The New MySQL Replication Features in MySQL 8 (with fellow presenter Lars Thalmann, Development Director at Oracle). The most popular high availability (HA) techniques deployed are based on making services redundant, in particular by means of replication. This fits quite naturally in the MySQL universe, as MySQL server has provided a mature replication solution for over a decade now. Moreover, the new replication developments (and their roadmap) show that MySQL is also catering for the requirements posed by popular environments such as the cloud.

I had a chance to speak with Luís about MySQL 8.0 replication:

Percona: How did you get into database technology? What do you love about it?

Luis SoaresLuís: My background is in distributed systems, particularly in database replication, message passing technologies and fault-tolerance. It all started while I was taking my BSc in computer science. As I finished it, I felt very much drawn towards these subjects. That alone made me enroll in a Master’s course that allowed me to focus almost full time on database replication using group communication technologies. I continued to study this field for years, and further deepened my knowledge on this area. That was great fun and quite a learning experience!

Years went by, and eventually I ended up working at MySQL on the replication team. This happened after I came to a MySQL user conference to present some of the work that I was doing at the time.

These are very fond memories! But I digress!

Back to the point. In general, the thing I love about working on database replication is that I am constantly facing new and interesting problems. Data replication in itself is hard. Add to that the semantics and requirements of a database server, and complexity increases quite a bit. Also, building a generic database replication service that fits in a large set of use cases requires a lot of discipline and careful thinking when designing new features. And let’s not forget the Web itself, which is constantly changing. New technologies come and go at a fast pace. The volume of data that has to be handled, year after year, increases considerably. This poses scalability and integration challenges that need to be addressed.

All in all, these are very exciting times to work with high availability, data replication and data integration.

Now specifically about MySQL, I love the fact that I work on a popular database technology that embraced replication very early in its life cycle. Replication awareness runs deep in the product and in its ecosystem. Consequently, MySQL has an extensive user base exploring many different use case scenarios around replication. And this is extremely motivating, rewarding and exciting. I can honestly say that my day-to-day work is never boring!

Percona: Your talk is called The New MySQL Replication Features in MySQL 8. What are the key replication features in MySQL 8.0, and why are they important?

Luís: It was a huge amount of work to get the MySQL Group Replication plugin out with MySQL 5.7.17. Group Replication is a new plugin that gives the user some replication nice properties by resorting to group communication and state machine replication. This makes the system able to protect data against split brain situations, enables fault-tolerance and high availability and provides coordination between servers committing transactions that change the data.

In addition to Group Replication, the team has also invested quite a bit on core replication features. Some of these features were already released, and others will be released at some point in time in a MySQL DMR.

In the first 8.0 DMR (MySQL 8.0.0) replication has better instrumentation for row-based replication. The user can observe the row-based replication applier progress by querying performance schema tables. There is also an enhanced global transaction identifier.

GTIDs history management, as the user can set the variable GTID_PURGED in scenarios other than those where the server has an empty GTID execution history. And the user can now specify the stop condition when starting the relay log applier, even if there are multiple applier threads started.

All these features combined are of great help, since they reduce operations overhead through automation, better observability and coordination between servers.

Work continues on many fronts: performance, availability, scalability, efficiency and observability. Stay tuned!

Percona: How do these features make DBAs lives easier? What problems do they solve?

Luís: As mentioned above, the features in MySQL 8.0.0 take some of the operations burden from the DBA. Moreover, they allow the user to better observe what is happening inside the replication pipeline. This alone is quite interesting, since DBAs need to make decisions both when designing new deployments and when tackling issues, possibly having to meet very tight deadlines.

Simply put, these features will help DBAs to diagnose and fix problems faster.

Percona: What do you want attendees to take away from your session? Why should they attend?

Luís: Our session is primarily about the shiny new replication features already in MySQL 8. This is the first takeaway. To know, first hand, what is in MySQL 8 replication-wise. But there is another takeaway, and quite an interesting one. Part of the session is dedicated to presenting the overall ideas around MySQL replication. So attendees will get an overview of the roadmap, and will be able to participate and provide feedback along the way. They will learn more about the big picture, and we will bring together some of the hot MySQL technologies that we keep hearing about nowadays: Group Replication, InnoDB Clusters, Multi-Threaded Replication and more!

It will be fun.

Percona: What are you most looking forward to at Percona Live 2017?

Luís: As a conference participant, I look forward to doing some networking with the vibrant MySQL community. I must say, that I really enjoy engaging in nice technical discussions about my favorite topics: fault-tolerance, replication, dependability and distributed systems overall. The conference gives me a great opportunity to do this.

As a MySQL developer, and one that has been developing MySQL replication for quite some time now, I look forward to talking about the recent work that my team has done and getting all the feedback I can.

As a bystander, conferences like Percona Live make me realize how much MySQL has grown, and how much it has evolved. Replication, for instance, has had so many interesting features, release after release over the last eight or nine years. The community has embraced and deployed them, often worked/interacted with the developers to improve them by providing feedback, feature requests or contributions. And this means that they are part of the story too!

These conferences are always a great learning experience! After spending a week with the MySQL community, I always feel refreshed, energized, extra motivated and with lots of food for thought when I get back home.

Go MySQL!

Register for Percona Live Data Performance Conference 2017, and see Luís present his session on The New MySQL Replication Features in MySQL 8 (with fellow presenter Lars Thalmann, Development Director at Oracle). Use the code FeaturedTalk and receive $100 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community, as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Feb
24
2017
--

Quest for Better Replication in MySQL: Galera vs. Group Replication

Group Replication

Group ReplicationUPDATE: Some of the language in the original post was considered overly-critical of Oracle by some community members. This was not my intent, and I’ve modified the language to be less so. I’ve also changed term “synchronous” (which the use of is inaccurate and misleading) to “virtually synchronous.” This term is more accurate and already used by both technologies’ founders, and should be less misleading.

I also wanted to thank Jean-François Gagné for pointing out the incorrect sentence about multi-threaded slaves in Group Replication, which I also corrected accordingly.

In today’s blog post, I will briefly compare two major virtually synchronous replication technologies available today for MySQL.

More Than Asynchronous Replication

Thanks to the Galera plugin, founded by the Codership team, we’ve had the choice between asynchronous and virtually synchronous replication in the MySQL ecosystem for quite a few years already. Moreover, we can choose between at least three software providers: Codership, MariaDB and Percona, each with its own Galera implementation.

The situation recently became much more interesting when MySQL Group Replication went into GA (stable) stage in December 2016.

Oracle, the upstream MySQL provider, introduced its own replication implementation that is very similar in concept. Unlike the others mentioned above, it isn’t based on Galera. Group Replication was built from the ground up as a new solution. MySQL Group Replication shares many very similar concepts to Galera. This post doesn’t cover MySQL Cluster, another and fully-synchronous solution, that existed much earlier then Galera — it is a much different solution for different use cases.

In this post, I will point out a couple of interesting differences between Group Replication and Galera, which hopefully will be helpful to those considering switching from one to another (or if they are planning to test them).

This is certainly not a full list of all the differences, but rather things I found interesting during my explorations.

It is also important to know that Group Replication has evolved a lot before it went GA (its whole cluster layer was replaced). I won’t mention how things looked before the GA stage, and will just concentrate on latest available 5.7.17 version. I will not spend too much time on how Galera implementations looked in the past, and will use Percona XtraDB Cluster 5.7 as a reference.

Multi-Master vs. Master-Slave

Galera has always been multi-master by default, so it does not matter to which node you write. Many users use a single writer due to workload specifics and multi-master limitations, but Galera has no single master mode per se.

Group Replication, on the other hand, promotes just one member as primary (master) by default, and other members are put into read-only mode automatically. This is what happens if we try to change data on non-master node:

mysql> truncate test.t1;
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

To change from single primary mode to multi-primary (multi-master), you have to start group replication with the 

group_replication_single_primary_mode

variable disabled.
Another interesting fact is you do not have any influence on which cluster member will be the master in single primary mode: the cluster auto-elects it. You can only check it with a query:

mysql> SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME like 'group_replication%';
+----------------------------------+--------------------------------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 329333cd-d6d9-11e6-bdd2-0242ac130002 |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

Or just:

mysql> show status like 'group%';
+----------------------------------+--------------------------------------+
| Variable_name                    | Value                                |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 329333cd-d6d9-11e6-bdd2-0242ac130002 |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

To show the hostname instead of UUID, here:

mysql> select member_host as "primary master" from performance_schema.global_status join performance_schema.replication_group_members where variable_name='group_replication_primary_member' and member_id=variable_value;
+----------------+
| primary master |
+----------------+
| f18ff539956d   |
+----------------+
1 row in set (0.00 sec)

Replication: Majority vs. All

Galera delivers write transactions synchronously to ALL nodes in the cluster. (Later, applying happens asynchronously in both technologies.) However, Group Replication needs just a majority of the nodes confirming the transaction. This means a transaction commit on the writer succeeds and returns to the client even if a minority of nodes still have not received it.

In the example of a three-node cluster, if one node crashes or loses the network connection, the two others continue to accept writes (or just the primary node in Single-Primary mode) even before a faulty node is removed from the cluster.

If the separated node is the primary one, it denies writes due to the lack of a quorum (it will report the error

ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.

). If one of the nodes receives a quorum, it will be elected to primary after the faulty node is removed from the cluster, and will then accept writes.

With that said, the “majority” rule in Group Replication means that there isn’t a guarantee that you won’t lose any data if the majority nodes are lost. There is a chance these could apply some transactions that aren’t delivered to the minority at the moment they crash.

In Galera, a single node network interruption makes the others wait for it, and pending writes can be committed once either the connection is restored or the faulty node removed from cluster after the timeout. So the chance of losing data in a similar scenario is lower, as transactions always reach all nodes. Data can be lost in Percona XtraDB Cluster only in a really bad luck scenario: a network split happens, the remaining majority of nodes form a quorum, the cluster reconfigures and allows new writes, and then shortly after the majority part is damaged.

Schema Requirements

For both technologies, one of the requirements is that all tables must be InnoDB and have a primary key. This requirement is now enforced by default in both Group Replication and Percona XtraDB Cluster 5.7. Let’s look at the differences.

Percona XtraDB Cluster:

mysql> create table nopk (a char(10));
Query OK, 0 rows affected (0.08 sec)
mysql> insert into nopk values ("aaa");
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits use of DML command on a table (test.nopk) without an explicit primary key with pxc_strict_mode = ENFORCING or MASTER
mysql> create table m1 (id int primary key) engine=myisam;
Query OK, 0 rows affected (0.02 sec)
mysql> insert into m1 values(1);
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits use of DML command on a table (test.m1) that resides in non-transactional storage engine with pxc_strict_mode = ENFORCING or MASTER
mysql> set global pxc_strict_mode=0;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into nopk values ("aaa");
Query OK, 1 row affected (0.00 sec)
mysql> insert into m1 values(1);
Query OK, 1 row affected (0.00 sec)

Before Percona XtraDB Cluster 5.7 (or in other Galera implementations), there were no such enforced restrictions. Users unaware of these requirements often ended up with problems.

Group Replication:

mysql> create table nopk (a char(10));
Query OK, 0 rows affected (0.04 sec)
mysql> insert into nopk values ("aaa");
ERROR 3098 (HY000): The table does not comply with the requirements by an external plugin.
2017-01-15T22:48:25.241119Z 139 [ERROR] Plugin group_replication reported: 'Table nopk does not have any PRIMARY KEY. This is not compatible with Group Replication'
mysql> create table m1 (id int primary key) engine=myisam;
ERROR 3161 (HY000): Storage engine MyISAM is disabled (Table creation is disallowed).

I am not aware of any way to disable these restrictions in Group Replication.

GTID

Galera has it’s own Global Transaction ID, which has existed since MySQL 5.5, and is independent from MySQL’s GTID feature introduced in MySQL 5.6. If MySQL’s GTID is enabled on a Galera-based cluster, both numerations exist with their own sequences and UUIDs.

Group Replication is based on a native MySQL GTID feature, and relies on it. Interestingly, a separate sequence block range (initially 1M) is pre-assigned for each cluster member.

WAN Support

The MySQL Group Replication documentation isn’t very optimistic on WAN support, claiming that both “Low latency, high bandwidth network connections are a requirement” and “Group Replication is designed to be deployed in a cluster environment where server instances are very close to each other, and is impacted by both network latency as well as network bandwidth.” These statements are found here and here. However there is network traffic optimization: Message Compression.

I don’t see group communication level tunings available yet, as we find in the Galera evs.* series of

wsrep_provider_options

.

Galera founders actually encourage trying it in geo-distributed environments, and some WAN-dedicated settings are available (the most important being WAN segments).

But both technologies need a reliable network for good performance.

State Transfers

Galera has two types of state transfers that allow syncing data to nodes when needed: incremental (IST) and full (SST). Incremental is used when a node has been out of a cluster for some time, and once it rejoins the other nodes has the missing write sets still in Galera cache. Full SST is helpful if incremental is not possible, especially when a new node is added to the cluster. SST automatically provisions the node with fresh data taken as a snapshot from one of the running nodes (donor). The most common SST method is using Percona XtraBackup, which takes a fast and non-blocking binary data snapshot (hot backup).

In Group Replication, state transfers are fully based on binary logs with GTID positions. If there is no donor with all of the binary logs (included the ones for new nodes), a DBA has to first provision the new node with initial data snapshot. Otherwise, the joiner will fail with a very familiar error:

2017-01-16T23:01:40.517372Z 50 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236

The official documentation mentions that provisioning the node before adding it to the cluster may speed up joining (the recovery stage). Another difference is that in the case of state transfer failure, a Galera joiner will abort after the first try, and will shutdown its mysqld instance. The Group Replication joiner will then fall-back to another donor in an attempt to succeed. Here I found something slightly annoying: if no donor can satisfy joiner demands, it will still keep trying the same donors over and over, for a fixed number of attempts:

[root@cd81c1dadb18 /]# grep 'Attempt' /var/log/mysqld.log |tail
2017-01-16T22:57:38.329541Z 12 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2017-01-16T22:57:38.539984Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
2017-01-16T22:57:38.806862Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 3/10'
2017-01-16T22:58:39.024568Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 4/10'
2017-01-16T22:58:39.249039Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 5/10'
2017-01-16T22:59:39.503086Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 6/10'
2017-01-16T22:59:39.736605Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 7/10'
2017-01-16T23:00:39.981073Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 8/10'
2017-01-16T23:00:40.176729Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 9/10'
2017-01-16T23:01:40.404785Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 10/10'

After the last try, even though it fails, mysqld keeps running and allows client connections…

Auto Increment Settings

Galera adjusts the auto_increment_increment and auto_increment_offset values according to the number of members in a cluster. So, for a 3-node cluster,

auto_increment_increment

  will be “3” and

auto_increment_offset

  from “1” to “3” (depending on the node). If a number of nodes change later, these are updated immediately. This feature can be disabled using the 

wsrep_auto_increment_control

 setting. If needed, these settings can be set manually.

Interestingly, in Group Replication the

auto_increment_increment

 seems to be fixed at 7, and only

auto_increment_offset

 is set differently on each node. This is the case even in the default Single-Primary mode! this seems like a waste of available IDs, so make sure that you adjust the

group_replication_auto_increment_increment

 setting to a saner number before you start using Group Replication in production.

Multi-Threaded Slave Side Applying

Galera developed its own multi-threaded slave feature, even in 5.5 versions, for workloads that include tables in the same database. It is controlled with the  wsrep_slave_threads variable. Group Replication uses a feature introduced in MySQL 5.7, where the number of applier threads is controlled with slave_parallel_workers. Galera will do multi-threaded replication based on potential conflicts of changed/locked rows. Group Replication parallelism is based on an improved LOGICAL_CLOCK scheduler, which uses information from writesets dependencies. This can allow it to achieve much better results than in normal asynchronous replication MTS mode. More details can be found here: http://mysqlhighavailability.com/zooming-in-on-group-replication-performance/

Flow Control

Both technologies use a technique to throttle writes when nodes are slow in applying them. Interestingly, the default size of the allowed applier queue in both is much different:

Moreover, Group Replication provides separate certifier queue size, also eligible for the Flow Control trigger:

 group_replication_flow_control_certifier_threshold

. One thing I found difficult, is checking the actual applier queue size, as the only exposed one via performance_schema.replication_group_member_stats is the

Count_Transactions_in_queue

 (which only shows the certifier queue).

Network Hiccup/Partition Handling

In Galera, when the network connection between nodes is lost, those who still have a quorum will form a new cluster view. Those who lost a quorum keep trying to re-connect to the primary component. Once the connection is restored, separated nodes will sync back using IST and rejoin the cluster automatically.

This doesn’t seem to be the case for Group Replication. Separated nodes that lose the quorum will be expelled from the cluster, and won’t join back automatically once the network connection is restored. In its error log we can see:

2017-01-17T11:12:18.562305Z 0 [ERROR] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.'
2017-01-17T11:12:18.631225Z 0 [Note] Plugin group_replication reported: 'getstart group_id ce427319'
2017-01-17T11:12:21.735374Z 0 [Note] Plugin group_replication reported: 'state 4330 action xa_terminate'
2017-01-17T11:12:21.735519Z 0 [Note] Plugin group_replication reported: 'new state x_start'
2017-01-17T11:12:21.735527Z 0 [Note] Plugin group_replication reported: 'state 4257 action xa_exit'
2017-01-17T11:12:21.735553Z 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
2017-01-17T11:12:21.735558Z 0 [Note] Plugin group_replication reported: 'new state x_start'

Its status changes to:

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 329333cd-d6d9-11e6-bdd2-0242ac130002 | f18ff539956d | 3306 | ERROR |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
1 row in set (0.00 sec)

It seems the only way to bring it back into the cluster is to manually restart Group Replication:

mysql> START GROUP_REPLICATION;
ERROR 3093 (HY000): The START GROUP_REPLICATION command failed since the group is already running.
mysql> STOP GROUP_REPLICATION;
Query OK, 0 rows affected (5.00 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (1.96 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 24d6ef6f-dc3f-11e6-abfa-0242ac130004 | cd81c1dadb18 | 3306 | ONLINE |
| group_replication_applier | 329333cd-d6d9-11e6-bdd2-0242ac130002 | f18ff539956d | 3306 | ONLINE |
| group_replication_applier | ae148d90-d6da-11e6-897e-0242ac130003 | 0af7a73f4d6b | 3306 | ONLINE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
3 rows in set (0.00 sec

Note that in the above output, after the network failure, Group Replication did not stop. It waits in an error state. Moreover, in Group Replication a partitioned node keeps serving dirty reads as if nothing happened (for non-super users):

cd81c1dadb18 {test} ((none)) > SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 24d6ef6f-dc3f-11e6-abfa-0242ac130004 | cd81c1dadb18 | 3306 | ERROR |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
1 row in set (0.00 sec)
cd81c1dadb18 {test} ((none)) > select * from test1.t1;
+----+-------+
| id | a |
+----+-------+
| 1 | dasda |
| 3 | dasda |
+----+-------+
2 rows in set (0.00 sec)
cd81c1dadb18 {test} ((none)) > show grants;
+-------------------------------------------------------------------------------+
| Grants for test@% |
+-------------------------------------------------------------------------------+
| GRANT SELECT, INSERT, UPDATE, DELETE, REPLICATION CLIENT ON *.* TO 'test'@'%' |
+-------------------------------------------------------------------------------+
1 row in set (0.00 sec)

A privileged user can disable

super_read_only

, but then it won’t be able to write:

cd81c1dadb18 {root} ((none)) > insert into test1.t1 set a="split brain";
ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.
cd81c1dadb18 {root} ((none)) > select * from test1.t1;
+----+-------+
| id | a |
+----+-------+
| 1 | dasda |
| 3 | dasda |
+----+-------+
2 rows in set (0.00 sec)

I found an interesting thing here, which I consider to be a bug. In this case, a partitioned node can actually perform DDL, despite the error:

cd81c1dadb18 {root} ((none)) > show tables in test1;
+-----------------+
| Tables_in_test1 |
+-----------------+
| nopk |
| t1 |
+-----------------+
2 rows in set (0.01 sec)
cd81c1dadb18 {root} ((none)) > create table test1.split_brain (id int primary key);
ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.
cd81c1dadb18 {root} ((none)) > show tables in test1;
+-----------------+
| Tables_in_test1 |
+-----------------+
| nopk |
| split_brain |
| t1 |
+-----------------+
3 rows in set (0.00 sec)

In a Galera-based cluster, you are automatically protected from that, and a partitioned node refuses to allow both reads and writes. It throws an error: 

ERROR 1047 (08S01): WSREP has not yet prepared node for application use

. You can force dirty reads using the 

wsrep_dirty_reads

 variable.

There many more subtle (and less subtle) differences between these technologies – but this blog post is long enough already. Maybe next time ?

Article with Similar Subject

http://lefred.be/content/group-replication-vs-galera/

Feb
15
2017
--

Group Replication: Shipped Too Early

Group Replication

Group ReplicationThis blog post is my overview of Group Replication technology.

With Oracle clearly entering the “open source high availability solutions” arena with the release of their brand new Group Replication solution, I believe it is time to review the quality of the first GA (production ready) release.

TL;DR: Having examined the technology, it is my conclusion that Oracle seems to have released the GA version of Group Replication too early. While the product is definitely “working prototype” quality, the release seems rushed and unfinished. I found a significant number of issues, and I would personally not recommend it for production use.

It is obvious that Oracle is trying hard to ship technology to compete with Percona XtraDB Cluster, which is probably why they rushed to claim Group Replication GA quality.

If you’re all set to follow along and test Group Replication yourself, simplify the initial setup by using this Docker image. We can review some of the issues you might face together.

For the record, I tested the version based on MySQL 5.7.17 release.

No automatic provisioning

First off, the first thing you’ll find is there is NO way to automatically setup of a new node.

If you need to setup new node or recover an existing node from a fatal failure, you’ll need to manually provision the slave.

Of course, you can clone a slave using Percona XtraBackup or LVM by employing some self-developed scripts. But given the high availability nature of the product, one would expect Group Replication to automatically re-provision any failed node.

Bug: stale reads on nodes

Please see this bug:

One line summary: while any secondary nodes are “catching up” to whatever happened on a first node (it takes time to apply changes on secondary nodes), reads on a secondary node could return stale data (as shown in the bug report).

This behavior brings us back to the traditional asynchronous replication slave behavior (i.e., Group Replication’s predecessor).

It also contradicts the Group Replication documentation, which states: “There is a built-in group membership service that keeps the view of the group consistent and available for all servers at any given point in time.” (See https://dev.mysql.com/doc/refman/5.7/en/group-replication.html.)

I might also mention here that Percona XtraDB Cluster prevents stale reads (see https://www.percona.com/doc/percona-xtradb-cluster/5.7/wsrep-system-index.html#wsrep_sync_wait).

Bug: nodes become unusable after a big transaction, refusing to execute further transactions

There are two related bugs:

One line summary: after running a big transaction, any secondary nodes become unusable and refuse to perform any further transactions.

Obscure error messages

It is not uncommon to see cryptic error messages while testing Group Replication. For example:

mysql> commit;
ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.

This is fairly useless and provides little help until I check the mysqld error log. The log provides a little bit more information:

2017-02-09T02:05:36.996776Z 18 [ERROR] Plugin group_replication reported: '[GCS] Gcs_packet's payload is too big. Only the packets smaller than 2113929216 bytes can be compressed.'

Discussion:

The items highlighted above might not seem too bad at first, and you could assume that your workload won’t be affected. However, stale reads and node dysfunctions basically prevent me from running a more comprehensive evaluation.

My recommendation:

If you care about your data, then I recommend not using Group Replication in production. Currently, it looks like it might cause plenty of headaches, and it is easy to get inconsistent results.

For the moment, Group Replication appears an advanced – but broken – traditional MySQL asynchronous replication.

I understand Oracle’s dilemma. Usually people are hesitant to test a product that is not GA. So in order to get feedback from users, Oracle needs to push the product to GA. Oracle must absolutely solve the issues above during future QA cycles.

Feb
15
2017
--

Docker Images for Percona Server for MySQL Group Replication

Group Replication

Group ReplicationIn this blog post, we’ll point to a new Docker image for Percona Server for MySQL Group Replication.

Our most recent release of Percona Server for MySQL (Percona Server for MySQL 5.7.17) comes with Group Replication plugins. Unfortunately, since this technology is very new, it requires some fairly complicated steps to setup and get running. To help with that process, I’ve prepare Docker images that simplify its setup procedures.

You can find the image here: https://hub.docker.com/r/perconalab/pgr-57/.

To start the first node (bootstrap the group):

docker run -d -p 3306 --net=clusternet -e MYSQL_ROOT_PASSWORD=passw0rd -e CLUSTER_NAME=cluster1 perconalab/pgr-57

To add nodes into the group after:

docker run -d -p 3306 --net=clusternet -e MYSQL_ROOT_PASSWORD=passw0rd -e CLUSTER_NAME=cluster1 -e CLUSTER_JOIN=CONTAINER_ID_FROM_THE_FIRST_STEP perconalab/pgr-57

You can also get a full script that starts “N” number of nodes, here: https://github.com/Percona-Lab/percona-docker/blob/master/pgr-57/start_node.sh

 

Feb
01
2017
--

WAN Synchronous Clusters: Dealing with Latency Using Concurrency

WAN Latency

In this blog, we’ll discuss how to use concurrency to help with WAN latency when using synchronous clusters.

WAN Latency Problem

Our customers often ask us for help or advice with WAN clustering problems. Historically, the usual solution for MySQL WAN deployments is having the primary site in one data center, and stand-by backup site in another data center (replicating from the primary asynchronously). These days, however, there is a huge desire to employ available synchronous replication solutions for MySQL. These solutions include things like Galera (i.e., Percona XtraDB Cluster) or the recently released MySQL Group Replication. This trend is attributable to the fact that these solutions are less problematic and provide more automatic fail over and fail back procedures. But it’s also because businesses want to write in both data centers simultaneously.

Unfortunately, WAN link reliability and latency makes the synchronous replication solution a big challenge. In many cases, these challenges force geographically separate data centers to still replicate asynchronously.

From a requirements point of view, the Galera founders official documentation has WAN related recommendations and some dedicated options (like segments) — as described in Jay’s blog post. But WAN deployments are absolutely possible, and even an advertised option, in Galera. The MySQL Group Replication team, however, seem to discourage such use cases, as we can read:

Group Replication is designed to be deployed in a cluster environment where server instances are very close to each other, and is impacted by both network latency as well as network bandwidth.

Remedy

While perhaps obvious to some, I would like to point out a simple dependency that might be a viable solution in some deployments that face significant network latency. That solution is concurrency! When you face the problem of limited write throughput due to a transaction commit latency, you can employ more writer threads. By using separate connections to MySQL, overall you can to commit more transactions at the same time.

Let me demonstrate with example results based on a very simple test case. I tested both Percona XtraDB Cluster (with Galera replication) and MySQL Group Replication. I configured a minimal cluster of three nodes in each case, running as Docker containers on the same host (simulating a WAN network). For this setup, latency is around 0.05ms on average. Then, I introduced an artificial network latency of 50ms and 100ms into one of the node’s network interfaces. I later repeated the same tests using VirtualBox VM instances, running on a completely different server. The results were very similar. The command to simulate additional network latency is:

# tc qdisc add dev eth0 root netem delay 50ms

To delay the ping to other nodes in the cluster:

# ping -c 2 172.17.0.2
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=50.0 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=50.1 ms

The test is very simple: execute 500 small insert transactions, each inserting just single row (but that is less relevant now).

For testing, I used a simple mysqlslap command:

mysqlslap --password=*** --host=$IP --user=root --delimiter=";" --number-of-queries=500 --create-schema=test --concurrency=$i --query="insert into t1 set a='fooBa'"

and simple single table:

CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`a` char(5) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB

Interestingly, without increased latency, the same test takes much longer against the Group Replication cluster, even though by default Group Replication works with enabled

group_replication_single_primary_mode

, and disabled

group_replication_enforce_update_everywhere_checks

. Theoretically, it should be a lighter operation, from a “data consistency checks” point of view. Also with WAN-type latencies, Percona XtraDB Cluster seems to be slightly faster in this particular test. Here are the test results for the three different network latencies:

XtraDB Cluster latency/seconds
Threads 100ms 50ms 0.05ms
1 51.671 25.583 0.268
4 13.936 8.359 0.187
8 7.84 4.18 0.146
16 4.641 2.353 0.13
32 2.33 1.16 0.122
64 1.808 0.925 0.098
GR latency/seconds
Threads 100ms 50ms 0.05ms
1 55.513 29.339 5.059
4 14.889 7.916 2.184
8 7.673 4.195 1.294
16 4.52 2.507 0.767
32 3.417 1.479 0.473
64 2.099 0.809 0.267

WAN latency

I used the same InnoDB settings for both clusters, each node under a separate Docker container or Virtual Box VM. Similar test result could differ a lot in real production systems, where more CPU cores provide better multi-concurrency conditions.

It also wasn’t my idea to benchmark Galera versus Group Replication, but rather to show that the same concurrency to write throughput dependency applies to both technologies. I might be missing some tuning on the Group Replication side, so I don’t claim any verified winner here.

Just to provide some more details, I was using Percona XtraDB Cluster 5.7.16 and MySQL with Group Replication 5.7.17.

One important note: when you expect higher concurrency to provide better throughput, you must make sure the concurrency is not limited by server settings. For example, you must look at

innodb_thread_concurrency

  (I used 0, so unlimited), 

slave_parallel_workers

 for GR and

wsrep_slave_threads

 for Galera (among others related to IO operations, etc.).

Apart from “concurrency tuning,” which could involve application changes if not architectural re-design, there are of course more possible optimizations for WAN environments. For example:

https://www.percona.com/blog/2016/03/14/percona-xtradb-cluster-in-a-high-latency-network-environment/ (to deal with latency)

and:

https://www.percona.com/blog/2013/12/19/automatic-replication-relaying-galera-3/

for saving/minimizing network utilization using 

binlog_row_image=minimal

 and other variables.

But these are out of the scope of this post. I hope this simple post helps you deal with the speed of light better!  ?

Dec
16
2016
--

Percona Live 2017 Sneak Peek Schedule Up Now! See the Available Sessions!

Percona Live 2017

Percona Live 2017We are excited to announce that the sneak peek schedule for the Percona Live 2017 Open Source Database Conference is up! The Percona Live Open Source Database Conference 2017 is April 24th – 27th, at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The Percona Live Open Source Database Conference 2017 is the premier event for the rich and diverse MySQL, MongoDB and open source database ecosystems. This conference provides an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience.

Below are some of our top picks for MySQL, MongoDB and open source database sessions:

Tutorials

MySQL 101 Tracks

MongoDB 101 Tracks

Breakout Talks

Register for the Percona Live Open Source Database Conference here.

Early Bird Discounts

Just a reminder to everyone out there: our Early Bird discount rate for the Percona Live Open Source Database Conference 2017 is only available ‘til January 8, 2017, 11:30 pm PST! This rate gets you all the excellent and amazing opportunities that Percona Live offers, at a very reasonable price!

Sponsor Percona Live

Become a conference sponsor! We have sponsorship opportunities available for this annual MySQL, MongoDB and open source database event. Sponsors become a part of a dynamic and growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.

Aug
08
2016
--

Docker Images for MySQL Group Replication 5.7.14

MySQL Group Replication

MySQL Group ReplicationIn this post, I will point you to Docker images for MySQL Group Replication testing.

There is a new release of MySQL Group Replication plugin for MySQL 5.7.14. It’s a “beta” plugin and it is probably the last (or at lease one of the final pre-release packages) before Group Replication goes GA (during Oracle OpenWorld 2016, in our best guess).

Since it is close to GA, it would be great to get a better understanding of this new technology. Unfortunately, MySQL Group Replication installation process isn’t very user-friendly.

Or, to put it another way, totally un-user-friendly! It consists of a mere “50 easy steps” – by which I think they mean “easy” to mess up.

Matt Lord, in his post http://mysqlhighavailability.com/mysql-group-replication-a-quick-start-guide/, acknowledges: “getting a working MySQL service consisting of 3 Group Replication members is not an easy “point and click” or automated single command style operation.”

I’m not providing a review of MySQL Group Replication 5.7.14 yet – I need to play around with it a lot more. To make this process easier for myself, and hopefully more helpful to you, I’ve prepared Docker images for the testing of MySQL Group Replication.

Docker Images

To start the first node, run:

docker run -d --net=cluster1 --name=node1  perconalab/mysql-group-replication --group_replication_bootstrap_group=ON

To join all following nodes:

docker run -d --net=cluster1 --name=node2  perconalab/mysql-group-replication --group_replication_group_seeds='node1:6606'

Of course, you need to have Docker Network running:

docker network create cluster1

I hope this will make the testing process easier!

Oct
29
2015
--

MySQL 5.7 first impressions on group-replication

Group Replication During the last few weeks I’ve been testing and playing a bit with the new group-replication plugin available for MySQL 5.7. Before continuing I’d like to clarify some aspects: the plugin is only available in labs and is not yet ready for production. The current version is 0.6. I used 5.7.9 GA running in a Vagrant 3 nodes cluster with CentOS 7.
As an additional note, I’ve tested previous version of plugin 0.5 against 5.7.8.rc and there are some good changes, so I recommend starting with the GA version.

For the matter of my tests I’ve followed instructions from this post. It’s not as straightforward as it looks; there were some issues that needed to be handled, but I finally managed to get a consistent 3 nodes cluster running:

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE |
| group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE |
| group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec)

My impressions about installation

1- The previous version relies only on Corosync for group communication, meaning that you need to install and configure an extra piece of software. For those who aren’t very familiar with it (like me), it can be a bit tricky and complex. From version 0.6.0 for MySQL 5.7.9, a new communication engine called XCom was added. This is now the default communication framework which is included in the plugin, so no extra pieces of software are needed.

2- When you initialize the MySQL database (mysqld –initialize function now replaces mysql_install_db script) you need to disable binary logging in the configuration file, otherwise information such as ‘create database mysql’ will be pushed to binary logs and cause issues with nodes joining the cluster due errors like:
2015-10-21T20:18:52.059231Z 8 [Warning] Slave: Can't create database 'mysql'; database exists Error_code: 1007

3- In group replication there isn’t a concept like SST (State Snapshot Transfer) which basically drops and recreates the datadir if it finds data discrepancies. With group replication you may end up having different datasets and replication will continue working (more on this later in the post).

4- For Incremental State Transfer (a.k.a. IST in Galera), group replication trusts in binary logs present in any of the potential donors (at the moment the selection of a donor is done randomly). So, if a node is disconnected, when it comes back online, it requests binary logs from the donor using the same IO thread as regular replication. The problem here is that if the binary log was purged on the donor then the joiner can’t be recovered and a full rebuild is needed. This is a similar approach to the gcache in Galera, but when gcache is not able to provide transactions needed for IST, an SST is performed instead. Group replication can’t do this (yet?).

These are some of the installation issues I faced. Now that we have the cluster running, what works? Well let’s try some samples.

Simple write tests
I tried running simple write operations like a few inserts, create tables and so on using sysbench like this:

sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run

And checked status in the other nodes. It does what it is supposed to do; data and records are found in the rest of nodes just like this:
Node1:

mysql> select * from sbtest1 order by 1 limit 4;
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 8 | 12410 | 05879259692-02024081996-76022818191-82933803603-81845875017-31928300264-16934042125-67052432228-92123768050-95121478647 | 79362588344-09017007031-35411714211-52054317597-26283585383 |
| 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 |
| 22 | 12544 | 57481185690-89398636500-16888148413-67987678267-15604944838-94210794401-18107184012-91338377776-83386272438-09451188763 | 35227182905-15234265621-59793845249-15413569710-23749555118 |
| 29 | 12436 | 29279855805-99348203463-85191104223-39587263726-81794340135-73817557808-54578801760-64404111877-55434439976-37212880746 | 59222897263-22759023974-22020489960-93434521232-77981152534 |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
4 rows in set (0.08 sec)

Node2

mysql> select * from sbtest1 order by 1 limit 4;
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 8 | 12410 | 05879259692-02024081996-76022818191-82933803603-81845875017-31928300264-16934042125-67052432228-92123768050-95121478647 | 79362588344-09017007031-35411714211-52054317597-26283585383 |
| 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 |
| 22 | 12544 | 57481185690-89398636500-16888148413-67987678267-15604944838-94210794401-18107184012-91338377776-83386272438-09451188763 | 35227182905-15234265621-59793845249-15413569710-23749555118 |
| 29 | 12436 | 29279855805-99348203463-85191104223-39587263726-81794340135-73817557808-54578801760-64404111877-55434439976-37212880746 | 59222897263-22759023974-22020489960-93434521232-77981152534 |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
4 rows in set (0.08 sec)

Well we expected this so, yay, we are fine.

What about trying to write in 2 nodes at the same time? This should fire things like conflict resolution during certification; in a nutshell, if we expect to use group replication to write in multiple nodes at the same time, we need a way to resolve conflicts with the data. These are most common in PK/UK violations; in other words 2 transactions trying to insert the same record/id. This is not recommended because it is not an approach we can use to scale up writes (same as Galera) but it’s still possible to do.

An easier way to test is to run sysbench in more than one member of a cluster and wait for a failure. As expected, it does what it is supposed to do:
Node1

[root@node1 data]# sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run |grep tps
[ 1s] threads: 8, tps: 0.00, reads: 0.00, writes: 1416.75, response time: 25.65ms (95%), errors: 0.00, reconnects: 0.00
[ 2s] threads: 8, tps: 0.00, reads: 0.00, writes: 1893.78, response time: 20.94ms (95%), errors: 0.00, reconnects: 0.00
[ 3s] threads: 8, tps: 0.00, reads: 0.00, writes: 1421.19, response time: 28.44ms (95%), errors: 0.00, reconnects: 0.00
[ 4s] threads: 8, tps: 0.00, reads: 0.00, writes: 1397.83, response time: 34.92ms (95%), errors: 0.00, reconnects: 0.00
[ 5s] threads: 8, tps: 0.00, reads: 0.00, writes: 1734.31, response time: 22.75ms (95%), errors: 0.00, reconnects: 0.00

Node2

[root@node2 vagrant]# sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run
WARNING: Both max-requests and max-time are 0, running endless test
sysbench 0.5: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 8
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored
Threads started!
ALERT: failed to execute MySQL query: `UPDATE sbtest6 SET k=k+1 WHERE id=12608`:
ALERT: Error 1180 Got error 149 during COMMIT
FATAL: failed to execute function `event': (null)
ALERT: failed to execute MySQL query: `UPDATE sbtest6 SET k=k+1 WHERE id=12468`:
ALERT: Error 1180 Got error 149 during COMMIT
FATAL: failed to execute function `event': (null)

Node2 eventually failed, but what happened? Let’s check the error log to see what’s reporting:

2015-10-27T17:12:33.894759Z 3 [Note] InnoDB: Blocking transaction: ID: 223804 - Blocked transaction ID: 223820 - MySQL thread id 59, OS thread handle 139838350866176, query id 197913 localhost root query end
UPDATE sbtest5 SET k=k+1 WHERE id=12510

At the commit stage there was a conflict with an already committed transaction in Node1, so it forced a failure and a rollback of the operation. So far, so good.

What about a node going down?
One of the tests I ran was to kill one of the nodes during the operations to see if it resumes replication properly when back to life. For this we need to set up some variables in the configuration file as follows:

loose-group_replication_group_name="8a94f357-aab4-11df-86ab-c80aa9429562"
loose-group_replication_start_on_boot=1
loose-group_replication_recovery_user='rpl_user'
loose-group_replication_recovery_password='rpl_pass'

Note: This is interesting, that the replication credentials are not saved into a table (as is done with slave_master_info in regular replication). I guess this is part of a to do section, but it’s something to keep in mind since this implies a security risk.

Back to our test. I ran the regular sysbench command in one of my nodes and then went to node2 and killed mysql daemon. After the regular crash recovery messages we can see:

[root@node2 data]# killall mysqld
[root@node2 data]# tail -500 error.log
2015-10-27T17:15:26.460674Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2015-10-27T17:15:26.460711Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2015-10-27T17:15:26.461001Z 0 [Note] InnoDB: Waiting for purge to start
2015-10-27T17:15:26.514015Z 0 [Note] InnoDB: 5.7.9 started; log sequence number 77370253
2015-10-27T17:15:26.515417Z 0 [Note] Plugin 'FEDERATED' is disabled.
2015-10-27T17:15:26.525466Z 0 [Note] InnoDB: not started
2015-10-27T17:15:26.525914Z 0 [Note] InnoDB: Loading buffer pool(s) from /data/ib_buffer_pool
[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configuring Xcom group: XCom Group ID=1827295128 Name=8a94f357-aab4-11df-86ab-c80aa9429562
[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Peer Nodes: 192.168.70.2:10300
[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Peer Nodes: 192.168.70.3:10300
[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Total number of peers: 2
[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Local Node: 192.168.70.3:10300
[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Bootstrap: false
2015-10-27T17:15:26.723392Z 0 [Note] InnoDB: Buffer pool(s) load completed at 151027 17:15:26
2015-10-27T17:15:27.135089Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key
2015-10-27T17:15:27.136449Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2015-10-27T17:15:27.145198Z 0 [Note] IPv6 is available.
2015-10-27T17:15:27.145247Z 0 [Note] - '::' resolves to '::';
2015-10-27T17:15:27.145265Z 0 [Note] Server socket created on IP: '::'.
2015-10-27T17:15:27.171615Z 0 [Warning] 'user' entry 'root@localhost' ignored in --skip-name-resolve mode.
2015-10-27T17:15:27.171711Z 0 [Warning] 'user' entry 'mysql.sys@localhost' ignored in --skip-name-resolve mode.
2015-10-27T17:15:27.172447Z 0 [Warning] 'db' entry 'sys mysql.sys@localhost' ignored in --skip-name-resolve mode.
2015-10-27T17:15:27.173089Z 0 [Warning] 'proxies_priv' entry '@ root@localhost' ignored in --skip-name-resolve mode.
2015-10-27T17:15:27.192881Z 0 [Warning] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode.
2015-10-27T17:15:27.205764Z 0 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=node2-relay-bin' to avoid this problem.
2015-10-27T17:15:27.676222Z 1 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2015-10-27T17:15:27.685374Z 3 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './node2-relay-bin-group_replication_applier.000002' position: 51793711
2015-10-27T17:15:27.685985Z 0 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2015-10-27T17:15:27.686009Z 0 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
2015-10-27T17:15:27.686017Z 0 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 2'
[XCOM BINDING DEBUG] ::join()
connecting to 192.168.70.3 10300
state 0 action xa_init
connecting to 192.168.70.3 10300
connected to 192.168.70.3 10300
connecting to 192.168.70.3 10300
connected to 192.168.70.3 10300
...
[XCOM BINDING DEBUG] ::join():: I am NOT the boot node.
[XCOM BINDING DEBUG] ::join():: xcom_client_open_connection to 192.168.70.2:10300
connecting to 192.168.70.2 10300
connected to 192.168.70.2 10300
[XCOM BINDING DEBUG] ::join():: Calling xcom_client_add_node
cli_err 0
state 3489 action xa_snapshot
new state x_recover
state 3505 action xa_complete
new state x_run
get_nodeno(get_site_def()) = 2 task_now() = 1445966128.920615 n = 0 median_time() = 1.000000 executed_msg={84bca5ce 12385 1}
 [XCOM BINDING DEBUG] ::join():: GCS_OK
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
2015-10-27T17:15:28.926806Z 0 [Note] Event Scheduler: Loaded 0 events
2015-10-27T17:15:28.928485Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.9-log' socket: '/data/mysql.sock' port: 3306 MySQL Community Server (GPL)
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
[XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet.
....
[XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message
[XCOM_BINDING_DEBUG] ::process_possible_control_message()::Install new view
[XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data
[XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data
[XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data
2015-10-27T17:15:30.084101Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 46750407:7'
[XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 1382963399
2015-10-27T17:15:30.091414Z 5 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/86400'
2015-10-27T17:15:30.107078Z 5 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='node3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2015-10-27T17:15:30.117379Z 5 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor e5263489-7cb7-11e5-a8ee-0800275ff74d at node3 port: 3306.'
2015-10-27T17:15:30.118109Z 6 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2015-10-27T17:15:30.130001Z 7 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './node2-relay-bin-group_replication_recovery.000001' position: 4
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0
[XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0
[XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0
[XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0
[XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0
[XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0
[XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0
[XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0
[XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134
2015-10-27T17:15:30.169817Z 6 [ERROR] Slave I/O for channel 'group_replication_recovery': error connecting to master 'rpl_user@node3:3306' - retry-time: 60 retries: 1, Error_code: 1130
2015-10-27T17:15:30.169856Z 6 [Note] Slave I/O thread for channel 'group_replication_recovery' killed while connecting to master
2015-10-27T17:15:30.169862Z 6 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2015-10-27T17:15:30.174955Z 5 [ERROR] Plugin group_replication reported: 'There was an error when connecting to the donor server. Check group replication recovery's connection credentials.'
2015-10-27T17:15:30.175573Z 5 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/86400'
2015-10-27T17:15:30.178016Z 5 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='node3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='node1', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2015-10-27T17:15:30.189233Z 5 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor a0ef74a1-7cb3-11e5-845e-0800275ff74d at node1 port: 3306.'
2015-10-27T17:15:30.190787Z 8 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2015-10-27T17:15:30.610531Z 8 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'rpl_user@node1:3306',replication started in log 'FIRST' at position 4
get_nodeno(get_site_def()) = 2 task_now() = 1445966131.000425 n = 74 (n - old_n) / (task_now() - old_t) = 11.136156

During this process we can check the status in any node as follows:

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | RECOVERING |
| group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE |
| group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE |
| group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE |
| group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec)

Again, as expected, the node connected to the cluster, requested binary logs from latest GTID executed position and applied remaining changes to be back online.

The final test I’ve done so far is about data consistency. For example, what if I stop group replication in a node and make some data changes? When it gets back to replication will it send these changes?

Let’s see a very simple example:
Node2:

mysql> select * from sbtest1 where id=15;
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.12 sec)

Node1:

mysql> select * from sbtest1 where id=15;
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> stop group_replication;
Query OK, 0 rows affected (0.03 sec)
mysql> delete from sbtest1 where id=15;
Query OK, 1 row affected (0.02 sec)
mysql> start group_replication;
Query OK, 0 rows affected (0.02 sec)
mysql> select * from sbtest1 where id=15;
Empty set (0.00 sec)

And now Node2 again:

mysql> select * from sbtest1 where id=15;
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 |
+----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

Hmmmm, not cool, what if I try to remove a row from Node2?

mysql> delete from sbtest1 where id=15;
Query OK, 1 row affected (0.01 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE |
| group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE |
| group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec)

Hmmmm, strange, everything seems to be working correctly. Is it? Let’s check node1 again:

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE |
| group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE |
| group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.01 sec)
mysql> stop group_replication;
Query OK, 0 rows affected (4.01 sec)
mysql> start group_replication;
Query OK, 0 rows affected (2.41 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | OFFLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
1 row in set (0.00 sec)
mysql> exit
Bye
[root@node1 data]# tail -100 error.log
2015-10-27T17:52:50.075274Z 15 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Delete_rows event on table test.sbtest1; Can't find record in 'sbtest1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 346, Error_code: 1032
2015-10-27T17:52:50.075274Z 15 [Warning] Slave: Can't find record in 'sbtest1' Error_code: 1032
2015-10-27T17:52:50.075274Z 15 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 0
2015-10-27T17:52:50.075294Z 2 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2015-10-27T17:52:50.075308Z 2 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
2015-10-27T17:52:50.075312Z 2 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 1'
[XCOM BINDING DEBUG] ::join()
connecting to 192.168.70.2 10300
state 3489 action xa_init
connecting to 192.168.70.2 10300
....
XCOM BINDING DEBUG] ::join():: I am NOT the boot node.
[XCOM BINDING DEBUG] ::join():: Skipping own address.
[XCOM BINDING DEBUG] ::join():: xcom_client_open_connection to 192.168.70.3:10300
connecting to 192.168.70.3 10300
connected to 192.168.70.3 10300
[XCOM BINDING DEBUG] ::join():: Calling xcom_client_add_node
cli_err 0
state 3489 action xa_snapshot
new state x_recover
state 3505 action xa_complete
new state x_run
[XCOM BINDING DEBUG] ::join():: GCS_OK
[XCOM_BINDING_DEBUG] ::cb_xcom_receive_global_view():: message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 0
[XCOM_BINDING_DEBUG] ::cb_xcom_receive_global_view():: node set: peer: 0 flag: 1 peer: 1 flag: 1 peer: 2 flag: 1
[XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: Processing new view on Handler
[XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: My node_id is 2
[XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: State Exchange started.
get_nodeno(get_site_def()) = 2 task_now() = 1445968372.450627 n = 0 median_time() = 1.000000 executed_msg={84bca5ce 12585 1}
 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 1
[XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message
[XCOM_BINDING_DEBUG] ::process_possible_control_message()::Still waiting for more State Exchange messages
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 2
[XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message
[XCOM_BINDING_DEBUG] ::process_possible_control_message()::Still waiting for more State Exchange messages
[XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12586 message_id.node= 0
[XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message
[XCOM_BINDING_DEBUG] ::process_possible_control_message()::Install new view
[XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data
[XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data
[XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data
2015-10-27T17:52:52.455340Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 46750407:11'
2015-10-27T17:52:52.456474Z 16 [ERROR] Plugin group_replication reported: 'Can't evaluate the group replication applier execution status. Group replication recovery will shutdown to avoid data corruption.'
2015-10-27T17:52:52.456503Z 16 [ERROR] Plugin group_replication reported: 'Fatal error during the Recovery process of Group Replication. The server will leave the group.'
[XCOM BINDING DEBUG] ::leave()
[XCOM BINDING DEBUG] ::leave():: Skipping own address.
[XCOM BINDING DEBUG] ::leave():: xcom_client_open_connection to 192.168.70.3:10300
connecting to 192.168.70.3 10300
[XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 750756806
connected to 192.168.70.3 10300
[XCOM BINDING DEBUG] ::leave():: Calling xcom_client_remove_node
cli_err 0
handle_remove_node /export/home2/pb2/build/sb_0-16846472-1445524610.82/build/BUILD/mysql-server/plugin/group_replication/gcs/src/bindings/xcom/xcom/xcom_base.c:1987 nodes: 0x3d05fa8 nodes->node_list_len = 1 nodes->node_list_val: 0x3da7da0 node_address n.address: 0x3d238d0 192.168.70.2:10300
getstart group_id 84bca5ce
state 3551 action xa_terminate
new state x_start
state 3489 action xa_exit
Exiting xcom thread
new state x_start
[XCOM BINDING DEBUG] ::leave():: Installing Leave view
[XCOM_BINDING_DEBUG] ::install_view():: No exchanged data
[XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 750756806
[XCOM BINDING DEBUG] ::leave():: Exiting with error=GCS_OK

So it looks like a member that has data inconsistencies might be reported as ONLINE erroneously, but whenever group replication is restarted it will fail and won’t be able to join to the cluster. It seems there should be better error handling when a data inconsistency is found.

What about the operational perspective?
It looks very limited, just a few variables and status counters, plus some status tables in performance schema as follows:

mysql> show global variables like '%group_repli%';
+---------------------------------------------------+---------------------------------------+
| Variable_name | Value |
+---------------------------------------------------+---------------------------------------+
| group_replication_allow_local_lower_version_join | OFF |
| group_replication_auto_increment_increment | 7 |
| group_replication_bootstrap_group | OFF |
| group_replication_components_stop_timeout | 31536000 |
| group_replication_gcs_engine | xcom |
| group_replication_group_name | 8a94f357-aab4-11df-86ab-c80aa9429562 |
| group_replication_local_address | 192.168.70.3:10300 |
| group_replication_peer_addresses | 192.168.70.2:10300,192.168.70.3:10300 |
| group_replication_pipeline_type_var | STANDARD |
| group_replication_recovery_complete_at | TRANSACTIONS_CERTIFIED |
| group_replication_recovery_password | |
| group_replication_recovery_reconnect_interval | 60 |
| group_replication_recovery_retry_count | 86400 |
| group_replication_recovery_ssl_ca | |
| group_replication_recovery_ssl_capath | |
| group_replication_recovery_ssl_cert | |
| group_replication_recovery_ssl_cipher | |
| group_replication_recovery_ssl_crl | |
| group_replication_recovery_ssl_crlpath | |
| group_replication_recovery_ssl_key | |
| group_replication_recovery_ssl_verify_server_cert | OFF |
| group_replication_recovery_use_ssl | OFF |
| group_replication_recovery_user | rpl_user |
| group_replication_start_on_boot | ON |
+---------------------------------------------------+---------------------------------------+
24 rows in set (0.00 sec)
mysql> show global status like '%group_repli%';
+-----------------------------+-------+
| Variable_name | Value |
+-----------------------------+-------+
| Com_group_replication_start | 8 |
| Com_group_replication_stop | 7 |
+-----------------------------+-------+
2 rows in set (0.01 sec)
mysql> show tables from performance_schema like '%group%';
+----------------------------------------+
| Tables_in_performance_schema (%group%) |
+----------------------------------------+
| replication_group_member_stats |
| replication_group_members |
+----------------------------------------+
2 rows in set (0.00 sec)

Most of values above are self-descriptive. I still need to dig into it a bit more to find the function for some of them.

Conclusions:

So far the work done with group replication is very impressive. Of course there is still a long road to travel, but it doesn’t look to be fair to compare group replication against Galera, unless it is not a side by side comparison.

Even if I like the idea of using a legacy component, I don’t like the need to install and configure Corosync because it’s another piece of software that could eventually fail. Fortunately this can be avoided with the newer version of the plugin, which can use the new XCom communication framework. I tested both versions and using XCom is far easier to setup and configure; however, the error log file can become very verbose, maybe too verbose in my opinion.
With regards to installation and configuration it’s pretty easy once you find the proper way to do it. There are few variables to configure to have a working cluster and most of the settings works just fine by default (like group_replication_auto_increment_increment).

I would still like to have some automatic control on data inconsistency handling (like SST in Galera), but in my opinion this new feature can be a good approach to consider in the future when looking for high availability solutions. A lot of tests need to be done and I’d also like to see some benchmarks. These are just my first impressions and we should wait some time before seeing this feature as GA. Paraphrasing that song “it’s a long way to the top if you wanna rock ‘n’ roll.”

The post MySQL 5.7 first impressions on group-replication appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com