Jul
20
2018
--

InnoDB Cluster in a Nutshell Part 3: MySQL Shell

MySQL InnoDB Cluster MySQL Shell

MySQL InnoDB Cluster MySQL ShellWelcome to the third part of this series. I’m glad you’re still reading, as hopefully this means you find this subject interesting at least. Previously we presented the first two components of MySQL InnoDB Cluster: Group Replication and MySQL Router and now we will discuss the last component, MySQL Shell.

MySQL Shell

This is the last component in the cluster and I love it. Oracle have created this tool to centralize cluster management, providing a friendly, command-line based user interface.

The tool can be defined as an advanced MySQL shell, which is much more powerful than the well known MySQL client. With the capacity to work with both relational and document (JSON) data, the tool provides an extended capability to interact with the database from a single place.

MySQL Shell is also able to understand different languages:

  • JavaScript (default) which includes several built-in functions to administer the cluster—create, destroy, restart, etc.—in a very easy way.
  • Python it provides an easy way to write Python code to interact with the database. This is particularly useful for developers who don’t need to have SQL skills or run applications to test code.
  • SQL to work in classic mode to query database as we used to do with the old MySQL client.

A very interesting feature provided with MySQL Shell is the ability to establish different connections to different servers/clusters from within the same shell. There is no need to exit to connect to a different server, just issuing the command \connect will make this happen. As DBA, I find this pretty useful when handling multiple clusters/servers.

Some of the features present in this tool:

  • Capacity to use both Classic and X protocols.
  • Online switch mode to change languages (JavaScript, Python and SQL)
  • Auto-completion of commands using tab, a super expected feature in MySQL client.
  • Colored formatting output that also supports different formats like Table, Tab-separated and Json formats.
  • Batch mode that processes batches of commands allowing also an interactive mode to print output according each line is processed.

Some sample commands

Samples of new tool and execution modes:

#switch modes
\sql
\js
\py
#connect to instance
\connect user@host:[port]
#create a cluster (better to handle through variables)
var cluster=dba.createCluster('percona')
#add instances to cluster
cluster.addInstance(‘root@192.168.70.2:3306’)
#check cluster status
cluster.status()
#using another variable
var cluster2=dba.getCluster(‘percona’)
cluster.status()
#get cluster structure
cluster.describe()
#rejoin instance to cluster - needs to be executed locally to the instance
cluster.rejoinInstance()
#rejoin instance to cluster - needs to be executed locally to the instance
cluster.rejoinInstance()
#recover from lost quorum
cluster.forceQuorumUsingPartitionOf(‘root@localhost:3306’)
#recover from lost quorum
cluster.rebootClusterFromCompleteOutage()
#destroy cluster
cluster.dissolve({force:true});

Personally, I think this tool is a very good replacement for the classic MySQL client. Sadly, mysql-server installations do not include MySQL shell by default, but it is worth getting used to. I recommend you try it.

Conclusion

We finally reached the end of this series. I hope you have enjoyed this short introduction to what seems to be Oracle’s bid to have a built-in High Availability solution based on InnoDB. It may become a good competitor to Galera-based solutions. Still, there is a long way to go, as the tool was only just released as GA (April 2018). There are a bunch of things that need to be addressed before it becomes consistent enough to be production-ready. In my personal opinion, it is not—yet. Nevertheless, I think it is a great tool that will eventually be a serious player in the HA field as it’s an excellent, flexible and easy to deploy solution.

The post InnoDB Cluster in a Nutshell Part 3: MySQL Shell appeared first on Percona Database Performance Blog.

Jul
09
2018
--

InnoDB Cluster in a nutshell – Part 1

innodb cluster in a nutshell part 1

Since MySQL 5.7 we have a new player in the field, MySQL InnoDB Cluster. This is an Oracle High Availability solution that can be easily installed over MySQL to get High Availability with multi-master capabilities and automatic failover.

This solution consists in 3 components: InnoDB Group Replication, MySQL Router and MySQL Shell, you can see how these components interact in this graphic:

Graphic describing InnoDB Group Replication, MySQL Router and MySQL Shell

In this three blog post series, we will cover each of this components to get a sense of what this tool provides and how it can help with architecture decisions.

Group Replication

This is the actual High Availability solution, and a while ago I wrote a short review of it when it still was in its labs stage. It has improved a lot since then.

This solution is based on a plugin that has to be installed (not installed by default) and works on the top of built-in replication. So it relies on binary logs and relay logs to apply writes to members of the cluster.

The main concept about this new type of replication is that all members of a cluster (i.e. each node) are considered equals. This means there is no master-slave (where slaves follow master) but members that apply transactions based on a consensus algorithm. This algorithm forces all members of a cluster to commit or reject a given transaction following decisions made by each individual member.

In practical terms, this means each member of the cluster has to decide if a transaction can be committed (i.e. no conflicts) or should be rolled back but all other members follow this decision. In other words, the transaction is either committed or rolled back according to the majority of members in a consistent state.

To achieve this, there is a service that exposes a view of cluster status indicating what members form the cluster and the current status of each of them. Additionally Group Replication requires GTID and Row Based Replication (or

binlog_format=ROW

 ) to distribute each writeset with row changes between members. This is done via binary logs and relay logs but before each transaction is pushed to binary/relay logs it has to be acknowledged by a majority of members of the clusters, in other words through consensus. This process is synchronous, unlike legacy replication. After a transaction is replicated we have a certification process to commit the transaction, and thus making it durable.

Here appears a new concept, the certification process, which is the process that confirms if a writeset can be applied/committed (i.e. a row change can be done without conflicts) after replication of the transaction is complete.

Basically this process consists of inspecting writesets to check if there are conflicts (i.e. same row updated by concurrent transactions). Based on an order set in the writeset, the conflict is resolved by ‘first-commiter wins’ while the second is rolled back in the originator. Finally, the transaction is pushed to binary/relay logs and committed.

Solution features

Some other features provided by this solution are:

  • Single-primary or multi-primary modes meaning that the cluster can operate with a single writer and multiple readers (recommended and default setup); or with multiple writers where all nodes are capable to accept write transactions. The latter is at the cost of a performance penalty due to conflict resolution.
  • Automatic failure detection, where an internal mechanism is able to detect a failed node (i.e. a crash, network problems, etc) and decide to exclude it from the cluster automatically. Also if a member can’t communicate with the cluster and gets isolated, it can’t accept transactions. This ensures that cluster data is not impacted by this situation.
  • Fault tolerance. This is the strategy that the cluster uses to support failing members. As described above, this is based on a majority. A cluster needs at least three members to support one node failure because the other two members will keep the majority. The bigger the number of nodes, the bigger the number of failing nodes the cluster supports. The maximum number of members (nodes) in a cluster is currently limited to 7. If it has seven members, then the majority is kept by four or more active members. In other words, a cluster of seven would support up to three failing nodes.

We will not cover installation and configuration aspects now. This will probably come with a new series of blogs where we can cover not only deployment but also use cases and so on.

In the next post we will talk about the next cluster component: MySQL Router, so stay tuned.

The post InnoDB Cluster in a nutshell – Part 1 appeared first on Percona Database Performance Blog.

Aug
01
2017
--

Group Replication: the Sweet and the Sour

Group Replication

In this blog, we’ll look at group replication and how it deals with flow control (FC) and replication lag. 

Overview

In the last few months, we had two main actors in the MySQL ecosystem: ProxySQL and Group-Replication (with the evolution to InnoDB Cluster). 

While I have extensively covered the first, my last serious work on Group Replication dates back to some lab version years past.

Given that Oracle decided to declare it GA, and Percona’s decision to provide some level of Group Replication support, I decided it was time for me to take a look at it again.

We’ve seen a lot of coverage already too many Group Replication topics. There are articles about Group Replication and performance, Group Replication and basic functionalities (or lack of it like automatic node provisioning), Group Replication and ProxySQL, and so on.

But one question kept coming up over and over in my mind. If Group Replication and InnoDB Cluster have to work as an alternative to other (virtually) synchronous replication mechanisms, what changes do our customers need to consider if they want to move from one to the other?

Solutions using Galera (like Percona XtraDB Cluster) must take into account a central concept: clusters are data-centric. What matters is the data and the data state. Both must be the same on each node at any given time (commit/apply). To guarantee this, Percona XtraDB Cluster (and other solutions) use a set of data validation and Flow Control processes that work to the ensure a consistent cluster data set on each node.

The upshot of this principle is that an application can query ANY node in a Percona XtraDB Cluster and get the same data, or write to ANY node and know that the data is visible everywhere in the cluster at (virtually) the same time.

Last but not least, inconsistent nodes should be excluded and either rebuild or fixed before rejoining the cluster.

If you think about it, this is very useful. Guaranteeing consistency across nodes allows you to transparently split write/read operations, failover from one node to another with very few issues, and more.

When I conceived of this blog on Group Replication (or InnoDB Cluster), I put myself in the customer shoes. I asked myself: “Aside from all the other things we know (see above), what is the real impact of moving from Percona XtraDB Cluster to Group Replication/InnoDB Cluster for my application? Since Group Replication still (basically) uses replication with binlogs and relaylog, is there also a Flow Control mechanism?” An alarm bell started to ring in my mind.

My answer is: “Let’s do a proof of concept (PoC), and see what is really going on.”

The POC

I setup a simple set of servers using Group Replication with a very basic application performing writes on a single writer node, and (eventually) reads on the other nodes. 

You can find the schema definition here. Mainly I used the four tables from my windmills test suite — nothing special or specifically designed for Group Replication. I’ve used this test a lot for Percona XtraDB Cluster in the past, so was a perfect fit.

Test Definition

The application will do very simple work, and I wanted to test four main cases:

  1. One thread performing one insert at each transaction
  2. One thread performing 50 batched inserts at each transaction
  3. Eight threads performing one insert to each transaction
  4. Eight threads performing 50 batched inserts at each transaction

As you can see, a pretty simple set of operations. Then I decided to test it using the following four conditions on the servers:

  1. One slave worker FC as default
  2. One slave worker FC set to 25
  3. Eight slave workers FC as default
  4. Eight slave workers FC set to 25

Again nothing weird or strange from my point of view. I used four nodes:

  1. Gr1 Writer
  2. Gr2 Reader
  3. Gr3 Reader minimal latency (~10ms)
  4. Gr4 Reader minimal latency (~10ms)

Finally, I had to be sure I measured the lag in a way that allowed me to reference it consistently on all nodes. 

I think we can safely say that the incoming GTID (last_ Received_transaction_set from replication_connection_status) is definitely the last change applied to the master that the slave node knows about. More recent changes could have occurred, but network delay can prevent them from being “received.” The other point of reference is GTID_EXECUTED, which refers to the latest GTID processed on the node itself.

The closest query that can track the distance will be:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-2),':',1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( Received_transaction_set,':',-2),':',1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag

Or in the case of a single worker:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(Received_transaction_set,':',-1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag;

The result will be something like this:

+---------------+
| last_executed |
+---------------+
| 23607         |
+---------------+
+---------------+
| last_received |
+---------------+
| 23607         |
+---------------+
+----------+
| real_lag |
+----------+
|        0 |
+----------+

The whole set of tests can be found here, with all the commands you need to run the application (you can find it here) and replicate the tests. I will focus on the results (otherwise this blog post would be far too long), but I invite you to see the details.

The Results

Efficiency on Writer by Execution Time and Rows/Sec

Using the raw data from the tests (Excel spreadsheet available here), I was interested in identifying if and how the Writer is affected by the use of Group Replication and flow control.

Reviewing the graph, we can see that the Writer has a linear increase in the execution time (when using default flow control) that matches the increase in the load. Nothing there is concerning, and all-in-all we see what is expected if the load is light. The volume of rows at the end justifies the execution time.

It’s a different scenario if we use flow control. The execution time increases significantly in both cases (single worker/multiple workers). In the worst case (eight threads, 50 inserts batch) it becomes four times higher than the same load without flow control.

What happens to the inserted rows? In the application, I traced the rows inserted/sec. It is easy to see what is going on there:

We can see that the Writer with flow control activated inserts less than a third of the rows it processes without flow control. 

We can definitely say that flow control has a significant impact on the Writer performance. To clarify, let’s look at this graph:

Without flow control, the Writer processes a high volume of rows in a limited amount of time (results from the test of eight workers, eight threads, 50 insert batch). With flow control, the situation changes drastically. The Writer takes a long time processing a significantly smaller number of rows/sec. In short, performance drops significantly.

But hey, I’m OK with that if it means having a consistent data-set cross all nodes. In the end, Percona XtraDB Cluster and similar solutions pay a significant performance price match the data-centric principle. 

Let’s see what happen on the other nodes.

Entries Lag

Well, this scenario is not so good:

When NOT using flow control, the nodes lag behind the writer significantly. Remember that by default flow control in Group Replication is set to 25000 entries (I mean 25K of entries!!!).

What happens is that as soon as I put some salt (see load) on the Writer, the slave nodes start to lag. When using the default single worker, that will have a significant impact. While using multiple workers, we see that the lag happens mainly on the node(s) with minimal (10ms) network latency. The sad thing is that is not really going down with respect to the single thread worker, indicating that the simple minimal latency of 10ms is enough to affect replication.

Time to activate the flow control and have no lag:

Unfortunately, this is not the case. As we can see, the lag of single worker remains high for Gr2 (154 entries). While using multiple workers, the Gr3/4 nodes can perform much better, with significantly less lag (but still high at ~1k entries).

It is important to remember that at this time the Writer is processing one-third or less of the rows it is normally able to. It is also important to note that I set 25 to the entry limit in flow control, and the Gr3 (and Gr4) nodes are still lagging more than 1K entries behind.

To clarify, let check the two graphs below:

Using the Writer (Master) as a baseline in entry #N, without flow control, the nodes (slaves) using Group Replication start to significantly lag behind the writer (even with a light load).

The distance in this PoC ranged from very minimal (with 58 entries), up to much higher loads (3849 entries):

Using flow control, the Writer (Master) diverges less, as expected. If it has a significant drop in performance (one-third or less), the nodes still lag. The worst-case is up to 1363 entries. 

I need to underline here that we have no further way (that I am aware of, anyway) to tune the lag and prevent it from happening.

This means an application cannot transparently split writes/reads and expect consistency. The gap is too high.

A Graph That Tells Us a Story

I used Percona Monitoring and Management (PMM) to keep an eye on the nodes while doing the tests. One of the graphs really showed me that Group Replication still has some “limits” as the replication mechanism for a cluster:

This graph shows the MySQL queries executed on all the four nodes, in the testing using 8-50 threads-batch and flow control. 

As you can see, the Gr1 (Writer) is the first one to take off, followed by Gr2. Nodes Gr3 and Gr4 require a bit more, given the binlog transmission (and 10ms delay). Once the data is there, they match (inconsistently) the Gr2 node. This is an effect of flow control asking the Master to slow down. But as previously seen, the nodes will never match the Writer. When the load test is over, the nodes continue to process the queue for additional ~130 seconds. Considering that the whole load takes 420 seconds on the Writer, this means that one-third of the total time on the Writer is spent syncing the slave AFTERWARDS.

The above graph shows the same test without flow control. It is interesting to see how the Writer moved above 300 queries/sec, while G2 stayed around 200 and Gr3/4 far below. The Writer was able to process the whole load in ~120 seconds instead 420, while Gr3/4 continue to process the load for an additional ~360 seconds.

This means that without flow control set, the nodes lag around 360 seconds behind the Master. With flow control set to 25, they lag 130 seconds.

This is a significant gap.

Conclusions

Going back to the reason why I was started this PoC, it looks like my application(s) are not a good fit for Group Replication given that I have set Percona XtraDB Cluster to scale out the reads and efficiently move my writer to another when I need to. 

Group Replication is still based on asynchronous replication (as my colleague Kenny said). It makes sense in many other cases, but it doesn’t compare to solutions based on virtually synchronous replication. It still requires a lot of refinement.

On the other hand, for applications that can afford to have a significant gap between writers and readers it is probably fine. But … doesn’t standard replication already cover that? 

Reviewing the Oracle documentations (https://dev.mysql.com/doc/refman/5.7/en/group-replication-background.html), I can see why Group Replication as part of the InnoDB cluster could help improve high availability when compared to standard replication. 

But I also think it is important to understand that Group Replication (and derived solutions like InnoDB cluster) are not comparable or a replacement for data-centric solutions as Percona XtraDB Cluster. At least up to now.

Good MySQL to everyone.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com