Dec
19
2017
--

Webinar Q&A: Percona XtraDB Cluster 101

Percona XtraDB Cluster

Percona XtraDB ClusterIn this blog, we will answer questions from our webinar on Percona XtraDB Cluster 101.

Recently (7 Dec 2017) I presented a webinar about Percona XtraDB Cluster 101. Firstly, thanks to all the attendees: we had a great webinar with quite some interesting questions and feedback.

Through this blog, I’ll answer most of the questions that were raised during the webinar.

Q. How does the need for the acknowledgment from other nodes affect the speed of writes?

A. There are two parts to replication: delivering a transaction (including acknowledgment) and applying the transaction. Generally, the first part is pretty quick and dictated by the network latency. The second part is time-consuming, but happens asynchronously. So acknowledging a transaction from other nodes is not that time-consuming.

Q. How can geo-distributed nodes affect the speed of writes?

A. The longest node dictates cluster performance (in terms of latency). You can’t write faster than the time it takes for a packet to reach the longest node (round-trip-latency). So geo-distribution does affect write performance.

Q. Would you consider Master -> Slave replication in RDS a traditional replication of MySQL? And how easy is it replicating from PXC to RDS if its possible

A. If an application doesn’t need high availability, then a user can explore the MASTER-SLAVE replication. But I would argue that if I am going to spend time booting two servers (MASTER and SLAVE), then why not boot both as MASTER (through Percona XtraDB Cluster). This ensures HA and write-scalability. Percona XtraDB Cluster is flexible for all topologies, and can act as async-master or async-slave too.

Q. Moving forward, is there a plan to deal with version control tools like Flyway that still uses Get locks?

A. Statements like GET_LOCK that establish local locks at the said node are not cluster-safe, so they are blocked with

pxc_strict_mode=ENFORCING

 and not recommended for use. With that said, if the application/user tries to use these statements in a non-conflicting way (with the load directed to single master) it could still work.

Q. With an ASYNC slave, can you use GTID? I know there is a bug(s) that prevent this currently from working 100% in MariaDB (though the bug is close to being fixed – MDEV-10715)?

A. Yes, you can use GTID with async-slave. MariaDB has different implementation of GTID so I am not in a position to comment on the latter part.

Q. Do tables need to have a primary key for the cluster to work?

A. Yes, all tables that you plan to use in a cluster should have a primary key (

pxc_strict_mode=ENFORCING

 enforces this criterion). This is mainly needed for conflict resolution, when the same conflicting workload is executed on multiple-nodes.

Q. What is the best wsrep_sst_method? (for huge database)

A. Percona XtraDB Cluster recommends using XtraBackup. It doesn’t lock the tables for the complete SST life-time, so you can continue to use the node while it is acting as DONOR.

Q. Is the “show processlist” node-specific? Is there an equivalent command to show the whole cluster process list?

A. Yes, show processlist is node specific. There is currently no way to cluster-wide-processlist.

Q. does PXC support partitioned tables?

A. Yes, using InnoDB native partitioning.

Q. These nodes (PXC-nodes) are api nodes or data nodes ?

A. Data nodes.

Q. If cluster went down then everytime it follow SST/IST?

A. It depends. If there is DONOR that has a missing write-set, then the node can rejoin through IST else SST.

Q. Around how much time it will take to join the cluster?

A. The time a node takes to join back depends on the size of the data. Generally, the time for SST is longer than IST. The good part is with 5.7.17+ we have considerably reduced the time for IST, so that a node can join faster than before.

Q. How does IST (incremental state transfer) process affect cluster performance?

A. IST is asynchronous and doesn’t emit FLOW_CONTROL, so cluster can continue to perform as normal. A small slice of DONOR bandwidth is used to send data to the JOINER, but it is not that significant to affect the overall cluster performance.

Q. How do you handle a situation when three simultaneous transactions try to insert auto_increment value?

A. Percona XtraDB Cluster has a concept of wsrep_auto_increment_control that adjust the increment size on each node based on a number of nodes in the cluster. Please check this link for more details.

Q. Imagine that a table A has a trigger on insert that inserts data into another table B. And there are two concurrent transactions: TA inserts into table A (and the trigger makes an insert into B) and TB that inserts the same data directly into B. Will such a conflict – insert from TB and from trigger – be detected?

A. Yes. A transaction can touch multiple data-objects and when the conflict resolution is done, it will check all the objects that transaction is planning to modify before certifying a transaction is safe to apply.

Q. How PXC will make sure of data integrity with parallel processing?

A. Percona XtraDB Cluster has conflict resolution protocol. This protocol is based on FIRST COMMITTER WIN principle that ensures only the first transaction (from a group of a conflicting transaction) commits to cluster.

Q. I’ve created a three node cluster and replication is working. I’d like to copy our production data to the cluster since exporting and importing from MySQL takes a long time. Should I have waited to bootstrap the cluster until the data directory is transferred?

A. If you already have a cluster in place then you are simply adding new tables to the cluster. You can start adding (LOADING) the tables and these tables are immediately replicated to the other nodes of the cluster. An alternative would be to start the first node of the cluster with the pre-loaded data that then becomes cluster state. Other joining nodes copy it over through SST.

Q. Do we have an option to autospinup the compute nodes in a cloud? If PXC will have that option or do we manually need to spinup the Instance and setup the replication?

A. You will have to manually configure it.

Q. Why does XtraBackup not work due bootstrapping but works perfectly after bootstrapping? rsync is working in both cases.

A. Not sure I get the question completely, but XtraBackup works in all scenarios. If you are facing any issue, please log it on launchpad.

Q. As per flow control, one node waits for the other node to be in sync. Won’t there be latency in writing the data?

A. The transaction originated from one node needs to get replicated on other nodes of the cluster. This is what we can call latency and is dictated by network latency. Flow-control is mainly to regulate a scenario wherein one node of the cluster falls way behind other nodes of the cluster.

Q. Can we set up PXC using AWS EC2?

A. Yes.

Once again, thanks for taking time to attend the webinar. If you have more questions, then please post them to the Percona XtraDB Cluster forum here. Also, we have a lot of blogs about Percona XtraDB Cluster. Make sure you check them out here.

Dec
06
2017
--

Webinar Thursday, December 7, 2017: Percona XtraDB Cluster (PXC) 101

Percona XtraDB Cluster

Percona XtraDB ClusterJoin Percona’s Software Engineer (PXC Lead), Krunal Bauskar as he presents Percona XtraDB Cluster 101 on Thursday, December 7, 2017, at 7:00 am PST / 10:00 am EST (UTC-8).

Tags: Percona XtraDB Cluster, MySQL, High Availability, Clustering

Experience Level: Beginner

Percona XtraDB Cluster (PXC) is a multi-master solution that offers virtual synchronous replication among clustering node. It is based on the Codership Galera replication library. In this session, we will explore some key features of Percona XtraDB Cluster that make it enterprise ready including some recently added 5.7 exclusive features.

This webinar is an introductory and will cover the following topics:

  • ProxySQL load balancer
  • Multi-master replication
  • Synchronous replication
  • Data at rest encryption
  • Improved SST Security through simplified configuration
  • Easy to setup encrypted between-nodes communication
  • ProxySQL-assisted Percona XtraDB Cluster maintenance mode
  • Automatic node provisioning
  • Percona XtraDB Cluster “strict-mode”

Register for the webinar now.

Percona XtraDB ClusterKrunal Bauskar, C/C++ Engineer

Krunal joined Percona in September 2015. Before joining Percona he worked as part of the InnoDB team at MySQL/Oracle. He authored most of the temporary table revamp work besides a lot of other features. In the past, he was associated with Yahoo! Labs researching on big data problems and database startup which is now part of Teradata. His interest mainly includes data-management at any scale, and he has been practicing it for more than a decade now. He loves to spend time with his family or get involved in social work, unless he is out for some near-by exploration drive. He is located out of Pune, India.

Nov
28
2017
--

Best Practices for Percona XtraDB Cluster on AWS

Percona XtraDB Cluster on AWS 2 small

In this blog post I’ll look at the performance of Percona XtraDB Cluster on AWS using different service instances, and recommend some best practices for maximizing performance.

You can use Percona XtraDB Cluster in AWS environments. We often get questions about how best to deploy it, and how to optimize both performance and spend when doing so. I decided to look into it with some benchmark testing.

For these benchmark tests, I used the following configuration:

  • Region:
    • Availability zones: US East – 1, zones: b, c, d
    • Sysbench 1.0.8
    • ProxySQL 1.4.3
    • 10 tables, 40mln records – ~95GB dataset
    • Percona XtraDB Cluster 5.7.18
    • Amazon Linux AMI

We evaluated different AWS instances to provide the best recommendation to run Percona XtraDB Cluster. We used instances

  • With General Purpose storage volumes, 200GB each
  • With IO provisioned volumes, 200GB, 10000 IOS
  • I3 instances with local attached NVMe storage.

We also used different instance sizes:

Instance vCPU Memory
r4.large 2 15.25
r4.xlarge 4 30.5
r4.2xlarge 8 61
r4.4xlarge 16 122
i3.large 2 15.25
i3.xlarge 4 30.5
i3.2xlarge 8 61
i3.4xlarge 16 122

 

While I3 instances with NVMe storage do not provide the same functionality for handling shared storage and snapshots as General Purpose and IO provisioned volumes, since Percona XtraDB Cluster provides data duplication by itself we think it is still valid to include them in this comparison.

We ran benchmarks in the US East 1 (N. Virginia) Region, and we used different availability zones for each of the Percona XtraDB Cluster zones (mostly zones “b”, “c” and “d”):

Percona XtraDB Cluster on AWS 1

The client was directly connected and used ProxySQL, so we were able to measure ProxySQL’s performance overhead as well.

ProxySQL is an advanced method to access Percona XtraDB Cluster. It can perform a health check of the nodes and route the traffic to the ONLINE node. It can also split read and write traffic and route read traffic to different nodes (although we didn’t use this capability in our benchmark).

In our benchmarks, we used 1,4, 16, 64 and 256 user threads. For this detailed review, however, we’ll look at the 64 thread case. 

Results

First, let’s review the average throughput (higher is better) and latency (lower is better) results (we measured 99% percentile with one-second resolution):

Percona XtraDB Cluster on AWS 2

Results summary, raw performance:

The performance for Percona XtraDB Cluster running on GP2 volumes is often pretty slow, so it is hard to recommend this volume type for the serious workloads.

IO provisioned volumes perform much better, and should be considered as the primary target for Percona XtraDB Cluster deployments. I3 instances show even better performance.

I3 instances use locally attached volumes and do not provide equal functionality as EBS IO provisioned volumes — although some of these limitations are covered by Percona XtraDB Cluster’s ability to keep copies of data on each node.

Results summary for jitter:

Along with average throughput and latency, it is important to take into account “jitter” — how stable is the performance during the runs?

Percona XtraDB Cluster on AWS 3

Latency variation for GP2 volumes is significant — practically not acceptable for serious usage. Let’s review the latency for only IO provisioning and NVMe volumes. The following chart provides better scale for just these two:

Percona XtraDB Cluster on AWS 4

At this scale, we see that NVMe provides a 99% better response time and is more stable. There is still variation for IO provisioned volumes.

Results summary, cost

When speaking about instance and volume types, it would be impractical to avoid mentioning of the instance costs. We need to analyze how much we need to pay to achieve the better performance. So we prepared data how much does it cost to produce throughput of 1000 transactions per second.

We compare on-demand and reserved instances pricing (reserved for one year / all upfront / tenancy-default):

Percona XtraDB Cluster on AWS 5

Because IO provisioned instances give much better performance, the price performance is comparable if not better than GP2 instances.

I3 instances are a clear winner.

It is also interesting to compare the raw cost of benchmarked instances:

Percona XtraDB Cluster on AWS 6

We can see that IO provisioned instances are the most expensive, and using reserved instances does not provide much savings. To understand the reason for this, let’s take a look at how cost is calculated for components:

Percona XtraDB Cluster on AWS 7

So for IO provisioned volumes, the majority of the cost comes from IO provisioning (which is the same for both on-demand and reserved instances).

Percona XtraDB Cluster scalability

Another interesting effort is looking at how Percona XtraDB Cluster performance scales with the instance size. As we double resources (both CPU and Memory) while increasing the instance size, how does it affect Percona XtraDB Cluster?

So let’s take a look at throughput:

Percona XtraDB Cluster on AWS 8

Throughput improves with increasing the instance size. Let’s calculate speedup with increasing instance size for IO provisioned and I3 instances:

Speedup X Times to Large Instance IO1 i3
large 1 1
xlarge 2.67 2.11
2xlarge 5.38 4.31
4xlarge 5.96 7.83

 

Percona XtraDB Cluster can scale (improve performance) with increasing instance size. Keep in mind, however, that it depends significantly on the workload. You may not get the same performance speedup as in this benchmark.

ProxySQL overhead

As mentioned above, ProxySQL adds additional functionality to the cluster. It can also add overhead, however. We would like to understand the expected performance penalty, so we compared the throughput and latency with and without ProxySQL.

Out of box, the ProxySQL performance was not great and required additional tuning. 

ProxySQL specific configuration:

  • Use connection through TCP-IP address, not through local socket
  • Adjust  mysql-max_stmts_per_connection variable for optimal value (default:50) – optimal – 1000
  • Ensure that “monitor@<host>” user has permissions as it’s important for proper handling of prepared statement.
    • CREATE USER ‘monitor’@‘172.30.%.%’ IDENTIFIED BY ‘monitor’;

Throughput:

Percona XtraDB Cluster on AWS 9

Response time:

Percona XtraDB Cluster on AWS 10

ProxySQL performance penalty in throughput

ProxySQL performance penalty IO1 i3
large 0.97 0.98
xlarge 1.03 0.97
2xlarge 0.95 0.95
4xlarge 0.96 0.93

 

It appears that ProxySQL adds 3-7% overhead. I wouldn’t consider this a significant penalty for additional functionality.

Summary

Amazon instances

First, the results show that instances based on General Purpose volumes do not provide acceptable performance and should be avoided in general for serious production usage. The choice is between IO provisioned instances and NVMe based instances.

IO provisioned instances are more expensive, but offer much better performance than General Purpose volumes. If we also look at price/performance metric, IO provisioned volumes are comparable with General Purpose volumes. You should use IO provisioned volumes if you are looking for the functionality provided by EBS volumes.

If you do not need EBS volumes, however, then i3 instances with NVMe volumes are a better choice. Both are cheaper and provide better performance than IO provisioned instances. Percona XtraDB Cluster provides data duplication on its own, which mitigates the need for EBS volumes to some extent.

ProxySQL overhead

We recommend using Percona XtraDB Cluster in combination with ProxySQL, as ProxySQL provides additional management and routing functionality. In general, the overhead for ProxySQL is not significant. But in our experience, however, ProxySQL has to be properly tuned — otherwise the performance penalty could be a bottleneck.

Percona XtraDB Cluster scalability

AWS has great capability to increase the instance size (both CPU and memory) if we exceed the capacity of the current instance. From our experiments, we see that Percona XtraDB Cluster can scale along with and benefit from increased instance size.

Below is a chart showing the speedup in relation to the instance size:

Percona XtraDB Cluster on AWS 11

So increasing the instance size is a feasible strategy for improving Percona XtraDB Cluster performance in an AWS environment.

Thanks for reading this benchmark! Put any questions or thoughts in the comments below.

Nov
15
2017
--

Understanding how an IST donor is selected

IST donor cluster

IST donor clusterIn a clustering environment, we often see a node that needs to be taken down for maintenance. For a node to rejoin, it should re-sync with the cluster state. In PXC (Percona XtraDB Cluster), there are 2 ways for the rejoining node to re-sync: State Snapshot Transfer (SST) and Incremental State Transfer (IST). SST involves a full data transfer (which could be time consuming). IST is an incremental data transfer whereby only missing write-sets are donated by a DONOR to the rejoining node (aka as JOINER).

In this article I will try to show how a DONOR for the IST process is selected.

Selecting an IST DONOR

First, a word about gcache. Each node retains some write-sets in its cache known as gcache. Once this gcache is full it is purged to make room for new write-sets. Based on gcache configuration, each node may retain a different span of write-sets. The wider the span, the greater the probability of the node acting as prospective DONOR. The lowest seqno in gcache can be queried using ( 

show status like 'wsrep_local_cached_downto'

 )

Let’s understand the IST DONOR algorithm with a topology and working example:

  • Say we have 3 node cluster: N1, N2, N3.
  • To start with, all 3 nodes are in sync (wsrep_last_committed is the same for all 3 nodes, let’s say 100).
  • N3 is schedule for maintenance and is taken down.
  • In meantime N1 and N2 processes workload, thereby moving them from 100 -> 1100.
  • N1 and N2 also purges the gcache. Let’s say wsrep_local_cached_downto for N1 and N2 is 110 and 90 respectively.
  • Now N3 is restarted and discovers that the cluster has made progress from 100 -> 1100 and so it needs the write-sets from (101, 1100).
  • It starts looking for a prospective DONOR.
    • N1 can service data from (110, 1100) but the request is for (101, 1100) so N1 can’t act as DONOR
    • N2 can service data from (90, 1100) and the request is for (101, 1100) so N2 can act as DONOR.

Safety gap and how it affects DONOR selection

So far so good. But can N2 reliably act as DONOR? While N3 is evaluating the prospective DONOR, what if N2 purges more data and now wsrep_local_cached_downto on N2 is 105? In order to accommodate this, the N3 algorithm adds a safety gap.

safety gap = (Current State of Cluster – Lowest available seqno from any of the existing node of the cluster) * 0.008

So the N2 range is considered to be (90 + (1100 – 90) * 0.008, 1100) = (98, 1100).

Can now N2 act as DONOR ? Yes: (98, 1100) < (101, 1100)

What if N2 had purged up to 95 and then N3 started looking for prospective DONOR?

In this case the N2 range would be (95 + (1100 – 95) * 0.008, 1100) = (103, 1100), ruling N2 out from the prospective DONOR list.

Twist at the end

Considering the latter case above (N2 purged up to 95), it has been proven that N2 can’t act as the IST DONOR and the only way for N3 to join is through SST.

What if I say that N3 still joins back using IST? CONFUSED?

Once N3 falls back from IST to SST it will select a SST donor. This selection is done sequentially and nominates N1 as the first choice. N1 doesn’t have the required write-sets, so SST is forced.

But what if I configure

wsrep_sst_donor=N2

  on N3? This will cause N2 to get selected instead of N1. But wait: N2 doesn’t qualify either as with safety gap, the range is (103, 1100).

That’s true. But the request has IST + SST request, so even though N3 ruled out N2 as the IST DONOR, a request is sent for one last try. If N2 can service the request using IST, it is allowed to do so.  Otherwise it falls back to SST.

Interesting! This is a well thought out algorithm from Codership: I applaud them for this and the many other important control functions that go on backstage of the galera cluster.

Oct
30
2017
--

Percona XtraDB Cluster 5.7.19-29.22-3 is now available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.7Percona announces the release of Percona XtraDB Cluster 5.7.19-29.22-3 on October 27, 2017. Binaries are available from the downloads section or our software repositories.

NOTE: You can also run Docker containers from the images in the Docker Hub repository.

Percona XtraDB Cluster 5.7.19-29.22-3 is now the current release, based on the following:

All Percona software is open-source and free.

Fixed Bugs

  • Added access checks for DDL commands to make sure they do not get replicated if they failed without proper permissions. Previously, when a user tried to perform certain DDL actions that failed locally due to lack of privileges, the command could still be replicated to other nodes, because access checks were performed after replication.This vulnerability is identified as CVE-2017-15365.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Oct
30
2017
--

Percona XtraDB Cluster 5.6.37-26.21-3 is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6.34-26.19Percona announces the release of Percona XtraDB Cluster 5.6.37-26.21-3 on October 27, 2017. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.37-26.21-3 is now the current release, based on the following:

All Percona software is open-source and free.

Fixed Bugs

  • Added access checks for DDL commands to make sure they do not get replicated if they failed without proper permissions. Previously, when a user tried to perform certain DDL actions that failed locally due to lack of privileges, the command could still be replicated to other nodes, because access checks were performed after replication.This vulnerability is identified as CVE-2017-15365.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Sep
22
2017
--

Percona XtraDB Cluster 5.7.19-29.22 is now available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.7Percona announces the release of Percona XtraDB Cluster 5.7.19-29.22 on September 22, 2017. Binaries are available from the downloads section or our software repositories.

NOTE: You can also run Docker containers from the images in the Docker Hub repository.

Percona XtraDB Cluster 5.7.19-29.22 is now the current release, based on the following:

All Percona software is open-source and free.

Upgrade Instructions

After you upgrade each node to Percona XtraDB Cluster 5.7.19-29.22, run the following command on one of the nodes:

$ mysql -uroot -p < /usr/share/mysql/pxc_cluster_view.sql

Then restart all nodes, one at a time:

$ sudo service mysql restart

New Features

  • Introduced the pxc_cluster_view table to get a unified view of the cluster. This table is exposed through the performance schema.

    mysql> select * from pxc_cluster_view;
    -----------------------------------------------------------------------------
    HOST_NAME  UUID                                  STATUS  LOCAL_INDEX  SEGMENT
    -----------------------------------------------------------------------------
    n1         b25bfd59-93ad-11e7-99c7-7b26c63037a2  DONOR   0            0
    n2         be7eae92-93ad-11e7-88d8-92f8234d6ce2  JOINER  1            0
    -----------------------------------------------------------------------------
    2 rows in set (0.01 sec)
  • PXC-803: Added support for new features in Percona XtraBackup 2.4.7:

    • wsrep_debug enables debug logging
    • encrypt_threads specifies the number of threads that XtraBackup should use for encrypting data (when encrypt=1). This value is passed using the --encrypt-threads option in XtraBackup.
    • backup_threads specifies the number of threads that XtraBackup should use to create backups. See the --parallel option in XtraBackup.

Improvements

  • PXC-835: Limited wsrep_node_name to 64 bytes.
  • PXC-846: Improved logging to report reason of IST failure.
  • PXC-851: Added version compatibility check during SST with XtraBackup:
    • If a donor is 5.6 and a joiner is 5.7: A warning is printed to perform mysql_upgrade.
    • If a donor is 5.7 and a joiner is 5.6: An error is printed and SST is rejected.

Fixed Bugs

  • PXC-825: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to include the --defaults-group-suffix when logging to syslog. For more information, see #1559498.
  • PXC-826: Fixed multi-source replication to PXC node slave. For more information, see #1676464.
  • PXC-827: Fixed handling of different binlog names between donor and joiner nodes when GTID is enabled. For more information, see #1690398.
  • PXC-830: Rejected the RESET MASTER operation when wsrep provider is enabled and gtid_mode is set to ON. For more information, see #1249284.
  • PXC-833: Fixed connection failure handling during SST by making the donor retry connection to joiner every second for a maximum of 30 retries. For more information, see #1696273.
  • PXC-839: Fixed GTID inconsistency when setting gtid_next.
  • PXC-840: Fixed typo in alias for systemd configuration.
  • PXC-841: Added check to avoid replication of DDL if sql_log_bin is disabled. For more information, see #1706820.
  • PXC-842: Fixed deadlocks during Load Data Infile (LDI) with log-bin disabled by ensuring that a new transaction (of 10 000 rows) starts only after the previous one is committed by both wsrep and InnoDB. For more information, see #1706514.
  • PXC-843: Fixed situation where the joiner hangs after SST has failed by dropping all transactions in the receive queue. For more information, see #1707633.
  • PXC-853: Fixed cluster recovery by enabling wsrep_ready whenever nodes become PRIMARY.
  • PXC-862: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to use the ssl-dhparams value from the configuration file.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Sep
20
2017
--

Percona XtraDB Cluster 5.6.37-26.21 is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6.34-26.19Percona announces the release of Percona XtraDB Cluster 5.6.37-26.21 on September 20, 2017. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.37-26.21 is now the current release, based on the following:

All Percona software is open-source and free.

Improvements

  • PXC-851: Added version compatibility check during SST with XtraBackup:
    • If donor is 5.6 and joiner is 5.7: A warning is printed to perform mysql_upgrade.
    • If donor is 5.7 and joiner is 5.6: An error is printed and SST is rejected.

Fixed Bugs

  • PXC-825: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to include the --defaults-group-suffix when logging to syslog. For more information, see #1559498.
  • PXC-827: Fixed handling of different binlog names between donor and joiner nodes when GTID is enabled. For more information, see #1690398.
  • PXC-830: Rejected the RESET MASTER operation when wsrep provider is enabled and gtid_mode is set to ON. For more information, see #1249284.
  • PXC-833: Fixed connection failure handling during SST by making the donor retry connection to joiner every second for a maximum of 30 retries. For more information, see #1696273.
  • PXC-841: Added check to avoid replication of DDL if sql_log_bin is disabled. For more information, see #1706820.
  • PXC-853: Fixed cluster recovery by enabling wsrep_ready whenever nodes become PRIMARY.
  • PXC-862: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to use the ssl-dhparams value from the configuration file.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Sep
05
2017
--

Webinar Wednesday, September 6, 2017: Percona Roadmap and Software News Update – Q3 2017

Percona Roadmap

Percona RoadmapCome and listen to Percona CEO Peter Zaitsev on Wednesday, September 6, 2017 at 10am PT / 1pm ET (UTC-7) discuss the Percona roadmap, as well as what’s new in Percona open source software.

 

During this webinar Peter will talk about newly released features in Percona software, show a few quick demos and share with you highlights from the Percona open source software roadmap. This discussion will cover Percona Server for MySQL and MongoDB, Percona XtraBackup, Percona Toolkit, Percona XtraDB Cluster and Percona Monitoring and Management.

Peter will also talk about new developments in Percona commercial services and finish with a Q&A.

Register for the webinar before seats fill up for this exciting webinar Wednesday, September 6, 2017 at 10am PT / 1pm ET (UTC-7).

Peter ZaitsevPeter Zaitsev, Percona CEO and Co-Founder

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30+ countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University, where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. Fortune and DZone have both tapped Peter as a contributor, and his recent ebook Practical MySQL Performance Optimization is one of percona.com’s most popular downloads.
Aug
22
2017
--

The MySQL High Availability Landscape in 2017 (The Adults)

In this blog post, we’ll look at some of the MySQL high availability solution options.

In the previous post of this series, we looked at the MySQL high availability (HA) solutions that have been around for a long time. I called these solutions “the elders.” Some of these solutions (like replication) are heavily used today and have been improved from release to release of MySQL.

This post focuses on the MySQL high availability solutions that have appeared over the last five years and gained a fair amount of traction in the community. I chose to include this group only two solutions: Galera and RDS Aurora. I’ll use the term “Galera” generically: it covers Galera Cluster, MariaDB Cluster and Percona XtraDB Cluster. I debated for some time whether or not to include Aurora. I don’t like the fact that they use closed source code. Given the tight integration with the AWS environment, what is the commercial risk of opening the source code? That question evades me, but I am not on the business side of technology. ?

Galera

When I say “Galera,” it means a replication protocol supported by a library provided by Codeship, a Finnish company. The library needs hooks inside the MySQL source code for the replication protocol to work. In Percona XtraDB cluster, I counted 66 .cc files where the word “wsrep” is present. As you can see, it is not a small task to add support for Galera to MySQL. Not all the implementations are similar. Percona, for example, focused more on stability and usability at the expense of new features.

high availability

Let’s start with a quick description of a Galera-based cluster. Unless you don’t care about split-brain, a Galera cluster needs at least three nodes. The Galera replication protocol is nearly synchronous, which is a huge gain compared to regular MySQL replication. It performs transactions almost simultaneously on all the nodes, and the protocol ensures the same commit order. The transactions are almost synchronous because there are incoming queues on each node to improve performance. The presence of these incoming queues forces an extra step: the certification. The certification compares an incoming transaction with the ones already queued. If there a conflict, it returns a deadlock error.

For performance reasons, the certification process must be quick so that the incoming queue stays in memory. Since the number of transactions defines the size of the queue, the presence of large transactions uses a lot of memory. There are safeguards against memory overload, so be aware that transactions like:

update ATableWithMillionsRows set colA=1;

will likely fail. That’s the first important limitation of a Galera-based cluster: the size of the number transactions is limited.

It is also critical to uniquely identify conflicting rows. The best way to achieve an efficient row comparison is to make sure all the tables have a primary key. In a Galera-based cluster, your tables need primary keys otherwise you’ll run into trouble. That’s the second limitation of a Galera based cluster: the need for primary keys. Personally, I think that a table should always have a primary key – but I have seen many oddities…

Another design characteristic is the need for an acknowledgment by all the nodes when a transaction commits. That means the network link with the largest latency between two nodes will set the floor value of the transactional latency. It is an important factor to consider when deploying a Galera-based cluster over a WAN. Similarly, an overloaded node can slow down the cluster if it cannot acknowledge the transaction in a timely manner. In most cases, adding slave threads will allow you to overcome the throughput limitations imposed by the network latency. Each transaction will suffer from latency, but more of them will be able to run at the same time and maintain the throughput.

The exception here is when there are “hot rows.” A hot row is a row that is hammered by updates all the time. The transactions affecting hot rows cannot be executed in parallel and are thus limited by the network latency.

Since Galera-based clusters are very popular, they must also have some good points. The first and most obvious is the full-durability support. Even if the node on which you executed a transaction crashes a fraction of second after the commit statement returned, the data is present on the other nodes incoming queues. In my opinion, it is the main reason for the demise of the share storage solution. Before Galera-based clusters, the shared storage solution was the only other solution guaranteeing no data loss in case of a crash.

While the standby node is unusable with the shared storage solution, all the nodes of a Galera-based cluster are available and are almost in sync. All the nodes can be used for reads without stressing too much about replication lag. If you accept a higher risk of deadlock errors, you can even write on all nodes.

Finally, and not the least, there is an automatic provisioning service for the new nodes called SST. During the SST process, a joiner node asks the cluster for a donor. One of the existing nodes agrees to be the donor and initiate a full backup. The backup is streamed over the network to the joiner and restored there. When the backup completes, the joiner performs an IST to get the recent updates and, once applied, joins the cluster. The most common SST method uses the Percona XtraBackup utility. When using SST for XtraBackup, the cluster is fully available during the SST, although it may degrade performance. This feature really simplifies the operational side of things.

The technology is very popular. Of course, I am a bit biased since I work for Percona and one of our flagship products is Percona XtraDB Cluster – an implementation of the Galera protocol. Other than standard MySQL replication, it is by far the most common HA solution used by the customers I work with.

RDS Aurora

The second “adult” MySQL high availability solution is RDS Aurora. I hesitated to add Aurora here, mainly because it is not an open-source technology. I must also admit that I haven’t followed the latest developments around Aurora very closely. So, let’s describe Aurora.

There are three major parts in Aurora: at least one database server, the writer node and the storage.

high availability

What makes Aurora special is the storage layer has its own processing logic. I don’t know if the processing logic is part of the writer node AWS instance or part of the storage service directly, since the source code is not available. Anyway, I’ll call that layer the appliers. The applier role is to apply redo log fragments that then allow the writer node to write only the redo log fragments (normally written to the InnoDB log files). The appliers read those fragments and modify the pages in the storage. If a node requests a page that has pending redo fragments to be applied, they get applied before returning the page.

From the writer node perspective, there are much fewer writes. There is also no direct upper bound in terms of a number of fragments to be queued, so it is a bit like having

innodb_log_file_size

 set to an extremely large value. Also, since Aurora doesn’t need to flush pages, if the write node needs to read from the storage, and there are no free pages in the buffer pool, it can just discard one even if it is “dirty.” Actually, there are no dirty pages in the buffer pool.

So far, that seems to be very good for high write loads with spikes. What about the reader nodes? These reader nodes receive the updates from the writer nodes. If the update concerns a page they have in their buffer pool, they can modify it in place or discard it and read again from the storage. Again, without the source code, it is hard to tell the implementation. The point is, the readers have the same data as the master, they just can’t lag behind.

Apart from the impossibility of any reader lag, the other good point of Aurora is the storage. InnoDB pages are saved as objects in an object store, not like in a regular file on a file system. That means you don’t need to over-provision your storage ahead of time. You pay for what you are using – actually the maximum you ever use. InnoDB tablespaces do not shrink, even with Aurora.

Furthermore, if you have a 5TB dataset and your workload is such that you would need ten servers (one writer and nine readers), you still need only 5TB of storage if you are not replicating to other AZ. If we compare with regular MySQL and replication, you would have one master and nine slaves, each with 5TB of storage, for a total of 50TB. On top of that, you’ll have at least ten times the write IOPS.

So, storage wise, we have something that could be very interesting for applications with large datasets and highly concurrent read heavy workloads. You ensure high availability with the ability to promote a reader to writer automatically. You access the primary or the readers through endpoints that automatically connect to the correct instances. Finally, you can replicate the storage to multiple availability zones for DR.

Of course, such an architecture comes with a trade-off. If you experiment with Aurora, you’ll quickly discover that the smallest instance types underperform while the largest ones perform in a more expected manner. It is also quite easy to overload the appliers. Just perform the following queries:

update ATableWithMillionsRows set colA=1;
select count(*) from ATableWithMillionsRows where colA=1;

given that the table ATableWithMillionsRows is larger than the buffer pool. The select will hang for a long time because the appliers are overloaded by the number of pages to update.

In term of adoption, we have some customers at Percona using Aurora, but not that many. It could be that users of Aurora do not naturally go to Percona for services and support. I also wonder about the decision to keep the source code closed. It is certainly not a positive marketing factor in a community like the MySQL community. Since the Aurora technology seems extremely bounded to their ecosystem, is there really a risk for the technology to be reused by a competitor? With a better understanding of the technology through open access to the source, Amazon could have received valuable contributions. It would also be much easier to understand, tune and recommend Aurora.

Further reading:

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com