Apr
27
2017
--

Percona Live 2017: Day Three Keynotes

Percona Live Keynotes

Welcome to the third (and final) day of the Percona Live Open Source Database Conference 2017, and the third (and final) set of Percona Live keynotes! The enthusiasm hasn’t waned here at Percona Live, and we had a full house on Thursday morning!

Day three of the conference kicked off with three keynotes talks, and ended with the Community Awards Ceremony:

Percona Live Keynote AirBnBSpinaltap: Airbnb’s Change Data Capture System

Xinyao Hu (AirBnB)

In this talk, Xinyao introduced Airbnb’s change data change system, Spinaltap. He briefly covered its design, and focused on various use cases inside Airbnb. These use cases covered both online serving production and offline large distributed batch processing.

Percona Live Keynote PerconaHow Percona Contributes to the Open Source Database Ecosystem

Peter Zaitsev (Percona)

Peter Zaitsev, CEO of Percona, discussed the growth and adoption of open source databases, and Percona’s commitment to remaining an unbiased champion of the open source database ecosystem. Percona remains committed to providing open source support and solutions to its customers, users and the community. He also provided updates and highlighted exciting new developments in Percona Server software for MySQL and MongoDB.

Percona Live Keynote BookingMonitoring Booking.com without looking at MySQL

Jean-François Gagné (Booking.com)

Jean-François Gagné presented a fascinating talk about using a metric for observing Booking.com’s system health: bookings per second. It wasn’t a technical deep-dive (not MySQL- or Linux-related) but it is one of the most important metric Booking.com has to detect problems (and customer behavior) on the website. Many things impact this metric, including the time of the day, the day of the week or the season of the year.

Percona Live Keynote Community AwardsCommunity Award Ceremony

Daniel Nichter (Square), Emily Slocombe (SurveyMonkey)

The MySQL Community Awards initiative is an effort to acknowledge and thank individuals and corporations for their contributions to the MySQL ecosystem. It is a from-the-community, by-the-community and for-the-community effort. Awards are given for Community Contributor, Application, and Corporate Contributor. More information can be found here: http://mysqlawards.org.

This year’s winners were:

  • Community: René Cannaò, Simon Mudd, Shlomi Noach
  • Application: Sysbench, Gh-ost
  • Corporate: GitHub, Percona

Congrats to the winners, the entire open source community, and to all the Percona Live attendees this year. There are still sessions today, check them out.

It’s been a great conference, and we’re looking forward to seeing you all at Percona Live Europe!

Apr
23
2017
--

Percona XtraDB Cluster: “dh key too small” error during an SST using SSL

dh key too small

dh key too smallIf you’ve tried to use SSL in Percona XtraDB Cluster and saw an error in the logs like SSL3_CHECK_CERT_AND_ALGORITHM:dh key too small, we’ve implemented some changes in Percona XtraDB Cluster 5.6.34 and 5.7.16 that get rid of these errors.

Some background

dh key too small refers to the Diffie-Hellman parameters used by the SSL code that are shorter than recommended.

Due to the Logjam vulnerability (https://weakdh.org/), the required key-lengths for the Diffie-Hellman parameters were changed from 512 bits to 2048 bits. Unfortunately, older versions of OpenSSL/socat still use 512 bits (and thus caused the error to appear).

Changes made to Percona XtraDB Cluster

Since versions of socat greater than 1.7.3 now use 2048 bits for the Diffie-Hellman parameters, we only do extra work for the older versions of socat (less than 1.7.3). The SST code now:

  1. Looks for a file with the DH params
    1. Uses the “ssl_dhparams” option in the [sst] section if it exists
    2. Looks for a “dhparams.pem” file in the datadir
  2. If the file is specified and exists, uses that file as a source for the DH parameters
  3. If the file does not exist, creates a dhparams.pem file in the datadir

Generating the dhparams yourself

Unfortunately, the time it can take several minutes to create the dhparams file. We recommend that the dhparams.pem be created prior to starting the SST.

openssl dhparam -out path/to/datadir/dhparams.pem 2048

Apr
23
2017
--

Percona XtraDB Cluster Transaction Replay Anomaly

dh key too small

Replay AnomalyIn this blog post, we’ll look at a transaction replay anomaly in Percona XtraDB Cluster.

Introduction

Percona XtraDB Cluster/Galera replays a transaction if the data is non-conflicting but, the transaction happens to have conflicting locks.

Anomaly

Let’s understand this with an example:

  • Let’s assume a two-node cluster (node-1 and node-2)
  • Base table “t” is created as follows:
create database test;
use test;
create table t (i int, c char(20), primary key pk(i)) engine=innodb;
insert into t values (1, 'abc'), (2, 'abc'), (4, 'abc');
select * from t;
mysql> select * from t;
+---+------+
| i | c |
+---+------+
| 1 | abc |
| 2 | abc |
| 4 | abc |
+---+------+

  • node-2 starts runs a transaction (trx-2):
trx-2: update t set c = 'pqr';

  • node-2 creates a write-set and is just about to replicate it. At the same time, node-1 executes the following transaction (trx-1), and is first to add it to the group-channel (before node-2 adds transaction (trx-2))
trx-1: insert into t values (3, 'a');

  • trx-1 is replicated on node-2, and it proceeds with the apply action. Since there is a lock conflict (no certification conflict), node-2 local transaction (trx-2) is aborted and scheduled for replay.
  • trx-1 causes addition of (3, ‘a’) and then node-2 transaction is REPLAYed.
  • REPLAY is done using the pre-created write-set that only modifies existing entries (1,2,4).

End-result:

mysql> select * from t;
+---+------+
| i | c |
+---+------+
| 1 | pqr |
| 2 | pqr |
| 3 | a |
| 4 | pqr |
+---+------+

  • At first, nothing looks wrong. If you look closely, however, the REPLAYed transaction “UPDATE t set c= ‘pqr’” is last to commit. But the effect of it is not seen as there is still a row (3, ‘a’) that has ‘a’ instead of ‘pqr’.
| mysql-bin.000003 | 792 | Gtid | 2 | 857 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:4' |
| mysql-bin.000003 | 857 | Query | 2 | 925 | BEGIN |
| mysql-bin.000003 | 925 | Table_map | 2 | 972 | table_id: 219 (test.t) |
| mysql-bin.000003 | 972 | Write_rows | 2 | 1014 | table_id: 219 flags: STMT_END_F existing|
| mysql-bin.000003 | 1014 | Xid | 2 | 1045 | COMMIT /* xid=4 */ |
| mysql-bin.000003 | 1045 | Gtid | 3 | 1110 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:5' |
| mysql-bin.000003 | 1110 | Query | 3 | 1187 | BEGIN |
| mysql-bin.000003 | 1187 | Table_map | 3 | 1234 | table_id: 219 (test.t) |
| mysql-bin.000003 | 1234 | Update_rows | 3 | 1324 | table_id: 219 flags: STMT_END_F |
| mysql-bin.000003 | 1324 | Xid | 3 | 1355 | COMMIT /* xid=5 */ |
+------------------+------+----------------+-----------+-------------+---------------------------------------------------------------------------------+
21 rows in set (0.00 sec)

  • We have used a simple char string, but if there is a constraint here, like c should have X after UPDATE is complete, than the CONSTRAINT will be violated even though the application reports UPDATE as a success.
  • Is it interesting to note what happens on node-1:
    • node-1 applies the local transaction (trx-1) and then gets the replicated write-set from node-2 (trx-2) that has changes only for (1,2,4). Thereby data consistency is not compromised.
Apr
22
2017
--

BEWARE: Increasing fc_limit can affect SELECT latency

SELECT Latency

SELECT LatencyIn this blog post, we’ll look at how increasing the fc_limit can affect SELECT latency.

Introduction

Recent Percona XtraDB Cluster optimizations have exposed fc_limit contention. It was always there, but was never exposed as the Commit Monitor contention was more significant. As it happens with any optimization, once we solve the bigger contention issues, smaller contention issues start popping up. We have seen this pattern in InnoDB, and Percona XtraDB Cluster is no exception. In fact, it is good because it tells us that we are on the right track.

If you haven’t yet checked the performance blogs, then please visit here and here.

What is FC_LIMIT?

Percona XtraDB Cluster has the concept of Flow Control. If any member of the cluster (not garbd) is unable to match the apply speed with the replicated write-set speed, then the queue builds up. If this queue crosses some threshold (dictated by gcs.fc_limit), then flow control kicks in. Flow control causes members of the cluster to temporary halt/slow-down so that the slower node can catch up.

The user can, of course, disable this by setting wsrep_desync=1 on the slower node, but make sure you understand the effect of doing so. Unless you have a good reason, you should avoid setting it.

mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+------------+
| Variable_name | Value |
+-----------------------------+------------+
| wsrep_flow_control_interval | [ 16, 16 ] |
+-----------------------------+------------+
1 row in set (0.01 sec)

Increasing fc_limit

Until recently, the default fc_limit was 16 (starting with Percona XtraDB Cluster 5.7.17-29.20, the default is 100). This worked until now, since Percona XtraDB Cluster failed to scale and rarely hit the limit of 16. With new optimizations, Percona XtraDB Cluster nodes can process more write-sets in a given time period, and thereby can replicate more write-sets (anywhere in the range of three to ten times). Of course, the replicating/slave nodes are also performing at a higher speed. But depending on the slave threads, it is easy to start hitting this limit.

So what is the solution?

  • Increase fc_limit from 16 to something really big. Say 1600.

Is this correct?

YES and NO.

Why YES?

  • If you don’t care about the freshness of data on the replicated nodes, then increasing the limit to a higher value is not an issue. Say setting it to 10K means that the replicating node is holding 10K write-sets to replicate, and a SELECT fired during this time will not view changes from these 10K write-sets.
  • But if you insist on having fresh data, then Percona XtraDB Cluster has a solution for this (set wsrep_sync_wait=7).
  • Setting wsrep_sync_wait places the SELECT request in a queue that is serviced only after existing replicated write-sets (at the point when the SELECT was fired) are done with. If the queue has 8K write-sets, then SELECT is placed at the 8K+1 position. As the queue progresses, SELECT gets serviced only when all those 8K write-sets are done. This insanely increases SELECT latency and can cause all Monitoring ALARM to go ON.

Why NO?

  • For the reason mentioned above, we feel it is not a good idea to increase the fc_limit beyond some value unless you don’t care about data freshness and in turn don’t care to set wsrep_sync_wait.
  • We did a small experiment with the latest Percona XtraDB Cluster release to understand the effects.
- Started 2 node cluster.
- Fired 64-threads workload on node-1 of the cluster.
- node-2 is acting as replicating slave without any active workload.
- Set wsrep_sync_wait=7 on node-2 to ensure data-freshness.
Using default fc_limit (= 16)
-----------------------------
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.03 sec)
Increasing it from 16 -> 1600
-----------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=1600";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.46 sec)
That is whopping 15x increase in SELECT latency.
Increasing it even further (1600 -> 25000)
-------------------------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=25000";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (7.07 sec)

Note: wsrep_sync_wait=7 will enforce the check for all DMLs (INSERT/UPDATE/DELETE/SELECT). We highlighted the SELECT example, as that is more concerning at first go. But latency for other DMLs also increases for the same reasons as mentioned above.

Conclusion

Let’s conclude with the following observation:

  • Avoid increasing fc_limit to an insanely high value as it can affect SELECT latency (if you are running a SELECT session with wsrep_sync_wait=7 for data freshness).
Apr
21
2017
--

Percona Monitoring and Management 1.1.3 is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.1.3 on April 21, 2017.

For installation instructions, see the Deployment Guide.

This release includes several new graphs in dashboards related to InnoDB and MongoDB operation, as well as smaller fixes and improvements.

New in PMM Server

  • PMM-649: Added the InnoDB Page Splits and InnoDB Page Reorgs graphs to the MySQL InnoDB Metrics Advanced dashboard.
  • Added the following graphs to the MongoDB ReplSet dashboard:
    • Oplog Getmore Time
    • Oplog Operations
    • Oplog Processing Time
    • Oplog Buffered Operations
    • Oplog Buffer Capacity
  • Added descriptions for graphs in the following dashboards:
    • MongoDB Overview
    • MongoDB ReplSet
    • PMM Demo

Changes in PMM Client

  • PMM-491: Improved pmm-admin error messages.
  • PMM-523: Added the --verbose option for pmm-admin add.
  • PMM-592: Added the --force option for pmm-admin stop.
  • PMM-702: Added the db.serverStatus().metrics.repl.executor stats to mongodb_exporter. These new stats will be used for graphs in future releases.
  • PMM-731: Added real-time checks to pmm-admin check-network output.
  • The following commands no longer require connection to PMM Server:
    • pmm-admin start --all
    • pmm-admin stop --all
    • pmm-admin restart --all
    • pmm-admin show-passwords

    NOTE: If you want to start, stop, or restart a specific service, connection to PMM Server is still required.

About Percona Monitoring and Management

Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

Apr
21
2017
--

Simplified Percona XtraDB Cluster SSL Configuration

Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB Cluster SSLIn this blog post, we’ll look at a feature that recently added to Percona XtraDB Cluster 5.7.16, that makes it easier to configure Percona XtraDB Cluster SSL for all related communications. It uses mode “encrypt=4”, and configures SSL for both IST/Galera communications and SST communications using the same SSL files. “encrypt=4” is a new encryption mode added in Percona XtraDB Cluster 5.7.16 (we’ll cover it in a later blog post).

If this option is used, this will override all other Galera/SST SSL-related file options. This is to ensure that a consistent configuration is applied.
Using this option also means that the Galera/SST communications are using the same keys as client connections.

Example

This example shows how to startup a cluster using this option. We will use the default SSL files created by the bootstrap node. Basically, there are two steps:

  1. Set
    pxc_encrypt_cluster_traffic=ON

     on all nodes

  2. Ensure that all nodes share the same SSL files

Step 1: Configuration (on all nodes)

We enable the

pxc_encrypt_cluster_traffic

 option in the configuration files on all nodes. The default value of this option is “OFF”, so we enable it here.

[mysqld]
 pxc_encrypt_cluster_traffic=ON

Step 2: Startup the bootstrap node

After initializing and starting up the bootstrap node, the datadir will contain the necessary data files. Here is some SSL-related log output:

[Note] Auto generated SSL certificates are placed in data directory.
 [Warning] CA certificate ca.pem is self signed.
 [Note] Auto generated RSA key files are placed in data directory.

The required files are ca.pem, server-cert.pem and server-key.pem, which are the Certificate Authority (CA) file, the server certificate and the server private key, respectively.

Step 3: Copy the SSL files to all other nodes

Galera views the cluster as a set of homogeneous nodes, so the same configuration is expected on all nodes. Therefore, we have to copy the CA file, the server’s certificate and the server’s private key. By default, MySQL names these: ca.pem, server-cert.pem, and server-key.pem, respectively.

Step 4: Startup the other nodes

This is some log output showing that the SSL certificate files have been found. The other nodes should be using the files that were created on the bootstrap node.

[Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
[Note] Skipping generation of SSL certificates as certificate files are present in data directory.
[Warning] CA certificate ca.pem is self signed.
[Note] Skipping generation of RSA key pair as key files are present in data directory.

This is some log output (with

log_error_verbosity=3

), showing the SST reporting on the configuration used.

WSREP_SST: [DEBUG] pxc_encrypt_cluster_traffic is enabled, using PXC auto-ssl configuration
WSREP_SST: [DEBUG] with encrypt=4 ssl_ca=/my/data//ca.pem ssl_cert=/my/data//server-cert.pem ssl_key=/my/data//server-key.pem

Customization

The “ssl-ca”, “ssl-cert”, and “ssl-key” options in the “[mysqld]” section can be used to specify the location of the SSL files. If these are not specified, then the datadir is searched (using the default names of “ca.pem”, “server-cert.pem” and “server-key.pem”).

[mysqld]
 pxc_encrypt_cluster_traffic=ON
 ssl-ca=/path/to/ca.pem
 ssl-cert=/path/to/server-cert.pem
 ssl-key=/path/to/server-key.pem

If you want to implement this yourself, the equivalent configuration file options are:

[mysqld]
wsrep_provider_options=”socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem”
[sst]
encrypt=4
ssl-ca=ca.pem
ssl-cert=server-cert.pem
ssl-key=server-key.pem

How it works

  1. Determine the location of the SSL files
    1. Uses the values if explicitly specified (via the “ssl-ca”, “ssl-cert” and “ssl-key” options in the “[mysqld]” section)
    2. If the SSL file options are not specified, we look in the data directory for files named “ca.pem”, “server-cert.pem” and “server-key.pem” for the CA file, the server certificate, and the server key, respectively.
  2. Modify the configuration
    1. Overrides the values for socket.ssl_ca, socket.ssl_cert, and socket.ssl_key in
      wsrep_provider_options

       in the “[mysqld]” section.

    2. Sets “encrypt=4” in the “[sst]” section.
    3. Overrides the values for ssl-ca, ssl-cert and ssl-key in the “[sst]” section.

This is not a dynamic setting, and is only available on startup.

Apr
21
2017
--

Enabling Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB Cluster SST Traffic EncryptionIn this blog post, we’ll look at enabling Percona XtraDB Cluster SST Traffic Encryption, and some of the changes to the SSL-based encryption of SST traffic in Percona XtraDB Cluster 5.7.16.

Some background

Percona XtraDB Cluster versions prior to 5.7 support encryption methods 0, 1, 2 and 3:

  • encrypt = 0 : (default) No encryption
  • encrypt = 1 : Symmetric encryption using AES-128, user-supplied key
  • encrypt = 2 : SSL-based encryption with a CA and cert files (via socat)
  • encrypt = 3 : SSL-based encryption with cert and key files (via socat)

We are deprecating modes encrypt=1,2,3 in favor of the new mode, encrypt=4. “encrypt=3” is not recommended, since it does not verify the cert being used (it cannot verify since no Certificate Authority (CA) file is provided). “encrypt=2” and “encrypt=3” use a slightly different way of building the SSL files than MySQL does. In order to remove confusion, we’ve deprecated these modes in favor of “encrypt=4”, which can use the MySQL generated SSL files.

New feature: encrypt= 4

The previous SSL methods (encrypt=2 and encrypt=3), are based on socat usage, http://www.dest-unreach.org/socat/doc/socat-openssltunnel.html. The certs are not built the same way as the certs created by MySQL (for encryption of client communication with MySQL). To simplify SSL configuration and usage, we added a new encryption method (encrypt=4) so that the SSL files generated by MySQL can now be used for SSL encryption of SST traffic.

For instructions on how to create these files, see https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/encrypt-traffic.html.

Configuration

In general, galera views the cluster as homogeneous, so it expects that all nodes are identically configured. This extends to the SSL configuration, so the preferred configuration is that all machines share the same certs/keys. The security implication is that possession of these certs/keys allows a machine to join the cluster and receive all of the data. So proper care must be taken to secure the SSL files.

The mode “encrypt=4” uses the same option names as MySQL, so it reads the SSL information from “ssl-ca”, “ssl-cert”, and “ssl-key” in the “[sst]” section of the configuration file.

Example my.cnf:

[sst]
 encrypt=4
 ssl-ca=/path/to/ca.pem
 ssl-cert=/path/to/server-cert.pem
 ssl-key=/path/to/server-key.pem

All three options (ssl-ca, ssl-cert, and ssl-key) must be specified otherwise the SST will return an error.

ssl-ca

This is the location of the Certificate Authority (CA) file. Only servers that have certificates generated from this CA file will be allowed to connect if SSL is enabled.

ssl-cert

This is the location fo the Certificate file. This is the digital certificate that will be sent to the other side of the SSL connection. The remote server will then verify that this certificate was generated from the Certificate Authority file in use by the remote server.

ssl-key

This is the location of the private key for the certificate specified in ssl-cert.

Apr
20
2017
--

Tracking IST Progress in Percona XtraDB Cluster

Percona XtraDB Cluster SST Traffic Encryption

ISTIn this blog post, we’ll look at how Percona XtraDB Cluster uses IST.

Introduction

Percona XtraDB Cluster uses the concept of an Incremental State Transfer (IST). When a node of the cluster leaves the cluster for a short period of time, it can rejoin the cluster by getting the delta set of missing changes from any active node in the cluster.

This process of getting the delta set of changes is named as IST in Percona XtraDB Cluster.

Tracking IST Progress

The number of write-sets/changes that the joining node needs to catch up on when rejoining the cluster is dictated by:

  1. The duration the node was not present in the cluster
  2. The workload of the cluster during that time frame

This catch-up process can be time-consuming. Until this process is complete, the rejoining node is not ready to process any active workloads.

We believe that any process that is time-consuming should have a progress monitor attached to it. This is exactly what we have done.

In the latest release of Percona XtraDB Cluster 5.7.17-29.20, we added an IST progress monitor that is exposed through SHOW STATUS. This helps you to monitor the percentage of write-sets which has been applied by the rejoining node.

Let’s see this in a working example:

  • Start a two-node cluster
  • Process some basic workloads, allow cluster replication
  • Shutdown node-2
  • Node-1 then continues to process more workloads (the workload fits the allocated gcache)
  • Restart Node-2, causing it to trigger an IST
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| wsrep_ist_receive_status | 3% complete, received seqno 1421771 of 1415410-1589676 |
+--------------------------+--------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 52% complete, received seqno 1506799 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 97% complete, received seqno 1585923 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| wsrep_ist_receive_status | |
+--------------------------+-------+
1 row in set (0.00 sec)

As you can see, the wsrep_ist_receive_status monitoring string indicates the percentage completed, currently received write-set and the range of write-sets applicable to the IST.

Once the IST activity is complete, the variable shows an empty-string.

Closing Comments

I hope you enjoy this newly added feature. Percona Engineering would be happy to hear from you, about more such features that can help you make effective use of Percona XtraDB Cluster. We will try our best to include them in our future plans (based on feasibility).

Note: Special thanks for Kenn Takara and Roel Van de Paar for helping me edit this post.

Apr
20
2017
--

More Trackable Flow Control for Percona XtraDB Cluster

Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB ClusterIn this blog post, we’ll discuss trackable flow control in Percona XtraDB Cluster.

Introduction

Percona XtraDB Cluster has a self-regulating mechanism called Flow Control. This mechanism helps to avoid a situation wherein the weakest/slowest member of the cluster falls significantly behind other members of the cluster.

When a member of a cluster is slow at applying write-sets (while simultaneously continuing to receive write-sets from the cluster group channel), then the incoming/receive queue grows in size. If this queue crosses a set threshold (gcs.fc_limit), the node emits a FLOW_CONTROL message asking other members to slow down or halt processing.

While this happens transparently for the end-user, it’s important for the cluster administrator to know whether a node is using flow-control. If it is, it can affect overall cluster productivity.

Finding if a node is in flow control

FLOW_CONTROL is not a persistent state. A node enters FLOW_CONTROL once the queue size exceeds the set threshold. It is released back again once the queue size falls back below the low-end watermark.

So how can one view the higher and lower watermarks?

Up until now, this was exposed only in the log files. However, starting with Percona XtraDB Cluster 5.7.17-29.20, this is now exposed and available through SHOW STATUS:

mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+------------+
| Variable_name | Value |
+-----------------------------+------------+
| wsrep_flow_control_interval | [ 12, 23 ] |
+-----------------------------+------------+
1 row in set (0.00 sec)
mysql> set global wsrep_provider_options="gcs.fc_limit=100";
Query OK, 0 rows affected (0.00 sec)
mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+-------------+
| Variable_name | Value |
+-----------------------------+-------------+
| wsrep_flow_control_interval | [ 71, 141 ] |
+-----------------------------+-------------+
1 row in set (0.00 sec)
mysql> set global wsrep_provider_options="gcs.fc_factor=0.75";
Query OK, 0 rows affected (0.00 sec)
mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+--------------+
| Variable_name | Value |
+-----------------------------+--------------+
| wsrep_flow_control_interval | [ 106, 141 ] |
+-----------------------------+--------------+
1 row in set (0.01 sec)

As you can see, the wsrep_flow_control_interval status variable emits a range representing the lower and higher watermark. A value set of (12, 23) means that if the incoming queue size is greater than 23, then FLOW_CONTROL is enabled. If the size falls below 12, FLOW_CONTROL is released.

Still, this does not show us if the node is using FLOW_CONTROL at any given moment.

To address this, we simultaneously introduced a wsrep_flow_control_status status variable into the same release. This boolean status variable tells the user if the node is in FLOW_CONTROL or not. Once the node is out of flow-control, the variable is switched to OFF, and vice versa to ON when the node is employing flow-control:

show status like '%flow%';
.....
| wsrep_flow_control_sent | 28156 |
| wsrep_flow_control_recv | 28156 |
| wsrep_flow_control_interval | [ 106, 141 ] |
| wsrep_flow_control_status | ON |

Finally, the wsrep_flow_control_sent/recv counter can be used to track FLOW_CONTROL status. This shows an aggregated count of how many times flow-control kicked in. You can clear them using FLUSH STATUS.

mysql> show status like '%flow%';
.....
| wsrep_flow_control_sent | 31491 |
| wsrep_flow_control_recv | 31491 |
| wsrep_flow_control_interval | [ 106, 141 ] |
| wsrep_flow_control_status | OFF |

Takeaway thoughts

We hope you enjoy the new features added to Percona XtraDB Cluster, including flow control tracking!

We like feedback from customers (via the usual Customer Support website), and where possible from the community (via Percona JIRA and Launchpad), on which features they most want to see. We recently re-engineered Percona XtraDB Cluster to have much-increased performance (a request originating from the community), and the new IST Progress Meter and Flow Control Monitoring were introduced to aid in solving customer-facing challenges.

If you have a feature you would like to see, let us know!

Note: Special thanks for Kenn Takara and Roel Van de Paar for helping me edit this post.

Apr
19
2017
--

How We Made Percona XtraDB Cluster Scale

Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB ClusterIn this blog post, we’ll look at the actions and efforts Percona experts took to scale Percona XtraDB Cluster.

Introduction

When we first started analyzing Percona XtraDB Cluster performance, it was pretty bad. We would see contention even with 16 threads. Performance was even worse with sync binlog=1, although the same pattern was observed even with the binary log disabled. The effect was not only limited to OLTP workloads, as even other workloads (like update-key/non-key) were also affected in a wider sense than OLTP.

That’s when we started analyzing the contention issues and found multiple problems. We will discuss all these problems and the solutions we adapted. But before that, let’s look at the current performance level.

Check this blog post for more details.

The good news is Percona XtraDB Cluster is now optimized to scale well for all scenarios, and the gain is in the range of 3x-10x.

Understanding How MySQL Commits a Transaction

Percona XtraDB Cluster contention is associated mainly with Commit Monitor contention, which comes into the picture during commit time. It is important to understand the context around it.

When a commit is invoked, it proceeds in two phases:

  • Prepare phase: mark the transaction as PREPARE, updating the undo segment to capture the updated state.
    • If bin-log is enabled, redo changes are not persisted immediately. Instead, a batch flush is done during Group Commit Flush stage.
    • If bin-log is disabled, then redo changes are persisted immediately.
  • Commit phase: Mark the transaction commit in memory.
    • If bin-log is enabled, Group Commit optimization kicks in, thereby causing a flush of redo-logs (that persists changes done to db-objects + PREPARE state of transaction) and this action is followed by a flush of the binary logs. Since the binary logs are flushed, redo log capturing of transaction commit doesn’t need to flush immediately (Saving fsync)
    • If bin-log is disabled, redo logs are flushed on completion of the transaction to persist the updated commit state of the transaction.

What is a Monitor in Percona XtraDB Cluster World?

Monitors help maintain transaction ordering. For example, the Commit Monitor ensures that no transaction with a global-seqno greater than the current commit-processing transaction’s global seqno is allowed to proceed.

How Percona XtraDB Cluster Commits a Transaction

Percona XtraDB Cluster follows the existing MySQL semantics of course, but has its own step to commit the transaction in the replication world. There are two important themes:

  1. Apply/Execution of transaction can proceed in parallel
  2. Commit is serialized based on cluster-wide global seqno.

Let’s understand the commit flow with Percona XtraDB Cluster involved (Percona XtraDB Cluster registers wsrep as an additional storage engine for replication).

  • Prepare phase:
    • wsrep prepare: executes two main actions:
      • Replicate the transaction (adding the write-set to group-channel)
      • Entering CommitMonitor. Thereby enforcing ordering of transaction.
    • binlog prepare: nothing significant (for this flow).
    • innobase prepare: mark the transaction in PREPARE state.
      • As discussed above, the persistence of the REDO log depends on if the binlog is enabled/disabled.
  • Commit phase
    • If bin-log is enabled
      • MySQL Group Commit Logic kicks in. The semantics ensure that the order of transaction commit is the same as the order of them getting added to the flush-queue of the group-commit.
    • If bin-log is disabled
      • Normal commit action for all registered storage engines is called with immediate persistence of redo log.
    • Percona XtraDB Cluster then invokes the post_commit hook, thereby releasing the Commit Monitor so that the next transaction can make progress.

With that understanding, let’s look at the problems and solutions:

PROBLEM-1:

Commit Monitor is exercised such that the complete commit operation is serialized. This limits the parallelism associated with the prepare-stage. With log-bin enabled, this is still ok since redo logs are flushed at group-commit flush-stage (starting with 5.7). But if log-bin is disabled, then each commit causes an independent redo-log-flush (in turn probable fsync).

OPTIMIZATION-1:

Split the replication pre-commit hook into two explicit actions: replicate (add write-set to group-channel) + pre-commit (enter commit-monitor).

The replicate action is performed just like before (as part of storage engine prepare). That will help complete the InnoDB prepare action in parallel (exploring much-needed parallelism in REDO flush with log-bin disabled).

On completion of replication, the pre-commit hook is called. That leads to entering the Commit Monitor for enforcing the commit ordering of the transactions. (Note: Replication action assigns the global seqno. So even if a transaction with a higher global seqno finishes the replication action earlier (due to CPU scheduling) than the transaction with a lower global seqno, it will wait in the pre-commit hook.)

Improved parallelism in the innodb-prepare stage helps accelerate log-bin enabled flow, and the same improved parallelism significantly helps in the log-bin disabled case by reducing redo-flush contention, thereby reducing fsyncs.


PROBLEM-2:

MySQL Group Commit already has a concept of ordering transactions based on the order of their addition to the GROUP COMMIT queue (FLUSH STAGE queue to be specific). Commit Monitor enforces the same, making the action redundant but limiting parallelism in MySQL Group Commit Logic (including redo-log flush that is now delayed to the flush stage).

With the existing flow (due to the involvement of Commit Monitor), only one transaction can enter the GROUP COMMIT Queue, thereby limiting optimal use of Group Commit Logic.

OPTIMIZATION-2:

Release the Commit Monitor once the transaction is successfully added to flush-stage of group-commit. MySQL will take it from there to maintain the commit ordering. (We call this interim-commit.)

Releasing the Commit Monitor early helps other transactions to make progress and real MySQL Group Commit Leader-Follower Optimization (batch flushing/sync/commit) comes into play.

This also helps ensure batch REDO log flushing.


PROBLEM-3:

This problem is specific to when the log-bin is disabled. Percona XtraDB Cluster still generates the log-bin, as it needs it for forming a replication write-set (it just doesn’t persist this log-bin information). If disk space is not a constraint, then I would suggest operating Percona XtraDB Cluster with log-bin enabled.

With log-bin disabled, OPTIMIZATION-1 is still relevant, but OPTIMIZATION-2 isn’t, as there is no group-commit protocol involved. Instead, MySQL ensures that the redo-log (capturing state change of transaction) is persisted before reporting COMMIT as a success. As per the original flow, the Commit Monitor is not released till the commit action is complete.

OPTIMIZATION-3:

The transaction is already committed to memory and the state change is captured. This is about persisting the REDO log only (REDO log modification is already captured by mtr_commit). This means we can release the Commit Monitor just before the REDO flush stage kicks in. Correctness is still ensured as the REDO log flush always persists the data sequentially. So even if trx-1 loses its slots before the flush kicks in, and trx-2 is allowed to make progress, trx-2’s REDO log flush ensures that trx-1’s REDO log is also flushed.


Conclusion

With these three main optimizations, and some small tweaks, we have tuned Percona XtraDB Cluster to scale better and made it fast enough for the growing demands of your applications. All of this is available with the recently released Percona XtraDB Cluster 5.7.17-29.20. Give it a try and watch your application scale in a multi-master environment, making Percona XtraDB Cluster the best fit for your HA workloads.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com