Jul
03
2018
--

Linux OS Tuning for MySQL Database Performance

Linux OS tuning for MySQL database performance

Linux OS tuning for MySQL database performanceIn this post we will review the most important Linux settings to adjust for performance tuning and optimization of a MySQL database server. We’ll note how some of the Linux parameter settings used OS tuning may vary according to different system types: physical, virtual or cloud. Other posts have addressed MySQL parameters, like Alexander’s blog MySQL 5.7 Performance Tuning Immediately After Installation. That post remains highly relevant for the latest versions of MySQL, 5.7 and 8.0. Here we will focus more on the Linux operating system parameters that can affect database performance.

Server and Operating System

Here are some Linux parameters that you should check and consider modifying if you need to improve database performance.

Kernel – vm.swappiness

The value represents the tendency of the kernel  to swap out memory pages. On a database server with ample amounts of RAM, we should keep this value as low as possible. The extra I/O can slow down or even render the service unresponsive. A value of 0 disables swapping completely while 1 causes the kernel to perform the minimum amount of swapping. In most cases the latter setting should be OK:

# Set the swappiness value as root
echo 1 > /proc/sys/vm/swappiness
# Alternatively, using sysctl
sysctl -w vm.swappiness=1
# Verify the change
cat /proc/sys/vm/swappiness
1
# Alternatively, using sysctl
sysctl vm.swappiness
vm.swappiness = 1

The change should be also persisted in /etc/sysctl.conf:

vm.swappiness = 1

Filesystems – XFS/ext4/ZFS
XFS

XFS is a high-performance, journaling file system designed for high scalability. It provides near native I/O performance even when the file system spans multiple storage devices.  XFS has features that make it suitable for very large file systems, supporting files up to 8EiB in size. Fast recovery, fast transactions, delayed allocation for reduced fragmentation and near raw I/O performance with DIRECT I/O.

The default options for mkfs.xfs are good for optimal speed, so the simple command:

# Use default mkfs options
mkfs.xfs /dev/target_volume

will provide best performance while ensuring data safety. Regarding mount options, the defaults should fit most cases. On some filesystems you can see a performance increase by adding the noatime mount option to the /etc/fstab.  For XFS filesystems the default atime behaviour is relatime, which has almost no overhead compared to noatime and still maintains sane atime values.  If you create an XFS file system on a LUN that has a battery backed, non-volatile cache, you can further increase the performance of the filesystem by disabling the write barrier with the mount option nobarrier. This helps you to avoid flushing data more often than necessary. If a BBU (backup battery unit) is not present, however, or you are unsure about it, leave barriers on, otherwise you may jeopardize data consistency. With this options on, an /etc/fstab file should look like the one below:

/dev/sda2              /datastore              xfs     defaults,nobarrier
/dev/sdb2              /binlog                 xfs     defaults,nobarrier

ext4

ext4 has been developed as the successor to ext3 with added performance improvements. It is a solid option that will fit most workloads. We should note here that it supports files up to 16TB in size, a smaller limit than xfs. This is something you should consider if extreme table space size/growth is a requirement. Regarding mount options, the same considerations apply. We recommend the defaults for a robust filesystem without risks to data consistency. However, if an enterprise storage controller with a BBU cache is present, the following mount options will provide the best performance:

/dev/sda2              /datastore              ext4     noatime,data=writeback,barrier=0,nobh,errors=remount-ro
/dev/sdb2              /binlog                 ext4     noatime,data=writeback,barrier=0,nobh,errors=remount-ro

Note: The data=writeback option results in only metadata being journaled, not actual file data. This has the risk of corrupting recently modified files in the event of a sudden power loss, a risk which is minimised with a presence of a BBU enabled controller. nobh only works with the data=writeback option enabled.

ZFS

ZFS is a filesystem and LVM combined enterprise storage solution with extended protection vs data corruption. There are certainly cases where the rich feature set of ZFS makes it an essential option to consider, most notably when advance volume management is a requirement. ZFS tuning for MySQL can be a complex topic and falls outside the scope of this blog. For further reference, there is a dedicated blog post on the subject by Yves Trudeau:

Disk Subsystem – I/O scheduler 

Most modern Linux distributions come with noop or deadline I/O schedulers by default, both providing better performance than the cfq and anticipatory ones. However it is always a good practice to check the scheduler for each device and if the value shown is different than noop or deadline the policy can change without rebooting the server:

# View the I/O scheduler setting. The value in square brackets shows the running scheduler
cat /sys/block/sdb/queue/scheduler
noop deadline [cfq]
# Change the setting
sudo echo noop > /sys/block/sdb/queue/scheduler

To make the change persistent, you must modify the GRUB configuration file:

# Change the line:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
# to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash elevator=noop"

AWS Note: There are cases where the I/O scheduler has a value of none, most notably in AWS VM instance types where EBS volumes are exposed as NVMe block devices. This is because the setting has no use in modern PCIe/NVMe devices. The reason is that they have a very large internal queue and they bypass the IO scheduler altogether. The setting in this case is none and it is the optimal in such disks.

Disk Subsystem – Volume optimization

Ideally different disk volumes should be used for the OS installation, binlog, data and the redo log, if this is possible. The separation of OS and data partitions, not just logically but physically, will improve database performance. The RAID level can also have an impact: RAID-5 should be avoided as the checksum needed to ensure integrity is costly. The best performance without making compromises to redundancy is achieved by the use of an advanced controller with a battery-backed cache unit and preferably RAID-10 volumes spanned across multiple disks.

AWS Note: For further information about EBS volumes and AWS storage optimisation, Amazon has documentation at the following links:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-optimized-instances.html

Database settings

System Architecture – NUMA settings

Non-uniform memory access (NUMA) is a memory design where an SMP’s system processor can access its own local memory faster than non-local memory (the one assigned local to other CPUs). This may result in suboptimal database performance and potentially swapping. When the buffer pool memory allocation is larger than size of the RAM available local to the node, and the default memory allocation policy is selected, swapping occurs. A NUMA enabled server will report different node distances between CPU nodes. A uniformed one will report a single distance:

# NUMA system
numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 65525 MB
node 0 free: 296 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 65536 MB
node 1 free: 9538 MB
node 2 cpus: 16 17 18 19 20 21 22 23
node 2 size: 65536 MB
node 2 free: 12701 MB
node 3 cpus: 24 25 26 27 28 29 30 31
node 3 size: 65535 MB
node 3 free: 7166 MB
node distances:
node   0   1   2   3
  0:  10  20  20  20
  1:  20  10  20  20
  2:  20  20  10  20
  3:  20  20  20  10
# Uniformed system
numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 64509 MB
node 0 free: 4870 MB
node distances:
node   0
  0:  10

In the case of a NUMA system, where numactl shows different distances across nodes, the MySQL variable innodb_numa_interleave should be enabled to ensure memory interleaving. Percona Server provides improved NUMA support by introducing the flush_caches variable. When enabled, it will help with allocation fairness across nodes. To determine whether or not allocation is equal across nodes, you can examine numa_maps for the mysqld process with this script:

# The perl script numa_maps.pl will report memory allocation per CPU node:
# 3595 is the pid of the mysqld process
perl numa_maps.pl < /proc/3595/numa_maps
N0        :     16010293 ( 61.07 GB)
N1        :     10465257 ( 39.92 GB)
N2        :     13036896 ( 49.73 GB)
N3        :     14508505 ( 55.35 GB)
active    :          438 (  0.00 GB)
anon      :     54018275 (206.06 GB)
dirty     :     54018275 (206.06 GB)
kernelpagesize_kB:         4680 (  0.02 GB)
mapmax    :          787 (  0.00 GB)
mapped    :         2731 (  0.01 GB)

Conclusion

In this blog post we examined a few important OS related settings and explained how they can be tuned for better database performance.

While you are here …

You might also find value in this recorded webinar Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance

 

The post Linux OS Tuning for MySQL Database Performance appeared first on Percona Database Performance Blog.

May
25
2018
--

Webinar Tues, 5/29: MySQL, Percona XtraDB Cluster, ProxySQL, Kubernetes: How they work together

Please join Percona’s Principal Architect Alex Rubin as he presents MySQL, Percona XtraDB Cluster, ProxySQL, Kubernetes: How they work together to give you a highly available cluster database environment on Tuesday, May 29th at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

In this webinar, Alex will discuss how to deploy a highly available MySQL database environment on Kubernetes/Openshift using Percona XtraDB Cluster (PXC) together with MySQL Proxy to implement read/write splitting.

If you have never used Kubernetes and Openshift, or never used PXC / MySQL Proxy, Alex will do a quick introduction to these technologies. There will also be a demo where Alex sets up a PXC cluster with ProxySQL in Openshift Origin and tries to break it.

By the end of this webinar you will have a better understanding of:

  • How to deploy Percona XtraDB Cluster with ProxySQL for HA solutions
  • How to leverage Kubernetes/Openshift in your environments
  • How to troubleshoot performance issues

Register for the webinar

Alexander Rubin, Principal Consultant

Alexander RubinAlexander joined Percona in 2013. Alexander worked with MySQL since 2000 as DBA and Application Developer. Before joining Percona he was doing MySQL consulting as a principal consultant for over 7 years (started with MySQL AB in 2006, then Sun Microsystems and then Oracle). He helped many customers design large, scalable and highly available MySQL systems and optimize MySQL performance. Alexander also helped customers design Big Data stores with Apache Hadoop and related technologies.

The post Webinar Tues, 5/29: MySQL, Percona XtraDB Cluster, ProxySQL, Kubernetes: How they work together appeared first on Percona Database Performance Blog.

Apr
17
2018
--

Webinar Wednesday, April 18, 2018: Percona XtraDB Cluster 5.7 Tutorial

Percona XtraDB Cluster Tutorial

Percona XtraDB Cluster 5.7 TutorialPlease join Percona’s Architect, Tibi Köröcz as he presents Percona XtraDB Cluster 5.7 Tutorial on Wednesday, April 18, 2018, at 7:00 am PDT (UTC-7) / 10:00 am EDT (UTC-4).

Never used Percona XtraDB Cluster before? Come join this 45-minute tutorial where we will introduce you to the concepts of a fully functional Percona XtraDB Cluster.

In this tutorial, we will show you how you can install Percona XtraDB Cluster with ProxySQL, and monitor it with Percona Monitoring and Management (PMM).

We will also cover topics like bootstrap, IST, SST, Certification, common-failure situations and online schema changes.

Register for the webinar now.

Percona XtraDB ClusterTibor Köröcz, Senior Consultant

Tibi joined Percona in 2015 as a Consultant. Before joining Percona, among many other things, he worked at the world’s largest car hire booking service as a Senior Database Engineer. He enjoys trying and working with the latest technologies and applications that can help or work with MySQL. In his spare time, he likes to spend time with his friends, travel around the world and play ultimate frisbee.

The post Webinar Wednesday, April 18, 2018: Percona XtraDB Cluster 5.7 Tutorial appeared first on Percona Database Performance Blog.

Dec
01
2017
--

This Week in Data with Colin Charles 17: AWS Re:Invent, a New Book on MySQL Cluster and Another Call Out for Percona Live 2018

Colin Charles

Colin Charles Open Source Database evangelist for PerconaJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

The CFP for Percona Live Santa Clara 2018 closes December 22, 2017: please consider submitting as soon as possible. We want to make an early announcement of talks, so we’ll definitely do a first pass even before the CFP date closes. Keep in mind the expanded view of what we are after: it’s more than just MySQL and MongoDB. And don’t forget that with one day less, there will be intense competition to fit all the content in.

A new book on MySQL Cluster is out: Pro MySQL NDB Cluster by Jesper Wisborg Krogh and Mikiya Okuno. At 690 pages, it is a weighty tome, and something I fully plan on reading, considering I haven’t played with NDBCLUSTER for quite some time.

Did you know that since MySQL 5.7.17, connection control plugins are included? They help DBAs introduce an increasing delay in server response to clients after a certain number of consecutive failed connection attempts. Read more at the connection control plugins.

While there are a tonne of announcements coming out from the Amazon re:Invent 2017 event, I highly recommend also reading Some data of interest as AWS reinvent 2017 ramps up by James Governor. Telemetry data from sumologic’s 1,500 largest customers suggest that NoSQL database usage has overtaken relational database workloads! Read The State of Modern Applications in the Cloud. Page 8 tells us that MySQL is the #1 database on AWS (I don’t see MariaDB Server being mentioned which is odd; did they lump it in together?), and MySQL, Redis & MongoDB account for 40% of database adoption on AWS. In other news, Andy Jassy also mentions that less than 1.5 months after hitting 40,000 database migrations, they’ve gone past 45,000 over the Thanksgiving holiday last week. Have you started using AWS Database Migration Service?

Releases

Link List

Upcoming appearances

  • ACMUG 2017 gathering – Beijing, China, December 9-10 2017 – it was very exciting being there in 2016, I can only imagine it’s going to be bigger and better in 2017, since it is now two days long!

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Apr
23
2017
--

Percona XtraDB Cluster: “dh key too small” error during an SST using SSL

dh key too small

dh key too smallIf you’ve tried to use SSL in Percona XtraDB Cluster and saw an error in the logs like SSL3_CHECK_CERT_AND_ALGORITHM:dh key too small, we’ve implemented some changes in Percona XtraDB Cluster 5.6.34 and 5.7.16 that get rid of these errors.

Some background

dh key too small refers to the Diffie-Hellman parameters used by the SSL code that are shorter than recommended.

Due to the Logjam vulnerability (https://weakdh.org/), the required key-lengths for the Diffie-Hellman parameters were changed from 512 bits to 2048 bits. Unfortunately, older versions of OpenSSL/socat still use 512 bits (and thus caused the error to appear).

Changes made to Percona XtraDB Cluster

Since versions of socat greater than 1.7.3 now use 2048 bits for the Diffie-Hellman parameters, we only do extra work for the older versions of socat (less than 1.7.3). The SST code now:

  1. Looks for a file with the DH params
    1. Uses the “ssl_dhparams” option in the [sst] section if it exists
    2. Looks for a “dhparams.pem” file in the datadir
  2. If the file is specified and exists, uses that file as a source for the DH parameters
  3. If the file does not exist, creates a dhparams.pem file in the datadir

Generating the dhparams yourself

Unfortunately, the time it can take several minutes to create the dhparams file. We recommend that the dhparams.pem be created prior to starting the SST.

openssl dhparam -out path/to/datadir/dhparams.pem 2048

Apr
23
2017
--

Percona XtraDB Cluster Transaction Replay Anomaly

dh key too small

Replay AnomalyIn this blog post, we’ll look at a transaction replay anomaly in Percona XtraDB Cluster.

Introduction

Percona XtraDB Cluster/Galera replays a transaction if the data is non-conflicting but, the transaction happens to have conflicting locks.

Anomaly

Let’s understand this with an example:

  • Let’s assume a two-node cluster (node-1 and node-2)
  • Base table “t” is created as follows:
create database test;
use test;
create table t (i int, c char(20), primary key pk(i)) engine=innodb;
insert into t values (1, 'abc'), (2, 'abc'), (4, 'abc');
select * from t;
mysql> select * from t;
+---+------+
| i | c |
+---+------+
| 1 | abc |
| 2 | abc |
| 4 | abc |
+---+------+

  • node-2 starts runs a transaction (trx-2):
trx-2: update t set c = 'pqr';

  • node-2 creates a write-set and is just about to replicate it. At the same time, node-1 executes the following transaction (trx-1), and is first to add it to the group-channel (before node-2 adds transaction (trx-2))
trx-1: insert into t values (3, 'a');

  • trx-1 is replicated on node-2, and it proceeds with the apply action. Since there is a lock conflict (no certification conflict), node-2 local transaction (trx-2) is aborted and scheduled for replay.
  • trx-1 causes addition of (3, ‘a’) and then node-2 transaction is REPLAYed.
  • REPLAY is done using the pre-created write-set that only modifies existing entries (1,2,4).

End-result:

mysql> select * from t;
+---+------+
| i | c |
+---+------+
| 1 | pqr |
| 2 | pqr |
| 3 | a |
| 4 | pqr |
+---+------+

  • At first, nothing looks wrong. If you look closely, however, the REPLAYed transaction “UPDATE t set c= ‘pqr’” is last to commit. But the effect of it is not seen as there is still a row (3, ‘a’) that has ‘a’ instead of ‘pqr’.
| mysql-bin.000003 | 792 | Gtid | 2 | 857 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:4' |
| mysql-bin.000003 | 857 | Query | 2 | 925 | BEGIN |
| mysql-bin.000003 | 925 | Table_map | 2 | 972 | table_id: 219 (test.t) |
| mysql-bin.000003 | 972 | Write_rows | 2 | 1014 | table_id: 219 flags: STMT_END_F existing|
| mysql-bin.000003 | 1014 | Xid | 2 | 1045 | COMMIT /* xid=4 */ |
| mysql-bin.000003 | 1045 | Gtid | 3 | 1110 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:5' |
| mysql-bin.000003 | 1110 | Query | 3 | 1187 | BEGIN |
| mysql-bin.000003 | 1187 | Table_map | 3 | 1234 | table_id: 219 (test.t) |
| mysql-bin.000003 | 1234 | Update_rows | 3 | 1324 | table_id: 219 flags: STMT_END_F |
| mysql-bin.000003 | 1324 | Xid | 3 | 1355 | COMMIT /* xid=5 */ |
+------------------+------+----------------+-----------+-------------+---------------------------------------------------------------------------------+
21 rows in set (0.00 sec)

  • We have used a simple char string, but if there is a constraint here, like c should have X after UPDATE is complete, than the CONSTRAINT will be violated even though the application reports UPDATE as a success.
  • Is it interesting to note what happens on node-1:
    • node-1 applies the local transaction (trx-1) and then gets the replicated write-set from node-2 (trx-2) that has changes only for (1,2,4). Thereby data consistency is not compromised.
Apr
22
2017
--

BEWARE: Increasing fc_limit can affect SELECT latency

SELECT Latency

SELECT LatencyIn this blog post, we’ll look at how increasing the fc_limit can affect SELECT latency.

Introduction

Recent Percona XtraDB Cluster optimizations have exposed fc_limit contention. It was always there, but was never exposed as the Commit Monitor contention was more significant. As it happens with any optimization, once we solve the bigger contention issues, smaller contention issues start popping up. We have seen this pattern in InnoDB, and Percona XtraDB Cluster is no exception. In fact, it is good because it tells us that we are on the right track.

If you haven’t yet checked the performance blogs, then please visit here and here.

What is FC_LIMIT?

Percona XtraDB Cluster has the concept of Flow Control. If any member of the cluster (not garbd) is unable to match the apply speed with the replicated write-set speed, then the queue builds up. If this queue crosses some threshold (dictated by gcs.fc_limit), then flow control kicks in. Flow control causes members of the cluster to temporary halt/slow-down so that the slower node can catch up.

The user can, of course, disable this by setting wsrep_desync=1 on the slower node, but make sure you understand the effect of doing so. Unless you have a good reason, you should avoid setting it.

mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+------------+
| Variable_name | Value |
+-----------------------------+------------+
| wsrep_flow_control_interval | [ 16, 16 ] |
+-----------------------------+------------+
1 row in set (0.01 sec)

Increasing fc_limit

Until recently, the default fc_limit was 16 (starting with Percona XtraDB Cluster 5.7.17-29.20, the default is 100). This worked until now, since Percona XtraDB Cluster failed to scale and rarely hit the limit of 16. With new optimizations, Percona XtraDB Cluster nodes can process more write-sets in a given time period, and thereby can replicate more write-sets (anywhere in the range of three to ten times). Of course, the replicating/slave nodes are also performing at a higher speed. But depending on the slave threads, it is easy to start hitting this limit.

So what is the solution?

  • Increase fc_limit from 16 to something really big. Say 1600.

Is this correct?

YES and NO.

Why YES?

  • If you don’t care about the freshness of data on the replicated nodes, then increasing the limit to a higher value is not an issue. Say setting it to 10K means that the replicating node is holding 10K write-sets to replicate, and a SELECT fired during this time will not view changes from these 10K write-sets.
  • But if you insist on having fresh data, then Percona XtraDB Cluster has a solution for this (set wsrep_sync_wait=7).
  • Setting wsrep_sync_wait places the SELECT request in a queue that is serviced only after existing replicated write-sets (at the point when the SELECT was fired) are done with. If the queue has 8K write-sets, then SELECT is placed at the 8K+1 position. As the queue progresses, SELECT gets serviced only when all those 8K write-sets are done. This insanely increases SELECT latency and can cause all Monitoring ALARM to go ON.

Why NO?

  • For the reason mentioned above, we feel it is not a good idea to increase the fc_limit beyond some value unless you don’t care about data freshness and in turn don’t care to set wsrep_sync_wait.
  • We did a small experiment with the latest Percona XtraDB Cluster release to understand the effects.
- Started 2 node cluster.
- Fired 64-threads workload on node-1 of the cluster.
- node-2 is acting as replicating slave without any active workload.
- Set wsrep_sync_wait=7 on node-2 to ensure data-freshness.
Using default fc_limit (= 16)
-----------------------------
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.03 sec)
Increasing it from 16 -> 1600
-----------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=1600";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.46 sec)
That is whopping 15x increase in SELECT latency.
Increasing it even further (1600 -> 25000)
-------------------------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=25000";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (7.07 sec)

Note: wsrep_sync_wait=7 will enforce the check for all DMLs (INSERT/UPDATE/DELETE/SELECT). We highlighted the SELECT example, as that is more concerning at first go. But latency for other DMLs also increases for the same reasons as mentioned above.

Conclusion

Let’s conclude with the following observation:

  • Avoid increasing fc_limit to an insanely high value as it can affect SELECT latency (if you are running a SELECT session with wsrep_sync_wait=7 for data freshness).
Apr
21
2017
--

Simplified Percona XtraDB Cluster SSL Configuration

Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB Cluster SSLIn this blog post, we’ll look at a feature that recently added to Percona XtraDB Cluster 5.7.16, that makes it easier to configure Percona XtraDB Cluster SSL for all related communications. It uses mode “encrypt=4”, and configures SSL for both IST/Galera communications and SST communications using the same SSL files. “encrypt=4” is a new encryption mode added in Percona XtraDB Cluster 5.7.16 (we’ll cover it in a later blog post).

If this option is used, this will override all other Galera/SST SSL-related file options. This is to ensure that a consistent configuration is applied.
Using this option also means that the Galera/SST communications are using the same keys as client connections.

Example

This example shows how to startup a cluster using this option. We will use the default SSL files created by the bootstrap node. Basically, there are two steps:

  1. Set
    pxc_encrypt_cluster_traffic=ON

     on all nodes

  2. Ensure that all nodes share the same SSL files

Step 1: Configuration (on all nodes)

We enable the

pxc_encrypt_cluster_traffic

 option in the configuration files on all nodes. The default value of this option is “OFF”, so we enable it here.

[mysqld]
 pxc_encrypt_cluster_traffic=ON

Step 2: Startup the bootstrap node

After initializing and starting up the bootstrap node, the datadir will contain the necessary data files. Here is some SSL-related log output:

[Note] Auto generated SSL certificates are placed in data directory.
 [Warning] CA certificate ca.pem is self signed.
 [Note] Auto generated RSA key files are placed in data directory.

The required files are ca.pem, server-cert.pem and server-key.pem, which are the Certificate Authority (CA) file, the server certificate and the server private key, respectively.

Step 3: Copy the SSL files to all other nodes

Galera views the cluster as a set of homogeneous nodes, so the same configuration is expected on all nodes. Therefore, we have to copy the CA file, the server’s certificate and the server’s private key. By default, MySQL names these: ca.pem, server-cert.pem, and server-key.pem, respectively.

Step 4: Startup the other nodes

This is some log output showing that the SSL certificate files have been found. The other nodes should be using the files that were created on the bootstrap node.

[Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
[Note] Skipping generation of SSL certificates as certificate files are present in data directory.
[Warning] CA certificate ca.pem is self signed.
[Note] Skipping generation of RSA key pair as key files are present in data directory.

This is some log output (with

log_error_verbosity=3

), showing the SST reporting on the configuration used.

WSREP_SST: [DEBUG] pxc_encrypt_cluster_traffic is enabled, using PXC auto-ssl configuration
WSREP_SST: [DEBUG] with encrypt=4 ssl_ca=/my/data//ca.pem ssl_cert=/my/data//server-cert.pem ssl_key=/my/data//server-key.pem

Customization

The “ssl-ca”, “ssl-cert”, and “ssl-key” options in the “[mysqld]” section can be used to specify the location of the SSL files. If these are not specified, then the datadir is searched (using the default names of “ca.pem”, “server-cert.pem” and “server-key.pem”).

[mysqld]
 pxc_encrypt_cluster_traffic=ON
 ssl-ca=/path/to/ca.pem
 ssl-cert=/path/to/server-cert.pem
 ssl-key=/path/to/server-key.pem

If you want to implement this yourself, the equivalent configuration file options are:

[mysqld]
wsrep_provider_options=”socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem”
[sst]
encrypt=4
ssl-ca=ca.pem
ssl-cert=server-cert.pem
ssl-key=server-key.pem

How it works

  1. Determine the location of the SSL files
    1. Uses the values if explicitly specified (via the “ssl-ca”, “ssl-cert” and “ssl-key” options in the “[mysqld]” section)
    2. If the SSL file options are not specified, we look in the data directory for files named “ca.pem”, “server-cert.pem” and “server-key.pem” for the CA file, the server certificate, and the server key, respectively.
  2. Modify the configuration
    1. Overrides the values for socket.ssl_ca, socket.ssl_cert, and socket.ssl_key in
      wsrep_provider_options

       in the “[mysqld]” section.

    2. Sets “encrypt=4” in the “[sst]” section.
    3. Overrides the values for ssl-ca, ssl-cert and ssl-key in the “[sst]” section.

This is not a dynamic setting, and is only available on startup.

Apr
21
2017
--

Enabling Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB Cluster SST Traffic EncryptionIn this blog post, we’ll look at enabling Percona XtraDB Cluster SST Traffic Encryption, and some of the changes to the SSL-based encryption of SST traffic in Percona XtraDB Cluster 5.7.16.

Some background

Percona XtraDB Cluster versions prior to 5.7 support encryption methods 0, 1, 2 and 3:

  • encrypt = 0 : (default) No encryption
  • encrypt = 1 : Symmetric encryption using AES-128, user-supplied key
  • encrypt = 2 : SSL-based encryption with a CA and cert files (via socat)
  • encrypt = 3 : SSL-based encryption with cert and key files (via socat)

We are deprecating modes encrypt=1,2,3 in favor of the new mode, encrypt=4. “encrypt=3” is not recommended, since it does not verify the cert being used (it cannot verify since no Certificate Authority (CA) file is provided). “encrypt=2” and “encrypt=3” use a slightly different way of building the SSL files than MySQL does. In order to remove confusion, we’ve deprecated these modes in favor of “encrypt=4”, which can use the MySQL generated SSL files.

New feature: encrypt= 4

The previous SSL methods (encrypt=2 and encrypt=3), are based on socat usage, http://www.dest-unreach.org/socat/doc/socat-openssltunnel.html. The certs are not built the same way as the certs created by MySQL (for encryption of client communication with MySQL). To simplify SSL configuration and usage, we added a new encryption method (encrypt=4) so that the SSL files generated by MySQL can now be used for SSL encryption of SST traffic.

For instructions on how to create these files, see https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/encrypt-traffic.html.

Configuration

In general, galera views the cluster as homogeneous, so it expects that all nodes are identically configured. This extends to the SSL configuration, so the preferred configuration is that all machines share the same certs/keys. The security implication is that possession of these certs/keys allows a machine to join the cluster and receive all of the data. So proper care must be taken to secure the SSL files.

The mode “encrypt=4” uses the same option names as MySQL, so it reads the SSL information from “ssl-ca”, “ssl-cert”, and “ssl-key” in the “[sst]” section of the configuration file.

Example my.cnf:

[sst]
 encrypt=4
 ssl-ca=/path/to/ca.pem
 ssl-cert=/path/to/server-cert.pem
 ssl-key=/path/to/server-key.pem

All three options (ssl-ca, ssl-cert, and ssl-key) must be specified otherwise the SST will return an error.

ssl-ca

This is the location of the Certificate Authority (CA) file. Only servers that have certificates generated from this CA file will be allowed to connect if SSL is enabled.

ssl-cert

This is the location fo the Certificate file. This is the digital certificate that will be sent to the other side of the SSL connection. The remote server will then verify that this certificate was generated from the Certificate Authority file in use by the remote server.

ssl-key

This is the location of the private key for the certificate specified in ssl-cert.

Apr
20
2017
--

Tracking IST Progress in Percona XtraDB Cluster

Percona XtraDB Cluster SST Traffic Encryption

ISTIn this blog post, we’ll look at how Percona XtraDB Cluster uses IST.

Introduction

Percona XtraDB Cluster uses the concept of an Incremental State Transfer (IST). When a node of the cluster leaves the cluster for a short period of time, it can rejoin the cluster by getting the delta set of missing changes from any active node in the cluster.

This process of getting the delta set of changes is named as IST in Percona XtraDB Cluster.

Tracking IST Progress

The number of write-sets/changes that the joining node needs to catch up on when rejoining the cluster is dictated by:

  1. The duration the node was not present in the cluster
  2. The workload of the cluster during that time frame

This catch-up process can be time-consuming. Until this process is complete, the rejoining node is not ready to process any active workloads.

We believe that any process that is time-consuming should have a progress monitor attached to it. This is exactly what we have done.

In the latest release of Percona XtraDB Cluster 5.7.17-29.20, we added an IST progress monitor that is exposed through SHOW STATUS. This helps you to monitor the percentage of write-sets which has been applied by the rejoining node.

Let’s see this in a working example:

  • Start a two-node cluster
  • Process some basic workloads, allow cluster replication
  • Shutdown node-2
  • Node-1 then continues to process more workloads (the workload fits the allocated gcache)
  • Restart Node-2, causing it to trigger an IST
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| wsrep_ist_receive_status | 3% complete, received seqno 1421771 of 1415410-1589676 |
+--------------------------+--------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 52% complete, received seqno 1506799 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 97% complete, received seqno 1585923 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| wsrep_ist_receive_status | |
+--------------------------+-------+
1 row in set (0.00 sec)

As you can see, the wsrep_ist_receive_status monitoring string indicates the percentage completed, currently received write-set and the range of write-sets applicable to the IST.

Once the IST activity is complete, the variable shows an empty-string.

Closing Comments

I hope you enjoy this newly added feature. Percona Engineering would be happy to hear from you, about more such features that can help you make effective use of Percona XtraDB Cluster. We will try our best to include them in our future plans (based on feasibility).

Note: Special thanks for Kenn Takara and Roel Van de Paar for helping me edit this post.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com