Jan
19
2021
--

The MySQL Clone Wars: Plugin vs. Percona XtraBackup

MySQL Plugin vs. Percona XtraBackup

MySQL Plugin vs. Percona XtraBackupLarge replication topologies are quite common nowadays, and this kind of architecture often requires a quick method to rebuild a replica from another server.

The Clone Plugin, available since MySQL 8.0.17, is a great feature that allows cloning databases out of the box. It is easy to rebuild a replica or to add new nodes to a cluster using the plugin. Before the release of the plugin, the best open-source alternative was Percona XtraBackup for MySQL Databases.

In this blog post, we compare both alternatives for cloning purposes. If you need to perform backups, Percona XtraBackup is a better tool as it supports compression and incremental backups, among other features not provided by the plugin. The plugin supports compression only for network transmission, not for storage.

But one of the plugin’s strong points is simplicity. Once installed and configured, cloning a database is straightforward. Just issuing a command from the destination database is enough.

Percona XtraBackup, on the other side, is a more complex tool. The cloning process involves several stages: backup, stream, write, and prepare. These stages can take place in parallel: we can stream the backup to the new server using netcat and, at the same time, we can write it into the destination directory. The only stage that is sequential is the last one: prepare.

Test Characteristics

We used sysbench to create 200 tables of 124Mb each for a total of 24Gb. Both source and replica virtual machines run 4 cores, 8 Gb RAM, and 60Gb storage. We created the disks on the same datastore.

During the tests, we did not generate additional operations on the database. We measured only the clone process, reducing the benchmark complexity. Otherwise, we would have to take into consideration things like application response time, or the number of transactions executed. This is beyond the scope of this assessment.

We tested different combinations of clone and Percona XtraBackup operations. For XtraBackup, we tested 1 to 4 threads, with and without compression. In the case of compression, we allocated the same number of threads to compression and decompression. For the clone plugin, we tested auto (which lets the server decide how many threads will perform the clone) and 1 to 4 threads. We also tested with and without compression. Finally, we executed all the tests using three different network limits: 500mbps, 1000mbps, and 4000mbps. These make a total of 54 tests, executed 12+1 times each.

All times are in seconds. In the graphs below, lower values are better.

Method

Clone

Out of the required parameters to operate the clone plugin, the following were set up accordingly in the recipient server:

  • clone_max_concurrency=<maximum number of threads> Defines the maximum number of threads used for a remote cloning operation with autotune enabled. Otherwise, this is the exact number of threads that remote cloning uses.
  • clone_autotune_concurrency If enabled the clone operation uses up to clone_max_concurrency threads. The default is 16.
    • clone_enable_compression If enabled, the remote clone operation will use compression.

Percona XtraBackup

To stream the backup we used the xbstream format and sent the data to the remote server using netcat. We applied the following parameters:

  • parallel=<number of threads> Xtrabackup and xbstream parameter that defines the number of threads used for backup and restore operations.
  • rebuild-threads The number of threads used for the rebuild (prepare) operation.
  • decompress_threads and compress_threads Xtrabackup and xbstream parameters that define the number of threads used for compression operations.

Some people use additional parameters like innodb-read-io-threads, innodb-write-io-threads, or innoDB-io-capacity, but these parameters only affect the behavior of InnoDB background threads. They have no impact during backup and restore operations.

Results

Clone

No compression

For the lower bandwidth tests, the number of threads used does not make a difference. Once we increase bandwidth we see that time cuts by half when we move from one thread to two. Going beyond that value improves slightly. Probably we reach the disk i/o limit.

Clone Plugin performance without compression.

The auto option is consistently the fastest one.

Compression

Compression is supposed to improve performance for lower bandwidth connections, but we see that this is not the case. Bandwidth has no impact on execution time and compression makes the clone slower. Again auto gives the best results, equivalent to 4 threads.
clone plugin with compression

Percona XtraBackup

No Compression

Without compression, we see again that the number of threads does not make any difference in the lower bandwidth test. When we increase bandwidth, the number of threads is important, but we quickly reach i/o limits.

Percona Xtrabackup stream without compression

Compression

When using compression, we see that requires less time to complete in almost every case compared with the option without compression, even when bandwidth is not the limit.

Percona Xtrabackup stream with compression

Conclusion

We see that, when using compression, the clone plugin is the slower option while Percona XtraBackup gives great results for all bandwidths. Without compression, the clone plugin is faster when using more than 2 threads. XtraBackup is faster for fewer threads.

Xtrabackup vs. Clone plugin - results summary

Below, we have a chart comparing the worst and best results. As expected, the worst results correspond to one thread executions.

The Clone Plugin is a great option for simplicity. Percona XtraBackup is excellent to save bandwidth and provides better results with fewer threads. With enough threads and bandwidth available, both solutions provide comparable results.

Jan
05
2021
--

MongoDB 101: 5 Configuration Options That Impact Performance and How to Set Them

MongoDB configuration options that impact Performance

MongoDB configuration options that impact PerformanceAs with any database platform, MongoDB performance is of paramount importance to keeping your application running quickly.   In this blog post, we’ll show you five configuration options that can impact the performance for your MongoDB Deployment and will help keep your database fast and performing at its peak.

MongoDB Performance Overview

MongoDB performance consists of several factors; there’s OS Settings, DB Configuration Settings, DB Internal Settings, Memory Settings, and Application Settings.  This post is going to focus on the MongoDB database configuration options around performance and how to set them.  These options are ones that are set in the database configuration itself that can impact your performance.

Configuration Options

So how do we ensure our performance configuration options are enabled or set up correctly?  And which ones are the most important?  We’ll now go through five configuration options that will help your MongoDB environment be performant!

MongoDB uses a configuration file in the YAML file format.  The configuration file is usually found in the following locations, depending on your Operating System:

DEFAULT CONFIGURATION FILE

  • On Linux, a default /etc/mongod.conf configuration file is included when using a package manager to install MongoDB.
  • On Windows, a default <install directory>/bin/mongod.cfg configuration file is included during the installation.
  • On macOS, a default /usr/local/etc/mongod.conf configuration file is included when installing from MongoDB’s official Homebrew tap.

 

storage.wiredTiger.engineConfig.cacheSizeGB

Our first configuration option to help with your MongoDB performance is storage.wiredTiger.engineConfig.cacheSizeGB.

storage:
   wiredTiger:
       engineConfig:
           cacheSizeGB: <value>

Since MongoDB 3.0, MongoDB has used WiredTiger as its default Storage Engine, so we’ll be examining MongoDB Memory Performance from a WiredTiger perspective. By default, MongoDB will reserve 50% of the available memory – 1 GB for the WiredTiger cache or 256 MB, whichever is greater.  For example, a system with 16 GB of RAM would have a WiredTiger cache size of 7.5 GB.

( 0.5 * (16-1) )

The size of this cache is important to ensure WiredTiger is performant. It’s worth taking a look to see if you should alter it from the default. A good rule of thumb is that the size of the cache should be large enough to hold the entire application working set.

Note that if you’re in a containerized environment, you may also need to set the configuration option to 50% – 1 GB of the memory that is available to the container.  MongoDB may adhere to the container’s memory limits or it may get the host’s memory limit, depending on how the system call is returned when MongoDB asks.  You can verify what MongoDB believes the memory limit is by running:

db.hostInfo()

And checking the hostinfo.system.memLimitMB value.  This is available from MongoDB 3.6.13+ and MongoDB 4.0.9+ forward.

How do we know whether to increase or decrease our cache size? Look at the cache usage statistics:

db.serverStatus().wiredTiger.cache
{
"application threads page read from disk to cache count" : 9,
"application threads page read from disk to cache time (usecs)" : 17555,
"application threads page write from cache to disk count" : 1820,
"application threads page write from cache to disk time (usecs)" : 1052322,
"bytes allocated for updates" : 20043,
"bytes belonging to page images in the cache" : 46742,
"bytes belonging to the history store table in the cache" : 173,
"bytes currently in the cache" : 73044,
"bytes dirty in the cache cumulative" : 38638327,
"bytes not belonging to page images in the cache" : 26302,
"bytes read into cache" : 43280,
"bytes written from cache" : 20517382,
"cache overflow score" : 0,
"checkpoint blocked page eviction" : 0,
"eviction calls to get a page" : 5973,
"eviction calls to get a page found queue empty" : 4973,
"eviction calls to get a page found queue empty after locking" : 20,
"eviction currently operating in aggressive mode" : 0,
"eviction empty score" : 0,
"eviction passes of a file" : 0,
"eviction server candidate queue empty when topping up" : 0,
"eviction server candidate queue not empty when topping up" : 0,
"eviction server evicting pages" : 0,
"eviction server slept, because we did not make progress with eviction" : 735,
"eviction server unable to reach eviction goal" : 0,
"eviction server waiting for a leaf page" : 2,
"eviction state" : 64,
"eviction walk target pages histogram - 0-9" : 0,
"eviction walk target pages histogram - 10-31" : 0,
"eviction walk target pages histogram - 128 and higher" : 0,
"eviction walk target pages histogram - 32-63" : 0,
"eviction walk target pages histogram - 64-128" : 0,
"eviction walk target strategy both clean and dirty pages" : 0,
"eviction walk target strategy only clean pages" : 0,
"eviction walk target strategy only dirty pages" : 0,
"eviction walks abandoned" : 0,
"eviction walks gave up because they restarted their walk twice" : 0,
"eviction walks gave up because they saw too many pages and found no candidates" : 0,
"eviction walks gave up because they saw too many pages and found too few candidates" : 0,
"eviction walks reached end of tree" : 0,
"eviction walks started from root of tree" : 0,
"eviction walks started from saved location in tree" : 0,
"eviction worker thread active" : 4,
"eviction worker thread created" : 0,
"eviction worker thread evicting pages" : 902,
"eviction worker thread removed" : 0,
"eviction worker thread stable number" : 0,
"files with active eviction walks" : 0,
"files with new eviction walks started" : 0,
"force re-tuning of eviction workers once in a while" : 0,
"forced eviction - history store pages failed to evict while session has history store cursor open" : 0,
"forced eviction - history store pages selected while session has history store cursor open" : 0,
"forced eviction - history store pages successfully evicted while session has history store cursor open" : 0,
"forced eviction - pages evicted that were clean count" : 0,
"forced eviction - pages evicted that were clean time (usecs)" : 0,
"forced eviction - pages evicted that were dirty count" : 0,
"forced eviction - pages evicted that were dirty time (usecs)" : 0,
"forced eviction - pages selected because of too many deleted items count" : 0,
"forced eviction - pages selected count" : 0,
"forced eviction - pages selected unable to be evicted count" : 0,
"forced eviction - pages selected unable to be evicted time" : 0,
"forced eviction - session returned rollback error while force evicting due to being oldest" : 0,
"hazard pointer blocked page eviction" : 0,
"hazard pointer check calls" : 902,
"hazard pointer check entries walked" : 25,
"hazard pointer maximum array length" : 1,
"history store key truncation calls that returned restart" : 0,
"history store key truncation due to mixed timestamps" : 0,
"history store key truncation due to the key being removed from the data page" : 0,
"history store score" : 0,
"history store table insert calls" : 0,
"history store table insert calls that returned restart" : 0,
"history store table max on-disk size" : 0,
"history store table on-disk size" : 0,
"history store table out-of-order resolved updates that lose their durable timestamp" : 0,
"history store table out-of-order updates that were fixed up by moving existing records" : 0,
"history store table out-of-order updates that were fixed up during insertion" : 0,
"history store table reads" : 0,
"history store table reads missed" : 0,
"history store table reads requiring squashed modifies" : 0,
"history store table remove calls due to key truncation" : 0,
"history store table writes requiring squashed modifies" : 0,
"in-memory page passed criteria to be split" : 0,
"in-memory page splits" : 0,
"internal pages evicted" : 0,
"internal pages queued for eviction" : 0,
"internal pages seen by eviction walk" : 0,
"internal pages seen by eviction walk that are already queued" : 0,
"internal pages split during eviction" : 0,
"leaf pages split during eviction" : 0,
"maximum bytes configured" : 8053063680,
"maximum page size at eviction" : 376,
"modified pages evicted" : 902,
"modified pages evicted by application threads" : 0,
"operations timed out waiting for space in cache" : 0,
"overflow pages read into cache" : 0,
"page split during eviction deepened the tree" : 0,
"page written requiring history store records" : 0,
"pages currently held in the cache" : 24,
"pages evicted by application threads" : 0,
"pages queued for eviction" : 0,
"pages queued for eviction post lru sorting" : 0,
"pages queued for urgent eviction" : 902,
"pages queued for urgent eviction during walk" : 0,
"pages read into cache" : 20,
"pages read into cache after truncate" : 902,
"pages read into cache after truncate in prepare state" : 0,
"pages requested from the cache" : 33134,
"pages seen by eviction walk" : 0,
"pages seen by eviction walk that are already queued" : 0,
"pages selected for eviction unable to be evicted" : 0,
"pages selected for eviction unable to be evicted as the parent page has overflow items" : 0,
"pages selected for eviction unable to be evicted because of active children on an internal page" : 0,
"pages selected for eviction unable to be evicted because of failure in reconciliation" : 0,
"pages walked for eviction" : 0,
"pages written from cache" : 1822,
"pages written requiring in-memory restoration" : 0,
"percentage overhead" : 8,
"tracked bytes belonging to internal pages in the cache" : 5136,
"tracked bytes belonging to leaf pages in the cache" : 67908,
"tracked dirty bytes in the cache" : 493,
"tracked dirty pages in the cache" : 1,
"unmodified pages evicted" : 0
}

There’s a lot of data here about WiredTiger’s cache, but we can focus on the following fields:

Looking at the above values, we can determine if we need to up the size of the WiredTiger cache for our instance. Additionally, we can look at the wiredTiger.cache.pages read into cache value for read-heavy applications. If this value is consistently high, increasing the cache size may improve overall read performance.

storage.wiredTiger.engineConfig.directoryForIndexes

Our second configuration option is storage.wiredTiger.engineConfig.directoryForIndexes.

storage:
   wiredTiger:
       engineConfig:
           directoryForIndexes: <true or false>

Setting this value to true creates two directories in your storage.dbPath directory, one named collection which will hold your collection data files, and another one named index which will hold your index data files.  This allows you to create separate storage volumes for collections and indexes if you wish, which can spread the amount of disk I/O across each volume, but with most modern storage options you can get the same performance benefits by just striping your disk across two volumes (RAID 0).  This can help separate index I/O from collection-based I/O and reduce storage based latencies, although index-based I/O is unlikely to be costly due to its smaller size.

storage.wiredTiger.collectionConfig.blockCompressor

Our third configuration option is storage.wiredTiger.collectionConfig.blockCompressor.

storage:
   wiredTiger:
       collectionConfig:
           blockCompressor: <value>

This option sets the compression options for all of your collection data.  Possible values for this parameter are none, snappy (default), zlib, and zstd.  So how does compression help your performance?  The WiredTiger cache generally stores changes uncompressed, with the exception of some very large documents. Now we need to write that uncompressed data to disk.

Compression Types:

Snappy compression is fairly straightforward, snappy compression gathers your data up to a maximum of 32KB, compresses your data, and if compression is successful, writes the block rounded up to the nearest 4KB.

Zlib compression works a little differently; it will gather more data and compress enough to fill a 32KB block on disk. This is more CPU-intensive but generally results in better compression ratios (independent of the inherent differences between snappy and zlib).

Zstd is a newer compression algorithm, developed by Facebook that offers improvements over zlib (better compression rates, less CPU intensive, faster performance).

Which compression algorithm to choose depends greatly on your workload.  For most write-heavy workloads, snappy compression will perform better than zlib and zstd but will require more disk space.  For read-heavy workloads, zstd is often the best choice because of its better decompression rates.

storage.directoryPerDB

Another configuration option to help with MongoDB performance is storage.directoryPerDB.

storage:
   directoryPerDB: <true or false>

Similar to the above configuration file option, storage.wiredTiger.engineConfig.directoryForIndexes, setting this value to true creates a separate directory in your storage.dbPath for each database in your MongoDB instance.  This allows you to create separate storage volumes for each database if you wish, which can spread the amount of disk I/O across each volume.  This can help when you have multiple databases with intensive I/O needs. Additionally, if you use this parameter in tandem with storage.wiredTiger.engineConfig.directorForIndexes, your directory structure will look like this:

-Database_name
    -Collection
    -Index

net.compression.compressors

Our final configuration option that can help keep your database performant is the net.compression.compressors configuration option.

net:
   compression:
       compressors: <value>

This option allows you to compress the network traffic between your mongos, mongod, and even your mongo shell. There are currently three types of compression available, snappy, zlib, and zstd. Since MongoDB 3.6, compression has been enabled by default.  In MongoDB 3.6 and 4.0, snappy was the default.  Since MongoDB 4.2, the default is now snappy, zstd, and zlib compressors, in that order.  It’s also important to note that you must have at least one mutual compressor on each side of your network conversation for compression to happen.  For example, if your shell uses zlib compression but you have your mongod set to only accept snappy compression, then no compression will occur between the two. If both accept zstd compression, then zstd compression would be used between them.  When compression is set it can be very helpful in reducing replication lag and overall reducing network latency as the size of the data moving across the network is decreased, sometimes dramatically.  In cloud environments, setting this configuration option can also lead to decreased data transfer costs.

Summary:

In this blog post, we’ve gone over five MongoDB configuration options to ensure you have a more performant MongoDB deployment.  We hope that these configuration options will help you build more performant MongoDB deployments and avoid slowdowns and bottlenecks.   Thanks for reading!

Additional Resources: MongoDB Best Practices 2020 Edition

Aug
05
2019
--

Webinar 8/7: Performance Analyses and Troubleshooting Technologies for Databases

webinar Performance Analyses and Troubleshooting Technologies

webinar Performance Analyses and Troubleshooting TechnologiesPlease join Percona CEO Peter Zaitsev as he presents “Performance Analyses and Troubleshooting Technologies for Databases” on Wednesday, August 7th, 2019 at 11:00 AM PDT (UTC-7).

Register Now

Have you heard about the USE Method (Utilization – Saturation – Errors), RED (Rate – Errors – Duration) or Golden Signals (Latency – Traffic – Errors – Saturations)?

In this presentation, we will talk briefly about these different-but-similar “focuses” and discuss how we can apply them to data infrastructure performance analysis troubleshooting and monitoring.

We will use MySQL as an example, but most of this talk applies to other database technologies as well.

If you can’t attend, sign up anyways we’ll send you the slides and recording afterward.

Feb
22
2019
--

Measuring Percona Server for MySQL On-Disk Decryption Overhead

benchmark heavy IO percona server for mysql 8 encryption

Percona Server for MySQL 8.0 comes with enterprise grade total data encryption features. However, there is always the question of how much overhead – or performance penalty – comes with the data decryption. As we saw in my networking performance post, SSL under high concurrency might be problematic. Is this the case for data decryption?

To measure any overhead, I will start with a simplified read-only workload, where data gets decrypted during read IO.

MySQL decryption schematic

During query execution, the data in memory is already decrypted so there is no additional processing time. The decryption happens only for blocks that require a read from storage.

For the benchmark I will use the following workload:

sysbench oltp_read_only --mysql-ssl=off --tables=20 --table-size=10000000 --threads=$i --time=300 --report-interval=1 --rand-type=uniform run

The datasize for this workload is about 50GB, so I will use

innodb_buffer_pool_size = 5GB

  to emulate a heavy disk read IO during the benchmark. In the second run, I will use

innodb_buffer_pool_size = 60GB

  so all data is kept in memory and there are NO disk read IO operations.

I will only use table-level encryption at this time (ie: no encryption for binary log, system tablespace, redo-  and undo- logs).

The server I am using has AES hardware CPU acceleration. Read more at https://en.wikipedia.org/wiki/AES_instruction_set

Benchmark N1, heavy read IO

benchmark heavy IO percona server for mysql 8 encryption

Threads encrypted storage no encryption encryption overhead
1 389.11 423.47 1.09
4 1531.48 1673.2 1.09
16 5583.04 6055 1.08
32 8250.61 8479.61 1.03
64 8558.6 8574.43 1.00
96 8571.55 8577.9 1.00
128 8570.5 8580.68 1.00
256 8576.34 8585 1.00
512 8573.15 8573.73 1.00
1024 8570.2 8562.82 1.00
2048 8422.24 8286.65 0.98

Benchmark N2, data in memory, no read IO

benchmark data in memory percona server for mysql 8 encryption

Threads Encryption No encryption
1 578.91 567.65
4 2289.13 2275.12
16 8304.1 8584.06
32 13324.02 13513.39
64 20007.22 19821.48
96 19613.82 19587.56
128 19254.68 19307.82
256 18694.05 18693.93
512 18431.97 18372.13
1024 18571.43 18453.69
2048 18509.73 18332.59

Observations

For a high number of threads, there is no measurable difference between encrypted and unencrypted storage. This is because a lot of CPU resources are spent in contention and waits, so the relative time spend in decryption is negligible.

However, we can see some performance penalty for a low number of threads: up to 9% penalty for hardware decryption. When data fully fits into memory, there is no measurable difference between encrypted and unencrypted storage.

So if you have hardware support then you should see little impact when using storage encryption with MySQL. The easiest way to check if you have support for this is to look at CPU flags and search for ‘aes’ string:

> lscpu | grep aes Flags: ... tsc_deadline_timer aes xsave avx f16c ...

Aug
07
2018
--

Resource Usage Improvements in Percona Monitoring and Management 1.13

PMM 1-13 reduction CPU usage by 5x

In Percona Monitoring and Management (PMM) 1.13 we have adopted Prometheus 2, and with this comes a dramatic improvement in resource usage, along with performance improvements!

What does it mean for you? This means you can have a significantly larger number of servers and database instances monitored by the same PMM installation. Or you can reduce the instance size you use to monitor your environment and save some money.

Let’s look at some stats!

CPU Usage

PMM 1.13 reduction in CPU usage by 5x

Percona Monitoring and Management 1.13 reduction in CPU usage after adopting Prometheus 2 by 8x

We can see an approximate 5x and 8x reduction of CPU usage on these two PMM Servers. Depending on the workload, we see CPU usage reductions to range between 3x and 10x.

Disk Writes

There is also less disk write bandwidth required:

PMM 1.13 reduction in disk write bandwidth

On this instance, the bandwidth reduction is “just” 1.5x times. Note this is disk IO for the entire PMM system, which includes more than only the Prometheus component. Prometheus 2 itself promises much more significant IO bandwidth reduction according to official benchmarks

According to the same benchmark, you should expect disk space usage reduction by 33-50% for Prometheus 2 vs Prometheus 1.8. The numbers will be less for Percona Monitoring and Management, as it also stores Query Statistics outside of Prometheus.

Resource usage on the monitored hosts

Also, resource usage on the monitored hosts is significantly reduced:

Percona Monitoring and Management 1.13 reduction of resource usage by Prometheus 2

Why does CPU usage go down on a monitored host with a Prometheus 2 upgrade? This is because PMM uses TLS for the Prometheus to monitored host communication. Before Prometheus 2, a full handshake was performed for every scrape, taking a lot of CPU time. This was optimized with Prometheus 2, resulting in a dramatic CPU usage decrease.

Query performance is also a lot better with Prometheus 2, meaning dashboards visually load a lot faster, though we did not do any specific benchmarks here to share the hard numbers. Note though this improvement only applies when you’re querying the data which is stored in Prometheus 2.

If you’re querying data that was originally stored in Prometheus 1.8, it will be queried through the much slower and less efficient “Remote Read” interface, being quite a bit slower and using a lot more CPU and memory resources.

If you love better efficiency and Performance, consider upgrading to PMM 1.13!

The post Resource Usage Improvements in Percona Monitoring and Management 1.13 appeared first on Percona Database Performance Blog.

Aug
01
2018
--

Saving With MyRocks in The Cloud

The main focus of a previous blog post was the performance of MyRocks when using fast SSD devices. However, I figured that MyRocks would be beneficial for use in cloud workloads, where storage is either slow or expensive.

In that earlier post, we demonstrated the benefits of MyRocks, especially for heavy IO workloads. Meanwhile, Mark wrote in his blog that the CPU overhead in MyRocks might be significant for CPU-bound workloads, but this should not be the issue for IO-bound workloads.

In the cloud the cost of resources is a major consideration. Let’s review the annual cost for the processing and storage resources.

 Resource cost/year, $   IO cost $/year   Total $/year 
c5.9xlarge  7881    7881
1TB io1 5000 IOPS  1500  3900    5400
1TB io1 10000 IOPS  1500  7800    9300
1TB io1 15000 IOPS  1500  11700  13200
1TB io1 20000 IOPS  1500  15600  17100
1TB io1 30000 IOPS  1500  23400  24900
3.4TB GP2 (10000 IOPS)  4800    4800

 

The scenario

The server version is Percona Server 5.7.22

For instances, I used c5.9xlarge instances. The reason for c5 was that it provides high performance Nitro virtualization: Brendan Gregg describes this in his blog post. The rationale for 9xlarge instances was to be able to utilize io1 volumes with a 30000 IOPS throughput – smaller instances will cap io1 throughput at a lower level.

I also used huge gp2 volumes: 3400GB, as this volume provides guaranteed 10000 IOPS even if we do not use io1 volumes. This is a cheaper alternative to io1 volumes to achieve 10000 IOPS.

For the workload I used sysbench-tpcc 5000W (50 tables * 100W), which for InnoDB gave about 471GB in storage used space.

For the cache I used 27GB and 54G buffer size, so the workload is IO-heavy.

I wanted to compare how InnoDB and RocksDB performed under this scenario.

If you are curious I prepared my terraform+ansible deployment files here: https://github.com/vadimtk/terraform-ansible-percona

Before jumping to the results, I should note that for MyRocks I used LZ4 compression for all levels, which in its final size is 91GB. That is five times less than InnoDB size. This alone provides operational benefits—for example to copy InnoDB files (471GB) from a backup volume takes longer than 1 hour, while it is much faster (five times) for MyRocks.

The benchmark results

So let’s review the results.

InnoDB versus MyRocks throughput in the cloud

Or presenting average throughput in a tabular form:

cachesize IOPS engine avg TPS
27 5000 innodb 132.66
27 5000 rocksdb 481.03
27 10000 innodb 285.93
27 10000 rocksdb 1224.14
27 10000gp2 innodb 227.19
27 10000gp2 rocksdb 1268.89
27 15000 innodb 436.04
27 15000 rocksdb 1839.66
27 20000 innodb 584.69
27 20000 rocksdb 2336.94
27 30000 innodb 753.86
27 30000 rocksdb 2508.97
54 5000 innodb 197.51
54 5000 rocksdb 667.63
54 10000 innodb 433.99
54 10000 rocksdb 1600.01
54 10000gp2 innodb 326.12
54 10000gp2 rocksdb 1559.98
54 15000 innodb 661.34
54 15000 rocksdb 2176.83
54 20000 innodb 888.74
54 20000 rocksdb 2506.22
54 30000 innodb 1097.31
54 30000 rocksdb 2690.91

 

We can see that MyRocks outperformed InnoDB in every single combination, but it is also important to note the following:

MyRocks on io1 5000 IOPS showed the performance that InnoDB showed in io1 15000 IOPS.

That means that InnoDB requires three times more in storage throughput. If we take a look at the storage cost, it corresponds to three times more expensive storage. Given that MyRocks requires less storage, it is possible to save even more on storage capacity.

On the most economical storage (3400GB gp2, which will provide 10000 IOPS) MyRocks showed 4.7 times better throughput.

For the 30000 IOPS storage, MyRocks was still better by 2.45 times.

However it is worth noting that MyRocks showed a greater variance in throughput during the runs. Let’s review the charts with 1 sec resolution for GP2 and io1 30000 IOPS storage:Throughput 1 sec resolution for GP2 and io1 30000 IOPS storage MyROCKS versus InnoDB

Such variance might be problematic for workloads that require stable throughput and where periodical slowdowns are unacceptable.

Conclusion

MyRocks is suitable and beneficial not only for fast SSD, but also for cloud deployments. By requiring less IOPS, MyRocks can provide better performance and save on the storage costs.

However, before evaluating MyRocks, make sure that your workload is IO-bound i.e. the working set is much bigger than available memory. For CPU-intensive workloads (where the working set fits into memory), MyRocks will be less beneficial or even perform worse than InnoDB (as described in the blog post A Look at MyRocks Performance)

 

 

 

The post Saving With MyRocks in The Cloud appeared first on Percona Database Performance Blog.

Jun
18
2018
--

Webinar Tues 19/6: MySQL: Scaling and High Availability – Production Experience from the Last Decade(s)

scale high availability

scale high availability
Please join Percona’s CEO, Peter Zaitsev as he presents MySQL: Scaling and High Availability – Production Experience Over the Last Decade(s) on Tuesday, June 19th, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

Percona is known as the MySQL performance experts. With over 4,000 customers, we’ve studied, mastered and executed many different ways of scaling applications. Percona can help ensure your application is highly available. Come learn from our playbook, and leave this talk knowing your MySQL database will run faster and more optimized than before.

Register Now

About Peter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

The post Webinar Tues 19/6: MySQL: Scaling and High Availability – Production Experience from the Last Decade(s) appeared first on Percona Database Performance Blog.

May
07
2018
--

Webinar Wednesday, May 9, 2018: MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM)

MySQL Troubleshooting

MySQL TroubleshootingPlease join Percona’s CEO, Peter Zaitsev as he presents MySQL Troubleshooting and Performance Optimization with PMM on Wednesday, May 9, 2018, at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

Optimizing MySQL performance and troubleshooting MySQL problems are two of the most critical and challenging tasks for MySQL DBAs. The databases powering your applications must handle heavy traffic loads while remaining responsive and stable so that you can deliver an excellent user experience. Further, DBAs’ bosses expect solutions that are cost-efficient.

In this webinar, Peter discusses how you can optimize and troubleshoot MySQL performance and demonstrate how Percona Monitoring and Management (PMM) enables you to solve these challenges using free and open source software. We will look at specific, common MySQL problems and review the essential components in PMM that allow you to diagnose and resolve them.

Register for the webinar now.

Peter ZaitsevPeter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. The Inc. 5000 recognized Percona in 2013, 2014, 2015 and 2016. Peter was an early employee at MySQL AB, eventually leading the company’s High-Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High-Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He was also tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization is one of percona.com’s most popular downloads.

The post Webinar Wednesday, May 9, 2018: MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM) appeared first on Percona Database Performance Blog.

Jan
18
2018
--

Does the Meltdown Fix Affect Performance for MySQL on Bare Metal?

Meltdown Fix Affect Performance small

In this blog post, we’ll look at does the Meltdown fix affect performance for MySQL on bare metal servers.

Since the news about the Meltdown bug, there were a lot of reports on the performance hit from proposed fixes. We have looked at how the fix affects MySQL (Percona Server for MySQL) under a sysbench workload.

In this case, we used bare metal boxes with the following specifications:

  • Two-socket Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz (in total 56 entries in /proc/cpuinfo)
  • Ubuntu 16.04
  • Memory: 256GB
  • Storage: Samsung SM863 1.9TB SATA SSD
  • Percona Server for MySQL 5.7.20
  • Kernel (vulnerable) 4.13.0-21
  • Kernel (with Meltdown fix) 4.13.0-25

Please note, the current kernel for Ubuntu 16.04 contains only a Meltdown fix, and not one for Spectre.

We performed the validation with the https://github.com/speed47/spectre-meltdown-checker tool. The database size is 100GB in a sysbench workload with 100 tables, 4mln rows each with Pareto distribution.

We have used a socket connection and TCP host connection to measure a possible overhead from the TCP network connection. We also perform read-write and read-only benchmarks.

The results are below for a various number of threads:

Meltdown Fix Affect Performance

Where

  • Nokpti: kernel without KPTI patch (4.13.0-21)
  • Pti: kernel with KPTI patch (4.13.0-25), with PTI enabled
  • Nopti: kernel with KPTI patch (4.13.0-25), with PTI disabled

 

testname bp socket threads pti nopti nokpti nopti_pct pti_pct
1 OLTP_RO in-memory tcp_socket 1 709.93 718.47 699.50 -2.64 -1.47
4 OLTP_RO in-memory tcp_socket 8 5473.05 5500.08 5483.40 -0.30 0.19
3 OLTP_RO in-memory tcp_socket 64 21716.18 22036.98 21548.46 -2.22 -0.77
2 OLTP_RO in-memory tcp_socket 128 21606.02 22010.36 21548.62 -2.10 -0.27
 5 OLTP_RO in-memory unix_socket 1 750.41 759.33 776.88 2.31 3.53
8 OLTP_RO in-memory unix_socket 8 5851.80 5896.86 5986.89 1.53 2.31
7 OLTP_RO in-memory unix_socket 64 23052.10 23552.26 23191.48 -1.53 0.60
6 OLTP_RO in-memory unix_socket 128 23215.38 23602.64 23146.42 -1.93 -0.30
9 OLTP_RO io-bound tcp_socket 1 364.03 369.68 370.51 0.22 1.78
12 OLTP_RO io-bound tcp_socket 8 3205.05 3225.21 3210.63 -0.45 0.17
11 OLTP_RO io-bound tcp_socket 64 15324.66 15456.44 15364.25 -0.60 0.26
10 OLTP_RO io-bound tcp_socket 128 17705.29 18007.45 17748.70 -1.44 0.25
13 OLTP_RO io-bound unix_socket 1 421.74 430.10 432.88 0.65 2.64
16 OLTP_RO io-bound unix_socket 8 3322.19 3367.46 3367.34 -0.00 1.36
15 OLTP_RO io-bound unix_socket 64 15977.28 16186.59 16248.42 0.38 1.70
14 OLTP_RO io-bound unix_socket 128 18729.10 19111.55 18962.02 -0.78 1.24
17 OLTP_RW in-memory tcp_socket 1 490.76 495.21 489.49 -1.16 -0.26
20 OLTP_RW in-memory tcp_socket 8 3445.66 3459.16 3414.36 -1.30 -0.91
19 OLTP_RW in-memory tcp_socket 64 11165.77 11167.44 10861.44 -2.74 -2.73
18 OLTP_RW in-memory tcp_socket 128 12176.96 12226.17 12204.85 -0.17 0.23
21 OLTP_RW in-memory unix_socket 1 530.08 534.98 540.27 0.99 1.92
24 OLTP_RW in-memory unix_socket 8 3734.93 3757.98 3772.17 0.38 1.00
23 OLTP_RW in-memory unix_socket 64 12042.27 12160.86 12138.01 -0.19 0.80
22 OLTP_RW in-memory unix_socket 128 12930.34 12939.02 12844.78 -0.73 -0.66
25 OLTP_RW io-bound tcp_socket 1 268.08 270.51 270.71 0.07 0.98
28 OLTP_RW io-bound tcp_socket 8 1585.39 1589.30 1557.58 -2.00 -1.75
27 OLTP_RW io-bound tcp_socket 64 4828.30 4782.42 4620.57 -3.38 -4.30
26 OLTP_RW io-bound tcp_socket 128 5158.66 5172.82 5321.03 2.87 3.15
29 OLTP_RW io-bound unix_socket 1 280.54 282.06 282.35 0.10 0.65
32 OLTP_RW io-bound unix_socket 8 1582.69 1584.52 1601.26 1.06 1.17
31 OLTP_RW io-bound unix_socket 64 4519.45 4485.72 4515.28 0.66 -0.09
30 OLTP_RW io-bound unix_socket 128 5524.28 5460.03 5275.53 -3.38 -4.50

 

As you can see, there is very little difference between runs (in 3-4% range), which fits into variance during the test.

Similar experiments were done on different servers and workloads:

There also we see a negligible difference that fits into measurement variance.

Overhead analysis

To understand why we do not see much effect in MySQL (InnoDB workloads), let’s take a look where we expect to see the overhead from the proposed fix.

The main overhead is expected from a system call, so let’s test syscall execution on the kernel before the fix and after the fix (thanks for Alexey Kopytov for an idea how to test it with sysbench).

We will use the following script syscall.lua:

ffi.cdef[[long syscall(long, long, long, long);]]
function event()
 for i = 1, 10000 do
 ffi.C.syscall(0, 0, 0, 0)
 end
end

Basically, we measure the time for executing 10000 system calls (this will be one event).

To run benchmark:

sysbench syscall.lua --time=60 --report-interval=1 run

 

And the results are following:

  • On the kernel without the fix (4.13.0-21): 455 events/sec
  • On the kernel with the fix (4.13.0-26): 250 events/sec

This means that time to execute 10000 system calls increased from 2.197ms to 4ms.

While this increase looks significant, it does not have much effect on MySQL (InnoDB engine). In MySQL, you can expect most system calls done for IO or network communication.

We can assume that the time to execute 10000 IO events on the fast storage takes 1000ms, so adding an extra 2ms for the system calls corresponds to adding 0.2% in overhead (which is practically invisible in MySQL workloads).

I expect the effect will be much more visible if we work with MyISAM tables cached in OS memory. In this case, the syscall overhead would be much more visible when accessing data in memory.

Conclusion:

From our results, we do not see a measurable effect from KPTI patches (to mitigate the Meltdown vulnerability) running on bare metal servers with Ubuntu 16.04 and 4.13 kernel series.

Reference commands and configs:

sysbench oltp_read_only.lua   {--mysql-socket=/tmp/mysql.sock|--mysql-host=127.0.0.1} --mysql-user=root
--mysql-db=sbtest100t4M --rand-type=pareto  --tables=100  --table-size=4000000 --num-threads=$threads --report-interval=1
--max-time=180 --max-requests=0  run

RW:

sysbench oltp_read_write.lua   {--mysql-socket=/tmp/mysql.sock|--mysql-host=127.0.0.1} --mysql-user=root
--mysql-db=sbtest100t4M --rand-type=pareto  --tables=100  --table-size=4000000 --num-threads=$threads --report-interval=1
--max-time=180 --max-requests=0  run

mysqld:
Percona Server 5.7.20-19

numactl --physcpubind=all --interleave=all   /usr/bin/env LD_PRELOAD=/data/opt/alexey.s/bin64_5720.ps/lib/mysql/libjemalloc.so.1 ./bin/mysqld
--defaults-file=/data/opt/alexey.s/my-perf57.cnf --basedir=. --datadir=/data/sam/sbtest100t4M   --user=root  --innodb_flush_log_at_trx_commit=1
--innodb-buffer-pool-size=150GB --innodb-log-file-size=10G --innodb-buffer-pool-instances=8  --innodb-io-capacity-max=20000
--innodb-io-capacity=10000 --loose-innodb-page-cleaners=8 --ssl=0

My.cnf file:

[mysqld]
user=root
port=3306
innodb_status_file=0
innodb_data_file_path=ibdata1:100M:autoextend
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = true
innodb_log_buffer_size = 128M
innodb_log_file_size = 10G
innodb_log_files_in_group = 2
innodb_write_io_threads=8
innodb_read_io_threads=8
innodb_io_capacity=15000
innodb_io_capacity_max=25000
innodb_lru_scan_depth=8192
#innodb_buffer_pool_size=${BP}G
innodb_doublewrite=1
innodb_thread_concurrency=0
innodb-checksum-algorithm=strict_crc32
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_purge_threads=8
loose-innodb-page-cleaners=8
innodb_buffer_pool_instances=8
innodb_change_buffering=none
innodb_adaptive_hash_index=OFF
sync_binlog=0
max_connections=5000
table_open_cache=5000
query_cache_type=OFF
thread_cache_size=16
back_log=2000
connect_timeout=15
skip-grant-tables
sort_buffer_size=262144
key_buffer_size=8388608
join_buffer_size=262144
server-id=1
max_connections=50000
skip_name_resolve=ON
max_prepared_stmt_count=1048560
performance_schema=OFF
performance-schema-instrument='wait/synch/%=ON'
innodb_monitor_enable=all
innodb_flush_neighbors=0
metadata_locks_hash_instances=256
table_open_cache_instances=64

Oct
18
2017
--

How to Choose the MySQL innodb_log_file_size

innodb_log_file_size

In this blog post, I’ll provide some guidance on how to choose the MySQL innodb_log_file_size.

Like many database management systems, MySQL uses logs to achieve data durability (when using the default InnoDB storage engine). This ensures that when a transaction is committed, data is not lost in the event of crash or power loss.

MySQL’s InnoDB storage engine uses a fixed size (circular) Redo log space. The size is controlled by innodb_log_file_size and innodb_log_files_in_group (default 2). You multiply those values and get the Redo log space that available to use. While technically it shouldn’t matter whether you change either the innodb_log_file_size or innodb_log_files_in_group variable to control the Redo space size, most people just work with the innodb_log_file_size and leave innodb_log_files_in_group alone.

Configuring InnoDB’s Redo space size is one of the most important configuration options for write-intensive workloads. However, it comes with trade-offs. The more Redo space you have configured, the better InnoDB can optimize write IO. However, increasing the Redo space also means longer recovery times when the system loses power or crashes for other reasons.  

It is not easy or straightforward to predict how much time a system crash recovery takes for a specific innodb_log_file_size value – it depends on the hardware, MySQL version and workload. It can vary widely (10 times difference or more, depending on the circumstances). However, around five minutes per 1GB of innodb_log_file_size is a decent ballpark number. If this is really important for your environment, I would recommend testing it by a simulating system crash under full load (after the database has completely warmed up).   

While recovery time can be a guideline for the limit of the InnoDB Log File size, there are a couple of other ways you can look at this number – especially if you have Percona Monitoring and Management installed.

Check Percona Monitoring and Management’s “MySQL InnoDB Metrics” Dashboard. If you see a graph like this:

innodb_log_file_size

where Uncheckpointed Bytes is pushing very close to the Max Checkpoint Age, you can almost be sure your current innodb_log_file_size is limiting your system’s performance. Increasing it can provide substantial performance improvements.

If you see something like this instead:

innodb_log_file_size 2

where the number of Uncheckpointed Bytes is well below the Max Checkpoint Age, then increasing the log file size won’t give you a significant improvement.

Note: many MySQL settings are interconnected. While a specific log file size might be good enough for smaller innodb_buffer_pool_size, larger InnoDB Buffer Pool values might warrant larger log files for optimal performance.

Another thing to keep in mind: the recovery time we spoke about early really depends on the Uncheckpointed Bytes rather than total log file size. If you do not see recovery time increasing with a larger innodb_log_file_size, check out InnoDB Checkpoint Age graph – it might be you just can’t fully utilize large log files with your workload and configuration.

Another way to look at the log file size is in context of log space usage:

innodb_log_file_size 3

This graph shows the amount of Data Written to the InnoDB log files per hour, as well as the total size of the InnoDB log files. In the graph above, we have 2GB of log space and some 12GB written to the Log files per hour. This means we cycle through logs every ten minutes.

InnoDB has to flush every dirty page in the buffer pool at least once per log file cycle time.

InnoDB gets better performance when it does that less frequently, and there is less wear and tear on SSD devices. I like to see this number at no less than 15 minutes. One hour is even better.  

Summary

Getting the innodb_log_file_file size is important to achieve the balance between reasonably fast crash recovery time and good system performance. Remember, your recovery time objective it is not as trivial as you might imagine. I hope the techniques described in this post help you to find the optimal value for your situation!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com