Feb
22
2018
--

How to Restore MySQL Logical Backup at Maximum Speed

Restore MySQL Logical Backup

Restore MySQL Logical BackupThe ability to restore MySQL logical backups is a significant part of disaster recovery procedures. It’s a last line of defense.

Even if you lost all data from a production server, physical backups (data files snapshot created with an offline copy or with Percona XtraBackup) could show the same internal database structure corruption as in production data. Backups in a simple plain text format allow you to avoid such corruptions and migrate between database formats (e.g., during a software upgrade and downgrade), or even help with migration from completely different database solution.

Unfortunately, the restore speed for logical backups is usually bad, and for a big database it could require days or even weeks to get data back. Thus it’s important to tune backups and MySQL for the fastest data restore and change settings back before production operations.

Disclaimer

All results are specific to my combination of hardware and dataset, but could be used as an illustration for MySQL database tuning procedures related to logical backup restore.

Benchmark

There is no general advice for tuning a MySQL database for a bulk logical backup load, and any parameter should be verified with a test on your hardware and database. In this article, we will explore some variables that help that process. To illustrate the tuning procedure, I’ve downloaded IMDB CSV files and created a MySQL database with pyimdb.

You may repeat the whole benchmark procedure, or just look at settings changed and resulting times.

Database:

  • 16GB – InnoDB database size
  • 6.6GB – uncompressed mysqldump sql
  • 5.8GB – uncompressed CSV + create table statements.

The simplest restore procedure for logical backups created by the mysqldump tool:

mysql -e 'create database imdb;'
time mysql imdb < imdb.sql
# real 129m51.389s

This requires slightly more than two hours to restore the backup into the MySQL instance started with default settings.

I’m using the Docker image percona:latest – it contains Percona Server 5.7.20-19 running on a laptop with 16GB RAM, Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, two disks: SSD KINGSTON RBU-SNS and HDD HGST HTS721010A9.

Let’s start with some “good” settings: buffer pool bigger than default, 2x1GB transaction log files, disable sync (because we are using slow HDD), and set big values for IO capacity,
the load should be faster with big batches thus use 1GB for max_allowed_packet.

Values were chosen to be bigger than the default MySQL parameters because I’m trying to see the difference between the usually suggested values (like 80% of RAM should belong to InnoDB buffer pool).

docker run --publish-all --name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
  --innodb_buffer_pool_size=4GB
  --innodb_log_file_size=1G
  --skip-log-bin
  --innodb_flush_log_at_trx_commit=0
  --innodb_flush_method=nosync
  --innodb_io_capacity=2000
  --innodb_io_capacity_max=3000
  --max_allowed_packet=1G
  time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
  # real 59m34.252s

The load is IO bounded, and there is no reaction on set global foreign_key_checks=0 and unique_checks=0 because these variables are already disabled in the dump file.

How can we reduce IO?

Disable InnoDB double write: --innodb_doublewrite=0

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 44m49.963s

A huge improvement, but we still have an IO-bounded load.

We will not be able to improve load time significantly for IO bounded load. Let’s move to SSD:

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 33m36.975s

Is it vital to disable disk sync for the InnoDB transaction log?

sudo rm -rf mysql/*
docker rm p57
docker run -v /home/ihanick/Private/Src/tmp/data-movies/imdb.sql:/root/imdb.sql -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
--innodb_buffer_pool_size=4GB
--innodb_log_file_size=1G
--skip-log-bin
--innodb_flush_log_at_trx_commit=0
--innodb_io_capacity=700
--innodb_io_capacity_max=1500
--max_allowed_packet=1G
--innodb_doublewrite=0
# real 33m49.724s

There is no significant difference.

By default, mysqldump produces SQL data, but it could also save data to CSV format:

cd /var/lib/mysql-files
mkdir imdb
chown mysql:mysql imdb/
time mysqldump --max_allowed_packet=128M --tab /var/lib/mysql-files/imdb imdb1
# real 1m45.983s
sudo rm -rf mysql/*
docker rm p57
docker run -v /srv/ihanick/tmp/imdb:/var/lib/mysql-files/imdb -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
--innodb_buffer_pool_size=4GB
--innodb_log_file_size=1G
--skip-log-bin
--innodb_flush_log_at_trx_commit=0
--innodb_io_capacity=700
--innodb_io_capacity_max=1500
--max_allowed_packet=1G
--innodb_doublewrite=0
time (
mysql -e 'drop database imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1 ;
for i in $PWD/*.txt ; do mysqlimport imdb1 $i ; done
)
# real 21m56.049s
1.5X faster, just because of changing the format from SQL to CSV!

We’re still using only one CPU core, let’s improve the load with the –use-threads=4 option:

time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
mysqlimport --use-threads=4 imdb1 $PWD/*.txt
)
# real 15m38.147s

In the end, the load is still not fully parallel due to a big table: all other tables are loaded, but one thread is still active.

Let’s split CSV files into smaller ones. For example, 100k rows in each file and load with GNU/parallel:

# /var/lib/mysql-files/imdb/test-restore.sh
apt-get update ; apt-get install -y parallel
cd /var/lib/mysql-files/imdb
time (
cd split1
for i in ../*.txt ; do echo $i ; split -a 6 -l 100000 -- $i `basename $i .txt`. ; done
for i in `ls *.*|sed 's/^[^.]+.//'|sort -u` ; do
mkdir ../split-$i
for j in *.$i ; do mv $j ../split-$i/${j/$i/txt} ; done
done
)
# real 2m26.566s
time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
parallel 'mysqlimport imdb1 /var/lib/mysql-files/imdb/{}/*.txt' ::: split-*
)
#real 16m50.314s

Split is not free, but you can split your dump files right after backup.

The load is parallel now, but the single big table strikes back with ‘setting auto-inc lock’ in SHOW ENGINE INNODB STATUSG

Using the --innodb_autoinc_lock_mode=2 option fixes this issue: 16m2.567s.

We got slightly better results with just mysqlimport --use-threads=4. Let’s check if hyperthreading helps and if the problem caused by “parallel” tool:

  • Using four parallel jobs for load: 17m3.662s
  • Using four parallel jobs for load and two threads: 16m4.218s

There is no difference between GNU/Parallel and --use-threads option of mysqlimport.

Why 100k rows? With 500k rows: 15m33.258s

Now we have performance better than for mysqlimport --use-threads=4.

How about 1M rows at once? Just 16m52.357s.

I see periodic flushing logs message with bigger transaction logs (2x4GB): 12m18.160s:

--innodb_buffer_pool_size=4GB --innodb_log_file_size=4G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2 --performance-schema=0

Let’s compare the number with myloader 0.6.1 also running with four threads (myloader have only -d parameter, myloader execution time is under corresponding mydumper command):

# oversized statement size to get 0.5M rows in one statement, single statement per chunk file
mydumper -B imdb1 --no-locks --rows 500000 --statement-size 536870912 -o 500kRows512MBstatement
17m59.866s
mydumper -B imdb1 --no-locks -o default_options
17m15.175s
mydumper -B imdb1 --no-locks --chunk-filesize 128 -o chunk128MB
16m36.878s
mydumper -B imdb1 --no-locks --chunk-filesize 64 -o chunk64MB
18m15.266s

It will be great to test mydumper with CSV format, but unfortunately, it wasn’t implemented in the last 1.5 years: https://bugs.launchpad.net/mydumper/+bug/1640550.

Returning back to parallel CSV files load, even bigger transaction logs 2x8GB: 11m15.132s.

What about a bigger buffer pool: --innodb_buffer_pool_size=12G? 9m41.519s

Let’s check six-year-old server-grade hardware: Intel(R) Xeon(R) CPU E5-2430 with SAS raid (used only for single SQL file restore test) and NVMe (Intel Corporation PCIe Data Center SSD, used for all other tests).

I’m using similar options as for previous tests, with 100k rows split for CSV files load:

--innodb_buffer_pool_size=8GB --innodb_log_file_size=8G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2

  • Single SQL file created by mysqldump loaded for 117m29.062s = 2x slower.
  • 24 parallel processes of mysqlimport: 11m51.718s
  • Again hyperthreading making a huge difference! 12 parallel jobs: 18m3.699s.
  • Due to higher concurrency, adaptive hash index is a reason for locking contention. After disabling it with --skip-innodb_adaptive_hash_index: 10m52.788s.
  • In many places, disable unique checks referred as a performance booster: 10m52.489s
    You can spend more time reading advice about unique_checks, but it might help for some databases with many unique indexes (in addition to primary one).
  • The buffer pool is smaller than the dataset, can you change old/new pages split to make insert faster? No: --innodb_old_blocks_pct=5 : 10m59.517s.
  • O_DIRECT is also recommended: --innodb_flush_method=O_DIRECT: 11m1.742s.
  • O_DIRECT is not able to improve performance by itself, but if you can use a bigger buffer pool: O_DIRECT + 30% bigger buffer pool: --innodb_buffeer_pool_size=11G: 10m46.716s.

Conclusions

  • There is no common solution to improve logical backup restore procedure.
  • If you have IO-bounded restore: disable InnoDB double write. It’s safe because even if the database crashes during restore, you can restart the operation.
  • Do not use SQL dumps for databases > 5-10GB. CSV files are much faster for mysqldump+mysql. Implement mysqldump --tabs+mysqlimport or use mydumper/myloader with appropriate chunk-filesize.
  • The number of rows per load data infile batch is important. Usually 100K-1M, use binary search (2-3 iterations) to find a good value for your dataset.
  • InnoDB log file size and buffer pool size are really important options for backup restore performance.
  • O_DIRECT reduces insert speed, but it’s good if you can increase the buffer pool size.
  • If you have enough RAM or SSD, the restore procedure is limited by CPU. Use a faster CPU (higher frequency, turboboost).
  • Hyperthreading also counts.
  • A powerful server could be slower than your laptop (12×2.4GHz vs. 4×2.8+turboboost).
  • Even with modern hardware, it’s hard to expect backup restore faster than 50MBps (for the final size of InnoDB database).
  • You can find a lot of different advice on how to improve backup load speed. Unfortunately, it’s not possible to implement improvements blindly, and you should know the limits of your system with general Unix performance tools like vmstat, iostat and various MySQL commands like SHOW ENGINE INNODB STATUS (all can be collected together with pt-stalk).
  • Percona Monitoring and Management (PMM) also provides good graphs, but you should be careful with QAN: full slow query log during logical database dump restore can cause significant processing load.
  • Default MySQL settings could cost you 10x backup restore slowdown
  • This benchmark is aimed at speeding up the restore procedure while the application is not running and the server is not used in production. Make sure that you have reverted all configuration parameters back to production values after load. For example, if you disable the InnoDB double write buffer during restore and left it enabled in production, you may have scary data corruption due to partial InnoDB pages writes.
  • If the application is running during restore, in most cases you will get an inconsistent database due to missing support for locking or correct transactions for restore methods (discussed above).
Feb
22
2018
--

Percona Live 2018 Featured Talk – Scaling a High-Traffic Database: Moving Tables Across Clusters with Bryana Knight

Percona Live 2018 Featured Talk

Percona Live 2018 Featured TalkWelcome to the first interview blog for the upcoming Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Bryana Knight, Platform Engineer at GitHub. Her talk is titled Scaling a High-Traffic Database: Moving Tables Across Clusters. Facing an immediate need to distribute load, GitHub came up with creative ways to move a significant amount of traffic off of their main MySQL cluster – with no user impact. In our conversation, we discussed how Bryana and GitHub solved some of these issues:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Bryana: I started at GitHub as a full-stack engineer working on a new business offering, and was then shortly offered the opportunity to transition to the database services team. Our priorities back then included reviewing every single database migration for GItHub.com. Having spent my whole career as a full-stack engineer, I had to level-up pretty quickly on MySQL, data modeling, data access patterns – basically everything databases. I spent the first few months learning our schema and setup through lots of reading, mentorship from other members of my team, reviewing migrations for most of our tables, and asking a million questions.

Originally, my team spent a lot of time addressing immediate performance concerns. Then we started partnering with product engineering teams to build out the backends for new features. Now we are focused on the longterm scalability and availability of our database, stemming from how we access it. I work right between our DBA’s and our product and API engineers.

Percona: Your talk is titled “Scaling a High-Traffic Database: Moving Tables Across Clusters”. What were the challenges GitHub faced that required redistributing your tables?

Bryana GitHubBryana: This biggest part of the GitHub codebase is an 8-year-old monolith. As a company, we’ve been fortunate enough to see a huge amount of user growth since the company started. User growth means data growth. The schema and setup that worked for GitHub early on, and very much allowed GitHub to get to where it is today with tons of features and an extremely robust API, is not necessarily the right schema and setup for the size GitHub is today. 

We were seeing that higher than “normal” load was starting to have a more noticeable effect. The monolith aspect of our database, organic growth, plus inefficiencies in our code base were putting a lot of pressure on the master of our primary database cluster, which held our most core tables (think users, repos, permissions). From the database perspective, this meant contention, locking, and replica lag. From the user’s perspective, this meant anything from longer page loads to delays in UI updates and notifications, to timeouts. 

Percona: What were some of the other options you looked at (if any)?

Bryana: Moving tables out of our main cluster was not the only action we took to alleviate some of the pressure in our database. However, it was the highest impact change we could make in the medium-term to give us the breathing room we needed and improve performance and availability. We also prioritized efforts around moving more reads to replicas and off the master, throttling more writes where possible, index improvements and query optimizations. Moving these tables gave us the opportunity to start thinking more long-term about how we can store and access our data differently to allow us to scale horizontally while maintaining our healthy pace of feature development.

Percona: What were the issues that needed to be worked out between the different teams you mention in your description? How did they impact the project?

Bryana: Moving tables out of our main database required collaboration between multiple teams. The team I’m on, database-services, was responsible for coming up with the strategy to move tables without user impact, writing the code to handle query isolation and routing, connection switching, backgrounding writes, and so on. Our database-infrastructure team determined where the tables we were moving should go (new cluster or existing), setup the clusters, and advised us on how to safely copy the data. In some cases, we were able to use MySQL replication. When that wasn’t possible, they weighed in on other options. 

We worked with production engineers to isolate data access to these tables and safely split JOINs with other tables. Everybody needed to be sure we weren’t affecting performance and user experience when doing this. We discussed with our support team the risk of what we were doing. Then we worked with them to determine if we should preemptively status yellow when there was a higher risk of user impact. During the actual cut-overs, representatives from all these groups would get on a war-room-like video call and “push the button”, and we always made sure to have a roll-out and roll-back plan. 

Percona: Why should people attend your talk? What do you hope people will take away from it?

Bryana: In terms of database performance, there are a lot of little things you can do immediately to try and make improvements: things like adding indexes, tweaking queries, and denormalizing data. There are also more drastic, architectural changes you can pursue, that many companies need to do when they get to certain scale. The topic of this talk is a valid strategy that fits between these two extremes. It relieved some ongoing performance problems and availability risk, while giving us some breathing room to think long term. I think other applications and databases might be in a similar situation and this could work for them. 

Percona: What are you looking forward to at Percona Live (besides your talk)?

This is actually the first time I’m attending a Percona Live conference. I’m hoping to learn from some of the talks around scaling a high traffic database and sharding. I’m also looking forward to seeing some talks from the wonderful folks on GitHub database-infrastructure team.

Want to find out more about this Percona Live 2018 featured talk, and Bryana and GitHub’s migration? Register for Percona Live 2018, and see her talk Scaling a High-Traffic Database: Moving Tables Across Clusters. Register now to get the best price!

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Feb
12
2018
--

Webinar Thursday, February 15, 2018: Basic Internal Troubleshooting Tools for MySQL Server

Troubleshooting Tools for MySQL

Troubleshooting Tools for MySQLPlease join Percona’s Principal Support Engineer, Sveta Smirnova, as she presents “Basic Internal Troubleshooting Tools for MySQL Server” on Thursday, February 15, 2018, at 10:00 am PST (UTC-8) / 1:00 pm EST (UTC-5).

 

MySQL Server has many built-in troubleshooting tools. They are always available and can provide many insights on what is happening internally. Many graphical tools, such as Percona Monitoring and Management (PMM), use built-ins to get data for their nice graphs.

Even if you are only going to use graphical tools, it is always good to know what data they can collect. This way, you can see their limitations and won’t have incorrect expectations in the heat of battle. Built-in troubleshooting tools are accessible via SQL commands. Most of them are standard across the server, but details are component-specific.

In this webinar, I will discuss how to use them, how to troubleshoot component-specific issues and how to find additional information. I will cover SHOW commands, Information Schema, status variables and few component-specific syntaxes. I will NOT cover Performance Schema (there will be a separate webinar on that), and I will use PMM graphs to illustrate the topics whenever possible.

Register for the Basic Internal Troubleshooting Tools for MySQL Server webinar now.

Internal Troubleshooting for MySQLSveta Smirnova, Principal Technical Services Engineer

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can solve typical issues quicker. She likes teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

Feb
08
2018
--

Fsync Performance on Storage Devices

fsync

fsync performanceWhile preparing a post on the design of ZFS based servers for use with MySQL, I stumbled on the topic of fsync call performance. The fsync call is very expensive, but it is essential to databases as it allows for durability (the “D” of the ACID acronym).

Let’s first review the type of disk IO operations executed by InnoDB in MySQL. I’ll assume the default InnoDB variable values.

The first and most obvious type of IO are pages reads and writes from the tablespaces. The pages are most often read one at a time, as 16KB random read operations. Writes to the tablespaces are also typically 16KB random operations, but they are done in batches. After every batch, fsync is called on the tablespace file handle.

To avoid partially written pages in the tablespaces (a source of data corruption), InnoDB performs a doublewrite. During a doublewrite operation, a batch of dirty pages, from 1 to about 100 pages, is first written sequentially to the doublewrite buffer and fsynced. The doublewrite buffer is a fixed area of the ibdata1 file, or a specific file with the latest Percona Server for MySQL 5.7. Only then do the writes to the tablespaces of the previous paragraph occur.

That leaves us with the writes to the InnoDB log files. During those writes, the transaction information — a kind of binary diff of the affected pages — is written to the log files and then the log file is fsynced. The duration of the fsync call can be a major contributor to the COMMIT latency.

Because the fsync call takes time, it greatly affects the performance of MySQL. Because of this, you probably noticed there are many status variables that relate to fsyncs. To overcome the inherent limitations of the storage devices, group commit allows multiple simultaneous transactions to fsync the log file once for all the transactions waiting for the fsync. There is no need for a transaction to call fsync for a write operation that another transaction already forced to disk. A series of write transactions sent over a single database connection cannot benefit from group commit.

Fsync Results

In order to evaluate the fsync performance, I used the following Python script:

#!/usr/bin/python
import os, sys, mmap
# Open a file
fd = os.open( "testfile", os.O_RDWR|os.O_CREAT|os.O_DIRECT )
m = mmap.mmap(-1, 512)
for i in range (1,1000):
   os.lseek(fd,os.SEEK_SET,0)
   m[1] = "1"
   os.write(fd, m)
   os.fsync(fd)
# Close opened file
os.close( fd )

The script opens a file with the O_DIRECT flag, writes and fsyncs it 1000 times and close the file. I added O_DIRECT after an internal discussion with my colleagues, but it doesn’t change the results and it doesn’t remove the need for calling fsync. We’ll discuss in more detail the impacts of O_DIRECT after we reviewed the results. The script is called with the time command like below:

root@lab:/tmp/testfsync# time python /root/fsync.py
real 0m18.320s
user 0m0.060s
sys 0m0.096s

In the above example using a 7.2k rpm drive, the fsync rate is about 56/s for a latency of 18ms. A 7.2k RPM drive performs 120 rotations per second. On average, the fsyncs require a bit more than two rotations to complete. The filesystem appears to make very little differences: ext4 and XFS show similar results. That means if MySQL uses such storage devices for the InnoDB log files, the latency of each transaction is at least 18ms. If the application workload requires 200 write transactions per second, they’ll need to be executed using at least four database connections.

So, let’s begin with rotational devices. These are becoming a bit less common now with databases, especially without a raid controller. I could only find a few.

Drive RPM Rate Latency Notes
WDC WD2500BJKT 5400 22/s 45 ms Laptop SATA from 2009
ST2000LM003 5400 15/s 66 ms USB-3 portable drive
ST3750528AS 7200 40/s 25 ms Desktop grade SATA
WD2502ABYS-18B7A0 7200 56/s 18 ms Desktop grade SATA
HUA723020ALA641 7200 50/s 20 ms Enterprise grade SATA, md mirror
Dell SAS unknown 7200 58/s 17 ms Behind Perc ctrl but no write cache
HDWE150 7200 43/s 23 ms Recent Desktop grade SATA, 5TB

 

I unfortunately didn’t have access to any 10k or 15k RPM drives that were not behind a raid controller with a write cache. If you have access to such drives, run the above script a few times and send me your results, that would help create a more complete picture! So, we can see a correlation between the rotational speed and the fsync rate, which makes sense. The faster a disk turns, the faster it can fsync. The fsync call saves the data and then updates the metadata. Hence, the heads need to move. That’s probably the main explanation for the remaining disparity. A good point, all drives appears to be fully complying with the SATA flush command even though they all have an enabled write cache. Disabling the drives write caches made no difference.

With the above number, the possible transaction rates in fully ACID mode is pretty depressing. But those drives were rotating ones, what about SSD drives? SSD are memory devices and are much faster for random IO operations. There are extremely fast for reads, and good for writes. But as you will see below, not that great for fsyncs.

Drive rate latency notes
SAMSUNG MZ7LN512 160/s 6.3ms Consumer grade SATA
Crucial_CT480M500SSD1 108/s 9.3ms Consumer grade SATA
Intel 520 2031/s 0.49ms Consumer grade SATA
SAMSUNG MZVPV512HDGL 104/s 9.6ms Consumer grade NVMe
Samsung SSD 960 PRO 267/s 3.8ms High-end consumer grade NVMe
Intel PC-3100 1274/s 0.79ms Low-end consumer grade NVMe (cheat?)
Intel 750 2038/s 0.49ms High-end consumer grade NVMe
Intel PC-3700 7380/s 0.14ms High-end enterprise-grade NVMe

 

Again, this is a small sample of the devices I have access to. All SSD/Flash have write caches, but only the high-end devices have capacitors to flush their write cache to the flash with a loss of power. The PC-3100 device is actually in my home server, and it is obviously cheating. If you look at the card specs on the Intel website, it doesn’t have the “Enhanced Power Loss Data Protection” and “End-to-End Data Protection” features. The much more expansive PC-3700 does. I use the PC-3100 as a ZFS L2ARC device, so I am good. In general, the performance of a flash device varies a bit more than rotational devices, since factors like the number of recent writes and the filling factor come into play.

Even when using a high-end NVMe device like the PC-3700, you can’t reach 10k fully ACID transactions per second at low thread concurrency. How do you reach the higher levels? The answer here is the good old raid controller with a protected write cache. The write cache is basically using DRAM memory protected from power loss by a battery. SAN controllers have similar caches. The writes to the InnoDB log files are sequential writes interleaved with fsyncs. The raid controller concatenates the sequential writes, eventually writing one big chunk on disk and… ignoring the fsyncs. Here’s the result from the only device I had access to:

Drive rate latency notes
Dell Perc with BBU 23000/s 0.04ms Array of 7.2k rpm drives

 

That’s extremely fast but, of course, it is memory. I modified the script to loop 10k times instead of 1k. In theory, something a single slave thread doing simple transactions could reach a rate of 20k/s or more while being fully ACID.

Discussion

We must always consider the results we got in the previous section in the context of a given application. For example, a server using an Intel PC-3700 NVMe card can do more than 7000 fully ACID transactions per second even if it is fully durable provided those transactions are issued by a sufficient number of threads. Adding threads will not allow scaling infinitely. At some point, other bottlenecks like mutex contention or page flushing will dominate.

We often say that Galera-based cluster solutions like Percona XtraDB Cluster (PXC) add latency to the transactions, since it involves communication over the network. With the Galera protocol, a commit operation returns only when all the nodes have received the data. Thus, tt is a good practice to relax the local durability and use innodb_flush_log_at_trx_commit set to 0 or 2. On a local network, the ping time is always below 1ms and often below 0.1ms. As a result, the transaction latency is often smaller.

About fdatasync

The fsync system is not the only system call that persists data to disk. There is also the fdatasync call. fdatasync persists the data to disk but does not update the metadata information like the file size and last update time. Said otherwise, it performs one write operation instead of two. In the Python script, if I replace os.fsync with os.fdatasync, here are the results for a subset of devices:

Drive rpm rate latency notes
ST2000LM003 5400 72/s 13 ms USB-3 portable drive
WD2502ABYS-18B7A0 7200 118/s 8.5 ms Desktop grade SATA
SAMSUNG MZ7LN512 N/A 333/s 3.0ms Consumer grade SATA
Crucial_CT480M500SSD1 N/A 213/s 4.7ms Consumer grade SATA
Samsung SSD 960 PRO N/A 714/s 1.4ms High-end consumer grade NVMe

 

In all cases, the resulting rates have more than doubled. The fdatasync call has a troubled history, as there were issues with it many years ago. Because of those issues, InnoDB never uses fdatasync, only fsyncs. You can find the following comments in the InnoDB os/os0file.cc:

/* We let O_SYNC only affect log files; note that we map O_DSYNC to
O_SYNC because the datasync options seemed to corrupt files in 2001
in both Linux and Solaris */

2001 is a long time ago. Given the above results, maybe we should reconsider the use of fdatasync. From the Linux main page on fdatasync, you find:

fdatasync() is similar to fsync(), but does not flush modified
metadata unless that metadata is needed in order to allow a
subsequent data retrieval to be correctly handled. For example,
changes to st_atime or st_mtime (respectively, time of last
access and time of last modification; see stat(2)) do not require
flushing because they are not necessary for a subsequent data
read to be handled correctly. On the other hand, a change to
the file size (st_size, as made by say ftruncate(2)), would
require a metadata flush.

So, even with fdatasync, operations like extending an InnoDB tablespace will update the metadata correctly. This appears to be an interesting low-hanging fruit in term of MySQL performance. In fact, webscalesql already have fdatasync available

O_DIRECT

Why do we need a fsync or fdatasync with O_DIRECT? With O_DIRECT, the OS is not buffering anything along the way. So the data should be persisted right? Actually, the OS is not buffering but the device very likely is. Here are a few results to highlight the point using a 7.2k rpm SATA drive:

Test rate latency
O_DIRECT, drive Write cache enabled 4651/s 0.22ms
O_DIRECT, drive Write cache disabled 101/s 9.9ms
ASYNC + fdatasync, Write cache enabled 119/s 8.4ms
ASYNC + fdatasync, Write cache disabled 117/s 8.5ms

 

The drive write cache was enabled/disabled using the hdparm command. Clearly, there’s no way the drive can persist 4651 writes per second. O_DIRECT doesn’t send the SATA flush command to the disk, so we are only writing to the drive write cache. If the drive write cache is disabled, the rate falls to a more reasonable value of 101/s. What is interesting — and I don’t really understand why — is that opening the file in async mode and performing fdatasync is significantly faster. As expected, the presence of the drive write cache has no impacts on ASYNC + fdatasync. When the fdatasync call occurs, the data is still in the OS file cache.

If you want to use only O_DIRECT, you should make sure all the storage write caches are crash safe. That’s why MySQL adds a fsync call after a write to a file opened with O_DIRECT.

ZFS

These days, I find it difficult to write a blog post without talking about ZFS. How does ZFS handles fsyncs and fdatasyncs? ZFS, like a database, performs write ahead logging in the ZIL. That means calls like fsync and fdatasync return when the data has been persisted to the ZIL, and not to the actual filesystem. The real write operation is done a few seconds later by a background thread. That means the added write for the metadata does not impact performance right away. My home server uses ZFS over a pair of 7.2k RPM drive and doesn’t have a SLOG device. The ZIL is thus stored on the 7.2k RPM drives. The results are the following:

Drive rpm rate latency
ZFS fsync 7200 104/s 9.6 ms
ZFS fdatasync 7200 107/s 9.3 ms

 

Remember that with ZFS, you need to disable the O_DIRECT mode. The fdatasync rate appears to be slightly faster, but it is not really significant. With ZFS, the fsync/fdatasync performance relates to where the ZIL is stored. If there is no SLOG device, the ZIL is stored with the data and thus, the persitence performance of the devices used for the data matter. If there is a SLOG device, the persistence performance is governed by the specs of the device(s) on which the SLOG is located. That’s a very important aspect we have to consider when designing a MySQL server that will use ZFS. The design of such server will be discussed in more details in a future post.

Feb
08
2018
--

Tutorial Schedule for Percona Live 2018 Is Live

Percona Live 2018

Percona Live 2018Percona has revealed the line-up of in-depth tutorials for the Percona Live 2018 Open Source Database Conference, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, Calif. Secure your spot now with Advanced Registration prices (available until March 4, 2018). Sponsorship opportunities for the conference are still available.

Percona Live 2018 Open Source Database Conference is the premier open source database event. The theme for the upcoming conference is “Championing Open Source Databases,” with a range of topics on MySQL, MongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations and Business/Case Studies. 

Tutorials take place throughout the day on April 23, 2018. Tutorials provide practical, in-depth knowledge of critical open source database issues. Topics include:

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference will be held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!

Sponsorships

Sponsorship opportunities for Percona Live 2018 Open Source Database Conference are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Continuent, VividCortex
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – SolarWinds, TwinDB, Yelp
  • Media Sponsors – Datanami, EnterpriseTech, HPCWire, ODBMS.org
Feb
07
2018
--

ProxySQL Query Cache: What It Is, How It Works

ProxySQL query cache

In this blog post, I’ll present the ProxySQL query cache functionality. This is a query caching mechanism on top of ProxySQL. As there are already many how-tos regarding the ProxySQL prerequisites and installation process, we are going to skip these steps. For those who are already familiar with ProxySQL query cache configuration, let’s go directly to the query rules and the performance results.

Before talking about the ProxySQL query cache, let’s take a look at other caching mechanisms available for MySQL environments.

MySQL query cache is a query caching mechanism – deprecated as of MySQL 5.7.20 and removed in MySQL 8.0 – on top of MySQL itself (based on the official MySQL documentation).

The MySQL query cache stores the text of a SELECT statement together with the corresponding result sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.

Although MySQL query cache is supposed to improve performance, there are cases where MySQL query cache is not scaling well and can degrade performance due to its locking and invalidation algorithms.

You can find a really interesting post regarding MySQL query cache here.

There is also another method to cache results in a MySQL environment. It’s external caching (i.e., Memcached, Redis, etc.), but this also has some drawbacks. Introducing such a mechanism requires some changes on the application side.

But what is ProxySQL query cache if there are already the MySQL query cache and other external caching mechanisms? At the moment, although we’ve done some tests, we are not going to compare ProxySQL query cache performance against other caching mechanisms. We’ll address this in a future blog post. We will only focus on ProxySQL itself.

What is ProxySQL Query Cache

ProxySQL query cache is an in-memory key-value storage that uses:

  • as key: a combination of username, schema and query text. It is a hash derived from username, schema name and the query itself. Combining these ensures that users access only their resultsets and for the correct schema.
  • as value: the resultset returned by the backend (mysqld or another proxy).

There is some more metadata stored for each resultset:

  • length: length of the resultset
  • expire_ms: defines when the entry will expire
  • access_ms: records the last time an entry was accessed
  • ref_count: a reference count to identify resultset currently in use

Based on the configuration, the resultsets are cached on the wire while queries are executed, and the resultset is returned to the application. If the application re-executes the same query within the time slot defined by “expire_ms”, the resultset is returned by the embedded ProxySQL query cache.

The only way to invalidate entries from the ProxySQL query cache is through a time-to-live in milliseconds. This is in contrast to MySQL query cache, where the query cache gets invalidated each time a table gets updated. At the moment, it is only possible to tune the total amount of memory used by the query cache, using the variable “mysql-query_cache_size_MB”. The current implementation of mysql-query_cache_size_MB doesn’t impose a hard limit. Instead, it is used as an argument by the purging thread.

It’s obvious that it’s not easy to directly compare these two cache mechanisms, as each of them has its own way to invalidate results. Please also note a significant difference between MySQL and ProxySQL when query cache is enabled. ProxySQL query cache may serve stale resultsets due to the way it invalidates cached data (cached data are valid for the interval specified by “cache_ttl”, while MySQL’s cached data get invalidated each time data change). Every query that is cached may return stale data, and this may or may not be acceptable by the application.

How it Works

Before doing any benchmarks, I will try to give you a short description of how ProxySQL query cache gets enabled. Unlike MySQL query cache, where a common caching space exists for all tables, in ProxySQL query cache we have to define what traffic gets cached. This is done by defining query rules that match traffic that is going to be cached and setting a “cache_ttl” for the cached results. There are many ways to define matches for incoming traffic, either by query or digest using patterns. All we need to cache the resultset is to define the matching criteria and the TTL. If a query passed the matching criteria, the resultset is cached so the next requests are served directly from the ProxySQL instance instead of querying the hostgroup DB nodes (if cache_ttl has not expired).

Let’s use an example to make it clear how ProxySQL query cache is enabled.

In our setup we have three backend DB servers in a master-slave topology, with Percona Server for MySQL 5.7 and a ProxySQL ver. 1.4.3 instance with sysbench 1.0 installed. Backend servers are within the same reader hostgroup, and traffic is balanced among these servers using the same priority.

As I’ve already said, we won’t look at the ProxySQL installation. There are many topologies you can implement: deploying ProxySQL on each application server thus removing the “single point of failure” weakness, for example. But in our case, we will just present the ProxySQL query cache having a single instance. In general, you would expect to have better performance with the ProxySQL instance closer to the application.

Configuration

With the ProxySQL instance up and running, let’s confirm that all servers are OK. Querying ProxySQL admin shows that all servers are ONLINE:

Admin> select * from mysql_servers;
+--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname  | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 2        	| 10.0.2.12 | 3306 | ONLINE | 1  	| 0       	| 1000        	| 0               	| 0   	| 0          	|     	|
| 1        	| 10.0.2.11 | 3306 | ONLINE | 1  	| 0       	| 1000        	| 0               	| 0   	| 0          	|     	|
| 2        	| 10.0.2.13 | 3306 | ONLINE | 1  	| 0       	| 1000            | 0               	| 0   	| 0          	|     	|
| 2        	| 10.0.2.11 | 3306 | ONLINE | 1  	| 0       	| 1000        	| 0               	| 0   	| 0          	|     	|
+--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+

As you can see, there are two hostgroups: the “1” used for WRITES and the “2” used for READS.

Some random connects proves that traffic is correctly routed to the DB backends:

[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname"
+--------------+
| @@hostname   |
+--------------+
| db1-atsaloux |
+--------------+
[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname"
+--------------+
| @@hostname   |
+--------------+
| db2-atsaloux |
+--------------+
[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname"
+--------------+
| @@hostname   |
+--------------+
| db1-atsaloux |
+--------------+
[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname"
+--------------+
| @@hostname   |
+--------------+
| db3-atsaloux |
+--------------+

Let’s first take a look at some statistics. Before using sysbench, the “stats_mysql_query_digest” table (where digests are stored) is empty:

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC;
Empty set (0.00 sec)

The “stats_mysql_query_digest: table contains statistics related to the queries routed through the ProxySQL server. How many times each query was executed and the total execution time are two of the several provided statistics.

Before doing any benchmarks, I had to create some data. The following sysbench commands were used for selects by PK or by RANGE. For simplicity, we are not going to execute benchmarks inside transactions — although ProxySQL query cache is effective.

--threads

 will be adjusted for each benchmark:

sysbench --threads=16 --max-requests=0 --time=60 --mysql-user=percona --mysql-password=percona --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=1000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=1 --oltp-simple-ranges=0 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run
sysbench --threads=16 --max-requests=0 --time=60 --mysql-user=percona --mysql-password=percona --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=1000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=0 --oltp-simple-ranges=1 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run

Before running the full benchmark, a simple sysbench was run to get the queries digests that are used for the ProxySQL query cache configuration.

After running the first benchmark with ProxySQL query cache disabled, I queried the “stats_mysql_query_digest” table again and got the following results where it logs all executed queries.

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC;
+------------+-------------+-----------+--------------------+------------------------------------------------+
| count_star | sum_time    | hostgroup | digest             | digest_text                                    |
+------------+-------------+-----------+--------------------+------------------------------------------------+
| 301536     | 20962929791 | 2         | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=?               |
| 3269       | 30200073    | 2         | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
+------------+-------------+-----------+--------------------+------------------------------------------------+
2 row in set (0.01 sec)

Add mysql_query_rules To Be Cached

Output above provides all the needed information in order to enable ProxySQL query cache. What we need to do now add the query rules that match the results that should be cached. In this case we use a matching pattern criteria and a cache_ttl of 5000ms. Taking this into consideration, we added the following rules:

Admin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (1,1,'0xBF001A0C13781C1D',5000,1);
Query OK, 1 row affected (0.00 sec)
Admin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (2,1,'0x290B92FD743826DA',5000,1);
Query OK, 1 row affected (0.00 sec)

We shouldn’t forget that we must load query rules at runtime. If we don’t want to lose these rules (i.e., after a ProxySQL restart), we should also save to disk:

Admin> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Now let’s reset the stats_mysql_query_digest results:

Admin> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1; -- we reset the counters
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)
----------

And re-run the benchmarks with query cache enabled. To confirm what traffic was cached, we have to query the stats_mysql_query_digest once again:

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC;
+------------+------------+-----------+--------------------+------------------------------------------------+
| count_star | sum_time   | hostgroup | digest             | digest_text                                    |
+------------+------------+-----------+--------------------+------------------------------------------------+
| 108681     | 6304585632 | 2         | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
| 343277     | 0          | -1        | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
+------------+------------+-----------+--------------------+------------------------------------------------+
2 rows in set (0.00 sec)

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC;
+------------+-----------+-----------+--------------------+----------------------------------+
| count_star | sum_time  | hostgroup | digest             | digest_text                      |
+------------+-----------+-----------+--------------------+----------------------------------+
| 79629      | 857050510 | 2         | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? |
| 441194     | 0         | -1        | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? |
+------------+-----------+-----------+--------------------+----------------------------------+
2 rows in set (0.00 sec)

Cached queries are the ones marked with a special hostgroup -1 (this means that these queries were not sent to any hostgroup), and the total execution time for the queries cached is 0 (this means that the request was served within the same events loop).

Below you can see the benchmark results. Let’s look at what happens for selects by PK and selects by RANGE:

ProxySQL query cache

 

ProxySQL query cache

Points of Interest

  • In all cases, when threads and backend servers are increasing, ProxySQL performs better. This is achieved due to it’s connection pooling and multiplexing capabilities.
  • Enabling ProxySQL query cache provides a significant performance boost.
  • ProxySQL query cache can achieve a ~2X performance boost at a minimum.
  • This boost can be considerably valuable in cases where MySQL performance may fall to 50% (i.e., select by RANGE).
  • We shouldn’t forget that results are affected by hardware specs as well, but it’s obvious that ProxySQL with query cache enabled gives a really high throughput.

ProxySQL Query Cache Limitations

Current known limitations:

  • It is not possible to define query cache invalidation other than with cache_ttl.
  • There is no way to enforce query cache purge.
  • mysql-query_cache_size_MB is not strictly enforced, but only used as a metric to trigger automatic purging of expired entries.
  • Although access_ms is recorded, it is not used as a metric to expire an unused metric when mysql-query_cache_size_MB is achieved.
  • Query cache does not support prepared statements.
  • Query cache may serve stale data.

Conclusion

ProxySQL is generally a very powerful and easy-to-use tool.

With regards to query cache, it seems to scale very well and achieve a significant performance boost. Although having complex configs (not only for query cache) can add some extra overhead, it’s easy to maintain.

cache_ttl can be a limitation, but if it’s correctly configured in conjunction with max_replication_lag when configuring nodes in hostgroups, it does not add any significant drawback. In any case, it depends whether or not this is acceptable by the application.

Feb
05
2018
--

Four Ways MySQL Executes GROUP BY

MySQL GROUP BY

MySQL GROUP BYIn this blog post, I’ll look into four ways MySQL executes GROUP BY. 

In my previous blog post, we learned that indexes or other means of finding data might not be the most expensive part of query execution. For example, MySQL GROUP BY could potentially be responsible for 90% or more of the query execution time. 

The main complexity when MySQL executes GROUP BY is computing aggregate functions in a GROUP BY statement. How this works is shown in the documentation for UDF Aggregate Functions. As we see, the requirement is that UDF functions get all values that constitute the single group one after another. That way, it can compute the aggregate function value for the single group before moving to another group.

The problem, of course, is that in most cases the source data values aren’t grouped. Values coming from a variety of groups follow one another during processing. As such, we need a special step to handle MySQL GROUP BY.

Let’s look at the same table we looked at before:

mysql> show create table tbl G
*************************** 1. row ***************************
      Table: tbl
Create Table: CREATE TABLE `tbl` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `k` int(11) NOT NULL DEFAULT '0',
 `g` int(10) unsigned NOT NULL,
 PRIMARY KEY (`id`),
 KEY `k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

And the same GROUP BY statements executed in different ways:

1: Index Ordered GROUP BY in MySQL

mysql> select k, count(*) c from tbl group by k order by k limit 5;
+---+---+
| k | c |
+---+---+
| 2 | 3 |
| 4 | 1 |
| 5 | 2 |
| 8 | 1 |
| 9 | 1 |
+---+---+
5 rows in set (0.00 sec)
mysql> explain select k, count(*) c from tbl group by k order by k limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: index
possible_keys: k
         key: k
     key_len: 4
         ref: NULL
        rows: 5
    filtered: 100.00
       Extra: Using index
1 row in set, 1 warning (0.00 sec)

In this case, we have an index on the column we use for GROUP BY. This way, we can just scan data group by group and perform GROUP BY on the fly (inexpensively).

It works especially well when we use LIMIT to restrict the number of groups we retrieve or when a “covering index” is in use, as a sequential index-only scan is a very fast operation.

If you have a small number of groups though, and no covering index, index order scans can cause a lot of IO. So this might not be the most optimal plan.

2: External Sort GROUP BY in MySQL

mysql> explain select SQL_BIG_RESULT g, count(*) c from tbl group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 998490
    filtered: 100.00
       Extra: Using filesort
1 row in set, 1 warning (0.00 sec)
mysql> select SQL_BIG_RESULT g, count(*) c from tbl group by g limit 5;
+---+---+
| g | c |
+---+---+
| 0 | 1 |
| 1 | 2 |
| 4 | 1 |
| 5 | 1 |
| 6 | 2 |
+---+---+
5 rows in set (0.88 sec)

If we do not have an index that allows us to scan the data in group order, we can instead get data sorted through an external sort (also referred to as “filesort” in MySQL).

You may notice I’m using an SQL_BIG_RESULT hint here to get this plan. Without it, MySQL won’t choose this plan in this case.

In general, MySQL prefers to use this plan only if we have a large number of groups, because in this case sorting is more efficient than having a temporary table (which we will talk about next).

3: Temporary Table GROUP BY in MySQL

mysql> explain select  g, sum(g) s from tbl group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 998490
    filtered: 100.00
       Extra: Using temporary
1 row in set, 1 warning (0.00 sec)
mysql> select  g, sum(g) s from tbl group by g order by null limit 5;
+---+------+
| g | s    |
+---+------+
| 0 |    0 |
| 1 |    2 |
| 4 |    4 |
| 5 |    5 |
| 6 |   12 |
+---+------+
5 rows in set (7.75 sec)

In this case, MySQL also does a full table scan. But instead of running additional sort passes, it creates a temporary table instead. This temporary table contains one row per group, and with each incoming row the value for the corresponding group is updated. Lots of updates! While this might be reasonable in-memory, it becomes very expensive if the resulting table is so large that updates are going to cause a lot of disk IO. In this case, external sort plans are usually better.

Note that while MySQL selects this plan by default for this use case, if we do not supply any hints it is almost 10x slower than the plan we get using the SQL_BIG_RESULT hint.

You may notice I added “ORDER BY NULL” to this query. This is to show you “clean” the temporary table only plan. Without it, we get this plan:

mysql> explain select  g, sum(g) s from tbl group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 998490
    filtered: 100.00
       Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)

In it, we get the “worst of both worlds” with Using Temporary Table and filesort.  

MySQL 5.7 always returns GROUP BY results sorted in group order, even if this the query doesn’t require it (which can then require an expensive additional sort pass). ORDER BY NULL signals the application doesn’t need this.

You should note that in some cases – such as JOIN queries with aggregate functions accessing columns from different tables – using temporary tables for GROUP BY might be the only option.

If you want to force MySQL to use a plan that does temporary tables for GROUP BY, you can use the SQL_SMALL_RESULT  hint.

4:  Index Skip-Scan-Based GROUP BY in MySQL

The previous three GROUP BY execution methods apply to all aggregate functions. Some of them, however, have a fourth method.

mysql> explain select k,max(id) from tbl group by k G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: range
possible_keys: k
         key: k
     key_len: 4
         ref: NULL
        rows: 2
    filtered: 100.00
       Extra: Using index for group-by
1 row in set, 1 warning (0.00 sec)
mysql> select k,max(id) from tbl group by k;
+---+---------+
| k | max(id) |
+---+---------+
| 0 | 2340920 |
| 1 | 2340916 |
| 2 | 2340932 |
| 3 | 2340928 |
| 4 | 2340924 |
+---+---------+
5 rows in set (0.00 sec)

This method applies only to very special aggregate functions: MIN() and MAX(). These do not really need to go through all the rows in the group to compute the value at all.

They can just jump to the minimum or maximum group value in the group directly (if there is such an index).

How can you find MAX(ID) value for each group if the index is only built on (K) column? This is an InnoDB table. Remember InnoDB tables effectively append the PRIMARY KEY to all indexes. (K) becomes (K,ID), allowing us to use Skip-Scan optimization for this query.

This optimization is only enabled if there is a large number of rows per group. Otherwise, MySQL prefers more conventional means to execute this query (like Index Ordered GROUP BY detailed in approach #1).

While we’re on MIN()/MAX() aggregate functions, other optimizations apply to them as well. For example, if you have an aggregate function with no GROUP BY (effectively  having one group for all tables), MySQL fetches those values from indexes during a statistics analyzes phase and avoids reading tables during the execution stage altogether:

mysql> explain select max(k) from tbl G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: NULL
  partitions: NULL
        type: NULL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: NULL
    filtered: NULL
       Extra: Select tables optimized away
1 row in set, 1 warning (0.00 sec)

Filtering and Group By

We have looked at four ways MySQL executes GROUP BY.  For simplicity, I used GROUP BY on the whole table, with no filtering applied. The same concepts apply when you have a WHERE clause:

mysql> explain select  g, sum(g) s from tbl where k>4 group by g order by NULL limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: range
possible_keys: k
         key: k
     key_len: 4
         ref: NULL
        rows: 1
    filtered: 100.00
       Extra: Using index condition; Using temporary
1 row in set, 1 warning (0.00 sec)

For this case, we use the range on the K column for data filtering/lookup and do a GROUP BY when there is a temporary table.

In some cases, the methods do not conflict. In others, however, we have to choose either to use one index for GROUP BY or another index for filtering:

mysql> alter table tbl add key(g);
Query OK, 0 rows affected (4.17 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain select  g, sum(g) s from tbl where k>1 group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: index
possible_keys: k,g
         key: g
     key_len: 4
         ref: NULL
        rows: 16
    filtered: 50.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select  g, sum(g) s from tbl where k>4 group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: range
possible_keys: k,g
         key: k
     key_len: 4
         ref: NULL
        rows: 1
    filtered: 100.00
       Extra: Using index condition; Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)

Depending on specific constants used in this query, we can see that we either use an index ordered scan for GROUP BY (and  “give up”  benefiting from the index to resolve the WHERE clause), or use an index to resolve the WHERE clause (but use a temporary table to resolve GROUP BY).

In my experience, this is where MySQL GROUP BY does not always make the right choice. You might need to use FORCE INDEX to execute queries the way you want them to.

Summary

I hope this article provides a good overview of how MySQL executes GROUP BY.  In my next blog post, we will look into techniques you can use to optimize GROUP BY queries.

Jan
31
2018
--

Aurora Hash Join Optimization (with a Gentle Reminder on Lab Features)

Aurora Hash Join Lab Mode

Aurora Hash Join Lab ModeThe Aurora hash join feature for relational databases has been around for a while now. But unlike MySQL Block Nested Loop algorithm, an Aurora hash join only caters to a specific number of use cases. When implemented with the optimizer properly, they can provide great benefits with certain workloads. Below we’ll see a brief example of a quick win.

This new feature is available in Aurora lab mode version 1.16. Because this is a lab feature, it’s important to make sure to test your queries before upgrading, especially if you are looking to scale up to the new R4 instances before the Superbowl to avoid hitting the same problem I discuss below.

When lab mode is enabled and

hash_join

  is ON, you can verify the optimizer feature from the

optimizer_switch

 variable:

mysql> SELECT @@aurora_version, @@aurora_lab_mode, @@optimizer_switch G
*************************** 1. row ***************************
  @@aurora_version: 1.16
 @@aurora_lab_mode: 1
@@optimizer_switch: index_merge=on,...,hash_join=on,hash_join_cost_based=on

Hash joins work well when joining large result sets because – unlike block nested loop in the same query – the optimizer scans the larger table and matches it against the hashed smaller table instead of the other way around. Consider the tables and query below:

+----------+----------+
| tbl      | rows     |
+----------+----------+
| branches |    55143 |
| users    |   103949 |
| history  | 27168887 |
+----------+----------+
EXPLAIN
SELECT SQL_NO_CACHE COUNT(*)
FROM branches b
   INNER JOIN users u ON (b.u_id = u.u_id)
   INNER JOIN history h ON (u.u_id = h.u_id);

With hash joins enabled, we can see from the Extra column in the EXPLAIN output how it builds the join conditions:

mysql> EXPLAIN
    -> SELECT SQL_NO_CACHE COUNT(*)
    -> FROM branches b
    ->    INNER JOIN users u ON (b.u_id = u.u_id)
    ->    INNER JOIN history h ON (u.u_id = h.u_id);
+----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows     | Extra                                                    |
+----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+
|  1 | SIMPLE      | u     | index | PRIMARY       | PRIMARY | 4       | NULL |   103342 | Using index                                              |
|  1 | SIMPLE      | h     | ALL   | NULL          | NULL    | NULL    | NULL | 24619023 | Using join buffer (Hash Join Outer table h)              |
|  1 | SIMPLE      | b     | index | user_id       | user_id | 4       | NULL |    54129 | Using index; Using join buffer (Hash Join Inner table b) |
+----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+

Without hash joins, it’s a straightforward Cartesian (almost) product of all three tables:

mysql> SET optimizer_switch='hash_join=off';
Query OK, 0 rows affected (0.02 sec)
mysql> EXPLAIN
    -> SELECT SQL_NO_CACHE COUNT(*)
    -> FROM branches b
    ->    INNER JOIN users u ON (b.u_id = u.u_id)
    ->    INNER JOIN history h ON (u.u_id = h.u_id);
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref            | rows     | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+
|  1 | SIMPLE      | h     | ALL    | NULL          | NULL    | NULL    | NULL           | 24619023 | NULL        |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | percona.h.u_id |        1 | Using index |
|  1 | SIMPLE      | b     | ref    | user_id       | user_id | 4       | percona.h.u_id |        7 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+

Now, the execution times without hash joins enabled:

mysql> SELECT SQL_NO_CACHE COUNT(*)
    -> FROM branches b
    ->    INNER JOIN users u ON (b.u_id = u.u_id)
    ->    INNER JOIN history h ON (u.u_id = h.u_id);
+-----------+
| COUNT(*)  |
+-----------+
| 128815553 |
+-----------+
1 row in set (1 min 6.95 sec)
mysql> SET optimizer_switch='hash_join=off';
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*)
    -> FROM branches b
    ->    INNER JOIN users u ON (b.u_id = u.u_id)
    ->    INNER JOIN history h ON (u.u_id = h.u_id);
+-----------+
| COUNT(*)  |
+-----------+
| 128815553 |
+-----------+
1 row in set (2 min 28.27 sec)

Clearly with this optimization enabled, we have more than a 50% gain from the example query.

Now while this type of query might be rare, most of us know we need to avoid really large JOINs as they are not scalable. But at some point, we find some that take advantage of the feature. Here is an excerpt from an actual production query I’ve recently worked on. It shows the good execution plan versus the one using hash joins.

This particular EXPLAIN output only differs in the row where without a hash join, it uses an index, and the query executes normally. With the hash join enabled, the optimizer thought it was better to use it instead:

...
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: eq_ref
possible_keys: PRIMARY,r_type_id_ix,r_id_r_type_id_dt_ix
          key: PRIMARY
      key_len: 4
          ref: db.x.p_id
         rows: 1
        Extra: Using where
...
...
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: index
possible_keys: PRIMARY,r_type_id_ix,r_id_r_type_id_dt_ix
          key: r_id_r_type_id_dt_ix
      key_len: 18
          ref: NULL
         rows: 715568233
        Extra: Using where; Using index; Using join buffer (Hash Join Inner table t)
...

Needless to say, it caused problems. Unfortunately, a bug on Aurora 1.16 exists where hash joins cannot be turned off selectively (it is enabled by default) from the parameter group. If you try this, you get an error “Error saving: Invalid parameter value: hash_join=off for: optimizer_switch”. The only way to disable the feature is to turn off

lab_mode

, which requires an instance restart. An alternative is to simply add

SET optimizer_switch='hash_join=off';

 from the application, especially if you rely on some of the other lab mode features in Aurora.

To summarize, the new hash join feature is a great addition. But as it’s a lab feature, be careful when upgrading!

Jan
22
2018
--

Webinar Wednesday, January 24, 2018: Differences between MariaDB and MySQL

MariaDB and MySQL

MariaDB and MySQLJoin Percona’s Chief Evangelist, Colin Charles as he presents Differences Between MariaDB and MySQL on Wednesday, January 24, 2018, at 7:00 am PDT / 10:00 am EDT (UTC-7).

Tags: MariaDB, MySQL, Percona Server for MySQL, DBA, SysAdmin, DevOps
Experience Level: Novice

MariaDB and MySQL. Are they syntactically similar? Where do these two query languages differ? Why would I use one over the other?

MariaDB is on the path of gradually diverging from MySQL. One obvious example is the internal data dictionary currently under development for MySQL 8.0. This is a major change to the way metadata is stored and used within the server. MariaDB doesn’t have an equivalent feature. Implementing this feature could mark the end of datafile-level compatibility between MySQL and MariaDB.

There are also non-technical differences between MySQL and MariaDB, including:

  • Licensing: MySQL offers their code as open-source under the GPL, and provides the option of non-GPL commercial distribution in the form of MySQL Enterprise. MariaDB can only use the GPL because they derive their work from the MySQL source code under the terms of that license.
  • Support services: Oracle provides technical support, training, certification and consulting for MySQL, while MariaDB has their own support services. Some people will prefer working with smaller companies, as traditionally it affords them more leverage as a customer.
  • Community contributions: MariaDB touts the fact that they accept more community contributions than Oracle. Part of the reason for this disparity is that developers like to contribute features, bug fixes and other code without a lot of paperwork overhead (and they complain about the Oracle Contributor Agreement). However, MariaDB has its own MariaDB Contributor Agreement — which more or less serves the same purpose.

Colin will take a look at some of the differences between MariaDB and MySQL and help answer some of the common questions our Database Performance Experts get about the two databases.

Register for the webinar now.

Colin CharlesColin Charles, Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team for MariaDB Server in 2009, worked in MySQL since 2005, and been a MySQL user since 2000. Before joining MySQL, Colin worked actively on the Fedora and OpenOffice.org projects. He’s well-known within many open source communities and speaks on the conference circuit.

Jan
18
2018
--

Does the Meltdown Fix Affect Performance for MySQL on Bare Metal?

Meltdown Fix Affect Performance small

In this blog post, we’ll look at does the Meltdown fix affect performance for MySQL on bare metal servers.

Since the news about the Meltdown bug, there were a lot of reports on the performance hit from proposed fixes. We have looked at how the fix affects MySQL (Percona Server for MySQL) under a sysbench workload.

In this case, we used bare metal boxes with the following specifications:

  • Two-socket Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz (in total 56 entries in /proc/cpuinfo)
  • Ubuntu 16.04
  • Memory: 256GB
  • Storage: Samsung SM863 1.9TB SATA SSD
  • Percona Server for MySQL 5.7.20
  • Kernel (vulnerable) 4.13.0-21
  • Kernel (with Meltdown fix) 4.13.0-25

Please note, the current kernel for Ubuntu 16.04 contains only a Meltdown fix, and not one for Spectre.

We performed the validation with the https://github.com/speed47/spectre-meltdown-checker tool. The database size is 100GB in a sysbench workload with 100 tables, 4mln rows each with Pareto distribution.

We have used a socket connection and TCP host connection to measure a possible overhead from the TCP network connection. We also perform read-write and read-only benchmarks.

The results are below for a various number of threads:

Meltdown Fix Affect Performance

Where

  • Nokpti: kernel without KPTI patch (4.13.0-21)
  • Pti: kernel with KPTI patch (4.13.0-25), with PTI enabled
  • Nopti: kernel with KPTI patch (4.13.0-25), with PTI disabled

 

testname bp socket threads pti nopti nokpti nopti_pct pti_pct
1 OLTP_RO in-memory tcp_socket 1 709.93 718.47 699.50 -2.64 -1.47
4 OLTP_RO in-memory tcp_socket 8 5473.05 5500.08 5483.40 -0.30 0.19
3 OLTP_RO in-memory tcp_socket 64 21716.18 22036.98 21548.46 -2.22 -0.77
2 OLTP_RO in-memory tcp_socket 128 21606.02 22010.36 21548.62 -2.10 -0.27
 5 OLTP_RO in-memory unix_socket 1 750.41 759.33 776.88 2.31 3.53
8 OLTP_RO in-memory unix_socket 8 5851.80 5896.86 5986.89 1.53 2.31
7 OLTP_RO in-memory unix_socket 64 23052.10 23552.26 23191.48 -1.53 0.60
6 OLTP_RO in-memory unix_socket 128 23215.38 23602.64 23146.42 -1.93 -0.30
9 OLTP_RO io-bound tcp_socket 1 364.03 369.68 370.51 0.22 1.78
12 OLTP_RO io-bound tcp_socket 8 3205.05 3225.21 3210.63 -0.45 0.17
11 OLTP_RO io-bound tcp_socket 64 15324.66 15456.44 15364.25 -0.60 0.26
10 OLTP_RO io-bound tcp_socket 128 17705.29 18007.45 17748.70 -1.44 0.25
13 OLTP_RO io-bound unix_socket 1 421.74 430.10 432.88 0.65 2.64
16 OLTP_RO io-bound unix_socket 8 3322.19 3367.46 3367.34 -0.00 1.36
15 OLTP_RO io-bound unix_socket 64 15977.28 16186.59 16248.42 0.38 1.70
14 OLTP_RO io-bound unix_socket 128 18729.10 19111.55 18962.02 -0.78 1.24
17 OLTP_RW in-memory tcp_socket 1 490.76 495.21 489.49 -1.16 -0.26
20 OLTP_RW in-memory tcp_socket 8 3445.66 3459.16 3414.36 -1.30 -0.91
19 OLTP_RW in-memory tcp_socket 64 11165.77 11167.44 10861.44 -2.74 -2.73
18 OLTP_RW in-memory tcp_socket 128 12176.96 12226.17 12204.85 -0.17 0.23
21 OLTP_RW in-memory unix_socket 1 530.08 534.98 540.27 0.99 1.92
24 OLTP_RW in-memory unix_socket 8 3734.93 3757.98 3772.17 0.38 1.00
23 OLTP_RW in-memory unix_socket 64 12042.27 12160.86 12138.01 -0.19 0.80
22 OLTP_RW in-memory unix_socket 128 12930.34 12939.02 12844.78 -0.73 -0.66
25 OLTP_RW io-bound tcp_socket 1 268.08 270.51 270.71 0.07 0.98
28 OLTP_RW io-bound tcp_socket 8 1585.39 1589.30 1557.58 -2.00 -1.75
27 OLTP_RW io-bound tcp_socket 64 4828.30 4782.42 4620.57 -3.38 -4.30
26 OLTP_RW io-bound tcp_socket 128 5158.66 5172.82 5321.03 2.87 3.15
29 OLTP_RW io-bound unix_socket 1 280.54 282.06 282.35 0.10 0.65
32 OLTP_RW io-bound unix_socket 8 1582.69 1584.52 1601.26 1.06 1.17
31 OLTP_RW io-bound unix_socket 64 4519.45 4485.72 4515.28 0.66 -0.09
30 OLTP_RW io-bound unix_socket 128 5524.28 5460.03 5275.53 -3.38 -4.50

 

As you can see, there is very little difference between runs (in 3-4% range), which fits into variance during the test.

Similar experiments were done on different servers and workloads:

There also we see a negligible difference that fits into measurement variance.

Overhead analysis

To understand why we do not see much effect in MySQL (InnoDB workloads), let’s take a look where we expect to see the overhead from the proposed fix.

The main overhead is expected from a system call, so let’s test syscall execution on the kernel before the fix and after the fix (thanks for Alexey Kopytov for an idea how to test it with sysbench).

We will use the following script syscall.lua:

ffi.cdef[[long syscall(long, long, long, long);]]
function event()
 for i = 1, 10000 do
 ffi.C.syscall(0, 0, 0, 0)
 end
end

Basically, we measure the time for executing 10000 system calls (this will be one event).

To run benchmark:

sysbench syscall.lua --time=60 --report-interval=1 run

 

And the results are following:

  • On the kernel without the fix (4.13.0-21): 455 events/sec
  • On the kernel with the fix (4.13.0-26): 250 events/sec

This means that time to execute 10000 system calls increased from 2.197ms to 4ms.

While this increase looks significant, it does not have much effect on MySQL (InnoDB engine). In MySQL, you can expect most system calls done for IO or network communication.

We can assume that the time to execute 10000 IO events on the fast storage takes 1000ms, so adding an extra 2ms for the system calls corresponds to adding 0.2% in overhead (which is practically invisible in MySQL workloads).

I expect the effect will be much more visible if we work with MyISAM tables cached in OS memory. In this case, the syscall overhead would be much more visible when accessing data in memory.

Conclusion:

From our results, we do not see a measurable effect from KPTI patches (to mitigate the Meltdown vulnerability) running on bare metal servers with Ubuntu 16.04 and 4.13 kernel series.

Reference commands and configs:

sysbench oltp_read_only.lua   {--mysql-socket=/tmp/mysql.sock|--mysql-host=127.0.0.1} --mysql-user=root
--mysql-db=sbtest100t4M --rand-type=pareto  --tables=100  --table-size=4000000 --num-threads=$threads --report-interval=1
--max-time=180 --max-requests=0  run

RW:

sysbench oltp_read_write.lua   {--mysql-socket=/tmp/mysql.sock|--mysql-host=127.0.0.1} --mysql-user=root
--mysql-db=sbtest100t4M --rand-type=pareto  --tables=100  --table-size=4000000 --num-threads=$threads --report-interval=1
--max-time=180 --max-requests=0  run

mysqld:
Percona Server 5.7.20-19

numactl --physcpubind=all --interleave=all   /usr/bin/env LD_PRELOAD=/data/opt/alexey.s/bin64_5720.ps/lib/mysql/libjemalloc.so.1 ./bin/mysqld
--defaults-file=/data/opt/alexey.s/my-perf57.cnf --basedir=. --datadir=/data/sam/sbtest100t4M   --user=root  --innodb_flush_log_at_trx_commit=1
--innodb-buffer-pool-size=150GB --innodb-log-file-size=10G --innodb-buffer-pool-instances=8  --innodb-io-capacity-max=20000
--innodb-io-capacity=10000 --loose-innodb-page-cleaners=8 --ssl=0

My.cnf file:

[mysqld]
user=root
port=3306
innodb_status_file=0
innodb_data_file_path=ibdata1:100M:autoextend
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = true
innodb_log_buffer_size = 128M
innodb_log_file_size = 10G
innodb_log_files_in_group = 2
innodb_write_io_threads=8
innodb_read_io_threads=8
innodb_io_capacity=15000
innodb_io_capacity_max=25000
innodb_lru_scan_depth=8192
#innodb_buffer_pool_size=${BP}G
innodb_doublewrite=1
innodb_thread_concurrency=0
innodb-checksum-algorithm=strict_crc32
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_purge_threads=8
loose-innodb-page-cleaners=8
innodb_buffer_pool_instances=8
innodb_change_buffering=none
innodb_adaptive_hash_index=OFF
sync_binlog=0
max_connections=5000
table_open_cache=5000
query_cache_type=OFF
thread_cache_size=16
back_log=2000
connect_timeout=15
skip-grant-tables
sort_buffer_size=262144
key_buffer_size=8388608
join_buffer_size=262144
server-id=1
max_connections=50000
skip_name_resolve=ON
max_prepared_stmt_count=1048560
performance_schema=OFF
performance-schema-instrument='wait/synch/%=ON'
innodb_monitor_enable=all
innodb_flush_neighbors=0
metadata_locks_hash_instances=256
table_open_cache_instances=64

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com