The Impact of Swapping on MySQL Performance

In this blog, I’ll look at the impact of swapping on MySQL performance. 

It’s common sense that when you’re running MySQL (or really any other DBMS) you don’t want to see any I/O in your swap space. Scaling the cache size (using


 in MySQL’s case) is standard practice to make sure there is enough free memory so swapping isn’t needed.   

But what if you make some mistake or miscalculation, and swapping happens? How much does it really impact performance? This is exactly what I set out to investigate.

My test system has the following:

  • 32GB of physical memory
  • OS (and swap space) on a (pretty old) Intel 520 SSD device
  • Database stored on Intel 750 NVMe storage

To simulate a worst case scenario, I’m using Uniform Sysbench Workload:

sysbench --test=/usr/share/doc/sysbench/tests/db/select.lua   --report-interval=1 --oltp-table-size=700000000 --max-time=0 --oltp-read-only=off --max-requests=0 --num-threads=64 --rand-type=uniform --db-driver=mysql --mysql-password=password --mysql-db=test_innodb  run

To better visualize the performance of the metrics that matter for this test, I have created the following custom graph in our Percona Monitoring and Management (PMM) tool. It shows performance disk IO and swapping activity on the same graph.

Here are the baseline results for


. The results are a reasonable ballpark number for a system with 32GB of memory.

Impact of Swapping on MySQL PMM 1

As you can see in the baseline scenario, there is almost no swapping, with around 600MB/sec read from the disk. This gives us about 44K QPS. The 95% query response time (reported by sysbench) is about 3.5ms.

Next, I changed the configuration to


, which is the total amount of memory available. As memory is required for other purposes, it caused swapping activity:

Impact of Swapping on MySQL PMM 2

We can see that performance stabilizes after a bit at around 20K QPS, with some 380MB/sec disk IO and 125MB/sec swap IO. The 95% query response time has grown to around 9ms.

Now let’s look at an even worse case. This time, we’ll set our configuration to


 (on a 32GB system).

Impact of Swapping on MySQL PMM 3

Now we have around 6K QPS. Disk IO has dropped to 250MB/sec, and swap IO is up to 190MB/sec. The 95% query response time is around 35ms. As the graph shows, the performance becomes more variable, confirming the common assumption that intense swapping affects system stability.

Finally, let’s remember MySQL 5.7 has the Online Buffer Pool Resize feature, which was created to solve exactly this problem (among other reasons). It changes the buffer pool size if you accidentally set it too large. As we have tested


, and demonstrated it worked well, let’s scale it back to that value:

mysql> set global innodb_buffer_pool_size=24*1024*1024*1024;
Query OK, 0 rows affected (0.00 sec)

Impact of Swapping on MySQL PMM 4

Now the graph shows both good and bad news. The good news is that the feature works as intended, and after the resize completes we get close to the same results before our swapping experiment. The bad news is everything pretty much grinds to halt for 15 minutes or so while resizing occurs. There is almost no IO activity or intensive swapping while the buffer pool resize is in progress.   

I also performed other sysbench runs for selects using Pareto random type rather than Uniform type, creating more realistic testing (skewed) data access patterns. I further performed update key benchmarks using both Uniform and Pareto access distribution.

You can see the results below:

Impact of Swapping on MySQL Pareto 1

Impact of Swapping on MySQL Pareto 2

As you can see, the results for selects are as expected. Accesses with Pareto distributions are better and are affected less – especially by minor swapping.  

If you look at the update key results, though, you find that minor swapping causes performance to improve for Pareto distribution. The results at 48GB of memory are pretty much the same.

Before you say that that is impossible, let me provide an explanation: I limited


 on this system to avoid unbound InnoDB history length growth. These workloads tend to be bound by InnoDB purge performance. It looks like swapping has impacted the user threads more than it did the purge threads, causing such an unusual performance profile. This is something that might not be repeatable between systems.


When I started, I expected severe performance drop even with very minor swapping. I surprised myself by getting swap activity to more than 100MB/sec, with performance “only” halved.  

While you should continue to plan your capacity so that there is no constant swapping on the database system, these results show that a few MB/sec of swapping activity it is not going to have a catastrophic impact.

This assumes your swap space is on an SSD, of course! SSDs handle random IO (which is what paging activity usually is) much better than HDDs.


The write cache: Swap insanity tome III

Swapping has always been something bad for MySQL performance but it is even more important for HA systems. It is so important to avoid swapping with HA that NDB cluster basically forbids calling malloc after the startup phase and hence its rather complex configuration.

Probably most readers of this blog know (or should know) about Linux swappiness setting, which basically controls how important is the file cache for Linux. Basically, with InnoDB, since the file cache is not important we add “vm.swappiness = 0″ to “/etc/sysctl.conf” and run “sysctl -p” and we are done.

Swappiness solves part of the swapping issue but not all. With Numa systems, the picture is more complex and swapping can occur because of a memory imbalance between the physical cpus, the sockets and not cores. Jeremy Cole explained this here and here. In summary, you need to interleave the allocation of memory for the MySQL process using the numactl utility, drop the file cache and pre-allocate the innodb buffer pool with the innodb_buffer_pool_populate option.

That solves most of the swapping issues but… I recently ended up in a situation where, after having done all that, a server was swapping episodically for a few minutes, enough to cause serious issues to the pacemaker setup the server is one of the nodes. That caused resource failovers. Resource failover when there’s no fencing and when the servers are not responsive because of swapping is pretty bad for a cluster and we often ended up is strange, unexpected states. What was I missing?

In order to figure out, I started of data gathering metrics like: pt-stalk, vmstat, top, iostat, etc all saving to disk with the current time stamp in front of each line. I quickly found that the issues happened during the backups but also, during the push of the backups to S3. Why? The push to S3 was especially puzzling since MySQL was not involved. The server was running with nearly 7GB of free memory but the push to S3 involves a split operation to create files of 5GB since the backup size is about 28GB and the file upload size limit is 5GB. So, about 28GB was written to disk in a short while. Here’s the vmstat output during the split process, the cron job started at 10am.

Fri Mar 22 10:03:22 UTC 2013 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
Fri Mar 22 10:03:22 UTC 2013 r b swpd free buff cache si so bi bo in cs us sy id wa st
Fri Mar 22 10:03:22 UTC 2013 2 1 291992 224892 138848 7254284 0 0 148504 476 24190 24233 8 6 81 5 0
Fri Mar 22 10:03:24 UTC 2013 2 1 291992 222228 138848 7258064 0 0 194040 96 22443 20425 7 6 83 5 0
Fri Mar 22 10:03:26 UTC 2013 3 1 291992 224088 138852 7260196 0 0 172386 156 27143 27637 8 7 80 5 0
Fri Mar 22 10:03:28 UTC 2013 1 4 291992 219104 138832 7262820 0 0 174854 160 18893 17002 6 5 83 6 0
Fri Mar 22 10:03:30 UTC 2013 1 2 291992 324640 138836 7153440 0 0 143736 132 19318 17425 7 5 76 12 0
Fri Mar 22 10:03:32 UTC 2013 0 2 291984 292988 138840 7183556 0 0 138480 1206 19126 16359 3 5 81 12 0
Fri Mar 22 10:03:34 UTC 2013 4 0 291984 216932 138832 7255856 0 0 169072 936 20541 16555 3 7 83 8 0
Fri Mar 22 10:03:36 UTC 2013 3 7 300144 242052 138836 7216444 0 4080 53400 53714 18422 16234 2 7 72 18 0
Fri Mar 22 10:03:38 UTC 2013 0 4 395888 355136 138844 7197992 0 47872 3350 125712 24148 21633 3 4 70 23 0
Fri Mar 22 10:03:40 UTC 2013 4 4 495212 450516 138864 7208188 0 49664 2532 164356 30539 21120 2 3 81 13 0

Note the free column going down and the cache one going up and then swapping starting at 10:03:36. Also, the server is reading a lot (~150MB/s) but doing very little physical writes. The writes were cached and the write cache was causing the problem. It happened that the setting vm.dirty_ratio was set to 20 on a 48GB server, allowing nearly 10GB of “write cache”. A quick survey of the servers I have access to showed that the common values are 20 and 40. I simply set the dirty_ratio to 2 (1GB of write cache) and the issue is gone since.

So, add to your verification list the point of making sure you look at the Linux settings for the dirty pages. You can either set “vm.dirty_ratio” or “vm.dirty_bytes”, but keep in mind that only one is used; setting one to a non-zero value sets the other to zero.

The post The write cache: Swap insanity tome III appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com