How to Choose the MySQL innodb_log_file_size


In this blog post, I’ll provide some guidance on how to choose the MySQL innodb_log_file_size.

Like many database management systems, MySQL uses logs to achieve data durability (when using the default InnoDB storage engine). This ensures that when a transaction is committed, data is not lost in the event of crash or power loss.

MySQL’s InnoDB storage engine uses a fixed size (circular) Redo log space. The size is controlled by innodb_log_file_size and innodb_log_files_in_group (default 2). You multiply those values and get the Redo log space that available to use. While technically it shouldn’t matter whether you change either the innodb_log_file_size or innodb_log_files_in_group variable to control the Redo space size, most people just work with the innodb_log_file_size and leave innodb_log_files_in_group alone.

Configuring InnoDB’s Redo space size is one of the most important configuration options for write-intensive workloads. However, it comes with trade-offs. The more Redo space you have configured, the better InnoDB can optimize write IO. However, increasing the Redo space also means longer recovery times when the system loses power or crashes for other reasons.  

It is not easy or straightforward to predict how much time a system crash recovery takes for a specific innodb_log_file_size value – it depends on the hardware, MySQL version and workload. It can vary widely (10 times difference or more, depending on the circumstances). However, around five minutes per 1GB of innodb_log_file_size is a decent ballpark number. If this is really important for your environment, I would recommend testing it by a simulating system crash under full load (after the database has completely warmed up).   

While recovery time can be a guideline for the limit of the InnoDB Log File size, there are a couple of other ways you can look at this number – especially if you have Percona Monitoring and Management installed.

Check Percona Monitoring and Management’s “MySQL InnoDB Metrics” Dashboard. If you see a graph like this:


where Uncheckpointed Bytes is pushing very close to the Max Checkpoint Age, you can almost be sure your current innodb_log_file_size is limiting your system’s performance. Increasing it can provide substantial performance improvements.

If you see something like this instead:

innodb_log_file_size 2

where the number of Uncheckpointed Bytes is well below the Max Checkpoint Age, then increasing the log file size won’t give you a significant improvement.

Note: many MySQL settings are interconnected. While a specific log file size might be good enough for smaller innodb_buffer_pool_size, larger InnoDB Buffer Pool values might warrant larger log files for optimal performance.

Another thing to keep in mind: the recovery time we spoke about early really depends on the Uncheckpointed Bytes rather than total log file size. If you do not see recovery time increasing with a larger innodb_log_file_size, check out InnoDB Checkpoint Age graph – it might be you just can’t fully utilize large log files with your workload and configuration.

Another way to look at the log file size is in context of log space usage:

innodb_log_file_size 3

This graph shows the amount of Data Written to the InnoDB log files per hour, as well as the total size of the InnoDB log files. In the graph above, we have 2GB of log space and some 12GB written to the Log files per hour. This means we cycle through logs every ten minutes.

InnoDB has to flush every dirty page in the buffer pool at least once per log file cycle time.

InnoDB gets better performance when it does that less frequently, and there is less wear and tear on SSD devices. I like to see this number at no less than 15 minutes. One hour is even better.  


Getting the innodb_log_file_file size is important to achieve the balance between reasonably fast crash recovery time and good system performance. Remember, your recovery time objective it is not as trivial as you might imagine. I hope the techniques described in this post help you to find the optimal value for your situation!


What is a big innodb_log_file_size?

big innodb_log_file_size

In this post, we’ll discuss what constitutes a big


, and how it can affect performance.

In the comments for our post on Percona Server 5.7 performance improvements, someone asked why we use


 with an indication that it might be too big?

In my previous post (https://www.percona.com/blog/2016/05/17/mysql-5-7-read-write-benchmarks/), the example used


. Is that too big? Let’s take a more detailed look at this.

First, let me start by rephrasing my warning: the log file size should be set as big as possible, but not bigger than necessary. A bigger log file size is better for performance, but it has a drawback (a significant one) that you need to worry about: the recovery time after a crash. You need to balance recovery time in the rare event of a crash recovery versus maximizing throughput during peak operations. This limitation can translate to a 20x longer crash recovery process!

But how big is “big enough”? Is it 48MB (the default value), 1-2GB (which I often see in production), or 10-15GB (like we use for benchmarks)?

I wrote about how the 


 is related to background flushing five years ago, and I recommend this post if you are interested in details:

InnoDB Flushing: Theory and solutions

Since that time many improvements have been made both in Percona Server and MySQL, but a small


 still affects the throughput.

How? Let’s review how writes happen in InnoDB. Practically all data page writes happen in the background. It seems like background writes shouldn’t affect user query performance, but it does. The more intense background writes are, the more resources are taken away from the user foreground workload. There are three big forces that rule background writes:

  1. How close checkpoint age is to the async point (again, see previous material https://www.percona.com/blog/2011/04/04/innodb-flushing-theory-and-solutions/). This is adaptive flushing.
  2. How close is

     to the percentage of actual dirty pages.  You can see this in the LRU flushing metrics.

  3. What amount of free pages are defined by

    . This is also in LRU flushing metrics.

So in this equation 


 defines the async point, and how big checkpoint age can be.

To show a practical application of these forces, I’ve provided some chart data. I will use charts from the Percona Monitoring and Management tool and data from Percona Server 5.7.

Before jumping to graphs, let me remind you that the max checkpoint age is defined not only by


, but also 


 (which is usually “2” by default). So 


 will have 4GB of log space, from which MySQL will use about 3.24GB (MySQL makes extra reservations to avoid a situation when we fully run out of log space).

Below are graphs from a tpcc-mysql benchmark with 1500 warehouses, which provides about 150GB of data. I used


, and I made two runs:

  1. with
  2. with

Other details about my setup:

  • CPU: 56 logical CPU threads servers Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
  • OS: Ubuntu 16.04
  • Kernel 4.4.0-21-generic
  • The storage device is Samsung SM863 SATA SSD, single device, with ext4 filesystem
  • MySQL versions: Percona Server 5.7.11
  • innodb_io_capacity=5000 / innodb_io_capacity_max=7500

On the first chart, let’s look at the max checkpoint age, current checkpoint age and amount of flushed pages per second:

. . . and also a related graph of how many pages are flushed by different forces (LRU flushing and adaptive flushing). You can receive this data by enabling

innodb_monitor_enable = '%'


From these charts, we can see that with 2GB innodb_log_file_size InnoDB is forced by adaptive flushing to flush (write) more pages, because the current checkpoint age (uncheckpointed bytes) is very close to Max Checkpoint Age. To see the checkpoint age in MySQL, you can use the innodb_metrics table and metrics 





In the case using


, the main flushing is done via LRU flushing (to keep 5000 pages (


) free per buffer pool instance). From the first graph we can figure that uncheckpointed bytes never reach 12GB, so in this case using


 is overkill. We might be fine with innodb_log_file_size=8GB – but we wouldn’t know unless we set the 


 big enough. MySQL 5.7 comes with a very convenient improvement: now it is much easier to change the


, but it still requires a server restart. I wish we could change it online, like we can for


 (I do not see technical barriers for this).

Let’s also look into the InnoDB buffer pool content:

We can see that there are more modified pages in the case with 15GB log files (which is good, as more modified pages means less work done in the background).

And the most interesting question: how does it affect throughput?



, the throughput is about 20% worse. With a 2GB log size, you can see that often zero transactions are processed within one second – this is bad, and says that the flushing algorithm still needs improvements in cases when the checkpoint age is close to or at the async point.

This should make a convincing case that using big


 is beneficial. In this particular case, probably 8GB (with


) would be enough.

What about the recovery time? To measure this, I killed mysqld when the checkpoint age (uncheckpointed bytes) was about 10GB. It appeared to take 20 mins to start mysqld after the crash. In another experiment with 25GB of uncheckpointed bytes, it took 45 mins. Unfortunately, crash recovery in MySQL is still singlethreaded, so it takes a long time to read and apply 10GB worth of changes (even from the fast storage).

We can see that recovery is single-threaded from the CPU usage chart during recovery:


The system uses 2% of the CPU (which corresponds to a single CPU).

In many cases, crash recovery is not a huge concern. People don’t always have to wait for MySQL recovery – since even one minute of downtime can be too long, often the instance fails over to a slave (especially with async replication), or the crashed node just leaves the cluster (if you use Percona XtraDB Cluster).

I would still like to see improvements in this area. Crash recovery is the biggest showstopper for using a big


, and I think it is possible to add parallelism similar to multithreaded slaves into the crash recovery process.

You can find the raw results, scripts and configs here.


Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com