High Memory Usage on ProxySQL Server

High Memory Usage on ProxySQL Server

High Memory Usage on ProxySQL ServerProxySQL is a very useful tool for gaining high availability, load balancing, query routing, query caching, query rewriting, multiplexing, and data masking. It is a proven tool and is used largely in production. There can be an instance of ProxySQL using a lot of memory on the server. This post is related to one of the causes that can use memory on the ProxySQL server and how we handle it.


How do we know where the memory is getting used?

Proxysql has a stats database which we can see with an admin login. This database contains metrics gathered by ProxySQL concerning its internal functioning.

This post mostly talks about stats in MySQL query digest in ProxySQL.

Memory used by query digest can grow unlimited, and you can query the memory usage with the below query with an admin login. The memory usage is shown in bytes. 

MySQL [(none)]> select * from stats_memory_metrics;
| Variable_Name                | Variable_Value |
| SQLite3_memory_bytes         | 3732504        |
| jemalloc_resident            | 171573248      |
| jemalloc_active              | 123863040      |
| jemalloc_allocated           | 21793784       |
| jemalloc_mapped              | 237305856      |
| jemalloc_metadata            | 10041824       |
| jemalloc_retained            | 57868288       |
| Auth_memory                  | 835            |
| query_digest_memory          | 177248         |
| mysql_query_rules_memory     | 8410           |
| mysql_firewall_users_table   | 0              |
| mysql_firewall_users_config  | 0              |
| mysql_firewall_rules_table   | 0              |
| mysql_firewall_rules_config  | 329            |
| stack_memory_mysql_threads   | 67108864       |
| stack_memory_admin_threads   | 8388608        |
| stack_memory_cluster_threads | 0              |

This is the test instance. So you don’t see it growing so fast due to idle load. But in the production environment where there are tons of unique queries running from ProxySQL, it can grow fast and be troublesome. Once it restarts, it clears the memory and empties the stats_mysql_query_digest table.

You can check this memory usage using Percona Monitoring and Management (PMM) as well. Let’s see where it shows.

mysql> select * from stats_memory_metrics where variable_name = 'query_digest_memory';
| Variable_Name       | Variable_Value |
| query_digest_memory | 3944           |

Under PMM, you can check “Memory used to store data” under the ProxySQL instance summary ? Memory usage pane.

Here is how it released the memory on restarting and also emptied the query_digest as well.

[root@yunus-shaikh-node3 ~]# systemctl restart proxysql

[root@yunus-shaikh-node3 ~]# mysql -h -u admin -P 6032 -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.30 (ProxySQL Admin Module)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]>  select * from stats_mysql_query_digest;
Empty set (0.00 sec)

MySQL [(none)]> select * from stats_memory_metrics where variable_name='query_digest_memory';
| Variable_Name       | Variable_Value |
| query_digest_memory | 0              |

But restarting ProxySQL is not a good solution for a production environment, so how do we release the memory?

Ways to do it without restarting ProxySQL

1) You can disable mysql_query_digest if you don’t want it. It is not a big problem, so keeping it enabled might benefit from seeing what queries your application is running. ProxySQL table (stats_mysql_query_digest) can reduce the overhead of analyzing slow logs and sorting slow queries. So we don’t recommend it to be disabled as it can be taken care of better.

Below are the steps that can be used to disable it.

set mysql-query_digests = 0;

2) ProxySQL has provided a table stats_mysql_query_digest_reset. If we run a select on it, it will display the contents on stats_mysql_query_digest and clears the table. This can be used if you don’t need the history of stats_mysql_query_digest anymore after reading once. This is similar to restarting ProxySQL as it does not backup anything and just deletes the data in stats_mysql_query_digest.

We had two entries in the digest, and when we selected it from stats_mysql_query_digest_reset then, it cleared the stats_mysql_query_digest table and also released the memory for the same.

MySQL [(none)]> select count(*) from stats_mysql_query_digest;
| count(*) |
| 2        |
1 row in set (0.00 sec)

MySQL [(none)]> select count(*) from stats_mysql_query_digest_reset;
| count(*) |
| 2        |
1 row in set (0.00 sec)

MySQL [(none)]> select count(*) from stats_mysql_query_digest;
| count(*) |
| 0        |
1 row in set (0.01 sec)

MySQL [(none)]> select * from stats_memory_metrics where variable_name='query_digest_memory';
| Variable_Name       | Variable_Value |
| query_digest_memory | 0              |

NOTE – This does not save your mysql_query_digest. It just deletes everything.

3) There is a variable admin-stats_mysql_query_digest_to_disk which allows you to save the query_digest in the history table history_mysql_query_digest after n number of seconds. N is the value you define to variables in seconds.

set admin-stats_mysql_query_digest_to_disk = <No of seconds in which you want to move from memory to disk>;

Every N seconds, it will move the contents from stats_mysql_query_digest to history_mysql_query_digest. So the digest will then be stored on a disk. Memory will be released.

4) The below step will directly store the MySQL digest on DISK. This will stop using the stats_mysql_query_digest table and will only store all data in history_mysql_query_digest.


Other variables that can be used for reducing the query digest in the memory include:

1) mysql-query_digests_keep_comment – This is by default false, which can help to reduce the digest text, and that means the size as well for the query digest. If you need, you can enable it.

2) mysql-query_digests_max_digest_length – This is by default set to 2048. You can reduce it to a minimum of 16. This will result in the queries like:

select * from test
select * from te

The digest text will be limited to 16 characters counting with space and comments in the query. This will result in queries cut down to some extent. Again, it will save memory.

3) mysql-query_digests_grouping_limit – This will convert the queries like;

set mysql-query_digests_grouping_limit=2;

select * from test2 where id IN (1,2,3,4);
select * from test2 where id IN (?,?,...)

4) mysql-query_digests_normalize_digest_text

When set to FALSE (default), ProxySQL will cache the SQL digest and related information in the table stats.stats_mysql_query_digest by the schema.

When this variable is TRUE, queries statistics store digest_text on a different internal hash table. In this way, ProxySQL will be able to normalize data, digest_text is internally stored elsewhere, and it deduplicates data.

When you query stats_mysql_query_digest, the data is merged. This drastically reduces memory usage on setups with many schemas but similar query patterns.

This will save us from restarting ProxySQL every time if the memory usage is high due to mysql_query_digest.


If you see the ProxySQL memory usage going high, you should look at the memory stats table to define where the memory is getting used. If you find that stats_mysql_query_digest is using the memory, then you can implement one of the abovementioned options to release the memory. You might also want to check why your application generates a lot of unique queries to fill the query digest. That can also help you to minimize the memory used by query digest.

The query digest doesn’t always need to cause a problem. If the memory metrics table shows something else, then you would have to go to troubleshoot in that way. If you think there is a memory leak, then you can refer to the below page on detecting memory leaks

In general, the recommended way of setting up the mysql_query_digest for ProxySQL would be to allow it to store in memory which can be faster than storing it on disk. So at some point in time, you would need to clear your memory before it grows. Here you can decide on your requirement if you need to store your query digest for a more extended period, then you can move it to disk. Query digest works so that it does not repeat unique queries in the digest. It only stores distinct queries when it is in one place.

When you move it from memory to disk (i.e. under history_mysql_query_digest), it empties the stats_mysql_query_digest table. So it will start recording all the queries that were also recorded previously and moved to the history table. When you move that data again to the history table, it will duplicate the query_digest there. So plan accordingly.

It would be best to decide how you want to set up your mysql_query_digest as per the requirement. It is always better to review it and move it to disk later. If not needed after a certain time, you can remove it from the disk too.

Proxysql Major Version 2.0


MySQL Memory Management, Memory Allocators and Operating System

memory management mysql bug

memory management mysql bugWhen users experience memory usage issues with any software, including MySQL®, their first response is to think that it’s a symptom of a memory leak. As this story will show, this is not always the case.

This story is about a bug.

All Percona Support customers are eligible for bug fixes, but their options vary. For example, Advanced+ customers are offered a HotFix build prior to the public release of software with the patch. Premium customers do not even have to use Percona software: we may port our patches to upstream for them. But for Percona products all Support levels have the right to have a fix.

Even so, this does not mean we will fix every unexpected behavior, even if we accept that behavior to be a valid bug. One of the reasons for such a decision might be that while the behavior is clearly wrong for Percona products, this is still a feature request.

A bug as a case study

A good recent example of such a case is PS-5312 – the bug is repeatable with upstream and reported at

This reports a situation whereby access to InnoDB fulltext indexes leads to growth in memory usage. It starts when someone queries a fulltext index, grows until a maximum, and is not freed for quite a long time.

Yura Sorokin from the Percona Engineering Team investigated if this is a memory leak and found that it is not.

When InnoDB resolves a fulltext query, it creates a memory heap in the function


This heap may grow up to 80MB. Additionally, it has a big number of blocks (


) which are not always used continuously and this, in turn, leads to memory fragmentation.

In the function


, the memory heap is freed. InnoDB does this for each of the allocated blocks. At the end of the function, it calls


which belongs to one of the memory allocator libraries, such as




. From the


point of view, everything is done correctly: there is no memory leak.

However while


should release memory when called, it is not required to return it back to the operating system. If the memory allocator decides that the same memory blocks will be required soon, it may still keep them for the


process. This explains why you might see that


  still uses a lot of memory after the job is finished and all de-allocations are done.

This in practice is not a big issue and should not cause any harm. But if you need the memory to be returned to the operating system quicker, you could try alternative memory allocators, such as jemalloc. The latter was proven to solve the issue with PS-5312.

Another factor which improves memory management is the number of CPU cores: the more we used for the test, the faster the memory was returned to the operating system. This, probably, can be explained by the fact that if you have multiple CPUs, then the memory allocator can dedicate one of them just for releasing memory to the operating system.

The very first implementation of InnoDB full text indexes introduced this flaw. As our engineer Yura Sorokin found:

Options to fix

We have a few options to fix this:

  1. Change implementation of InnoDB fulltext index
  2. Use custom memory library like jemalloc

Both have their advantages and disadvantages.

Option 1 means we are introducing an incompatibility with upstream, which may lead to strange bugs in future versions. This also means a full rewrite of the InnoDB fulltext code which is always risky in GA versions, used by our customers.

Option 2 means we may hit flaws in the jemalloc library which is designed for performance and not for the safest memory allocation.

So we have to choose between these two not ideal solutions.

Since option 1 may lead to a situation when Percona Server will be incompatible with upstream, we prefer option 2 and look forward for the upstream fix of this bug.


If you are seeing a high memory usage by the


process, it is not always a symptom of a memory leak. You can use memory instrumentation in Performance Schema to find out how allocated memory is used. Try alternative memory libraries for better processing of allocations and freeing of memory. Search the user manual for


to find out how to set it up at these pages here and here.


Linux OS Tuning for MySQL Database Performance

Linux OS tuning for MySQL database performance

Linux OS tuning for MySQL database performanceIn this post we will review the most important Linux settings to adjust for performance tuning and optimization of a MySQL database server. We’ll note how some of the Linux parameter settings used OS tuning may vary according to different system types: physical, virtual or cloud. Other posts have addressed MySQL parameters, like Alexander’s blog MySQL 5.7 Performance Tuning Immediately After Installation. That post remains highly relevant for the latest versions of MySQL, 5.7 and 8.0. Here we will focus more on the Linux operating system parameters that can affect database performance.

Server and Operating System

Here are some Linux parameters that you should check and consider modifying if you need to improve database performance.

Kernel – vm.swappiness

The value represents the tendency of the kernel  to swap out memory pages. On a database server with ample amounts of RAM, we should keep this value as low as possible. The extra I/O can slow down or even render the service unresponsive. A value of 0 disables swapping completely while 1 causes the kernel to perform the minimum amount of swapping. In most cases the latter setting should be OK:

# Set the swappiness value as root
echo 1 > /proc/sys/vm/swappiness
# Alternatively, using sysctl
sysctl -w vm.swappiness=1
# Verify the change
cat /proc/sys/vm/swappiness
# Alternatively, using sysctl
sysctl vm.swappiness
vm.swappiness = 1

The change should be also persisted in /etc/sysctl.conf:

vm.swappiness = 1

Filesystems – XFS/ext4/ZFS

XFS is a high-performance, journaling file system designed for high scalability. It provides near native I/O performance even when the file system spans multiple storage devices.  XFS has features that make it suitable for very large file systems, supporting files up to 8EiB in size. Fast recovery, fast transactions, delayed allocation for reduced fragmentation and near raw I/O performance with DIRECT I/O.

The default options for mkfs.xfs are good for optimal speed, so the simple command:

# Use default mkfs options
mkfs.xfs /dev/target_volume

will provide best performance while ensuring data safety. Regarding mount options, the defaults should fit most cases. On some filesystems you can see a performance increase by adding the noatime mount option to the /etc/fstab.  For XFS filesystems the default atime behaviour is relatime, which has almost no overhead compared to noatime and still maintains sane atime values.  If you create an XFS file system on a LUN that has a battery backed, non-volatile cache, you can further increase the performance of the filesystem by disabling the write barrier with the mount option nobarrier. This helps you to avoid flushing data more often than necessary. If a BBU (backup battery unit) is not present, however, or you are unsure about it, leave barriers on, otherwise you may jeopardize data consistency. With this options on, an /etc/fstab file should look like the one below:

/dev/sda2              /datastore              xfs     defaults,nobarrier
/dev/sdb2              /binlog                 xfs     defaults,nobarrier


ext4 has been developed as the successor to ext3 with added performance improvements. It is a solid option that will fit most workloads. We should note here that it supports files up to 16TB in size, a smaller limit than xfs. This is something you should consider if extreme table space size/growth is a requirement. Regarding mount options, the same considerations apply. We recommend the defaults for a robust filesystem without risks to data consistency. However, if an enterprise storage controller with a BBU cache is present, the following mount options will provide the best performance:

/dev/sda2              /datastore              ext4     noatime,data=writeback,barrier=0,nobh,errors=remount-ro
/dev/sdb2              /binlog                 ext4     noatime,data=writeback,barrier=0,nobh,errors=remount-ro

Note: The data=writeback option results in only metadata being journaled, not actual file data. This has the risk of corrupting recently modified files in the event of a sudden power loss, a risk which is minimised with a presence of a BBU enabled controller. nobh only works with the data=writeback option enabled.


ZFS is a filesystem and LVM combined enterprise storage solution with extended protection vs data corruption. There are certainly cases where the rich feature set of ZFS makes it an essential option to consider, most notably when advance volume management is a requirement. ZFS tuning for MySQL can be a complex topic and falls outside the scope of this blog. For further reference, there is a dedicated blog post on the subject by Yves Trudeau:

Disk Subsystem – I/O scheduler 

Most modern Linux distributions come with noop or deadline I/O schedulers by default, both providing better performance than the cfq and anticipatory ones. However it is always a good practice to check the scheduler for each device and if the value shown is different than noop or deadline the policy can change without rebooting the server:

# View the I/O scheduler setting. The value in square brackets shows the running scheduler
cat /sys/block/sdb/queue/scheduler
noop deadline [cfq]
# Change the setting
sudo echo noop > /sys/block/sdb/queue/scheduler

To make the change persistent, you must modify the GRUB configuration file:

# Change the line:
# to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash elevator=noop"

AWS Note: There are cases where the I/O scheduler has a value of none, most notably in AWS VM instance types where EBS volumes are exposed as NVMe block devices. This is because the setting has no use in modern PCIe/NVMe devices. The reason is that they have a very large internal queue and they bypass the IO scheduler altogether. The setting in this case is none and it is the optimal in such disks.

Disk Subsystem – Volume optimization

Ideally different disk volumes should be used for the OS installation, binlog, data and the redo log, if this is possible. The separation of OS and data partitions, not just logically but physically, will improve database performance. The RAID level can also have an impact: RAID-5 should be avoided as the checksum needed to ensure integrity is costly. The best performance without making compromises to redundancy is achieved by the use of an advanced controller with a battery-backed cache unit and preferably RAID-10 volumes spanned across multiple disks.

AWS Note: For further information about EBS volumes and AWS storage optimisation, Amazon has documentation at the following links:

Database settings

System Architecture – NUMA settings

Non-uniform memory access (NUMA) is a memory design where an SMP’s system processor can access its own local memory faster than non-local memory (the one assigned local to other CPUs). This may result in suboptimal database performance and potentially swapping. When the buffer pool memory allocation is larger than size of the RAM available local to the node, and the default memory allocation policy is selected, swapping occurs. A NUMA enabled server will report different node distances between CPU nodes. A uniformed one will report a single distance:

# NUMA system
numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 65525 MB
node 0 free: 296 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 65536 MB
node 1 free: 9538 MB
node 2 cpus: 16 17 18 19 20 21 22 23
node 2 size: 65536 MB
node 2 free: 12701 MB
node 3 cpus: 24 25 26 27 28 29 30 31
node 3 size: 65535 MB
node 3 free: 7166 MB
node distances:
node   0   1   2   3
  0:  10  20  20  20
  1:  20  10  20  20
  2:  20  20  10  20
  3:  20  20  20  10
# Uniformed system
numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 64509 MB
node 0 free: 4870 MB
node distances:
node   0
  0:  10

In the case of a NUMA system, where numactl shows different distances across nodes, the MySQL variable innodb_numa_interleave should be enabled to ensure memory interleaving. Percona Server provides improved NUMA support by introducing the flush_caches variable. When enabled, it will help with allocation fairness across nodes. To determine whether or not allocation is equal across nodes, you can examine numa_maps for the mysqld process with this script:

# The perl script will report memory allocation per CPU node:
# 3595 is the pid of the mysqld process
perl < /proc/3595/numa_maps
N0        :     16010293 ( 61.07 GB)
N1        :     10465257 ( 39.92 GB)
N2        :     13036896 ( 49.73 GB)
N3        :     14508505 ( 55.35 GB)
active    :          438 (  0.00 GB)
anon      :     54018275 (206.06 GB)
dirty     :     54018275 (206.06 GB)
kernelpagesize_kB:         4680 (  0.02 GB)
mapmax    :          787 (  0.00 GB)
mapped    :         2731 (  0.01 GB)


In this blog post we examined a few important OS related settings and explained how they can be tuned for better database performance.

While you are here …

You might also find value in this recorded webinar Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance


The post Linux OS Tuning for MySQL Database Performance appeared first on Percona Database Performance Blog.


What To Do When MySQL Runs Out of Memory: Troubleshooting Guide

MySQL memory troubleshooting

MySQL memory troubleshootingTroubleshooting crashes is never a fun task, especially if MySQL does not report the cause of the crash. For example, when MySQL runs out of memory. Peter Zaitsev wrote a blog post in 2012: Troubleshooting MySQL Memory Usage with a lots of useful tips. With the new versions of MySQL (5.7+) and performance_schema we have the ability to troubleshoot MySQL memory allocation much more easily.

In this blog post I will show you how to use it.

First of all, there are 3 major cases when MySQL will crash due to running out of memory:

  1. MySQL tries to allocate more memory than available because we specifically told it to do so. For example: you did not set innodb_buffer_pool_size correctly. This is very easy to fix
  2. There is some other process(es) on the server that allocates RAM. It can be the application (java, python, php), web server or even the backup (i.e. mysqldump). When the source of the problem is identified, it is straightforward to fix.
  3. Memory leaks in MySQL. This is a worst case scenario, and we need to troubleshoot.

Where to start troubleshooting MySQL memory leaks

Here is what we can start with (assuming it is a Linux server):

Part 1: Linux OS and config check
  1. Identify the crash by checking mysql error log and Linux log file (i.e. /var/log/messages or /var/log/syslog). You may see an entry saying that OOM Killer killed MySQL. Whenever MySQL has been killed by OOM “dmesg” also shows details about the circumstances surrounding it.
  2. Check the available RAM:
    • free -g
    • cat /proc/meminfo
  3. Check what applications are using RAM: “top” or “htop” (see the resident vs virtual memory)
  4. Check mysql configuration: check /etc/my.cnf or in general /etc/my* (including /etc/mysql/* and other files). MySQL may be running with the different my.cnf (run
    ps  ax| grep mysql


  5. Run
    vmstat 5 5

     to see if the system is reading/writing via virtual memory and if it is swapping

  6. For non-production environments we can use other tools (like Valgrind, gdb, etc) to examine MySQL usage
Part 2:  Checks inside MySQL

Now we can check things inside MySQL to look for potential MySQL memory leaks.

MySQL allocates memory in tons of places. Especially:

  • Table cache
  • Performance_schema (run:
    show engine performance_schema status

      and look at the last line). That may be the cause for the systems with small amount of RAM, i.e. 1G or less

  • InnoDB (run
    show engine innodb status

      and check the buffer pool section, memory allocated for buffer_pool and related caches)

  • Temporary tables in RAM (find all in-memory tables by running:
    select * from information_schema.tables where engine='MEMORY'


  • Prepared statements, when it is not deallocated (check the number of prepared commands via deallocate command by running show global status like ‘
    Com_prepare_sql';show global status like 'Com_dealloc_sql'


The good news is: starting with MySQL 5.7 we have memory allocation in performance_schema. Here is how we can use it

  1. First, we need to enable collecting memory metrics. Run:
    UPDATE setup_instruments SET ENABLED = 'YES'
    WHERE NAME LIKE 'memory/%';
  2. Run the report from sys schema:
    select event_name, current_alloc, high_alloc
    from sys.memory_global_by_current_bytes
    where current_count > 0;
  3. Usually this will give you the place in code when memory is allocated. It is usually self-explanatory. In some cases we can search for bugs or we might need to check the MySQL source code.

For example, for the bug where memory was over-allocated in triggers ( the select shows:

mysql> select event_name, current_alloc, high_alloc from memory_global_by_current_bytes where current_count > 0;
| event_name                                                                     | current_alloc | high_alloc  |
| memory/innodb/buf_buf_pool                                                     | 7.29 GiB      | 7.29 GiB    |
| memory/sql/sp_head::main_mem_root                                              | 3.21 GiB      | 3.62 GiB    |

The largest chunk of RAM is usually the buffer pool but ~3G in stored procedures seems to be too high.

According to the MySQL source code documentation, sp_head represents one instance of a stored program which might be of any type (stored procedure, function, trigger, event). In the above case we have a potential memory leak.

In addition we can get a total report for each higher level event if we want to see from the birds eye what is eating memory:

mysql> select  substring_index(
    ->     substring_index(event_name, '/', 2),
    ->     '/',
    ->     -1
    ->   )  as event_type,
    ->   round(sum(CURRENT_NUMBER_OF_BYTES_USED)/1024/1024, 2) as MB_CURRENTLY_USED
    -> from performance_schema.memory_summary_global_by_event_name
    -> group by event_type
    -> having MB_CURRENTLY_USED>0;
| event_type         | MB_CURRENTLY_USED |
| innodb             |              0.61 |
| memory             |              0.21 |
| performance_schema |            106.26 |
| sql                |              0.79 |
4 rows in set (0.00 sec)

I hope those simple steps can help troubleshoot MySQL crashes due to running out of memory.

Links to more resources that might be of interest

The post What To Do When MySQL Runs Out of Memory: Troubleshooting Guide appeared first on Percona Database Performance Blog.


Chunk Change: InnoDB Buffer Pool Resizing

innodb buffer pool chunk size

Since MySQL 5.7.5, we have been able to resize dynamically the InnoDB Buffer Pool. This new feature also introduced a new variable — innodb_buffer_pool_chunk_size — which defines the chunk size by which the buffer pool is enlarged or reduced. This variable is not dynamic and if it is incorrectly configured, could lead to undesired situations.

Let’s see first how innodb_buffer_pool_size , innodb_buffer_pool_instances  and innodb_buffer_pool_chunk_size interact:

The buffer pool can hold several instances and each instance is divided into chunks. There is some information that we need to take into account: the number of instances can go from 1 to 64 and the total amount of chunks should not exceed 1000.

So, for a server with 3GB RAM, a buffer pool of 2GB with 8 instances and chunks at default value (128MB) we are going to get 2 chunks per instance:

This means that there will be 16 chunks.

I’m not going to explain the benefits of having multiple instances, I will focus on resizing operations. Why would you want to resize the buffer pool? Well, there are several reasons, such as:

  • on a virtual server you can add more memory dynamically
  • for a physical server, you might want to reduce database memory usage to make way for other processes
  • on systems where the database size is smaller than available RAM
  • if you expect a huge growth and want to increase the buffer pool on demand

Reducing the buffer pool

Let’s start reducing the buffer pool:

| innodb_buffer_pool_size | 2147483648 |
| innodb_buffer_pool_instances | 8     |
| innodb_buffer_pool_chunk_size | 134217728 |
mysql> set global innodb_buffer_pool_size=1073741824;
Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
| Variable_name           | Value      |
| innodb_buffer_pool_size | 1073741824 |
1 row in set (0.00 sec)

If we try to decrease it to 1.5GB, the buffer pool will not change and a warning will be showed:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
| Level   | Code | Message                                                                         |
| Warning | 1210 | InnoDB: Cannot resize buffer pool to lesser than chunk size of 134217728 bytes. |
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
| Variable_name           | Value      |
| innodb_buffer_pool_size | 2147483648 |
1 row in set (0.01 sec)

Increasing the buffer pool

When we try to increase the value from 1GB to 1.5GB, the buffer pool is resized but the requested innodb_buffer_pool_size is considered to be incorrect and is truncated:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
| Level   | Code | Message                                                         |
| Warning | 1292 | Truncated incorrect innodb_buffer_pool_size value: '1610612736' |
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
| Variable_name           | Value      |
| innodb_buffer_pool_size | 2147483648 |
1 row in set (0.01 sec)

And the final size is 2GB. Yes! you intended to set the value to 1.5GB and you succeeded in setting it to 2GB. Even if you set 1 byte higher, like setting: 1073741825, you will end up with a buffer pool of 2GB.

mysql> set global innodb_buffer_pool_size=1073741825;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_%size' ;
| Variable_name                 | Value      |
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 2147483648 |
2 rows in set (0.01 sec)

Interesting scenarios

Increasing size in the config file

Let’s suppose one day you get up willing to change or tune some variables in your server, and you decide that as you have free memory you will increase the buffer pool. In this example, we are going to use a server with 

innodb_buffer_pool_instances = 16

  and 2GB of buffer pool size which will be increased to 2.5GB

So, we set in the configuration file:

innodb_buffer_pool_size = 2684354560

But then after restart, we found:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
| Variable_name                 | Value      |
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 4294967296 |
2 rows in set (0.00 sec)

And the error log says:

2018-05-02T21:52:43.568054Z 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 16, chunk size = 128M

So, after we have set innodb_buffer_pool_size in the config file to 2.5GB, the database gives us a 4GB buffer pool, because of the number of instances and the chunk size. What the message doesn’t tell us is the number of chunks, and this would be useful to understand why such a huge difference.

Let’s take a look at how that’s calculated.

Increasing instances and chunk size

Changing the number of instances or the chunk size will require a restart and will take into consideration the buffer pool size as an upper limit to set the chunk size. For instance, with this configuration:

innodb_buffer_pool_size = 2147483648
innodb_buffer_pool_instances = 32
innodb_buffer_pool_chunk_size = 134217728

We get this chunk size:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
| Variable_name                 | Value      |
| innodb_buffer_pool_chunk_size | 67108864   |
| innodb_buffer_pool_size       | 2147483648 |
2 rows in set (0.00 sec)

However, we need to understand how this is really working. To get the innodb_buffer_pool_chunk_size it will make this calculation: innodb_buffer_pool_size / innodb_buffer_pool_instances with the result rounded to a multiple of 1MB.

In our example, the calculation will be 2147483648 / 32 = 67108864 which 67108864%1048576=0, no rounding needed. The number of chunks will be one chunk per instance.

When does it consider that it needs to use more chunks per instance? When the difference between the required size and the innodb_buffer_pool_size configured in the file is greater or equal to 1MB.

That is why, for instance, if you try to set the innodb_buffer_pool_size equal to 1GB + 1MB – 1B you will get 1GB of buffer pool:

innodb_buffer_pool_size = 1074790399
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:26:43.328313Z 0 [Note] InnoDB: Initializing buffer pool, total size = 1G, instances = 16, chunk size = 64M

But if you set the innodb_buffer_pool_size equals to 1GB + 1MB you will get 2GB of buffer pool:

innodb_buffer_pool_size = 1074790400
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:25:48.204032Z 0 [Note] InnoDB: Initializing buffer pool, total size = 2G, instances = 16, chunk size = 64M

This is because it considers that two chunks will fit. We can say that this is how the InnoDB Buffer pool size is calculated:

  if innodb_buffer_pool_size / innodb_buffer_pool_instances < innodb_buffer_pool_chunk_size
    innodb_buffer_pool_chunk_size = roundDownMB(innodb_buffer_pool_size / innodb_buffer_pool_instances)
  innodb_buffer_amount_chunks_per_instance = roundDown(innodb_buffer_pool_size / innodb_buffer_pool_instances / innodb_buffer_pool_chunk_size)
  if innodb_buffer_amount_chunks_per_instance * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size - innodb_buffer_pool_size > 1024*1024
innodb_buffer_pool_size = innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size * innodb_buffer_amount_chunks_per_instance

What is the best setting?

In order to analyze the best setting you will need to know that there is a upper limit of 1000 chunks. In our example with 16 instances, we can have no more than 62 chunks per instance.

Another thing to consider is what each chunk represents in percentage terms. Continuing with the example, each chunk per instance represent 1.61%, which means that we can increase or decrease the complete buffer pool size in multiples of this percentage.

From a management point of view, I think that you might want to consider at least a range of 2% to 5% to increase or decrease the buffer. I performed some tests to see the impact of having small chunks and I found no issues but this is something that needs to be thoroughly tested.

The post Chunk Change: InnoDB Buffer Pool Resizing appeared first on Percona Database Performance Blog.


AWS now offers a virtual machine with over 4TB of memory

 Earlier this year, Amazon’s AWS group said that it was working on bringing instance types with between 4 to 16TB of memory to its users. It’s now starting to fulfill this promise as the company today launched its largest EC2 machine (in terms of memory size) yet: the x1e.32xlarge instance with a whopping 4.19TB of RAM. Previously, EC2’s largest instance only featured just… Read More


Thread_Statistics and High Memory Usage


thread_statisticsIn this blog post, we’ll look at how using thread_statistics can cause high memory usage.

I was recently working on a high memory usage issue for one of our clients, and made some interesting discoveries: high memory usage with no bounds. It was really tricky to diagnose.

Below, I am going to show you how to identify that having thread_statistics enabled causes high memory usage on busy systems with many threads.

Part 1: Issue Background

I had a server with 55.0G of available memory. Percona Server for MySQL version:

Version | 5.6.35-80.0-log Percona Server (GPL), Release 80.0, Revision f113994f31
                 Built On | debian-linux-gnu x86_64

We have calculated approximately how much memory MySQL can use in a worst case scenario for max_connections=250:

mysql> select ((@@key_buffer_size+@@innodb_buffer_pool_size+@@innodb_log_buffer_size+@@innodb_additional_mem_pool_size+@@net_buffer_length+@@query_cache_size)/1024/1024/1024)+((@@sort_buffer_size+@@myisam_sort_buffer_size+@@read_buffer_size+@@join_buffer_size+@@read_rnd_buffer_size+@@thread_stack)/1024/1024/1024*250);
| ((@@key_buffer_size+@@innodb_buffer_pool_size+@@innodb_log_buffer_size+@@innodb_additional_mem_pool_size+@@net_buffer_length+@@query_cache_size)/1024/1024/1024)+((@@sort_buffer_size+@@myisam_sort_buffer_size+@@read_buffer_size+@@join_buffer_size+@@read_rnd |
| 12.445816040039 |
1 row in set (0.00 sec)

So in our case, this shouldn’t be more than ~12.5G.

After MySQL Server has been restarted, it allocated about 7G. After running for a week, it reached 44G:

# Memory #####################################################
       Total | 55.0G
        Free | 8.2G
        Used | physical = 46.8G, swap allocated = 0.0, swap used = 0.0, virtual = 46.8G
      Shared | 560.0k
     Buffers | 206.5M
      Caches | 1.0G
       Dirty | 7392 kB
     UsedRSS | 45.3G
  Swappiness | 60
 DirtyPolicy | 20, 10
 DirtyStatus | 0, 0

# Top Processes ##############################################
 2074 mysql     20   0 53.091g 0.044t   8952 S 126.0 81.7  34919:49 mysqld

I checked everything that could be related to the high memory usage (for example, operating system settings such as Transparent Huge Pages (THP), etc.). But I still didn’t find the cause (THP was disabled on the server). I asked my teammates if they had any ideas.

Part 2: Team Is on Rescue

After brainstorming and reviewing the status, metrics and profiles again and again, my colleague (Yves Trudeau) pointed out that User Statistics is enabled on the server.

User Statistics adds several INFORMATION_SCHEMA tables, several commands, and the


 variable. The tables and commands can be used to better understand different server activity, and to identify the different load sources. Check out the documentation for more information.

mysql> show global variables like 'user%';
| Variable_name | Value |
| userstat      | ON    |
mysql> show global variables like 'thread_statistics%';
| Variable_name                 | Value                     |
| thread_statistics             | ON                        |

Since we saw many threads running, it was a good option to verify this as the cause of the issue.

Part 3: Cause Verification – Did It Really Eat Our Memory?

I decided to apply some calculations, and the following test cases to verify the cause:

  1. Looking at the THREAD_STATISTICS table in the INFORMATION_SCHEMA, we can see that for each connection there is a row like the following:
    mysql> select * from THREAD_STATISTICS limit 1G
    *************************** 1. row ***************************
    THREAD_ID: 3566
    BUSY_TIME: 0
    CPU_TIME: 0
    1 row in set (0,00 sec)
  2. We have 22 columns, each of them BIGINT, which gives us ~ 176 bytes per row.
  3. Let’s calculate how many rows we have in this table at this time, and check once again in an hour:
    mysql> select count(*) from information_schema.thread_statistics;
    | count(*) |
    | 7864343  |
    1 row in set (15.35 sec)

    In an hour:

    mysql> select count(*) from information_schema.thread_statistics;
    | count(*) |
    | 12190801 |
    1 row in set (24.46 sec)
  4. Now let’s check on how much memory is currently in use:

    2096 mysql 20 0 12.164g 0.010t 16036 S 173.4 18.9 2274:51 mysqld
  5. We have 12190801 rows in the THREAD_STATISTICS table, which is ~2G in size.
  6. Issuing the following statement cleans up the statistics:

    mysql> flush thread_statistics;
    mysql> select count(*) from information_schema.thread_statistics;
    | count(*) |
    | 0        |
    1 row in set (00.00 sec)
  7. Now, let’s check again on how much memory is in use:

    ID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2096 mysql 20 0 12.164g 7.855g 16036 S 99.7 14.3 2286:24 mysqld

As we can see, memory usage drops to the approximate value of 2G that we had calculated earlier!

That was the root cause of the high memory usage in this case.


User Statistics (basically Thread_Statistics) is a great feature that allows us to identify load sources and better understand server activity. At the same time, though, it can be dangerous (from the memory usage point of view) to use as a permanent monitoring solution due to no limitations on memory usage.

As a reminder, thread_statistics is NOT enabled by default when you enable User_Statistics. If you have enabled Thread_Statistics for monitoring purposes, please don’t forget to pay attention to it.

As a next step, we are considering submitting a feature request to implement some default limits that can prevent Out of Memory issues on busy systems.


Troubleshooting hardware resource usage webinar: Q & A

InnoDB locks and transaction isolation

Troubleshooting hardware resourceIn this blog, I provide answers to the Q & A for the Troubleshooting hardware resource usage webinar.

First, I want to thank everybody who attended the May 26 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: How did you find the memory IO LEAK?

A: Do you mean the replication bug I was talking about in the webinar? I wrote about this bug here. See also comments in the bug report itself.

Q: Do you have common formulas you use to tune MySQL?

A: There are best practices: relate thread concurrency to number of CPU cores you have, set InnoDB buffer pool size large enough so it can contain all your working dataset (which is not always possible), and do not set the Query Cache size larger than 512MB (or even better, turn it off) to avoid issues with global lock set when it needs to be de-fragmented. I prefer not to call them “formulas,” because all options need to be adjusted to match the workload. If this weren’t the case, MySQL Server would have an automatic configuration. There is also a separate webinar on configuration (Troubleshooting configuration issues) where I discuss these practices.

Q: Slide 11: is this real time? Can we get this info for a job that has already finished?

A: Yes, this is real time. No, it is not possible to get this info for a thread that does not exist.

Q: Slide 11: what do negative numbers mean?

A: Numbers are taken from the 


 field for table


 in Performance Schema. These values, in turn, are calculated as (memory allocated by thread) – (memory freed by thread). Negative numbers here mean either a memory leak or incorrect calculation of memory used by the thread. I reported this behavior in the MySQL Bugs database. Please subscribe to the bug report and wait to see how InnoDB and Performance Schema developers answer.

Q: Are TokuDB memory usage stats recorded in the 


  table also?  Do we have to set something to enable this collection? I ran the query, but it shows 0 for everything.

A: TokuDB currently does not support Performance Schema, thus its memory statistics are not instrumented. See the user manual on how memory instrumentation works.

Q: With disk what we will check for disk I/O?

A: I quite don’t understand the question. Are you asking on which disk we should check IO statistics? For datadir and other disks, look at the locations where MySQL stores data and log files (if you set custom locations).

Q: How can we put CPU in parallel to process multiple client requests? Put multiple requests In memory by parallel way. By defining transaction. Or there any query hints?

A: We cannot directly put CPU in parallel, but we can affect it indirectly by tuning InnoDB threads-related options (

innodb_threads_concurrency, innodb_read_io_threads, innodb_write_io_threads

) and using the thread pool.

Q: Is there any information the Performance Schema that is not found in the SYS schema?

A: Yes. For example, sys schema does not have a view for statistics about prepared statements, while Performance Schema does, because sys schema takes its statement statistics from digest tables (which make no sense for prepared statements).

Q: What is your favorite tool to investigate a memory issue with a task/job that has already finished?

A: I don’t know that there is such a tool suitable for use in production. In a test environment, you can use valgrind or similar tools. You can also make core dumps of the mysqld process and investigate them after the issue is gone.


OOM relation to vm.swappiness=0 in new kernel

I have recently been involved in diagnosing the reasons behind OOM invocation that would kill the MySQL server process. Of course these servers were primarily running MySQL. As such the MySQL server process was the one with the largest amount of memory allocated.

But the strange thing was that in all the cases, there was no swapping activity seen and there were enough pages in the page cache. Ironically all of these servers were CentOS 6.4 running kernel version 2.6.32-358. Another commonality was the fact that vm.swappiness was set to 0. This is a pretty much standard practice and one that is applied on nearly every server that runs MySQL.

Looking into this further I realized that there was a change introduced in kernel 3.5-rc1 that altered the swapping behavior when “vm.swappiness=0″.

Below is the description of the commit that changed “vm.swappiness=0″ behavior, together with the diff:

$ git show fe35004fbf9eaf67482b074a2e032abb9c89b1dd
commit fe35004fbf9eaf67482b074a2e032abb9c89b1dd
Author: Satoru Moriya <>
Date: Tue May 29 15:06:47 2012 -0700
mm: avoid swapping out with swappiness==0
Sometimes we'd like to avoid swapping out anonymous memory. In
particular, avoid swapping out pages of important process or process
groups while there is a reasonable amount of pagecache on RAM so that we
can satisfy our customers' requirements.
OTOH, we can control how aggressive the kernel will swap memory pages with
/proc/sys/vm/swappiness for global and
/sys/fs/cgroup/memory/memory.swappiness for each memcg.
But with current reclaim implementation, the kernel may swap out even if
we set swappiness=0 and there is pagecache in RAM.
This patch changes the behavior with swappiness==0. If we set
swappiness==0, the kernel does not swap out completely (for global reclaim
until the amount of free pages and filebacked pages in a zone has been
reduced to something very very small (nr_free + nr_filebacked < high
Signed-off-by: Satoru Moriya <>
Acked-by: Minchan Kim <>
Reviewed-by: Rik van Riel <>
Acked-by: Jerome Marchand <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 67a4fd4..ee97530 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1761,10 +1761,10 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
* proportional to the fraction of recently scanned pages on
* each list that were recently referenced and in active use.
- ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
+ ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
ap /= reclaim_stat->recent_rotated[0] + 1;
- fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
+ fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
fp /= reclaim_stat->recent_rotated[1] + 1;
@@ -1777,7 +1777,7 @@ out:
unsigned long scan;
 scan = zone_nr_lru_pages(mz, lru);
- if (priority || noswap) {
+ if (priority || noswap || !vmscan_swappiness(mz, sc)) {
scan >>= priority;
if (!scan && force_scan)

This change was merged into the RHEL kernel 2.6.32-303:

* Mon Aug 27 2012 Jarod Wilson <> [2.6.32-303.el6]
- [mm] avoid swapping out with swappiness==0 (Satoru Moriya) [787885]

This obviously changed the way we think about “vm.swappiness=0″. Previously, setting this to 0 was thought to reduce the tendency to swap userland processes but not disable that completely. As such it was expected to see little swapping instead of OOM.

This applies to all RHEL/CentOS kernels > 2.6.32-303 and to other distributions that provide newer kernels such as Debian and Ubuntu. Or any other distribution where this change has been backported as in RHEL.

Let me share with you memory zones related statistics that were logged to the system log from one of the OOM event.

Mar 11 11:01:45 db01 kernel: Node 0 DMA: 4*4kB 2*8kB 2*16kB 0*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15680kB
Mar 11 11:01:45 db01 kernel: Node 0 DMA32: 6*4kB 22*8kB 444*16kB 374*32kB 129*64kB 26*128kB 15*256kB 17*512kB 2*1024kB 0*2048kB 0*4096kB = 45448kB
Mar 11 11:01:45 db01 kernel: Node 0 Normal: 825*4kB 1012*8kB 382*16kB 169*32kB 69*64kB 74*128kB 14*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 42436kB
Mar 11 11:01:45 db01 kernel: 452844 total pagecache pages
Mar 11 11:01:45 db01 kernel: 0 pages in swap cache
Mar 11 11:01:45 db01 kernel: Swap cache stats: add 0, delete 0, find 0/0
Mar 11 11:01:45 db01 kernel: Free swap  = 4128760kB
Mar 11 11:01:45 db01 kernel: Total swap = 4128760kB
Mar 11 11:01:45 db01 kernel: Node 0 DMA free:15680kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 11 11:01:45 db01 kernel: Node 0 DMA32 free:45448kB min:25140kB low:31424kB high:37708kB active_anon:1741812kB inactive_anon:520348kB active_file:4792kB inactive_file:462576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:386328kB writeback:76268kB mapped:936kB shmem:0kB slab_reclaimable:20420kB slab_unreclaimable:6964kB kernel_stack:0kB pagetables:572kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:142592 all_unreclaimable? no
Mar 11 11:01:45 db01 kernel: Node 0 Normal free:42436kB min:42316kB low:52892kB high:63472kB active_anon:3041852kB inactive_anon:643624kB active_file:340156kB inactive_file:1003512kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:5171200kB mlocked:0kB dirty:979444kB writeback:22040kB mapped:15616kB shmem:180kB slab_reclaimable:41052kB slab_unreclaimable:35996kB kernel_stack:2720kB pagetables:19912kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:31552 all_unreclaimable? no

As can be seen the amount of free memory and the amount of memory in the page cache was greater than the high watermark, which prevented any swapping activity. Yet unnecessary memory pressure caused OOM to be invoked which killed the MySQL server process.

MySQL getting OOM’ed is bad for many reasons and can have an undesirable impact such as causing loss of uncommitted transactions or transactions not yet flushed to the log because of innodb_flush_log_at_trx_commit=0, or a much more heavy impact because of cold caches upon restart.

I prefer the old behavior of vm.swappiness and as such I now set it to a value of “1″. Setting vm.swappiness=0 would mean that you will now have to be much more accurate in how you configure the size of various global and session buffers.

The post OOM relation to vm.swappiness=0 in new kernel appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by