May
02
2019
--

MySQL Memory Management, Memory Allocators and Operating System

memory management mysql bug

memory management mysql bugWhen users experience memory usage issues with any software, including MySQL®, their first response is to think that it’s a symptom of a memory leak. As this story will show, this is not always the case.

This story is about a bug.

All Percona Support customers are eligible for bug fixes, but their options vary. For example, Advanced+ customers are offered a HotFix build prior to the public release of software with the patch. Premium customers do not even have to use Percona software: we may port our patches to upstream for them. But for Percona products all Support levels have the right to have a fix.

Even so, this does not mean we will fix every unexpected behavior, even if we accept that behavior to be a valid bug. One of the reasons for such a decision might be that while the behavior is clearly wrong for Percona products, this is still a feature request.

A bug as a case study

A good recent example of such a case is PS-5312 – the bug is repeatable with upstream and reported at bugs.mysql.com/95065

This reports a situation whereby access to InnoDB fulltext indexes leads to growth in memory usage. It starts when someone queries a fulltext index, grows until a maximum, and is not freed for quite a long time.

Yura Sorokin from the Percona Engineering Team investigated if this is a memory leak and found that it is not.

When InnoDB resolves a fulltext query, it creates a memory heap in the function

fts_query_phrase_search

This heap may grow up to 80MB. Additionally, it has a big number of blocks (

mem_block_t

) which are not always used continuously and this, in turn, leads to memory fragmentation.

In the function

exit

, the memory heap is freed. InnoDB does this for each of the allocated blocks. At the end of the function, it calls

free()

which belongs to one of the memory allocator libraries, such as

malloc

or

jemalloc

. From the

mysqld

point of view, everything is done correctly: there is no memory leak.

However while

free()

should release memory when called, it is not required to return it back to the operating system. If the memory allocator decides that the same memory blocks will be required soon, it may still keep them for the

mysqld

process. This explains why you might see that

mysqld

  still uses a lot of memory after the job is finished and all de-allocations are done.

This in practice is not a big issue and should not cause any harm. But if you need the memory to be returned to the operating system quicker, you could try alternative memory allocators, such as jemalloc. The latter was proven to solve the issue with PS-5312.

Another factor which improves memory management is the number of CPU cores: the more we used for the test, the faster the memory was returned to the operating system. This, probably, can be explained by the fact that if you have multiple CPUs, then the memory allocator can dedicate one of them just for releasing memory to the operating system.

The very first implementation of InnoDB full text indexes introduced this flaw. As our engineer Yura Sorokin found:

Options to fix

We have a few options to fix this:

  1. Change implementation of InnoDB fulltext index
  2. Use custom memory library like jemalloc

Both have their advantages and disadvantages.

Option 1 means we are introducing an incompatibility with upstream, which may lead to strange bugs in future versions. This also means a full rewrite of the InnoDB fulltext code which is always risky in GA versions, used by our customers.

Option 2 means we may hit flaws in the jemalloc library which is designed for performance and not for the safest memory allocation.

So we have to choose between these two not ideal solutions.

Since option 1 may lead to a situation when Percona Server will be incompatible with upstream, we prefer option 2 and look forward for the upstream fix of this bug.

Conclusion

If you are seeing a high memory usage by the

mysqld

process, it is not always a symptom of a memory leak. You can use memory instrumentation in Performance Schema to find out how allocated memory is used. Try alternative memory libraries for better processing of allocations and freeing of memory. Search the user manual for

LD_PRELOAD

to find out how to set it up at these pages here and here.

Jul
23
2014
--

Why TokuDB hates Transparent HugePages

If you try to install the TokuDB storage engine on a modern Linux distribution it might fail with following error message:

2014-07-17 19:02:55 13865 [ERROR] TokuDB will not run with transparent huge pages enabled.
2014-07-17 19:02:55 13865 [ERROR] Please disable them to continue.
2014-07-17 19:02:55 13865 [ERROR] (echo never > /sys/kernel/mm/transparent_hugepage/enabled)

You might be curious why TokuDB refuses to start with Transparent HugePages. Are they not a good thing… allowing smaller kernel page tables and less TLB misses when accessing data in the buffer pool? I was curious, so I asked Tim Callaghan this very question.

This problem originates with TokuDB using jemalloc memory allocator, which uses a particular trick to deal with memory fragmentation. The classical problem with memory allocators is fragmentation – if you allocated a say 2MB chunk from the operating system (typically using mmap),  as the process runs it is likely some of that 2MB memory block will become free but not all of it, hence it can’t be given back to operating system completely. jemalloc uses a clever trick being able to give back portions of memory allocated in such a way through madvise(…, MADV_DONTNEED) call.

Now what happens when you use transparent huge pages? In this case the operating system (and CPU, really) works with pages of a much larger size which only can be unmapped from the address space in its entirety – which does not work when smaller objects are freed which produce smaller free “holes.”

As a result, without being able to free memory efficiently the amount of allocated memory may grow unbound until the process starts to swap out – and in the end being killed by “out of memory” killer at least under some workloads. This is not a behavior you want to see from the database server. As such requiring to disable huge pages is a better choice.

Having said that this is pretty crude requirement/solution – disabling huge pages on complete operating system image to make one application work while others might be negatively impacted. I hope with a future jemalloc version/kernel releases there will be solution where jemalloc simply prevents huge pages usage for its allocations.

Using jemalloc and its approach to remove pages from resident space also makes TokuDB a lot different than typical MySQL instances running Innodb from the process space. With Innodb VSZ and RSS are often close. In fact we often monitor VSZ to ensure it is not excessively large to avoid danger of process starting to swap actively or be killed with OOM killer. TokuDB however often can look like this

[root@smt1 mysql]# ps aux | grep mysqld
mysql 14604 21.8 50.6 12922416 4083016 pts/0 Sl Jul17 1453:27 /usr/sbin/mysqld –basedir=/usr –datadir=/var/lib/mysql –plugin-dir=/usr/lib64/mysql/plugin –user=mysql –log-error=/var/lib/mysql/smt1.pz.percona.com.err –pid-file=/var/lib/mysql/smt1.pz.percona.com.pid
root 28937 0.0 0.0 103244 852 pts/2 S+ 10:38 0:00 grep mysqld

In this case TokuDB is run with defaults on 8GB system – it takes approximately 50% of memory in terms of RSS size, however the VSZ of the process is over 12GB – this is a lot more than memory available.

This is completely fine for TokuDB. If I would not have Transparent HugePages disabled, though, my RSS would be a lot closer to VSZ causing intense swapping or even process killed by OOM killer.

In addition to explaining this to me, Tim Callaghan was also kind enough to share some links on this issue from other companies such as Oracle, NuoDB , Splunk, SAP, SAP(2), which provide more background information on this topic.

The post Why TokuDB hates Transparent HugePages appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com