May
24
2021
--

MongoDB Tuning Anti-Patterns: How Tuning Memory Can Make Things Much Worse

MongoDB Memory Tuning

MongoDB Memory TuningIt’s your busiest day of the year and the website has crawled to a halt and finally crashed… and it was all because you did not understand how MongoDB uses memory and left your system open to cluster instability, poor performance, and unpredictable behavior. Understanding how MongoDB uses memory and planning for its use can save you a lot of headaches, tears, and grief. Over the last 5 years, I have too often been called in to fix what are easily avoided problems. Let me share with you how MongoDB uses Memory, and how to avoid potentially disastrous mistakes when using MongoDB.

In most databases, more data cached in RAM is better. Same in MongoDB. However, cache competes with other memory-intensive processes as well as the kernel ones.

To speed up performance many people simply allocate the resources to the most visible issue. In the case of MongoDB however, sometimes allocating more memory actually hurts performance. How is this possible? The short answer is MongoDB relies on both its internal memory caches as well as the operating system’s cache. The OS cache generally is seen as “Unallocated” by sysadmins, dba’s, and devs. This means they steal memory from the OS and allocate it internally to MongoDB. Why is this potentially a bad thing? Let me explain.

How MongoDB Uses the Memory for Caching Data

Anytime you run a query some pages are copied from the files into an internal memory cache of the mongod process for future reuse. A part of your data and indexes can be cached and retrieved really very fast when needed. This is what the WiredTiger Cache (WTC) does. The goal of the WTC is to store the most frequently and recently used pages in order to provide the fastest access to your data. That’s awesome for improving the performance of the database.

By default, a mongod process uses up to 50% of the available RAM for that cache. Eventually, you can change the size of the WTC using the  storage.wiredTiger.engineConfig.cacheSizeGB configuration variable.

Remember that the data is compressed on disk files while the cache stores instead uncompressed pages.

When the WTC gets close to full, more evictions can happen. Evictions happen when the requested pages are not in the cache and mongod has to drop out existing pages in order to make room and read the incoming pages from the file system. The eviction walk algorithm does a few other things (LRU page list sorting and WT page reconciliation) as well as marking the least recently used pages as available for reuse, and altogether this can cause at some point slowness because of a more intensive IO.

Based on how the WTC works, someone could think it’s a good idea to assign even 80%/90% of the memory to it (if you are familiar with MySQL, it’s the same you do when configuring the Buffer Pool for InnoDB). Most of the time this is a mistake and to understand why let’s see now another way mongod uses the memory.

How MongoDB Uses the Memory for File Buffering

Sudden topic change: we’re going to talk about OS instead for a bit. The OS also caches into the memory normal filesystem disk blocks in order to speed up their retrieval if they are requested multiple times. This feature is provided by the system regardless of which application is using it, and it’s really beneficial when an application needs frequent access to the disk. When the IO operation is triggered, the data can be returned by reading the blocks from the memory instead of accessing the disk for real. Then the request will be served faster. This kind of memory managed by the OS is called cached, as you see in /proc/meminfo. We can also call it “File Buffering”.

# cat /proc/meminfo 
MemTotal:        1882064 kB
MemFree:         1376380 kB
MemAvailable:    1535676 kB
Buffers:            2088 kB
Cached:           292324 kB
SwapCached:            0 kB
Active:           152944 kB
Inactive:         252628 kB
Active(anon):     111328 kB
Inactive(anon):    16508 kB
Active(file):      41616 kB
Inactive(file):   236120 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2097148 kB
SwapFree:        2097148 kB
Dirty:                40 kB
Writeback:             0 kB
AnonPages:        111180 kB
Mapped:            56396 kB
...
[truncated]

Keep in mind that MongoDB relies entirely on the Operating System for file buffering.

On a dedicated server, where running a single mongod process, as long as you use the database more disk blocks will be stored into the memory. In the end, almost all the “cached” + “buffer” fields in the memory stat output shown above will be used exclusively for the disk blocks requested by mongod.

An important thing is that the cached memory saves the disk blocks exactly as they are. Since the disk blocks are compressed into the WT files, also the blocks into the memory are compressed. Because of the compression, you can store really a lot of your MongoDB data and indexes.

Let’s suppose you have a 4x compression ratio, in a 10GB memory file buffer (cached memory) you can store up to 40GB of real data. That’s a lot more, for free.

Putting Things Together

The following picture gives you a rough overview of memory usage.

MongoDB Memory Usage

Suppose we have a dedicated 64GB RAM machine and a 120GB dataset. Because of compression, the database uses around 30GB of storage, assuming a 4x compression ratio, which is quite common.

Without changing anything on the configuration, then around 32GB will be used by the WTC for saving 32GB of uncompressed data. The remaining memory will be used in part by the OS and other applications and let’s say it is 4GB. The remaining RAM is 28GB and it will be mainly used for file buffering. In that 28 GB, we can store almost the entire compressed database. The overall performance of MongoDB will be great because most of the time it won’t read from disk. Only 2GB of the compressed file data are not stored on File Buffering. Or 8GB of the uncompressed 120GB as another way to look at it. So, when there’s an access on a page not amongst the 32GB in the  WTC at that moment the IO will read a disk block most probably from the File Buffer instead of doing real disk access. At least 10x better latency, maybe 100x. That’s awesome.

Multiple mongod on the Same Machine is Bad

As I mentioned, people hate to see that (apparently) unallocated memory on their systems.  Not everyone with that misconception increases the WTC, sometimes they view this as an opportunity to add other mongods on the same box, to use that unused memory.

The multiple mongod processes would like all their disk file content to be cached in memory by the OS too. You can limit the size of the WTC, but you cannot affect the requests to the disk and the file buffering usage. This causes less memory used for the file buffering for any mongod process triggering more real disk IO. In addition, the processes will compete for accessing other resources, like the CPU.

Another problem is that multiple mongod processes make troubleshooting more complicated. It won’t be so simple to identify the root cause of any issue. Which mongod is using more memory for file buffering? Is the other mongod’s slowness affecting the performance of my mongod?

Troubleshooting can be addressed easier on a dedicated machine when running a single mongod.

If one of the mongods gets crazy and uses more CPU time and memory, then all the mongods on the machine will slow down because of fewer resources available in the system.

In the end, never deploy more than one mongod on the same machine. Eventually, you may consider Docker containers. Running mongod in a container you can limit the amount of memory it can use. In such a case do your calculations for how much memory you need in total for the server and how much memory reserve for any container to get the best possible performance for mongod.

It is Not Recommended to Have a Very Large WTC

Increasing the WTC significantly, more than the 50% default, is also a bad habit.

With a larger cache, you can store more uncompressed data but at the same time, you leave a little memory for file buffering. More queries can benefit from the larger WTC but when having evictions mongod could trigger a lot of real disk accesses slowing down the database.

For this reason, in most cases, it is not recommended to increase the WTC higher than the default 50%. The goal is to save enough space for buffering disk blocks into the memory. This can help you to get a very good and more stable performance.

Conclusion

When you think about mongod, you have to consider it as the only process running in the universe. It tries to use as much memory as it can. But there are two caches – the WT cache (uncompressed documents) and the file buffer (of WiredTiger’s compressed files), and performance will be hurt if you starve one for the other.

Never deploy multiple mongods into the same box or at least consider containers. For the WTC, also remember that most of the time the default size (up to 50% of the RAM) works well.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com