Mar
29
2023
--

Compression Methods in MongoDB: Snappy vs. Zstd


Compression in any database is necessary as it has many advantages, like storage reduction, data transmission time, etc.

Storage reduction alone results in significant cost savings, and we can save more data in the same space. As the amount of data grows, the need for efficient data compression becomes increasingly important to save storage space, reduce I/O overhead, and improve query performance.

In this blog, we will discuss both data and network-level compression offered in MongoDB. We will discuss snappy and zstd for data block and zstd compression in a network.

Percona Server for MongoDB (PSMDB) supports all types of compression and enterprise-grade features for free. I am using PSMDB 6.0.4 here.

Data compression

MongoDB offers various block compression methods used by the WiredTiger storage engine, like snappy, zlib, and zstd.

When data is written to disk, MongoDB compresses it with a specified block compression method and then writes it to disk. When this data block is read, it decompresses it in memory and presents it to the incoming request.

Block compression is a type of compression that compresses data in blocks rather than compressing the entire data set at once. Block compression can improve performance by allowing data to be read and written in smaller chunks.

By default, MongoDB provides a snappy block compression method for storage and network communication.

Snappy compression is designed to be fast and efficient regarding memory usage, making it a good fit for MongoDB workloads. Snappy is a compression library developed by Google.

Benefits of snappy compression in MongoDB:

  1. Fast compression and decompression speeds
  2. Low CPU usage
  3. A streamable format that allows for quick processing
  4. Minimal impact on query performance

Zstandard Compression or zstd, another newer block compression method provided by MongoDB starting for v4.2, provides higher compression rates. Zstd is a compression library that Facebook developed.

Zstd typically offers a higher compression ratio than snappy, meaning that it can compress data more effectively and achieve a smaller compressed size for the same input data.

Benefits of zstd compression in MongoDB:

  1. Higher compression ratios than Snappy
  2. Highly configurable compression levels
  3. Fast compression and decompression speeds
  4. Minimal impact on query performance

To enable zstd block compression, you need to specify the block compressor as “zstd” in the configuration file:

storage:
  engine: wiredTiger
  wiredTiger:
    collectionConfig:
      blockCompressor: zstd
      blockCompressorQuality: 6 #(available since v5.0)
    engineConfig:
      cacheSizeGB: 4

 

In the above example, blockCompressorQuality is set to 6, which is the default.

blockCompressorQuality specifies the level of compression applied when using the zstd compressor. Values can range from 1 to 22.

The higher the specified value for zstdCompressionLevel, the higher the compression which is applied. So, it becomes very important to test for the optimal required use case before implementing it in production.

Here, we are going to test snappy and zstd compression with the following configurations.

Host config: 4vCPU, 14 GB RAM

DB version: PSMDB 6.0.4

OS: CentOS Linux 7

I’ve used mgenerate command to insert a sample document.

mgeneratejs '{"name": "$name", "age": "$age", "emails": {"$array": {"of": "$email", "number": 3}}}' -n 120000000 | mongoimport --uri mongodb://localhost:27017/<db> --collection <coll_name> --mode insert

Sample record:

_id: ObjectId("64195975e40cea62af1be510"),
name: 'Verna Grant',
age: 44,
emails: [ 'guzwev@gizusuzu.mv', 'ba@ewobisrut.tl', 'doz@bi.ag' ]

I’ve created a collection using the below command with a specific block compression method. This does not affect any existing collection or any new collection being created after this.

db.createCollection("user", {storageEngine: {wiredTiger: {configString: "block_compressor=zstd"}}})

If any new collection is created in the default manner, it will always be the default snappy or compression method specified in the mongod config file.

At the time of insert ops, no other queries or DML ops were running in the database.

Snappy

Data size: 14.95GB

Data size after compression: 10.75GB

Avg latency: 12.22ms

Avg cpu usage: 34%

Avg insert ops rate: 16K/s

Time taken to import 120000000 document: 7292 seconds

snappy compression mongodb

Zstd (with default compression level 6)

Data size: 14.95GB

Data size after compression: 7.69GB

Avg latency: 12.52ms

Avg cpu usage: 31.72%

Avg insert ops rate: 14.8K/s

Time taken to import 120000000 document: 7412 seconds

zstd compression mongodb

We can see from the above comparison that we can save almost 3GB of disk space without impacting the CPU or memory.

Network compression

MongoDB also offers network compression.

This can further reduce the amount of data that needs to be transmitted between server and client over the network. This, in turn, requires less bandwidth and network resources, which can improve performance and reduce costs.

It supports the same compression algorithms for network compression, i.e., snappy, zstd, and zlib. All these compression algorithms have various compression ratios and CPU needs.

To enable network compression in mongod and mongos, you can specify the compression algorithm by adding the following line to the configuration file.

net:
compression:
   compressors: snappy

We can also use multiple compression algorithms like

net:
compression:
   compressors: snappy,zstd,zlib

 

The client should also use at least one or the same compression method specified in the config to have data over the network compressed, or the data between the client and server would be uncompressed.

In the below example, I am using a python driver to connect to my server with no compression, and zstd compression algorithm

I am doing simple find ops on the sample record shown above.

This is the outbound data traffic without any compression method

Here we can see data transmitted is around 2.33MB/s:

Now, I’ve enabled zstd compression algorithm in both the server and client

client = pymongo.MongoClient("mongodb://user:pwd@xx.xx.xx.xx:27017/?replicaSet=rs1&authSource=admin&compressors=zstd")

Here we can see data avg outbound transmission is around 1MB/s which is almost a 50% reduction.

Note that network compression can have a significant impact on network performance and CPU usage. In my case, there was hardly anything else running, so I did not see any significant CPU usage.

Conclusion

Choosing between snappy and zstd compression depends on the specific use cases. By understanding the benefits of each algorithm and how they are implemented in MongoDB, you can choose the right compression setting for your specific use case and save some disk space.

Choosing the appropriate compression algorithm is important based on your specific requirements and resources. It’s also important to test your applications with and without network compression to determine the optimal configuration for your use case.

I also recommend using  Percona Server for MongoDB, which provides MongoDB enterprise-grade features without any license, as it is free. You can learn more about it in the blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?

Percona also offers some more great products for MongoDB, like Percona Backup for MongoDBPercona Kubernetes Operator for MongoDB, and Percona Monitoring and Management.

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

 

Download Percona Distribution for MongoDB Today!

Feb
03
2022
--

Optimize SST in Percona XtraDB Cluster with ZSTD Compression

Optimize SST in Percona XtraDB Cluster with ZSTD Compression

Optimize SST in Percona XtraDB Cluster with ZSTD CompressionPercona XtraDB Cluster (PXC) offers a great deal of flexibility when it comes to the state transfer (SST) options (used when a new node is automatically provisioned with data). For many environments, on-the-fly compression capability gives great benefits of saving network bandwidth during the process of sending sometimes terabytes of data. The usual choice for compression here is a built-in Percona XtraBackup compress option (using qpress internally), or options compressor/decompressor for the compression tool of choice. In the second case, the popular option is the gzip or its multi-threaded version pigz, which offers a better compression rate than qpress.

In this writeup, I would like to mention another important compression alternative, which is gaining good popularity recently – zstd.

I decided to do a simple test of various SST settings in terms of compression method and number of parallel threads. Note that my test is limited to basically one hardware scenario and a generic mix of TPCC and sysbench data

The specs of my test box, which I tested with PXC 8.0.252x Qemu-KVM VMs, each has 6GB RAM, 8 vCPUs (i7 11th gen), disk storage on a fast NVMe drive, and 1Gbps virtual network link. Therefore, my goal is only to give some hints and encourage to test various options, as the potential benefit may be quite significant in some environments.

In order to set particular compression, I used the following configuration options, where x means a number of parallel threads. 

  • No compression
[sst]
backup_threads=x

  • qpress used internally by XtraBackup
[sst]
backup_threads=x
[xtrabackup]
compress
parallel=x
compress-threads=x

  • qpress
[sst]
compressor='qpress -io -Tx 1'
decompressor='qpress -dio'
backup_threads=x
[xtrabackup]
parallel=x

  • pigz
[sst]
compressor='pigz -px'
decompressor='pigz -px -d'
backup_threads=x
[xtrabackup]
parallel=x

  • zstd
[sst]
compressor='zstd -1 -Tx'
decompressor='zstd -d -Tx'
backup_threads=x
[xtrabackup]
parallel=x

On each SST test, I measured the complete time of starting the new node, network data received bytes during the SST process by the donor, and data written to the joiner’s disk.

Here are the results:

SST time in seconds
Threads No compression qpress built-in qpress gzip (pigz) zstd
1 102 156 130 976 118
2 92 123 112 474 92
4 85 106 109 258 95
8 86 99 109 182 97
Data received by the joiner during SST [MB]
No compression qpress built-in qpress gzip (pigz) zstd
20762 6122 6138 4041 4148
Data written by the joiner to disk during SST [MB]
20683 26515 20683 20684 20683

And some graphical views for convenience:

SST MySQL

In this test case, the small gain of using multiple threads with no compression or with lightweight compression is due to the fact that the network link and disk IO became the bottleneck faster than the CPU.

The test shows how bad regarding CPU utilization gzip is compared to other compression methods, as CPU was the main bottleneck even with 8 threads here.

Quite excellent results came with zstd, which while offering the same good compression rate as gzip, completely outperforms it in terms of CPU utilization, and all of that with the lowest compression level of “1”!

One thing that needs clarification is the difference between the two methods using qpress (quicklz) compression. When using the compress option for Percona XtraBackup, the tool first compresses each file and sends it with .qp suffix to the joiner. Then, the joiner has to decompress those files before it can prepare the backup. Therefore, it is always a more expensive one as requires more disk space during the process.

Any real-life examples of introducing better compression methods are very welcome in the comments! I wonder if zstd turns out to be as effective in your real use cases.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com