Jan
14
2021
--

How to Store MySQL Audit Logs in MongoDB in a Maintenance-Free Setup

Store MySQL Audit Logs in MongoDB

Store MySQL Audit Logs in MongoDBI was once helping one of our customers on how to load MySQL audit logs into a MySQL database and analyze them. But immediately I thought: “Hey, this is not the most efficient solution! MySQL or typical RDBMS, in general, were not really meant to store logs after all.”

So, I decided to explore an alternative – which seemed more sensible to me – and use MongoDB as the storage for logs, for three main reasons:

  • schema-less nature fits well to the audit log nature, where different types of events may use different fields
  • speaks JSON natively and the audit plugin can use JSON format
  • has capped collections feature, which allows avoiding additional maintenance overhead

Just to mention, audit logging is available in MySQL Enterprise Edition but a similar, yet free, solution, is available in Percona Server for MySQL. In both cases, it works by installing the audit log plugin.

Ad Hoc Import

The simplest scenario is to just set the audit log format to JSON:

audit_log_format = JSON

And as soon as it collects some data, import the log file into MongoDB collection via the mongoimport command, like this:

# mongoimport --username percona --password P3rc0n4 --host 10.95.83.225 --port 27017 --db auditlogs --collection audit1 --file /var/lib/mysql/audit.log
2020-12-31T16:24:43.782+0000 connected to: 10.95.83.225:27017
2020-12-31T16:24:44.316+0000 imported 25462 documents

mongo > db.audit1.countDocuments({})
25462

Of course, this works, but I prefer an automated solution, so I looked at available options for live-streaming the logs.

Syslog

The first thing that looked useful is the ability to send the audit log directly to syslog instead of a file. Knowing that both rsyslog, as well as syslog-ng, have MongoDB output modules, it felt like a very easy approach. So I installed the rsyslog-mongodb module package on my test Ubuntu VM with running Percona Server for MySQL, configured audit log with:

[mysqld]
audit_log_handler = syslog
audit_log_format = JSON

Rsyslog (version 8.2) example configuration with:

# cat /etc/rsyslog.d/49-ship-syslog.conf
action(type="ommongodb"
uristr="mongodb://percona:P3rc0n4@10.95.83.225:27017/?authSource=auditlogs"
db="auditlogs" collection="mysql_node1_log")

This worked, however, inserted documents looked like this:

mongo > db.mysql_node1_log.findOne().pretty()
{
"_id" : ObjectId("5fece941f17f487c7d1d158b"),
"msg" : " {\"audit_record\":{\"name\":\"Connect\",\"record\":\"7_1970-01-01T00:00:00\",\"timestamp\":\"2020-12-30T20:55:29Z\",\"connection_id\":\"9\",\"status\":0,\"user\":\"root\",\"priv_user\":\"root\",\"os_login\":\"root\",\"proxy_user\":\"\",\"host\":\"localhost\",\"ip\":\"\",\"db\":\"\"}}"
}

Basically, because of syslog escaping the double quote symbols, the whole audit record appears as a single string inside MongoDB collection, instead of a JSON object. No matter what I tried, like custom templates and property values in rsyslog, I could not disable escaping. Therefore, although feeding MongoDB with audit logs works this way, it becomes pretty useless when it comes to analyzing the logs later. The same issue applies to syslog-ng and the syslog-ng-mod-mongodb module. And since MongoDB does not offer before-insert triggers, I could not easily “fix” the inserted data on the fly.

Fluentd For The Rescue!

This forced me to look for alternative solutions. One of them would be using FIFO file and tail the audit log continuously to feed it, and then read from it to insert logs to mongodb. I wanted a more robust way, though, and decided to try Fluentd instead. It was created as a versatile log collector machine, highly flexible, prepared to work with many different applications out of the box, but most importantly, it is an open source project and speaks JSON natively. Making it to do the job I wanted turned out to be easier than I expected.

Here is what I did:

  • Installed the Fluentd package (I chose td-agent variant here for an even easier user experience)
  • Installed MongoDB plugin for Fluentd with (don’t use the usual ‘gem install’ here):
td-agent-gem install fluent-plugin-mongo

  • Configured audit log as a source and output directive for MongoDB:
# cat /etc/td-agent/td-agent.conf
####
...
<source>
 @type tail
 path /var/lib/mysql/audit.log
 pos_file /var/log/td-agent/audit.access_log.pos
 <parse>
  @type json
 </parse>
 tag mongo.audit.log
</source>
<match mongo.audit.log>
 @type mongo
 database auditlogs #(required)
 collection audit_log #(optional; default="untagged")
 capped
 capped_size 100m
 host 10.95.83.225 #(optional; default="localhost")
 port 27017 #(optional; default=27017)
 user percona
 password P3rc0n4
 <buffer>
  flush_interval 1s
 </buffer>
</match>

  • Added the user used by Fluentd to mysql group to allow it to read from the audit log:
# id td-agent
uid=114(td-agent) gid=121(td-agent) groups=121(td-agent)
# usermod -a -G mysql td-agent
# id td-agent
uid=114(td-agent) gid=121(td-agent) groups=121(td-agent),120(mysql)

[mysqld]
audit_log_handler = file
audit_log_format = JSON
audit_log_file = audit.log
audit_log_rotate_on_size = 10M
audit_log_rotations = 3

  • Restarted both services to apply changes:
# systemctl restart mysql
# systemctl restart td-agent

  • Checked the Fluentd log to see if it reads the audit log as expected, also for when Percona Server for MySQL rotates it:
# tail -f /var/log/td-agent/td-agent.log
2020-12-31 02:41:39 +0000 [info]: adding match pattern="mongo.audit.log" type="mongo"
...
2020-12-31 02:41:40 +0000 [info]: #0 following tail of /var/lib/mysql/audit.log
...
2020-12-31 02:52:14 +0000 [info]: #0 detected rotation of /var/lib/mysql/audit.log; waiting 5 seconds
2020-12-31 02:52:14 +0000 [info]: #0 following tail of /var/lib/mysql/audit.log

  • Ran sysbench against MySQL instance and verified the new collection in MongoDB gets updated:
mongo > db.audit_log.countDocuments({})
281245

mongo > db.audit_log.stats()
{
 "ns" : "auditlogs.audit_log",
 "size" : 104857293,
 "count" : 281245,
 "avgObjSize" : 372,
 "storageSize" : 26357760,
 "capped" : true,
 "max" : -1,
 "maxSize" : 104857600,
(...)

Yay, it works like a charm! Not only are the audit logs rotated automatically on Percona Server for MySQL, but also on MongoDB the destination collection size cap works as well, so I am safe when it comes to disk space on both hosts!

Here, there is a little caveat – if for some reason you drop the destination collection manually on MongoDB, incoming inserts will make it re-created without the capped setting! Therefore, either let the collection be created by Fluentd on its service startup or create it manually with a capped setting, and don’t drop it later.

Now, we can try some example aggregations to get some useful audit stats:

mongo > db.audit_log.aggregate([ { $group: { _id: {name: "$audit_record.name", command: "$audit_record.command_class"}, count: {$sum:1}}}, { $sort: {count:-1}} ])
{ "_id" : { "name" : "Execute", "command" : "error" }, "count" : 267086 }
{ "_id" : { "name" : "Query", "command" : "begin" }, "count" : 14054 }
{ "_id" : { "name" : "Close stmt", "command" : "error" }, "count" : 76 }
{ "_id" : { "name" : "Query", "command" : "show_variables" }, "count" : 7 }
{ "_id" : { "name" : "Query", "command" : "select" }, "count" : 6 }
{ "_id" : { "name" : "Quit" }, "count" : 5 }
{ "_id" : { "name" : "Query", "command" : "show_tables" }, "count" : 4 }
{ "_id" : { "name" : "Init DB", "command" : "error" }, "count" : 2 }
{ "_id" : { "name" : "Field List", "command" : "show_fields" }, "count" : 2 }
{ "_id" : { "name" : "Query", "command" : "show_databases" }, "count" : 2 }
{ "_id" : { "name" : "Connect" }, "count" : 1 }

mongo > db.audit_log.aggregate([ { $match: { "audit_record.status": {$gt: 0} } }, { $group: { _id: {command_class: "$audit_record.command_class", status: "$audit_record.status"}, count: {$sum:1}}}, { $sort: {count:-1}} ])
{ "_id" : { "command_class" : "error", "status" : 1049 }, "count" : 2 }
{ "_id" : { "command_class" : "show_tables", "status" : 1046 }, "count" : 2 }
{ "_id" : { "command_class" : "create_table", "status" : 1050 }, "count" : 2 }
{ "_id" : { "command_class" : "drop_table", "status" : 1051 }, "count" : 2 }
{ "_id" : { "command_class" : "drop_table", "status" : 1046 }, "count" : 2 }
{ "_id" : { "command_class" : "create_table", "status" : 1046 }, "count" : 1 }
{ "_id" : { "command_class" : "create_table", "status" : 1113 }, "count" : 1 }

References

https://www.percona.com/doc/percona-server/LATEST/management/audit_log_plugin.html
https://dev.mysql.com/doc/refman/8.0/en/audit-log.html
https://www.rsyslog.com/doc/v8-stable/configuration/modules/ommongodb.html
https://docs.fluentd.org/output/mongo

Jan
14
2021
--

Thinking About Deploying MongoDB? Read This First.

Deploying MongoDB

Deploying MongoDBAre you thinking about deploying MongoDB? Is it the right choice for you?

Choosing a database is an important step when designing an application. A wrong choice can have a negative impact on your organization in terms of development and maintenance. Also, the wrong choice can lead to poor performance.

Generally speaking, any kind of database can manage any kind of workload, but any database has specific workloads that fit better than others.

You don’t have to consider MongoDB just because it’s cool and there’s already a lot of companies using it. You need to understand if it fits properly with your workload and expectations. So, choose the right tool for the job.

In this article, we are going to discuss a few things you need to know before choosing and deploying MongoDB.

MongoDB Manages JSON-style Documents and Developers Appreciate That

The basic component of a MongoDB database is a JSON-style document. Technically it is BSON, which contains some extra datatypes (eg. datetime) that aren’t legit JSON.

We can consider the document the same as a record for a relational database. The documents are put into a collection, the same concept as a relational table.

JSON-style documents are widely used by a lot of programmers worldwide to implement web services, applications, and exchange data. Having a database that is able to manage that data natively is really effective.

MongoDB is often appreciated by developers because they can start using it without having specific knowledge about database administration and design and without studying a complex query language. Indeed, the MongoDB query language is also represented by JSON documents.

The developers can create, save, retrieve, and update their JSON-style documents at ease. Great! This leads usually to a significant reduction in development time.

MongoDB is Schemaless

Are you familiar with relational databases? For sure you are, as relational databases are used and studied for such a long time at school and at university. Relational databases are the most widely used in the market nowadays.

You know that a relational schema needs a predefined and fixed structure for the tables. Any time you add or change a column you need to run a DDL query and additional time is necessary to also change your application code to manage the new structure. In the case of a massive change that requires multiple column changes and/or the creation of new tables, the application changes could be impressive. MongoDB’s lack of schema enforcement means none of that is required. You just insert a document in a collection and that’s all. Let suppose that you have a collection with user data. If at some point you need to add for example the new “date_of_birth” field, you simply start to insert the new JSON documents with the additional field. That’s all. No need to change anything on the schema.

You can insert into the same collection even completely different JSON documents, representing different entities. Well, this is technically feasible, but not recommended, anyway.

MongoDB greatly shortens the cycle of application development for a non-technology reason – it removes the need to coordinate a schema change migration project with the DBA team. There is no need to wait until the DBA team does a QA dress-rehearsal and then the production release (with rollback plans) that, often as not, requires some production downtime.

MongoDB Has No Foreign Keys, Stored Procedures, or Triggers. Joins Are Supported, but Untypical.

The database design requires SQL queries to be able to join multiple tables on specific fields. Also, the database design may require foreign keys for assuring the consistency of the data and for running automatic changes on semantically connected fields.

What about stored procedures? They can be useful to embed into the database some application logic to simplify some tasks or to improve the security.

And what about triggers? They are useful to automatically “trigger” changes on the data based on specific events, like adding/changing/deleting a row. They help to manage the consistency of the data and, in some cases, to simplify the application code.

Well, none of them is available on MongoDB. So, be aware of that.

Note: to be honest, there’s an aggregation stage that can implement the same of a LEFT JOIN, but this is the only case.

How to survive without JOIN?

Managing JOINs must be done on your application code. If you need to join two collections, you need to read the first one, selects the join field and use it for querying the second collection, and so on. Well, this seems to be expensive in terms of application development, and also this could lead to more queries executed. Indeed it is, but the good news is that in many cases you don’t have to manage the joins at all.

Remember that MongoDB is a schemaless database; it doesn’t require normalization. If you are able to properly design your collections, you can embed and duplicate data into a single collection without the need of creating an additional collection. This way you won’t need to run any join because all the data you need is already into one collection only.

Foreign keys are not available, but as long as you can embed multiple documents into the same collection, you don’t really need them.

Stored procedures can be implemented easily as external scripts you can write in your preferred language. Triggers can be implemented externally the same way, but with the help of the Change Stream API feature connected to a collection.

If you have a lot of collections with referenced fields, you have to implement in your code a lot of joins or you have to do a lot of checks to assure consistency. This is possible but at a higher cost in terms of development. MongoDB could be the wrong choice in such a case.

MongoDB Replication and Sharding Are Easy to Deploy

MongoDB was natively designed not as a standalone application. It was designed instead to be a piece of a larger puzzle. A mongod server is able to work together with other mongod instances in order to implement replication and sharding efficiently and without the need for any additional third-party tool.

A Replica Set is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability by design. With caveats regarding potentially stale data, you also get read scalability for free. It should be the basis for all production deployments.

The Sharding Cluster is deployed as a group of several Replica Sets with the capability to split and distribute the data evenly on them. The Sharding Cluster provides write scalability in addition to redundancy, high availability, and read scalability. The sharding topology is suitable for the deployment of very large data sets. The number of shards you can add is, in theory, unlimited.

Both the topologies can be upgraded at any time by adding more servers and shards. More importantly, no changes are required for the application since each topology is completely transparent from the application perspective.

Finally, the deployment of such topologies is straightforward. Well, you need to spend some time in the beginning to understand a few basic concepts, but then, in a matter of a few hours, you can deploy even a very large sharded cluster. In the case of several servers, instead of doing everything manually, you can automatize a lot of things using Ansible playbooks or other similar tools.

Further readings:

Deploy a MongoDB Replica Set with Transport Encryption (Part 1)

MongoDB Sharding 101 Webinar

MongoDB Has Indexes and They Are Really Important

MongoDB allows you to create indexes on the JSON document’s fields. Indexes are used the same way as a relational database. They are useful to solve queries faster and to decrease the usage of machine resources: memory, CPU time, and disk IOPS.

You should create all the indexes that will help any of the regularly executed queries, updates, or deletes from your application.

MongoDB has a really advanced indexing capability. It provides TLL indexes, GEO Spatial indexes, indexes on array elements, partial and sparse indexes. If you need more details about the available index types, you can take a look at the following articles:

MongoDB Index Types and MongoDB explain() (part 1)

Using Partial and Sparse Indexes in MongoDB

Create all the indexes you need for your collections. They will help you a lot to improve the overall performance of your database.

MongoDB is Memory Intensive

MongoDB is memory intensive; it needs a lot. This is the same for many other databases. Memory is the most important resource, most of the time.

MongoDB uses the RAM for caching the most frequently and recently accessed data and indexes. The larger this cache, the better the overall performance will be, because MongoDB will be able to retrieve a lot of data faster. Also, MongoDB writes are only committed to memory before client confirmation is returned, at least by default. Writes to disk are done asynchronously – first to the journal file (typically within 50ms), and later into the normal data files (once per min).

The widely used storage engine used by MongoDB is WiredTiger. In the past there was MMAPv1, but it is no longer available on more recent versions. The WiredTiger storage engine uses an important memory cache (the WiredTiger Cache) for caching data and indexes.

Other than using the WTCache, MongoDB relies on the OS file system caches for accessing the disk pages. This is another important optimization, and significant memory may be required also for that.

In addition, MongoDB needs memory for managing other stuff like client connections, in-memory sortings, saving temporary data when executing aggregation pipelines, and other minor things.

In the end, be prepared to provide enough memory to MongoDB.

But how much memory should I need? The rule of thumb is evaluating the “working set” size.

The “working set” is the amount of data that is most frequently requested by your application. Usually, an application needs a limited amount of data, it doesn’t need to read the entire data set during normal operations. For example, in the case of time-series data, most probably you need to read only the last few hours or the last few day’s entries. Only on a few occasions will you need to read legacy data. In such a case, your working set is the one that can store just a few days of data.

Let’s suppose your data set is 100GB and you evaluated your working set is around 20%, then you need to provide at least 20GB for the WTCache.

Since MongoDB uses by default 50% of the RAM for the WTCache (we usually suggest not to increase it significantly), you should provide around 40GB of memory in total for your server.

Every case is different and sometimes it could be difficult to evaluate correctly the working set size. Anyway, the main recommendation is that you should spend a significant part of your budget to provide the larger memory you can. For sure, this is will be beneficial for MongoDB.

What Are the Suitable Use Cases for MongoDB?

Actually, a lot. I have seen MongoDB deployed on a wide variety of environments.

For example, MongoDB is suitable for:

  • events logging
  • content management
  • gaming applications
  • payment applications
  • real-time analytics
  • Internet Of Things applications
  • content caching
  • time-series data applications

And many others.

We can say that you can use MongoDB basically for everything, it is a general-purpose database. The key point is instead the way you use it.

For example, if you plan to use MongoDB the same way as a relational database, with data normalized, a lot of collections around, and a myriad of joins to be managed by the application, then MongoDB is not the right choice for sure. Use a relational database.

The best way to use MongoDB is to adhere to a few best practices and modeling the collections keeping in mind some basic rules like embedding documents instead of creating multiple referenced collections.

Percona Server for MongoDB: The Enterprise-Class Open Source Alternative

Percona develops and deploys its own open source version of MongoDB: the Percona Server for MongoDB (PSMDB).

PSMDB is a drop-in replacement for MongoDB Community and it is 100% compatible. The great advantage provided by PSMDB is that you can get enterprise-class features for free, like:

  • encryption at the rest
  • audit logging
  • LDAP Authentication
  • LDAP Authorization
  • Log redaction
  • Kerberos Authentication
  • Hot backup
  • in-memory storage engine

Without PSMDB all these advanced features are available only in the MongoDB Enterprise subscription.

Please take a look at the following links for more details about PSMDB:

Percona Server for MongoDB Feature Comparison

Percona Server for MongoDB

Remember you can get in touch with Percona at any time for any details or for getting help.

Conclusion

Let’s have a look at the following list with the more important things you need to check before choosing MongoDB as the backend database for your applications. The three colored flags indicate if MongoDB is a good choice: red means it’s not, orange means it could be a good choice but with some limitations or potential bottlenecks, green means it’s very good.

Your applications primarily deal with JSON documents
Your data has unpredictable and frequent schema changes during the time
You have several collections with a lot of external references for assuring consistency and the majority of the queries need joins
You need to replicate stored procedures and triggers you have in your relational database
You need HA and read scalability
You need to scale your data to a very large size
You need to scale because of a huge amount of writes

 

And finally, remember the following:

Take a look at Percona Server for MongoDB 

Jan
13
2021
--

Percona 2020 Recap: Great Content and Software Releases

Percona 2020 content and releases

Percona 2020 content and releasesThe Percona team provided the community with some excellent content and several new releases in 2020. I wanted to highlight some of your favorites (based on popularity) if you missed them.

First up is our most-read blog from last year, which ironically was published before 2020. Ananias Tsalouchidis’s blog on when you should use Aurora and when should you use RDS MYSQL continued to attract readers all year long. People don’t always understand the key differences between the two, so having a guide is great and timely for many.

What about the most read blogs or watched videos published in 2020?

PostgreSQL Takes Our Most-Read Spot of 2020

The Percona blog is known for its great in-depth MySQL coverage, but experts in the MongoDB and PostgreSQL space have also written some quality content over the last few years. It is exciting to see that the most popular blog published last year was outside of MySQL: Ibrar Ahmed’s deep dive into handling null values in PostgreSQL.

Interested in the top six PostgreSQL reads from 2020? Here they are:

We also had some fantastic conference talks this year you may want to check out. Here are the most-watched PostgreSQL videos of 2020:

Awesome PostgreSQL talks and blogs from the community:

Our Percona University Online posted its first PostgreSQL training last year; if you are looking for a deeper understanding of indexes (and who isn’t), check out our training, Deep Dive Into PostgreSQL Indexes.

MySQL is Still as Popular as Ever

Even though PostgreSQL took this year’s top spot, not too far behind was a great blog series by our CEO Peter Zaitsev on solving MySQL bottlenecks. His three-part series, 18 things you can do to remove MySQL Bottlenecks caused by high traffic, was not only highly read, but it also spawned one of the most-watched webinars of the year. Scalability and performance are critical to any application and can mean life or death for any application. A vital read and a great link to bookmark for when you have one of those odd performance issues you can not seem to find!

Interested in the top five MySQL reads from 2020? Here they are:

Interested in watching some outstanding MySQL sessions? Check out some of the most-watched MySQL sessions of 2020:

Awesome MySQL talks and blogs from the community:

Our Percona University Online posted its first MySQL training; if you are looking at how to upgrade to MySQL 8, it is worth watching. Check out the training, How to Upgrade to MySQL 8.0.

The Staying Power of MongoDB is Undeniable

MongoDB growth in 2020 was undeniable, which is why it’s no surprise that another one of our top blogs was on MongoDB. Percona most-read tech blog on MongoDB published in 2020 was Vinicius Grippa’s must-read work outlining the best practices for running MongoDB. If you are new or old to MongoDB, it is worth reading and double-checking to ensure you have MongoDB optimized.

Interested in the top five MongoDB reads from 2020? Here they are:

Interested in watching some MongoDB sessions? Check out some of the most-watched MongoDB sessions of 2020:

Awesome MongoDB talks and blogs from the community:

More Popular Blogs and Discussions

Sometimes topics cross databases and delve into general advice. Let’s look at some of the more popular talks and blogs that are not tied to a specific database.

If you like videos, you may want to check out these great Percona Live Sessions from last year:

Other Popular Blogs:

Finally, Some Great Percona Software Released This Year

Here is the list of interesting software changes and news on Percona software in 2020:

Percona Distributions for MongoDB and MySQL:

  • What are Percona distributions? We take the best components from the community and ensure they work together. This way, you know your backup, HA, monitoring, etc., will all work together seamlessly.

Percona XtraDB Cluster 8.0 (PXC) was released, with improved performance, scalability, and security. Long sought after features include:

  • Streaming replication to support larger transactions
  • More granular and improved logging and troubleshooting options
  • Multiple system tables help find out more about the state of the cluster state.
  • Percona XtraDB Cluster 8.0 now automatically launches the upgrade as needed (even for minor releases), avoiding manual intervention and simplifying operation in the cloud.

Percona Distribution for PostgreSQL 13. Version 13 of PostgreSQL was a leap forward, and our distribution was updated to support all the additional functionality. Better indexing, better performance, and better security! Sign me up!

Percona Monitoring And Management (PMM) jumped forward from 2.2 to 2.13 adding some very cool features like:

  • Alert manager integration and integrated alerting
  • A brand new Query Analyzer with awesome features to allow you to find problem queries quicker and more efficiently
  • Enhanced metrics for AWS RDS monitoring
  • Added support for External Exporters so you can monitor 3rd party and custom services through the installed PMM-agent
  • New security threat tool allows for alerts and visibility into the most common security issues
  • Support for group replication
  • Better MongoDB and PostgreSQL monitoring
  • Better support for larger environments (Monitor More Stuff Faster)
  • Plus a ton of misc small enhancements!

Percona Kubernetes Operator for Percona XtraDB Cluster continued to evolve with several new features helping users build their own DYI DBaaS:

  • Auto-Tuning MySQL Parameters
  • Integration with Percona Monitoring and Management
  • Full data encryption at rest
  • Support for Percona XtraDB Cluster 8.0
  • Support for the latest version of Open Shift and Amazon’s Elastic Container Service
  • Dual support for ProxySQL and HA Proxy
  • Automated minor upgrades
  • Clone backups to set up a new PXC cluster on a different Kubernetes cluster

Percona Kubernetes Operator for Percona Server for MongoDB added several features, including:

  • Support for Percona Server for MongoDB 4.4
  • Automated management of system users
  • Support for the latest version of Open Shift and Amazon’s Elastic Container Service
  • Automated minor upgrades

While 2020 was far from the best year for many of us and we are glad it is behind us, it did generate some good content that we can use in 2021 and going forward to help us better manage and run our databases. Thanks for reading and happy database tuning!

Jan
08
2021
--

MongoDB 101: How to Tune Your MongoDB Configuration After Upgrading to More Memory

MongoDB configuration

MongoDB configurationIn this post, we will be discussing what to do when you add more memory to your MongoDB deployment, a common practice when you are scaling resources.

Why Might You Need to Add More Memory?

Scaling resources is a way of adding more resources to your environment.  There are two main ways this can be accomplished: vertical scaling and horizontal scaling.

  • Vertical scaling is increasing hardware capacity for a given instance, thus having a more powerful server.
  • Horizontal scaling is when you add more servers to your architecture.   A pretty standard approach for horizontal scaling, especially for databases,  is load balancing and sharding.

As your application grows, working sets are getting bigger, and thus we start to see bottlenecks as data that doesn’t fit into memory has to be retrieved from disk. Reading from disk is a costly operation, even with modern NVME drives, so we will need to deal with either of the scaling solutions we mentioned.

In this case, we will discuss adding more RAM, which is usually the fastest and easiest way to scale hardware vertically, and how having more memory can be a major help for MongoDB performance.

How to Calculate Memory Utilization in MongoDB

Before we add memory to our MongoDB deployment, we need to understand our current Memory Utilization.  This is best done by querying serverStatus and requesting data on the WiredTiger cache.

Since MongoDB 3.2, MongoDB has used WiredTiger as its default Storage Engine. And by default, MongoDB will reserve 50% of the available memory – 1 GB for the WiredTiger cache or 256 MB whichever is greater.

For example, a system with 16 GB of RAM, would have a WiredTiger cache size of 7.5 GB.

( 0.5 * (16-1) )

The size of this cache is important to ensure WiredTiger is performant. It’s worth taking a look to see if you should alter it from the default. A good rule is that the size of the cache should be large enough to hold the entire application working set.

How do we know whether to alter it? Let’s look at the cache usage statistics:

db.serverStatus().wiredTiger.cache
{
"application threads page read from disk to cache count" : 9,
"application threads page read from disk to cache time (usecs)" : 17555,
"application threads page write from cache to disk count" : 1820,
"application threads page write from cache to disk time (usecs)" : 1052322,
"bytes allocated for updates" : 20043,
"bytes belonging to page images in the cache" : 46742,
"bytes belonging to the history store table in the cache" : 173,
"bytes currently in the cache" : 73044,
"bytes dirty in the cache cumulative" : 38638327,
"bytes not belonging to page images in the cache" : 26302,
"bytes read into cache" : 43280,
"bytes written from cache" : 20517382,
"cache overflow score" : 0,
"checkpoint blocked page eviction" : 0,
"eviction calls to get a page" : 5973,
"eviction calls to get a page found queue empty" : 4973,
"eviction calls to get a page found queue empty after locking" : 20,
"eviction currently operating in aggressive mode" : 0,
"eviction empty score" : 0,
"eviction passes of a file" : 0,
"eviction server candidate queue empty when topping up" : 0,
"eviction server candidate queue not empty when topping up" : 0,
"eviction server evicting pages" : 0,
"eviction server slept, because we did not make progress with eviction" : 735,
"eviction server unable to reach eviction goal" : 0,
"eviction server waiting for a leaf page" : 2,
"eviction state" : 64,
"eviction walk target pages histogram - 0-9" : 0,
"eviction walk target pages histogram - 10-31" : 0,
"eviction walk target pages histogram - 128 and higher" : 0,
"eviction walk target pages histogram - 32-63" : 0,
"eviction walk target pages histogram - 64-128" : 0,
"eviction walk target strategy both clean and dirty pages" : 0,
"eviction walk target strategy only clean pages" : 0,
"eviction walk target strategy only dirty pages" : 0,
"eviction walks abandoned" : 0,
"eviction walks gave up because they restarted their walk twice" : 0,
"eviction walks gave up because they saw too many pages and found no candidates" : 0,
"eviction walks gave up because they saw too many pages and found too few candidates" : 0,
"eviction walks reached end of tree" : 0,
"eviction walks started from root of tree" : 0,
"eviction walks started from saved location in tree" : 0,
"eviction worker thread active" : 4,
"eviction worker thread created" : 0,
"eviction worker thread evicting pages" : 902,
"eviction worker thread removed" : 0,
"eviction worker thread stable number" : 0,
"files with active eviction walks" : 0,
"files with new eviction walks started" : 0,
"force re-tuning of eviction workers once in a while" : 0,
"forced eviction - history store pages failed to evict while session has history store cursor open" : 0,
"forced eviction - history store pages selected while session has history store cursor open" : 0,
"forced eviction - history store pages successfully evicted while session has history store cursor open" : 0,
"forced eviction - pages evicted that were clean count" : 0,
"forced eviction - pages evicted that were clean time (usecs)" : 0,
"forced eviction - pages evicted that were dirty count" : 0,
"forced eviction - pages evicted that were dirty time (usecs)" : 0,
"forced eviction - pages selected because of too many deleted items count" : 0,
"forced eviction - pages selected count" : 0,
"forced eviction - pages selected unable to be evicted count" : 0,
"forced eviction - pages selected unable to be evicted time" : 0,
"forced eviction - session returned rollback error while force evicting due to being oldest" : 0,
"hazard pointer blocked page eviction" : 0,
"hazard pointer check calls" : 902,
"hazard pointer check entries walked" : 25,
"hazard pointer maximum array length" : 1,
"history store key truncation calls that returned restart" : 0,
"history store key truncation due to mixed timestamps" : 0,
"history store key truncation due to the key being removed from the data page" : 0,
"history store score" : 0,
"history store table insert calls" : 0,
"history store table insert calls that returned restart" : 0,
"history store table max on-disk size" : 0,
"history store table on-disk size" : 0,
"history store table out-of-order resolved updates that lose their durable timestamp" : 0,
"history store table out-of-order updates that were fixed up by moving existing records" : 0,
"history store table out-of-order updates that were fixed up during insertion" : 0,
"history store table reads" : 0,
"history store table reads missed" : 0,
"history store table reads requiring squashed modifies" : 0,
"history store table remove calls due to key truncation" : 0,
"history store table writes requiring squashed modifies" : 0,
"in-memory page passed criteria to be split" : 0,
"in-memory page splits" : 0,
"internal pages evicted" : 0,
"internal pages queued for eviction" : 0,
"internal pages seen by eviction walk" : 0,
"internal pages seen by eviction walk that are already queued" : 0,
"internal pages split during eviction" : 0,
"leaf pages split during eviction" : 0,
"maximum bytes configured" : 8053063680,
"maximum page size at eviction" : 376,
"modified pages evicted" : 902,
"modified pages evicted by application threads" : 0,
"operations timed out waiting for space in cache" : 0,
"overflow pages read into cache" : 0,
"page split during eviction deepened the tree" : 0,
"page written requiring history store records" : 0,
"pages currently held in the cache" : 24,
"pages evicted by application threads" : 0,
"pages queued for eviction" : 0,
"pages queued for eviction post lru sorting" : 0,
"pages queued for urgent eviction" : 902,
"pages queued for urgent eviction during walk" : 0,
"pages read into cache" : 20,
"pages read into cache after truncate" : 902,
"pages read into cache after truncate in prepare state" : 0,
"pages requested from the cache" : 33134,
"pages seen by eviction walk" : 0,
"pages seen by eviction walk that are already queued" : 0,
"pages selected for eviction unable to be evicted" : 0,
"pages selected for eviction unable to be evicted as the parent page has overflow items" : 0,
"pages selected for eviction unable to be evicted because of active children on an internal page" : 0,
"pages selected for eviction unable to be evicted because of failure in reconciliation" : 0,
"pages walked for eviction" : 0,
"pages written from cache" : 1822,
"pages written requiring in-memory restoration" : 0,
"percentage overhead" : 8,
"tracked bytes belonging to internal pages in the cache" : 5136,
"tracked bytes belonging to leaf pages in the cache" : 67908,
"tracked dirty bytes in the cache" : 493,
"tracked dirty pages in the cache" : 1,
"unmodified pages evicted" : 0
}

 

There’s a lot of data here about WiredTiger’s cache, but we can focus on the following fields:

  • wiredTiger.cache.maximum bytes configured: This is the current maximum cache size.
  • wiredTiger.cache.bytes currently in the cache – This is the size of the data currently in the cache.   This is typically 80% of your cache size plus the amount of “dirty” cache that has not yet been written to disk. This should not be greater than the maximum bytes configured.  Having a value equal to or greater than the maximum bytes configured is a great indicator that you should have already scaled out.
  • wiredTiger.cache.tracked dirty bytes in the cache – This is the size of the dirty data in the cache. This should be less than five percent of your cache size value and can be another indicator that we need to scale out.   Once this goes over five percent of your cache size value WiredTiger will get more aggressive with removing data from your cache and in some cases may force your client to evict data from the cache before it can successfully write to it.
  • wiredTiger.cache.pages read into cache – This is the number of pages that are read into cache and you can use this to judge its per-second average to know what data is coming into your cache.
  • wiredTiger.cache.pages written from cache – This is the number of pages that are written from the cache to disk.   This will be especially heavy before checkpoints have occurred.  If this value continues to increase, then your checkpoints will continue to get longer and longer.

Looking at the above values, we can determine if we need to increase the size of the WiredTiger cache for our instance.  We might also look at the WiredTiger Concurrency Read and Write Ticket usage.  It’s fine that some tickets are used, but if the number continues to grow towards the number of cores then you’re reaching saturation of your CPU.  To check your tickets used you can see this in Percona Monitoring and Management Tool (PMM) or run the following query:

 

db.serverStatus().wiredTiger.concurrentTransactions
{
"write" : {
"out" : 0,
"available" : 128,
"totalTickets" : 128
},
"read" : {
"out" : 1,
"available" : 127,
"totalTickets" : 128
}
}

 

The wiredTiger.cache.pages read into cache value may also be indicative of an issue for read-heavy applications. If this value is consistently a large part of your cache size, increasing your memory may improve overall read performance.

Example

Using the following numbers as our example starting point, we can see the cache is small and there is definitely memory pressure on the cache:

We also are using the default wiredTiger cache size, so we know we have 16 GB of memory on the system (0.5 * (16-1)) = 7.5 GB.   Based on our knowledge of our (imaginary) application, we know the working set is 16 GB, so we want to be higher than this number.  In order to give us room for additional growth since our working set will only continue to grow, we could resize our server’s RAM from 16 GB to 48 GB.  If we stick with the default settings, this would increase our WiredTiger cache to 23.5 GB. (0.5 * (48-1)) = 23.5 GB.  This would leave 24.5 GB of RAM for the OS and its filesystem cache.  If we wanted to increase the size given to the WiredTiger cache we would set the storage.wiredTiger.engineConfig.cacheSizeGB to the value we wanted.   For example, say we want to allocate 30 GB to the wiredTiger cache to really avoid any reads from disk in the near term, leaving 18 GB for the OS and its filesystem cache.   We would add the following to our mongod.conf file:

storage:
   wiredTiger:
       engineConfig:
           cacheSizeGB: 30

For either the default setting or the specific settings to recognize the added memory and take effect, we will need to restart the mongod process.

Also note that unlike a lot of other database systems where the database cache is typically sized closer to 80-90% of system memory, MongoDB’s sweet spot is in the 50-70% range.  This is because MongoDB only uses the WiredTiger cache for uncompressed pages, while the operating system caches the compressed pages and writes them to the database files.  By leaving free memory to the operating system, we increase the likelihood of getting the page from the OS cache instead of needing to do a disk read.

Summary

In this article, we’ve gone over how to update your MongoDB configuration after you’ve upgraded to more memory.   We hope that this helps you tune your MongoDB configuration so that you can get the most out of your increased RAM.   Thanks for reading!

Additional Resources:

MongoDB Best Practices 2020 Edition

Tuning MongoDB for Bulk Loads

Jan
07
2021
--

Webinar January 19: Maximize the Benefits of Using Open Source MongoDB with Percona Distribution for MongoDB

Benefits of Using Open Source MongoDB with Percona Distribution for MongoDB

Benefits of Using Open Source MongoDB with Percona Distribution for MongoDBMany organizations require an Enterprise subscription of MongoDB for the support coverage and additional features that it provides. However, many are unaware that there is an open source alternative offering all the features and benefits of a MongoDB Enterprise subscription without the licensing fees and the vendor lock-in.

In this joint NessPRO webinar geared at the Israeli marketplace, we will cover the Percona approach and features that make Percona Server for MongoDB an enterprise-grade, license-free alternative to MongoDB Enterprise edition. In this discussion we will address::

– Brief History of MongoDB

– MongoDB Enterprise versus Percona Server for MongoDB features including:

– Authentication and Authorization

– Encryption

– Governance

– Audit logging

– Backups

– Kubernetes Operator

– Monitoring & Alerting

Please join Michal Nosek, Senior Pre-Sales Solution Engineer, Percona, and Eli Alafi, Director, Data & Innovation at NessPRO on Tuesday, January 19th, 2021 at 10:00 am GMT+2 for the webinar “How to Maximize the Benefits of Using Open Source MongoDB with Percona Distribution for MongoDB”.

Register for Webinar

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

Jan
06
2021
--

One Shard Support in Kubernetes Operator for Percona Server for MongoDB

One Shard Support in Kubernetes Operator for Percona Server for MongoDB

One Shard Support in Kubernetes Operator for Percona Server for MongoDB

So far, Percona Kubernetes Operator for Percona Server for MongoDB (PSMDB) has supported only managing replica sets, but from version 1.6.0 it is possible to start a sharding cluster, although at the moment only with one shard. This is a step toward supporting full sharding, with multiple shards being added in a future release.

Components that were added to make this work are config replica set and mongos support, with all things that go around that like services, probes, statuses, etc. As well as starting a sharded cluster from scratch, it is also possible to migrate from a single replica set to a shard setup – and back.

Configuration Options for Sharding

A new section was added into the cr.yaml configuration called “sharding” where you can enable/disable sharding altogether. You can also change the number of running pods for config server replica set and mongos, set antiAffinityTopologyKey, podDisruptionBudget, resources, and define how the mongos service will be exposed.

Here’s how some simple config might look like:

sharding:
  enabled: true
    configsvrReplSet:
      size: 3
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi
  mongos:
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    expose:
      enabled: true
      exposeType: LoadBalancer

The default number of pods for config server replica set and mongos is three, but you can use less if you enable the “allowUnsafeConfigurations” option.
There are more configuration options inside the cr.yaml, but some of them are commented out since they are probably a bit more specific to different use cases or environments.

This is how the pods and service setup might look like when you start the sharding cluster:

NAME                                               READY   STATUS    RESTARTS   AGE
my-cluster-name-cfg-0                              2/2     Running   0          2m38s
my-cluster-name-cfg-1                              2/2     Running   1          2m10s
my-cluster-name-cfg-2                              2/2     Running   1          103s
my-cluster-name-mongos-556bdd5b79-bkgd2            1/1     Running   0          2m36s
my-cluster-name-mongos-556bdd5b79-klkh6            1/1     Running   0          2m36s
my-cluster-name-mongos-556bdd5b79-nbgd9            1/1     Running   0          2m36s
my-cluster-name-rs0-0                              2/2     Running   0          2m40s
my-cluster-name-rs0-1                              2/2     Running   1          2m11s
my-cluster-name-rs0-2                              2/2     Running   1          104s
percona-server-mongodb-operator-587658ccc8-k6zpt   1/1     Running   0          3m14s

NAME                     TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)           AGE
my-cluster-name-cfg      ClusterIP      None          <none>        27017/TCP         2m58s
my-cluster-name-mongos   LoadBalancer   10.51.244.4   34.78.50.13   27017:31685/TCP   2m56s
my-cluster-name-rs0      ClusterIP      None          <none>        27017/TCP         3m

Here you can see that in this example, we have mongos service configured to be exposed with LoadBalancer and available through external IP. At the current moment, the client will connect to mongos instances through the load balancer service in a round-robin fashion, but in the future, it is planned to support session affinity (sticky method) so that the same client would connect to the same mongos instance most of the time.

Migrating From Replica Set to One Shard Setup (and Back)

MongoDB (in general) supports migrating from replica set to sharding setup and also from sharding to replica set, but it requires more or less manual steps depending on the complexity of existing architecture. Our Kubernetes operator, at the current moment, supports automatic migration from replica set to one shard and back from one shard to replica set.

These are the steps that PSMDB Kubernetes Operator does when we enable sharding but have an existing replica set:

  • restart existing replica set members with “–shardsvr” option included
  • deploy config server replica set and mongos as they are defined in cr.yaml (default is three pods for each)
    • create stateful set for config replica set
    • setup Kubernetes service for mongos and config replica set
  • add existing replica set as a shard in sharding cluster

In this process, data is preserved, but there might be additional steps needed with application users since they will become shard local users and not available through mongos (so it is needed to create them from mongos).

When we migrate from one shard setup to replica set, data is also preserved, the steps which are mentioned above are reverted, but in this case, application users are lost since they were stored in config replica set which doesn’t exist anymore – so they will need to be recreated.

SmartUpdate Strategy for Sharding Cluster

As you may know, both Percona Kubernetes Operators (Percona XtraDB Cluster and PSMDB) have SmartUpdate strategy which tries to upgrade the clusters automatically and with as little interruption for the application as possible.

When we are talking about sharding, this is what the steps look like:

  • disable the balancer
  • upgrade config replica set (secondaries first, step down primary, upgrade primary)
  • upgrade data replica set (secondaries first, step down primary, upgrade primary)
  • upgrade mongos pods
  • enable balancer

This is how this process might look in the Operator logs when we upgrade the cluster from one PSMDB version to another (some parts stripped for brevity):

{"level":"info","msg":"update Mongo version to 4.2.7-7 (fetched from db)"}
{"level":"info","msg":"waiting for config RS update"}
{"level":"info","msg":"statefullSet was changed, start smart update","name":"my-cluster-name-cfg"}
{"level":"info","msg":"balancer disabled"}
{"level":"info","msg":"primary pod is my-cluster-name-cfg-0.my-cluster-name-cfg.psmdb-test.svc.cluster.local:27017"}
{"level":"info","msg":"apply changes to secondary pod my-cluster-name-cfg-2"}
{"level":"info","msg":"pod my-cluster-name-cfg-2 started"}
{"level":"info","msg":"apply changes to secondary pod my-cluster-name-cfg-1"}
{"level":"info","msg":"pod my-cluster-name-cfg-1 started"}
{"level":"info","msg":"doing step down..."}
{"level":"info","msg":"apply changes to primary pod my-cluster-name-cfg-0"}
{"level":"info","msg":"pod my-cluster-name-cfg-0 started"}
{"level":"info","msg":"smart update finished for statefulset","statefulset":"my-cluster-name-cfg"}
{"level":"info","msg":"statefullSet was changed, start smart update","name":"my-cluster-name-rs0"}
{"level":"info","msg":"primary pod is my-cluster-name-rs0-0.my-cluster-name-rs0.psmdb-test.svc.cluster.local:27017"}
{"level":"info","msg":"apply changes to secondary pod my-cluster-name-rs0-2"}
{"level":"info","msg":"pod my-cluster-name-rs0-2 started"}
{"level":"info","msg":"apply changes to secondary pod my-cluster-name-rs0-1"}
{"level":"info","msg":"pod my-cluster-name-rs0-1 started"}
{"level":"info","msg":"doing step down..."}
{"level":"info","msg":"apply changes to primary pod my-cluster-name-rs0-0"}
{"level":"info","msg":"pod my-cluster-name-rs0-0 started"}
{"level":"info","msg":"smart update finished for statefulset","statefulset":"my-cluster-name-rs0"}
{"level":"info","msg":"update Mongo version to 4.2.8-8 (fetched from db)"}
{"level":"info","msg":"waiting for mongos update"}
{"level":"info","msg":"balancer enabled"}

Conclusion

Although adding support for one shard cluster doesn’t sound too important since it doesn’t allow sharding data across shards, it is a big milestone and laying the foundation for things that are needed in the future to support this. Except for that, it might allow you to expose your data to applications in different ways through mongos instances, so if interested please check the documentation and release notes for more details.

Jan
05
2021
--

MongoDB 101: 5 Configuration Options That Impact Performance and How to Set Them

MongoDB configuration options that impact Performance

MongoDB configuration options that impact PerformanceAs with any database platform, MongoDB performance is of paramount importance to keeping your application running quickly.   In this blog post, we’ll show you five configuration options that can impact the performance for your MongoDB Deployment and will help keep your database fast and performing at its peak.

MongoDB Performance Overview

MongoDB performance consists of several factors; there’s OS Settings, DB Configuration Settings, DB Internal Settings, Memory Settings, and Application Settings.  This post is going to focus on the MongoDB database configuration options around performance and how to set them.  These options are ones that are set in the database configuration itself that can impact your performance.

Configuration Options

So how do we ensure our performance configuration options are enabled or set up correctly?  And which ones are the most important?  We’ll now go through five configuration options that will help your MongoDB environment be performant!

MongoDB uses a configuration file in the YAML file format.  The configuration file is usually found in the following locations, depending on your Operating System:

DEFAULT CONFIGURATION FILE

  • On Linux, a default /etc/mongod.conf configuration file is included when using a package manager to install MongoDB.
  • On Windows, a default <install directory>/bin/mongod.cfg configuration file is included during the installation.
  • On macOS, a default /usr/local/etc/mongod.conf configuration file is included when installing from MongoDB’s official Homebrew tap.

 

storage.wiredTiger.engineConfig.cacheSizeGB

Our first configuration option to help with your MongoDB performance is storage.wiredTiger.engineConfig.cacheSizeGB.

storage:
   wiredTiger:
       engineConfig:
           cacheSizeGB: <value>

Since MongoDB 3.0, MongoDB has used WiredTiger as its default Storage Engine, so we’ll be examining MongoDB Memory Performance from a WiredTiger perspective. By default, MongoDB will reserve 50% of the available memory – 1 GB for the WiredTiger cache or 256 MB, whichever is greater.  For example, a system with 16 GB of RAM would have a WiredTiger cache size of 7.5 GB.

( 0.5 * (16-1) )

The size of this cache is important to ensure WiredTiger is performant. It’s worth taking a look to see if you should alter it from the default. A good rule of thumb is that the size of the cache should be large enough to hold the entire application working set.

Note that if you’re in a containerized environment, you may also need to set the configuration option to 50% – 1 GB of the memory that is available to the container.  MongoDB may adhere to the container’s memory limits or it may get the host’s memory limit, depending on how the system call is returned when MongoDB asks.  You can verify what MongoDB believes the memory limit is by running:

db.hostInfo()

And checking the hostinfo.system.memLimitMB value.  This is available from MongoDB 3.6.13+ and MongoDB 4.0.9+ forward.

How do we know whether to increase or decrease our cache size? Look at the cache usage statistics:

db.serverStatus().wiredTiger.cache
{
"application threads page read from disk to cache count" : 9,
"application threads page read from disk to cache time (usecs)" : 17555,
"application threads page write from cache to disk count" : 1820,
"application threads page write from cache to disk time (usecs)" : 1052322,
"bytes allocated for updates" : 20043,
"bytes belonging to page images in the cache" : 46742,
"bytes belonging to the history store table in the cache" : 173,
"bytes currently in the cache" : 73044,
"bytes dirty in the cache cumulative" : 38638327,
"bytes not belonging to page images in the cache" : 26302,
"bytes read into cache" : 43280,
"bytes written from cache" : 20517382,
"cache overflow score" : 0,
"checkpoint blocked page eviction" : 0,
"eviction calls to get a page" : 5973,
"eviction calls to get a page found queue empty" : 4973,
"eviction calls to get a page found queue empty after locking" : 20,
"eviction currently operating in aggressive mode" : 0,
"eviction empty score" : 0,
"eviction passes of a file" : 0,
"eviction server candidate queue empty when topping up" : 0,
"eviction server candidate queue not empty when topping up" : 0,
"eviction server evicting pages" : 0,
"eviction server slept, because we did not make progress with eviction" : 735,
"eviction server unable to reach eviction goal" : 0,
"eviction server waiting for a leaf page" : 2,
"eviction state" : 64,
"eviction walk target pages histogram - 0-9" : 0,
"eviction walk target pages histogram - 10-31" : 0,
"eviction walk target pages histogram - 128 and higher" : 0,
"eviction walk target pages histogram - 32-63" : 0,
"eviction walk target pages histogram - 64-128" : 0,
"eviction walk target strategy both clean and dirty pages" : 0,
"eviction walk target strategy only clean pages" : 0,
"eviction walk target strategy only dirty pages" : 0,
"eviction walks abandoned" : 0,
"eviction walks gave up because they restarted their walk twice" : 0,
"eviction walks gave up because they saw too many pages and found no candidates" : 0,
"eviction walks gave up because they saw too many pages and found too few candidates" : 0,
"eviction walks reached end of tree" : 0,
"eviction walks started from root of tree" : 0,
"eviction walks started from saved location in tree" : 0,
"eviction worker thread active" : 4,
"eviction worker thread created" : 0,
"eviction worker thread evicting pages" : 902,
"eviction worker thread removed" : 0,
"eviction worker thread stable number" : 0,
"files with active eviction walks" : 0,
"files with new eviction walks started" : 0,
"force re-tuning of eviction workers once in a while" : 0,
"forced eviction - history store pages failed to evict while session has history store cursor open" : 0,
"forced eviction - history store pages selected while session has history store cursor open" : 0,
"forced eviction - history store pages successfully evicted while session has history store cursor open" : 0,
"forced eviction - pages evicted that were clean count" : 0,
"forced eviction - pages evicted that were clean time (usecs)" : 0,
"forced eviction - pages evicted that were dirty count" : 0,
"forced eviction - pages evicted that were dirty time (usecs)" : 0,
"forced eviction - pages selected because of too many deleted items count" : 0,
"forced eviction - pages selected count" : 0,
"forced eviction - pages selected unable to be evicted count" : 0,
"forced eviction - pages selected unable to be evicted time" : 0,
"forced eviction - session returned rollback error while force evicting due to being oldest" : 0,
"hazard pointer blocked page eviction" : 0,
"hazard pointer check calls" : 902,
"hazard pointer check entries walked" : 25,
"hazard pointer maximum array length" : 1,
"history store key truncation calls that returned restart" : 0,
"history store key truncation due to mixed timestamps" : 0,
"history store key truncation due to the key being removed from the data page" : 0,
"history store score" : 0,
"history store table insert calls" : 0,
"history store table insert calls that returned restart" : 0,
"history store table max on-disk size" : 0,
"history store table on-disk size" : 0,
"history store table out-of-order resolved updates that lose their durable timestamp" : 0,
"history store table out-of-order updates that were fixed up by moving existing records" : 0,
"history store table out-of-order updates that were fixed up during insertion" : 0,
"history store table reads" : 0,
"history store table reads missed" : 0,
"history store table reads requiring squashed modifies" : 0,
"history store table remove calls due to key truncation" : 0,
"history store table writes requiring squashed modifies" : 0,
"in-memory page passed criteria to be split" : 0,
"in-memory page splits" : 0,
"internal pages evicted" : 0,
"internal pages queued for eviction" : 0,
"internal pages seen by eviction walk" : 0,
"internal pages seen by eviction walk that are already queued" : 0,
"internal pages split during eviction" : 0,
"leaf pages split during eviction" : 0,
"maximum bytes configured" : 8053063680,
"maximum page size at eviction" : 376,
"modified pages evicted" : 902,
"modified pages evicted by application threads" : 0,
"operations timed out waiting for space in cache" : 0,
"overflow pages read into cache" : 0,
"page split during eviction deepened the tree" : 0,
"page written requiring history store records" : 0,
"pages currently held in the cache" : 24,
"pages evicted by application threads" : 0,
"pages queued for eviction" : 0,
"pages queued for eviction post lru sorting" : 0,
"pages queued for urgent eviction" : 902,
"pages queued for urgent eviction during walk" : 0,
"pages read into cache" : 20,
"pages read into cache after truncate" : 902,
"pages read into cache after truncate in prepare state" : 0,
"pages requested from the cache" : 33134,
"pages seen by eviction walk" : 0,
"pages seen by eviction walk that are already queued" : 0,
"pages selected for eviction unable to be evicted" : 0,
"pages selected for eviction unable to be evicted as the parent page has overflow items" : 0,
"pages selected for eviction unable to be evicted because of active children on an internal page" : 0,
"pages selected for eviction unable to be evicted because of failure in reconciliation" : 0,
"pages walked for eviction" : 0,
"pages written from cache" : 1822,
"pages written requiring in-memory restoration" : 0,
"percentage overhead" : 8,
"tracked bytes belonging to internal pages in the cache" : 5136,
"tracked bytes belonging to leaf pages in the cache" : 67908,
"tracked dirty bytes in the cache" : 493,
"tracked dirty pages in the cache" : 1,
"unmodified pages evicted" : 0
}

There’s a lot of data here about WiredTiger’s cache, but we can focus on the following fields:

Looking at the above values, we can determine if we need to up the size of the WiredTiger cache for our instance. Additionally, we can look at the wiredTiger.cache.pages read into cache value for read-heavy applications. If this value is consistently high, increasing the cache size may improve overall read performance.

storage.wiredTiger.engineConfig.directoryForIndexes

Our second configuration option is storage.wiredTiger.engineConfig.directoryForIndexes.

storage:
   wiredTiger:
       engineConfig:
           directoryForIndexes: <true or false>

Setting this value to true creates two directories in your storage.dbPath directory, one named collection which will hold your collection data files, and another one named index which will hold your index data files.  This allows you to create separate storage volumes for collections and indexes if you wish, which can spread the amount of disk I/O across each volume, but with most modern storage options you can get the same performance benefits by just striping your disk across two volumes (RAID 0).  This can help separate index I/O from collection-based I/O and reduce storage based latencies, although index-based I/O is unlikely to be costly due to its smaller size.

storage.wiredTiger.collectionConfig.blockCompressor

Our third configuration option is storage.wiredTiger.collectionConfig.blockCompressor.

storage:
   wiredTiger:
       collectionConfig:
           blockCompressor: <value>

This option sets the compression options for all of your collection data.  Possible values for this parameter are none, snappy (default), zlib, and zstd.  So how does compression help your performance?  The WiredTiger cache generally stores changes uncompressed, with the exception of some very large documents. Now we need to write that uncompressed data to disk.

Compression Types:

Snappy compression is fairly straightforward, snappy compression gathers your data up to a maximum of 32KB, compresses your data, and if compression is successful, writes the block rounded up to the nearest 4KB.

Zlib compression works a little differently; it will gather more data and compress enough to fill a 32KB block on disk. This is more CPU-intensive but generally results in better compression ratios (independent of the inherent differences between snappy and zlib).

Zstd is a newer compression algorithm, developed by Facebook that offers improvements over zlib (better compression rates, less CPU intensive, faster performance).

Which compression algorithm to choose depends greatly on your workload.  For most write-heavy workloads, snappy compression will perform better than zlib and zstd but will require more disk space.  For read-heavy workloads, zstd is often the best choice because of its better decompression rates.

storage.directoryPerDB

Another configuration option to help with MongoDB performance is storage.directoryPerDB.

storage:
   directoryPerDB: <true or false>

Similar to the above configuration file option, storage.wiredTiger.engineConfig.directoryForIndexes, setting this value to true creates a separate directory in your storage.dbPath for each database in your MongoDB instance.  This allows you to create separate storage volumes for each database if you wish, which can spread the amount of disk I/O across each volume.  This can help when you have multiple databases with intensive I/O needs. Additionally, if you use this parameter in tandem with storage.wiredTiger.engineConfig.directorForIndexes, your directory structure will look like this:

-Database_name
    -Collection
    -Index

net.compression.compressors

Our final configuration option that can help keep your database performant is the net.compression.compressors configuration option.

net:
   compression:
       compressors: <value>

This option allows you to compress the network traffic between your mongos, mongod, and even your mongo shell. There are currently three types of compression available, snappy, zlib, and zstd. Since MongoDB 3.6, compression has been enabled by default.  In MongoDB 3.6 and 4.0, snappy was the default.  Since MongoDB 4.2, the default is now snappy, zstd, and zlib compressors, in that order.  It’s also important to note that you must have at least one mutual compressor on each side of your network conversation for compression to happen.  For example, if your shell uses zlib compression but you have your mongod set to only accept snappy compression, then no compression will occur between the two. If both accept zstd compression, then zstd compression would be used between them.  When compression is set it can be very helpful in reducing replication lag and overall reducing network latency as the size of the data moving across the network is decreased, sometimes dramatically.  In cloud environments, setting this configuration option can also lead to decreased data transfer costs.

Summary:

In this blog post, we’ve gone over five MongoDB configuration options to ensure you have a more performant MongoDB deployment.  We hope that these configuration options will help you build more performant MongoDB deployments and avoid slowdowns and bottlenecks.   Thanks for reading!

Additional Resources: MongoDB Best Practices 2020 Edition

Jan
04
2021
--

Converting MongoDB to Percona Server for MongoDB Webinar Q&A

Converting MongoDB to Percona Server for MongoDB Webinar Q&A

Converting MongoDB to Percona Server for MongoDB Webinar Q&AWe had great attendance, questions, and feedback from our “Converting MongoDB to Percona Server for MongoDB” webinar, which was recorded and can be viewed here. You can view another Q&A from a previous webinar on “Converting MongoDB to Percona Server for MongoDB” here. Without further ado, here are your questions and their responses.

 

Q: If migrating from MongoDB Enterprise Edition to Percona Server for MongoDB, can implementations that use LDAP or Kerberos be migrated using the replica set takeover method to avoid excessive downtime?

A: The intended design when using these two features is that Percona Server for MongoDB (PSMDB) remains a true drop-in replacement. This means no configuration changes will be necessary. This should be tested before go-live to ensure success. 

The above is valid for the Audit Plugin as well, which is a free feature that is included in the enterprise version.

 

Q: Does the replica set takeover method also work for sharded implementations of MongoDB that use configdb replica sets?

A: Yes, it does, although there are a few more steps to consider as you do not want chunks migrating while you are upgrading. 

To convert a sharded cluster to Percona Server for MongoDB, you would:

  1. Disable the balancer (sh.stopBalancer() )
  2. Convert the config servers (replica set takeover method)
  3. Upgrade the shards (replica set takeover method)
  4. Upgrade the mongos instances
  5. Re-enable the balancer (sh.startBalancer())

 

Q: Is it necessary to make any change on the driver/application side?

A: No. The drivers have the same compatibility for PSMDB.

 

Q: Does Percona Monitoring and Management (PMM) support alerting?

A: Yes, PMM supports alerting. This blog post discusses the new and upcoming PMM native alerting, this documentation shows how to configure Prometheus AlertManager integration, and this documentation shows how to utilize Grafana Alerts. 

 

Q: When is best to migrate from MySQL to MongoDB, and in what scenario would MongoDB be the best replacement?

A: This is a complicated question without a simple answer. It depends on the application workload, internal business directives, internal technology directives, and in-house expertise availability. In general, we recommend choosing the right tool for the right job. In that sense, MySQL was built for structured, relational, and transactional workloads, while MongoDB was built for an unstructured, JSON document model without a lot of transactions linking collections. While you technically can cross-pollinate both models between MySQL and MongoDB, we do not recommend doing so unless there is good reason to do so.  This is the perfect scenario to engage with Percona consulting for true expert input in the decision making process.

Dec
07
2020
--

Running MongoDB on Amazon EKS Distro

MongoDB on AWS EKS-D

MongoDB on AWS EKS-DLast year AWS was about to ban the “multi-cloud” term in co-branding guides for Partners, removed the ban after community and partners critique, and now embraces multi-cloud strategy.

One of the products that AWS announced during its last re:Invent was Amazon EKS Distro — Kubernetes distribution based on and used by Amazon Elastic Kubernetes Service. It is interesting because it is the first step to the new service — EKS Anywhere — which enables AWS customers to run EKS anywhere, even on bare-metal or any other cloud, and later allows them to seamlessly migrate from on-prem EKS directly to AWS.

In this blog post, we will show how easy it is to spin up Amazon EKS Distro (EKS-D) and set up MongoDB with Percona Kubernetes Operator for Percona Server for MongoDB.

Let the Show Begin

Give Me the Cluster

I just spun up the brand new Ubuntu 20.10 virtual machine. You can spin it up anywhere, I myself use Multipass — it gives command-line interface to launch Linux machines locally in seconds.

Installing EKS-D on Ubuntu is one command “effort”:

$ sudo snap install eks --classic --edge
Run configure hook of "eks" snap if present                                                                                                                                                                                                                                                     
eks (1.18/edge) v1.18.9 from Canonical? installed

EKS on Ubuntu gives the same look and feel as microk8s — it has its own command line (eks) and allows you to add/remove nodes easily if needed. Read more here.

Check if EKS is up and running:

# eks status
eks is running
high-availability: no
  datastore master nodes: 127.0.0.1:19001
  datastore standby nodes: none

eks kubectl

 gives you direct access to regular Kubernetes API. Hint: you can get configuration from eks and put it into

.kube

folder to control EKS with

kubectl

(you may need to install it). I’m lazy and will continue using

eks kubectl

.

# mkdir ~/.kube/ ; eks config > ~/.kube/config
# eks kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-node-rrsbd                          1/1     Running   1          15m
calico-kube-controllers-555fc8cc5c-2ll8f   1/1     Running   0          15m
coredns-6788f546c9-x8q7l                   1/1     Running   0          15m
metrics-server-768748c8f4-qpxnp            1/1     Running   0          15m
hostpath-provisioner-66667bf7f-pfg8s       1/1     Running   0          15m

hostpath-provisioner

is running, which means the host path based storage class needed for the database is already there.

Give Me the Database

As promised we will use Percona Kubernetes Operator for Percona Server for MongoDB to spin up the database. And it is the same process described in our minikube installation guide (as long as you run 1 node only).

Get the code from github:

# git clone -b v1.5.0 https://github.com/percona/percona-server-mongodb-operator
# cd percona-server-mongodb-operator

Deploy the operator:

# eks kubectl apply -f deploy/bundle.yaml
customresourcedefinition.apiextensions.k8s.io/perconaservermongodbs.psmdb.percona.com created
customresourcedefinition.apiextensions.k8s.io/perconaservermongodbbackups.psmdb.percona.com created
customresourcedefinition.apiextensions.k8s.io/perconaservermongodbrestores.psmdb.percona.com created
role.rbac.authorization.k8s.io/percona-server-mongodb-operator created
serviceaccount/percona-server-mongodb-operator created
rolebinding.rbac.authorization.k8s.io/service-account-percona-server-mongodb-operator created
deployment.apps/percona-server-mongodb-operator created

I have one node in my fancy EKS cluster and I will

  • Change the number of nodes in a replica set to 1 (
    size: 1

    )

  • Remove the
    antiAffinity

    configuration 

  • Set
    allowUnsafeConfigurations

    flag to true in

    deploy/cr.yaml

    . This flag set to

    true

    allows users to run unsafe configurations (like 1 node MongoDB cluster), this is useful for development or testing purposes, but of course not recommended for production.

spec:
...
  allowUnsafeConfigurations: true
  replsets:
  - name: rs0
    size: 1
#    affinity:
#      antiAffinityTopologyKey: "kubernetes.io/hostname"

Now, give me the database:

# eks kubectl apply -f deploy/cr.yaml
1 minute later…
# eks kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
percona-server-mongodb-operator-6b5dbccbd5-jh9x8   1/1     Running   0          7m35s
my-cluster-name-rs0-0                              2/2     Running   0          77s

Simple as that!

Conclusion

Setting up Kubernetes locally can be easily done not only with EKS Distro, but with minikube, microk8s, k3s, and other distributions. The true value of EKS-D will be shown once EKS Anywhere goes live in 2021 and unlocks the multi-cloud Kubernetes. Percona has always been an open source company: we embrace, value, and heavily invest in multi-cloud ecosystems. Our Kubernetes Operators for Percona XtraDB Cluster and MongoDB enable businesses to run their data on Kubernetes on any public or private cloud without lock-in. We also provide full support for our operators and databases running on your Kubernetes cluster.

Dec
02
2020
--

Swag for Review: Special Winter Set for Percona MongoDB Products

Percona Swag MongoDB

Percona Swag MongoDBWe continue to give away exclusive Percona swag for your honest reviews about our products. Today we are glad to announce a special winter set that works for the reviews on specific products.

Percona is developing three excellent products created for MongoDB database technology:

  1. Percona Server for MongoDB is a free, open-source drop-in replacement for MongoDB Community Edition but with enterprise-grade functionality.
  2. Percona Backup for MongoDB is a distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets.
  3. Percona Distribution for MongoDB is the only truly open-source solution powerful enough for enterprise applications. Percona Server for MongoDB and Percona Backup for MongoDB are key components of Percona Distribution for MongoDB.

We believe you have tried some of them if you work with MongoDB, and now you have a great opportunity to get wonderful gifts from Percona.

All you need is to take 10 minutes and write a review on any of our MongoDB products. We will send you the latest in Percona gear – 100% free, and shipped to you anywhere in the world!

Get an awesome hoodie or hat from Percona for a review on G2.

  1. Percona Server for MongoDB
  2. Percona Backup for MongoDB

Leave a review on SourceForge and get a t-shirt or a metal mug.

  1. Percona Server for MongoDB
  2. Percona Backup for MongoDB

If you use all the products, you can leave reviews on all the platforms and get the full set. But please don’t miss a chance – the give-away finishes on December 31, 2020.

We would like to point out that feedback platforms moderate reviews, do not write short reviews, implausible reviews, or too marketing sweet reviews.

Any meaningful review (ie: not just a star rating) earns swag; whether it is positive, negative, or mixed. We believe in open source and learning from our users, so please write honestly about your experience using Percona software.

To claim your swag, email the Percona community team and include:

  1. The screenshot or link to your review
  2. Your postal address
  3. Your phone number (for delivery use only, never for marketing)
  4. The swag you’ve chosen
  5. If you have chosen a t-shirt or hoodie please also let us know what color (grey, black, or blue), and your size.

It’s that simple!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com