Mar
10
2023
--

Capped Collection in MongoDB

capped collection in mongodb

In this blog post, we will discuss the capped collection of MongoDB. Capped collection is a fixed-type collection that inserts docs in a circular fashion. This means once allocated files are full, data at the beginning of the first file is overwritten. Consider this: we define a capped collection of size 1GB, it will purge out the oldest document if the allocated size of the collection is full and we have to insert a new document.

A capped collection guarantees that it will maintain the document’s insertion order. Due to this, it doesn’t require an extra index to retrieve a document. This helps a capped collection to maintain a high throughput insertion. Capped collection’s document contains the _id field, which is by default index, and deletion of the document will happen based on the oldest _id. MongoDB automatically increases the capped collection’s provided size to make it an integer multiple of 256. One should avoid updating a doc in a capped collection, but if you want to update, you can. Only if you don’t increase the original size of the document, and it is recommended to be light on updates as it scans the whole collection. Create an index to avoid collection scan.

A capped collection can be created as below:

db.createCollection( "logs", { capped: true, size: 500000 } );  // size is in bytes.

We can also specify the maximum number of docs in the capped collection.

rs1:PRIMARY> db.createCollection( "logs", { capped: true, size: 500000, max: 500 } );  // Size parameter is always needed even if we define the max document number.
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1676896611, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1676896611, 1)
}
rs1:PRIMARY>

Convert a collection to capped collection:

rs1:PRIMARY> db.runCommand({ "convertToCapped" : "log_old", size: 500000, max : 50 })
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1676896802, 3),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1676896802, 3)
}

Verify if it’s a capped collection:

rs1:PRIMARY> db.log_old.isCapped()
true
rs1:PRIMARY>

Query a capped collection:

MongoDB guarantees that retrieving of the docs will be in the same order as it was inserted. 

db.log_old.find().sort( { $natural: 1 } )

To return the docs in reverse insertion order, use the sort method with $natural parameter set to -1.

db.log_old.find().sort( { $natural: -1 } )

Change a capped collection size:

From MongoDB v6.0 onwards, capped collection size can be resized. However, before resizing the capped collection, ensure featureCompatibilityVersion is set to at least “6.0”.

db.runCommand( { collMod: "log", cappedSize: 100000 } ) //cappedSize should be in between 0 and 1PB.

If you try to resize the capped collection in a version older than v6.0 (without setting featureCompatibilityVersion to 6.0) then it will fail with the error unknown option to collMod: cappedSize:

rs1:PRIMARY> db.version()
4.4.16-16
rs1:PRIMARY> db.runCommand( { collMod: "logs", cappedSize: 100000 } )
{
"operationTime" : Timestamp(1678096200, 1),
"ok" : 0,
"errmsg" : "unknown option to collMod: cappedSize",
"code" : 72,
"codeName" : "InvalidOptions",
"$clusterTime" : {
"clusterTime" : Timestamp(1678096200, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
rs1:PRIMARY>

Advantage:

  1. Supports high insertion throughput:  Capped collection keeps data in insert order and removes the index overhead; due to this, it supports high insertion throughput.
  2. Capped collection is useful in storing the log information as it keeps the data ordered by the events.

Disadvantage:
There are some restrictions of the capped collection.

  1. A capped collection can’t be sharded.
  2. A capped collection can’t have TTL indexes.

Summary

Capped collection can be useful to store log file information as it’s close to the speed of writing log information directly to a file system without the index overhead. MongoDB itself uses the capped collection for replication of oplog.rs collection’s storage mechanism due to its solid performance. Oplog.rs collection is a special capped collection in which we cannot create an index, insert a document, or drop the collection. Capped collection has advantages and disadvantages, so before using a capped collection, make sure you understand the application’s requirements and decide accordingly.

We also encourage you to try our products for MongoDB, like Percona Server for MongoDB, Percona Backup for MongoDB, and Percona Operator for MongoDB. We also recommend checking out our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?

Mar
10
2023
--

Capped Collection in MongoDB

capped collection in mongodb

In this blog post, we will discuss the capped collection of MongoDB. Capped collection is a fixed-type collection that inserts docs in a circular fashion. This means once allocated files are full, data at the beginning of the first file is overwritten. Consider this: we define a capped collection of size 1GB, it will purge out the oldest document if the allocated size of the collection is full and we have to insert a new document.

A capped collection guarantees that it will maintain the document’s insertion order. Due to this, it doesn’t require an extra index to retrieve a document. This helps a capped collection to maintain a high throughput insertion. Capped collection’s document contains the _id field, which is by default index, and deletion of the document will happen based on the oldest _id. MongoDB automatically increases the capped collection’s provided size to make it an integer multiple of 256. One should avoid updating a doc in a capped collection, but if you want to update, you can. Only if you don’t increase the original size of the document, and it is recommended to be light on updates as it scans the whole collection. Create an index to avoid collection scan.

A capped collection can be created as below:

db.createCollection( "logs", { capped: true, size: 500000 } );  // size is in bytes.

We can also specify the maximum number of docs in the capped collection.

rs1:PRIMARY> db.createCollection( "logs", { capped: true, size: 500000, max: 500 } );  // Size parameter is always needed even if we define the max document number.
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1676896611, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1676896611, 1)
}
rs1:PRIMARY>

Convert a collection to capped collection:

rs1:PRIMARY> db.runCommand({ "convertToCapped" : "log_old", size: 500000, max : 50 })
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1676896802, 3),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1676896802, 3)
}

Verify if it’s a capped collection:

rs1:PRIMARY> db.log_old.isCapped()
true
rs1:PRIMARY>

Query a capped collection:

MongoDB guarantees that retrieving of the docs will be in the same order as it was inserted. 

db.log_old.find().sort( { $natural: 1 } )

To return the docs in reverse insertion order, use the sort method with $natural parameter set to -1.

db.log_old.find().sort( { $natural: -1 } )

Change a capped collection size:

From MongoDB v6.0 onwards, capped collection size can be resized. However, before resizing the capped collection, ensure featureCompatibilityVersion is set to at least “6.0”.

db.runCommand( { collMod: "log", cappedSize: 100000 } ) //cappedSize should be in between 0 and 1PB.

If you try to resize the capped collection in a version older than v6.0 (without setting featureCompatibilityVersion to 6.0) then it will fail with the error unknown option to collMod: cappedSize:

rs1:PRIMARY> db.version()
4.4.16-16
rs1:PRIMARY> db.runCommand( { collMod: "logs", cappedSize: 100000 } )
{
"operationTime" : Timestamp(1678096200, 1),
"ok" : 0,
"errmsg" : "unknown option to collMod: cappedSize",
"code" : 72,
"codeName" : "InvalidOptions",
"$clusterTime" : {
"clusterTime" : Timestamp(1678096200, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
rs1:PRIMARY>

Advantage:

  1. Supports high insertion throughput:  Capped collection keeps data in insert order and removes the index overhead; due to this, it supports high insertion throughput.
  2. Capped collection is useful in storing the log information as it keeps the data ordered by the events.

Disadvantage:
There are some restrictions of the capped collection.

  1. A capped collection can’t be sharded.
  2. A capped collection can’t have TTL indexes.

Summary

Capped collection can be useful to store log file information as it’s close to the speed of writing log information directly to a file system without the index overhead. MongoDB itself uses the capped collection for replication of oplog.rs collection’s storage mechanism due to its solid performance. Oplog.rs collection is a special capped collection in which we cannot create an index, insert a document, or drop the collection. Capped collection has advantages and disadvantages, so before using a capped collection, make sure you understand the application’s requirements and decide accordingly.

We also encourage you to try our products for MongoDB, like Percona Server for MongoDB, Percona Backup for MongoDB, and Percona Operator for MongoDB. We also recommend checking out our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?

Aug
26
2020
--

Percona Announces Updates to MongoDB Solutions

Percona MongoDB Updates

Percona MongoDB UpdatesOn August 26, 2020, Percona announced the latest release of Percona Distribution for MongoDB which includes new releases of Percona Backup for MongoDB 1.3, now with Point in Time Recovery and Percona Server for MongoDB 4.4. These new releases include several key features:

Percona Backup for MongoDB

Point in Time Recovery

Percona Backup for MongoDB now provides Point in Time Recovery (PITR). With PITR, an administrator can recover the entire replica set to a specific timestamp. Along with taking backup snapshots users can enable Incremental Backup. By doing this, you capture the Oplog 24/7 for each replica set (including those in clusters) to the same storage you use to store backup snapshots.

This is especially important in cases of data corruption. PITR enables you to reset the system to a time before the offending incident occurred, thus restoring the environment to a healthy state. For example, if you drop an important collection at 2020-08-24T11:05:30, you can restore the environment to 2020-08-24T11:05:29 to recover as much data as possible before the damage occurred.

Percona Server for MongoDB

  • Refine Shard Keys
    With the ability to extend your shard keys, you are no longer stuck with a potentially bad decision made early in your implementation. By extending the shard key, you enable data restructuring with no downtime.
  • Hashed Compound Shard Keys
    Hash-provided simple distribution across shards is now available for compound shard keys as well as single-field ones.
  • Mirrored Reads
    This option pre-warms the cache of secondary replicas to reduce the impact of primary elections following an outage or after planned maintenance.
  • Hedged Reads
    To improve tail latency guarantees during times of heavy load, you can send reads to two replica set members at once and use the faster response. Only available when using a cluster, and using a read preference other than the default of “primary”.
  • New Aggregation Operators
    Providing a better way to do MapReduce.

For more details on what’s new in the community MongoDB 4.4 release, please review our expert’s blog.

To learn how Percona Backup for MongoDB can benefit your business, please visit our website.

To learn more about the latest release of Percona Distribution for MongoDB, check out the release notes.

Download Percona Distribution for MongoDB

Aug
10
2020
--

Securing MongoDB: Top Five Security Concerns

securing mongodb

I think most of the time hackers behind the attacks do it just for fun, because they can and because it’s very simple”, — says Diachenko. Source: “Meowing” attack completely destroyed more than 1000 databases

securing mongodbThese are the words of Bob Diachenko, one of the most respected cybersecurity researchers in relation to the last “Meowing” attack, which destroyed between 1000-4000 databases around the world.

Whether it’s for fun or for money, this is not the first reported attack, and it won’t be the last (of 2020). With an increasing number of databases in the cloud, the question is now not a matter of IF it will happen again, but when.

Because of its NoSQL origin and document architecture design, MongoDB is much more flexible and scalable than SQL databases such as MySQL. As a result, typically much more data is stored in MongoDB than in traditional SQL databases. MongoDB databases commonly exceed a terabyte of data. The large amount of data exposed in a single database makes breaches involving MongoDB much more devastating.

The good news is that most of the actions you should take to avoid this are simple to execute. If you are using the open-source Percona Distribution for MongoDB, you also have extra features such as LDAP authentication support, which is present in the Enterprise and Atlas MongoDB versions.

What does MongoDB offer to mitigate security threats? Let’s explore the top five measures that we can take when securing MongoDB.

Authentication in MongoDB

Most breaches involving MongoDB occur because of a deadly combination of authentication disabled and MongoDB opened to the internet. MongoDB provides support for authentication on a per-database level. Users exist in the context of a single logical database. However, MongoDB does not support items like password complexity, age-based rotation, and centralization and identification of user roles versus service functions.

Thankfully, LDAP can fill many of these gaps. Many connectors allow the use of Windows Active Directory (AD) systems to talk with LDAP.

LDAP support is available in MongoDB Enterprise, but not in MongoDB Community Edition. However, it is available in other open-source versions of MongoDB, such as Percona Server for MongoDB.

Note: LDAP in the Percona Distribution for MongoDB will work with MongoDB Compass.

In the following links, we describe how to use LDAP:

Percona Server for MongoDB LDAP Enhancements: User-to-DN Mapping

Percona Server for MongoDB Authentication Using Active Directory

Authorization in MongoDB

Enabling access control on a MongoDB deployment enforces authentication, requiring users to identify themselves. When accessing a MongoDB deployment that has access control enabled, users can only perform the actions determined by their roles. Replica sets and sharded clusters also require internal authentication between members when access control is enabled. It is essential to follow the principle of least privilege. No one should have more permissions than they need to do their job, and even a DBA should log in with a non-elevated account.

MongoDB grants access to data and commands through role-based authorization and include built-in roles that provide the different levels of access commonly needed in a database system. Additionally, it is possible to create user-defined roles.

To create a role in MongoDB and add it to a user:

db.createRole({
role : 'write_foo2_Collection',
privileges : [ {resource : {db : "percona", collection : "foo2"}, actions : ["insert","remove"]}
],
roles : ["read"]
})

db.updateUser('client_read', roles : ['write_foo2_Collection'])

Transport Encryption in MongoDB

MongoDB has support for using transport encryption between the client and the nodes, and between the nodes in the cluster. Encrypting traffic ensures that no one can “sniff” sensitive data on the network. For example, tools like Wireshark or Tcpdump can easily capture unencrypted sensitive data such as usernames and passwords.

MongoDB supports X.509 certificate authentication for use with a secure TLS/SSL connection. Members can use X.509 certificates to verify their membership in the replica set and shards.

It is necessary to create certificates on all nodes and have a certificate authority (CA) that signs them. As using a CA can be costly, it is also possible to use self-signed certificates. Using a public CA is not necessary inside a private infrastructure.

Here are the detailed steps to create the certificates and configure Mongo:

MongoDB: Deploy a Replica Set with Transport Encryption

Data Encryption in MongoDB

One of the most severe problems with MongoDB was that data files didn’t have encryption at rest. Since version 3.6.8, Percona Server for MongoDB has offered at rest encryption for the MongoDB Community Edition. In upstream MongoDB software, data encryption at rest is available in MongoDB Enterprise version only.

The example below shows how to activate WiredTiger encryption for data at rest in Percona Server for MongoDB. First, it is necessary to edit the encryption options in mongod.conf:

# Encryption variables in mongod.conf shell
[root@app ~]# grep security -A2 /etc/mongod.conf
security:
  enableEncryption: true
  encryptionKeyFile: /data/key/mongodb.key

By default, Percona Server for MongoDB uses the AES256-CBC cipher mode. It is necessary to create the key with OpenSSL as below:

# Create Encryption KeyShell
[root@app ~]# mkdir /data/key
[root@app ~]# openssl rand -base64 32 > /data/key/mongodb.key
[root@app ~]# chmod 600 /data/key/mongodb.key

Now start Percona Server for MongoDB:

[root@app ~]# systemctl start mongod

To check whether encryption is successfully enabled in the database, use the command below:

# Security outputShell
mongo > db.serverCmdLineOpts().parsed.security
{ "enableEncryption" : true, "encryptionKeyFile" : "/data/key/mongodb.key" }

PS: Note that for existing nodes, it is necessary to perform a new initial sync to encrypt the data.

Auditing in MongoDB

Auditing is not designed to mitigate a security threat but helps when investigating unauthorized access or tracking data access and modification. The general database auditing concept is about tracking the use of database records and authority. When we audit a database, each operation on the data can be monitored and logged to an audit trail. This includes information about which database object or data record was touched, which account performed the action, and when the activity occurred.

MongoDB Atlas offers audit logging natively, as does MongoDB Enterprise Edition and Percona Server for MongoDB. To enable the audit log in Percona Server for MongoDB in the command line or the config file, add these entries in the command line:

mongod --dbpath /var/lib/mongodb --auditDestination file --auditFormat BSON --auditPath /var/lib/mongodb/auditLog.bson

Or in the MongoDB configuration file:

auditLog:
   destination: file
   format: BSON
   path: /var/lib/mongodb/auditLog.bson

Conclusion

It is vital that you secure your database before the deployment phase. The five mentioned measures above can be automated using tools like Ansible/Puppet, and are a good start to secure your database. A MongoDB database with these five security measures taken would not be affected by the Meow attack.

For further information on database security, please watch our recent webinar from Akira Kurogane, MongoDB Lead, Percona – On-Demand Webinar: Securing MongoDB.

You can also access additional expert guidance and thought-leadership via our website.


Learn more about the history of Oracle, the growth of MongoDB, and what really qualifies software as open source. If you are a DBA, or an executive looking to adopt or renew with MongoDB, this is a must-read!

Download “Is MongoDB the New Oracle?”

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com