In this blog post, we will discuss the capped collection of MongoDB. Capped collection is a fixed-type collection that inserts docs in a circular fashion. This means once allocated files are full, data at the beginning of the first file is overwritten. Consider this: we define a capped collection of size 1GB, it will purge out the oldest document if the allocated size of the collection is full and we have to insert a new document.
A capped collection guarantees that it will maintain the document’s insertion order. Due to this, it doesn’t require an extra index to retrieve a document. This helps a capped collection to maintain a high throughput insertion. Capped collection’s document contains the _id field, which is by default index, and deletion of the document will happen based on the oldest _id. MongoDB automatically increases the capped collection’s provided size to make it an integer multiple of 256. One should avoid updating a doc in a capped collection, but if you want to update, you can. Only if you don’t increase the original size of the document, and it is recommended to be light on updates as it scans the whole collection. Create an index to avoid collection scan.
A capped collection can be created as below:
db.createCollection( "logs", { capped: true, size: 500000 } ); // size is in bytes.
We can also specify the maximum number of docs in the capped collection.
rs1:PRIMARY> db.createCollection( "logs", { capped: true, size: 500000, max: 500 } ); // Size parameter is always needed even if we define the max document number. { "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1676896611, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1676896611, 1) } rs1:PRIMARY>
Convert a collection to capped collection:
rs1:PRIMARY> db.runCommand({ "convertToCapped" : "log_old", size: 500000, max : 50 }) { "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1676896802, 3), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1676896802, 3) }
Verify if it’s a capped collection:
rs1:PRIMARY> db.log_old.isCapped() true rs1:PRIMARY>
Query a capped collection:
MongoDB guarantees that retrieving of the docs will be in the same order as it was inserted.
db.log_old.find().sort( { $natural: 1 } )
To return the docs in reverse insertion order, use the sort method with $natural parameter set to -1.
db.log_old.find().sort( { $natural: -1 } )
Change a capped collection size:
From MongoDB v6.0 onwards, capped collection size can be resized. However, before resizing the capped collection, ensure featureCompatibilityVersion is set to at least “6.0”.
db.runCommand( { collMod: "log", cappedSize: 100000 } ) //cappedSize should be in between 0 and 1PB.
If you try to resize the capped collection in a version older than v6.0 (without setting featureCompatibilityVersion to 6.0) then it will fail with the error “unknown option to collMod: cappedSize“:
rs1:PRIMARY> db.version() 4.4.16-16 rs1:PRIMARY> db.runCommand( { collMod: "logs", cappedSize: 100000 } ) { "operationTime" : Timestamp(1678096200, 1), "ok" : 0, "errmsg" : "unknown option to collMod: cappedSize", "code" : 72, "codeName" : "InvalidOptions", "$clusterTime" : { "clusterTime" : Timestamp(1678096200, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } rs1:PRIMARY>
Advantage:
- Supports high insertion throughput: Capped collection keeps data in insert order and removes the index overhead; due to this, it supports high insertion throughput.
- Capped collection is useful in storing the log information as it keeps the data ordered by the events.
Disadvantage:
There are some restrictions of the capped collection.
- A capped collection can’t be sharded.
- A capped collection can’t have TTL indexes.
Summary
Capped collection can be useful to store log file information as it’s close to the speed of writing log information directly to a file system without the index overhead. MongoDB itself uses the capped collection for replication of oplog.rs collection’s storage mechanism due to its solid performance. Oplog.rs collection is a special capped collection in which we cannot create an index, insert a document, or drop the collection. Capped collection has advantages and disadvantages, so before using a capped collection, make sure you understand the application’s requirements and decide accordingly.
We also encourage you to try our products for MongoDB, like Percona Server for MongoDB, Percona Backup for MongoDB, and Percona Operator for MongoDB. We also recommend checking out our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?