MongoDB and non-existent collections

MongoDB and non-existent collections

MongoDB and non-existent collectionsIn this blog, I will discuss how I found some of my basic SQL assumptions that don’t hold true when dealing with MongoDB and non-existent collections.

Coming from a MySQL background, I have some assumptions about databases that don’t apply to MongoDB (or other kinds of databases that are neither SQL-based nor relationally-inspired).

An example of this is the assumption that data is organized in rows that are part of a table, with all tables having a strict schema (i.e., a single type of row). When working with MongoDB, this assumption must be transformed into the idea that data is organized in documents that are part of a collection and have a flexible schema so that different types of documents can reside in the same collection.

That’s an easy adjustment to make because a dynamic schema is one of the defining features of MongoDB. There are other less-obvious assumptions that need to be adjusted or redefined as you get familiar with a new product like MongoDB (for example, MySQL does not currently support built-in sharding, while MongoDB does).

There is a more fundamental kind of assumption, and by “fundamental” I mean an assumption that is deeply ingrained because you rely on it so often it’s automatic (i.e., unconscious). We’re usually hit by these when changing programming languages, especially in dynamic ones (“Will I be able to add a number to a string? If so, how will it behave?”). These can make it hard to adjust to a new database (or programming language, operating system, etc.) because we don’t consciously think about them and so we may forget to verify if they hold in the new system. This can happen in “real” life too: try going to a country where cars drive on the other side of the road from yours!

While working on a MongoDB benchmark recently, I was hit by one of these assumptions. I thought sharing my mistake may help others who are also coming to MongoDB from an SQL background.

One of my computing assumptions can be summarized as “reading from a non-existent source will fail with an error.”

Sure enough, it seems to be true for my operating system:

telecaster:~ fernandoipar$ cat notexists
cat: notexists: No such file or directory

And for MySQL:

mysql> select 1 from notexists;
ERROR 1146 (42S02): Table 'test.notexists' doesn't exist

But what happens in MongoDB?

> db.notexists.find()
> db.notexists.find().count()

No errors. You get no results. Interestingly, you can even count the number of documents in a cursor that is associated with a non-existent collection.

As I said, I hit this while working on a benchmark. How? I was comparing the throughput for different engines and various configurations, and after preparing the graphs they all showed the same behavior. While it’s not impossible for this to be accurate, it was very unlikely given what I was trying to measure. Some investigation led me to discover a mistake in the preparation phase of my benchmarks. To save time and to use the same data baseline for all tests, I was only running sysbench prepare once, backing up the data directory for each engine, and then restoring this backup before each experiment. The error was that I was restoring one subdirectory below MongoDB’s expected dbpath (i.e., to /data/db/db instead of /data/db), and so my scripts were reading from non-existent collections.

On a MySQL experiment, this would have immediately blown up in my face; with MongoDB, that is not the case.

On reflection, this behavior makes sense for MongoDB in that it is consistent with the write behavior. You don’t need to create a new collection, or even a new database. It’s enough that you write a document to it, and it gets created for you. If writing to a non-existent collection produces no errors, reading from one shouldn’t either.

Still, sometimes an application needs to know if a collection exists. How can you do this? There are multiple ways to do this, and I think the best approach is to verify their existence during the application initialization stage. Here are a couple of examples:

> db.notexists.stats()
{ "ok" : 0, "errmsg" : "Collection [test.notexists] not found." }
> db.getCollectionNames().indexOf("notexists") >= 0

I hope you find this useful, and if you’ve been hit by similar problems (such as MongoDB and non-existent collections) due to broken assumptions when moving to MongoDB with an SQL background, I’d love to read about them in the comments!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com