I've come to believe that MongoDB is a poor choice for a general-purpose application database.
Mongo certainly has it's valid use cases, and holds up under a number of uses that a difficult for SQL to keep up. However, as the persistance layer for the majority of applications, I don't think it's a good choice. Here's why.
6 Reasons Why I Think Mongo is a Poor Choice
1. All data in most applications is relational.
Users belong to teams; users create entities; entities have (sometimes complex) relationships to one another.
Yes, you can use references between documents to create these relationships, but then you'll need to figure out querying.
Yes, I know you can model things in hierarchical/nested fashion - but not everything can actually be nested. Also, once you get 2 arrays deep in a document, update operators get really difficult to maintain or are unsupported.
Yes, I know Mongo recommends denormalization - but in most applications data is frequently updated and denormalization makes this a giant pain that causes bugs like "it's updated here but not there". (And, you have to implement all those updates in your application tier.)
Mongo only supports SQL join-like queries in aggregation pipelines (see
$lookup), which can only read but not perform updates - so if you need to update on the basis of a relationship - that's more logic for your application tier.
2. There's always a schema.
If you're running business logic against data stored in a DB, there's a schema.
If it's not enforced by the database, then it's either explicitly enforced in the application tier, the API layer, or you use some sort of schema enforcement mechanism like Mongoose or a JSON Schema implementation.
Or, you suffer from a tremendous amount of runtime errors because the application code implicitly has a contract with the data structure.
If you've rolled your own schema enforcement or suffer from lots of runtime errors, you're doing more work that your DB should be doing for you.
As of Mongo 3.2, you can use a JSON Schema-like definition to enforce document schemas in a collection; this should be getting more attention and have better tooling.
3. ObjectIds are hard to work with.
The default document ID is BSON Object ID. There's some complicated logic behind how these work, but we mostly know them as 24 character hex strings.
I have more than a couple of issues BSON IDs.
First, there's an order but it's not visible to the naked eye. Incrementing integers have clear sequence that's easy to spot.
Second, programmatic comparison (at least in Node.js) is a huge pain. BSON IDs are not strings but a different type entirely, but they serialize to strings and often go over the network as strings - which means you need to check that you're comparing strings to strings or objects to objects in order to do equality matching.
Third, BSON IDs difficult to remember, read, and generally interact with. This leads to lots of little friction, but if you need an incrementing integer ID - you'll need to implement that yourself.
All of this leads to a bunch of mental friction that you don't have with integer IDs.
The only handy thing I've found is that you can assign the BSON ID for the document prior to actually creating it in the DB, and have pretty strong guarantees that it will be unique. You can even do this from the client, if you're API is loose about that.
4. Migrations are still a thing.
Because there's still a schema if you're running business logic against Mongo documents, you'll need to migrate it as you add/remove properties to documents.
The suggested approach is to read any format but write the updated format. This sounds nice in practice, but in actual practice, I haven't ever seen a clean implementation that isn't a mess of maintenance.
On the other hand, tooling is pretty minimal around migrations. I ended up writing my own migration framework to solve for it.
5. Aggregations pipelines are brutish to work with.
Every developer I've seen (including myself) struggles mightily with them.
Complex SQL statements can be tough to reason about and optimize, but I've experienced more tension in trying to work with aggregation pipelines.
Reasoning about complex pipelines where the input of one stage (and potentially substages) is piped into the next stage is very complex. I'm still confused about whether you reference a property with
$$name in any case.
Once you've got a non-trivial amount of data, it's really easy to write an aggregation pipeline that exceeds the magical 100MB limit for a pipeline stage and throws an error. You need to pass in
allowDiskUse: true, but now you risk writing to disk which slows the whole thing down.
And, while aggregation pipelines allow you to do something that mimics SQL "joins", you can't use them to perform updates.
6. Vendor Lock-In
MongoDB is technically open-sourced, but behind it is a venture backed company that has taken on millions of VC money.
Their monetization play is to have a bunch of hosted services, principally MongoDB Atlas.
In order to protect that play, Mongodb, Inc. put strict licensing around it which restricts competitors from offering hosted services. (They bought out one of the only services, mLab, and Amazon's AWS offered a Mongo-api compatible service called DocumentDB.)
Unlike SQL, you don't have a number of vendors to choose from. In many cases, you can swap different flavors of SQL unless you do vendor-specific things (e.g., Postgres only features).
To be fair, I don't have any complaints with Mongo's hosted service, Atlas. It supported everything needed for a decent scale production application.
When Mongo Might be a Good Choice
Ok, that's a lot of criticism.
Here are a few places that I would consider Mongo for:
- Storing semi-unstructured data
Let's suppose I have data coming from a 3rd party or scraping data and it comes in semi-structured. Mongo is a great place to dump this for processing later without worrying about strict schema adherence.
- A small prototype that I want to prove out
You can rapidly iterate through some different schema design choices, shove JSON from the frontend into the DB, and quickly prototype out an application. But, like most protoypes, they're designed to be thrown away once their usefulness runs out.
There's probably other valid use cases.
What would make me change my mind?
Specifically, what would make me change my mind about using MongoDB as a general purpose application database?
- Foreign key constraints (I should be able to enforce documents existing across collections linked by IDs)
- Better tooling around schema enforcement
- Better tooling for creating aggregation pipelines
- Built-in support for incrementing IDs
- Looser licensing to avoid vendor lock-in
Have I given up on NoSQL?
There are plenty of non-MongoDB NoSQL databases that make a ton of sense for different use cases.
Redis is a shining example that excels at caching, storing session data, etc.
In some specific cases, a graph databases can make a lot of sense as well. Neo4j is the incumbent but TigerGraph is doing interesting things.
CouchDB or Apache Cassandra is also something to consider under specific application constraints.
If you're already in the AWS ecosystem, DynamoDB might make sense to use.