Relational NoSQL with MongoDB
I had this conversation with a colleague of mine, who is a SQL Server DBA. She said she knows almost nothing about "NoSQL" but believed "it's the future!". Here I am to tell you that, if you know relational databases, you already know some basic MongoDB, one of the most used NoSQL databases out there. Maybe you are ready for the future!
The scary world of NoSQL
So, what is NoSQL?
I know! I know! It's all about JSON and stuff!
Well... no. I mean, not really. The difference is more in the way you retrieve data than the way it is stored.
There are a lot of NoSQL db's around with different data structures that are used for specific problems. Like graphs for networks of links, peoples, cities; or associative arrays (or KV, key-value) used for caching. And there are the document-oriented db's that uses syntax like XML or JSON to handle data, MongoDB is one of them.
The thing they have in common is that they tend to avoid use classic "Structured Query Language" for retrieving data. Not only SQL, they say.
But is it too different from SQL?
It may be. They favor other syntaxes because they're better suited for its data structure or use case.* So if you have a KV structure, you only need the key to search the value, no projection needed.
That's not the case for MongoDB.
* There are some differences in distributing data, network partition and other stuff like the CAP Theorem. But I don't want to go there today.
The same concepts, different structure
Imagine we have a Video On Demand (a.k.a. VOD) application for superhero movies.
Today you may go ahead and create a relational database with a Movies
table
for our catalog, like this one:
id | title | year | publish_date | studio_fk |
---|---|---|---|---|
1 | "Justice League" | 2017 | 2018-02-10 | 1 |
2 | "Deadpool" | 2016 | 2017-10-19 | 3 |
3 | "Wonder Woman" | 2017 | 2017-09-19 | 1 |
4 | "Captain America: Civil War" | 2016 | 2017-09-19 | 2 |
Along with a Studios
table. Indexes, foreign keys, normalization and stuff.
You know the drill. Let's not think about it for now.
I already told you MongoDB uses JSON-like syntax to store data, not tables. So, how, say, the "Deadpool" entry would look like?
{
"_id": 2,
"title": "Deadpool",
"year": 2016,
"publish_date": Date("2017-10-19"),
"studio": {
"name": "21th Century Fox",
"logo": "url_to_fox_logo.png"
}
}
Wow! Not that different!
You did see the pattern, right? There do are some odd stuff, but mostly the
columns' names are translated to JSON fields
and the values are attributed
accordingly.
Other thing in common in this case is that the _id
acts like a Primary Key: it
must exist and its value must be unique. If you don't provide it when inserting
an entry, the database generates one for you.
So, there is a PK. Are there indexes too?
Yes, there are. The indexes are defined for every collection of documents
called... well... collections
! And collections act pretty much like tables.
The only real difference is that you do not need to specify the schema of the
data. If the index can't find the key it treats it as null
or forbids you to
insert the document, depends on how you created the index.
Pretty much like in SQL Server.
Nice! And the studio thing there in the end?
Oh! I almost forgot it! If you need to denormalizate your data, for performance (or even sanity) reasons, you may embed another document inside your own, and even query it.
Speaking of queries
How can you ask for data in this database? I'll start with the SQL syntax again.
If you need the list of the movie titles from the "Warner" studio, in the reverse order of publish date, you'll write something like this:
SELECT title, publish_date
FROM dbo.Movies AS movie
JOIN dbo.Studios AS studio ON movie.studio_fk = studios.id
WHERE studio.name='Warner'
ORDER BY publish_date DESC
-- See? I know some SQL too! :D
Let's translate it to the language MongoDB understand, which happens to be JavaScript.
var projection = { title: 1, publish_date: 1, _id: 0 };
var filter = { "studio.name": "Warner" };
var order = { publish_date: -1 };
db.movies.find(filter, projection).sort(order);
Notice the similarities! The db.movies
means we are dealing with the movies
collection in current db, just like the FROM
clause. The filter
function
acts like the WHERE
, with a little help of the denormalization here. The
SELECT
clause is converted to a projection
json object. The ORDER BY
goes
inside the sort
method, with the -1
indicating the descending order.
The main difference in this example is that you need to explicitly indicate the
_id
field should not be retrieved with a "_id": 0
in the projection
, as
the default is that it's always retrieved.
Equal but not the Same
There's a lot common in both database systems, but what's really different in MongoDB? Besides the data structure and the query language, there are a few things that the developers of MongoDB decided to keep aside. Mostly because, in their vision, those features make it difficult to deal with distributed data the way they do.
The biggest one I think is the absence of Transactions.
That's why many developers and dba's make use of embed documents when designing a MongoDB database. Of course, when you use embed documents, there's a limit for the document size, and sometimes it may cause some updates absurdly complex and slow. I recommend to use it with caution. There's a great tutorial on how to create transaction-like operations in the MongoDB docs, so you may "normalize" your data across collections and treat some complex updates on more than one document.
Summing it up
Let's see what we know now:
- The JSON documents in MongoDB are basically SQL entries with
{}
- Collections are like Tables
_id
is an obligatory PrimaryKey- When querying MongoDB we use all the elements from a SQL query, but written in JavaScript.
- There are no Transactions
That's the basic. I hope you'll feel more comfortable to search more info about NoSQL databases after knowing that it's not that different. Maybe it will help you to improve your SQL skills and mindset. Who knows?
I used MongoDB as an example because of those similarities. If you want to know more databases, with other paradigms, I recommend the book Seven Databases in Seven Weeks.
And that's a wrap. May the Force be with you.
Thanks to my colleague and friend Mariana Sampaio for reviewing this post