Back in 90s and early 20th century applications were like following:

Everything you need to know about data modelling for document databases

Today, building applications is different than what it was back in 2000. Building modern day applications involves quite a few different technologies.

Everything you need to know about data modelling for document databases

Back in the days, you could estimate and predict the size of your audience back then, today we have virality, different marketing schemes which makes it harder to predict the size of audience. You can never know, your next app can be the big hit.

We spent too much time in relational world, it takes time, effort and practice to bend your mind and understand how to do it in NoSQL way.

In relational data stores data is organized in tables which store rows and columns. Data types are usually the simple ones like, string, int, float date time. It is hard to store complex data types like arrays, hash tables and complex objects, which is called impedence mismatch .

On the other hand, key value or document stores can also store simple data types, moreover, they are comfortable storing complex data types usually in JSON format. JSON is basically a special class of string for representing data structures. Also, the schema in key/value and document stores are implicit and unenforced.

For many year now, we have been thought to normalize our data model for redundancy and efficiency as following.

Everything you need to know about data modelling for document databases

Relational data stores are like a garage, you break your car apart and store related ones in the shelves. Store related things together.

In a document databases, you persist the vehicle the way it is. “Come as you are”

Everything you need to know about data modelling for document databases

Same data can be represented in JSON document format as above. It is much easier, compact and simpler to represent the aggregate in JSON. It is a denormalized form of the data. Also, It makes it more convenient to distribute this data across multiple servers or clusters. The main problem with relational data stores here is the JOIN operation which is expensive operation.

Here is the domain model for this particular example:

While ORMs come to rescue for relational databases, performance is usually becomes a problem. One can claim that serialization/deserialization for JSON documents can also become a problem however, usually serialization frameworks are a lot faster these days.

Schema free databases makes it easy to rapidly develop software, however, you need to understand and embrace the principles behind data modelling for document databases.

Questions you need to keep in mind during modelling are:

How is data going to be persisted?

How are you going to retrieve or query data?

Is your application write or read bound?

Key Patterns:

We start by defining unique keys which we call predictable keys (that we can query easily). Following keys have corresponding values which are JSON documents.



-> User data as value

user:: [emailprotected]

-> User data as value


product::12345 [ID of the product] -> Product data as value

Then we have unpredictable keys as follows:


session::a5ab2863-db93-430e-8da3-feeb1998521f -> Session document data

Unpredictable keysare queried with Map-Reduce functions in most key/value and NoSQL databases.

Counter-Id Pattern:

Almost all of the key/value and document stores provides very fast atomic counters . These atomic counters are safe for doing increment and decrement operations. The idea behind counter-id pattern is as follows:

Everything you need to know about data modelling for document databases

Additionally, we can get the current state or the value of the counter anytime and hence we can predictably query users or iterate over them. This is a pretty easy and clever technique in my opinion. Since we only do increment, I know the size of the data, I can even run multi-get operations on it or do some paging.

This is similar to Identity column in RDBMS systems. Each time I want to store a document, increment the counter, get the counter value, create a predictable key and store the key/value.

Lookup Pattern:

Lookup pattern is usually a two step process as follows:

Everything you need to know about data modelling for document databases

Initially, we store the data with an unpredictable key (such as GUID, UUID), then we create references to the particular key/data with predictable keys such as (email, username etc.)

In this example: we store a user data with a GUID representation, then we store references to this particular key, such as email, username. While retrieving the user data in a predictable way, we use email and username to get the GUID representation, then we can do another GET operation with key we captured from the first query, in order to get the data.

This makes it easy to retrieve documents without Map-Reduce operations. This pattern is quite useful for large datasets. Most of the data stores provide map-reduce operations with eventual consistency that use B trees on disk with log(n) complexity . However, lookup pattern provides immediate consistency with O(1), constant lookup time . Hence lookup pattern is very fast, scales very well and provides read your own writes along with consistent performance.

You can use these patterns together. For example: you can use counter pattern along with lookup pattern as following:

Everything you need to know about data modelling for document databases

Initially we get an ID from our counter, we store the data with “user::54” key. Then we can add additional lookup keys such as (“twitter::first”,”user::54″) and we use other keys that we will need to get this particular document. Again, we need a two step process to get the initial data. First we do a GET operation on (“twitter::firat”) , which gives us the result (“user::54”) , then we do another GET operation on (“user::54”) and we have the user document.

Yet again, these are pretty fast operations since they run in constant lookup time.

Pros of combining counter-id and lookup pattern:

These are extremely fast binary operations.

Linear performance.

Enables several ways to find single document.

Provides consistency, always consistent.

Cons of combining counter-id and lookup pattern:

Increases number of documents in the storage which is usually not a problem.

Below you can find the differences between RDBMS and Document Stores for persisting and querying the data.

Everything you need to know about data modelling for document databases

Related data patterns:

To embed or to reference? That is the question.

Embedding is easy. You can persist the document the way it is.

Pros with embedding:

Easy to retrieve document, you can get the document at once. Single read operation brings you back the document. Single write or update operation takes care of your insert or update.

Cons with embedding:

In most dat


主题: SQL
tags: data,document,key
本文标题:Everything you need to know about data modelling for document databases

技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(187)