Understanding Indexing in MongoDB

TLDR: Indexing is just a way of grouping documents, dividing collections into groups to speed up performance

Indexes increase query performance as well as it is used for searches

The idea of indexing in MongoDB is similar to index of any book, They increase the speed of finding a page. The index in MongoDB increase the speed of finding documents

First, let’s understand how you declare index in MongoDB

collectionName.createIndex({field:value}) //for creating indexcollectionName.dropIndex({field:value}) //for removing index

Here field is the “fieldName” which will be indexed. “Value” could be -1 or 1 or “text”.

It defines the type of index, 1 or -1 increase find() query performance whereas “text” is used for searching.

1 and -1 give the order of the index. Ascending = -1 & Descending =1

Now, How indexes work under the hood?

Imagine a collection of users, each document containing various information, one of them being score.

Let’s say we want all users to score 23.

When no index exists, MongoDB goes through each document to find the queried document, This is called Collection scan, MongoDB has a shorthand for this COLLSCAN (This is called Table scan in Relational databases)

How can we optimize this?

To optimize this we can create a table with one column for score and another column for references which will contain IDs of documents with that particular score. Now we only need to scan that table rather than scanning the whole database. This is much faster. This is exactly what an Index is.

Indexes help MongoDB narrow the dataset it will have to scan. This is called Index Scan, MongoDB has a shorthand for this too IXNSCAN

Here is a visual representation of a score Index and its mapping.

Performance improvement by Index is only visible when the number of documents crosses 100K or so.

You can compare it yourself by comparing two queries one with an indexed field and one without an Index

db.<collection name>.find(query).explain()

An Object will be returned

object.winingPlan.stage will tell the type of scan COLLSCAN or IXNSCAN

but it will not tell you the time taken in the execution

use explain(‘executionStat’) method before the query method like find

db.<collection name>explain('executionStat').find(query)

executionTimeMillis will tell you the time taken in the execution

Indexes are not free, Indexes cost space. Indexes may increase read speed but whenever something is written, the index needs to be updated to resolve this we use B- tree we do some calculations before insertion so that it is faster.

Indexes are not a Table of groups, conceptually it is actually a B-Tree (Binary Tree) not just MongoDB, SQL database also uses B Tree for indexing

this Video best explains B Tree


Finding Magic.