Demystifying MongoDB write operations
In this post, we will try to understand the different factors which control the write operations in MongoDB. We will try to tie in the common concepts like checkpointing, journaling, replication that we hear so often in the context of write operations. There are different default configurations across different Mongo versions, so it’s important to check the default configurations before you modify the configurations.
A brief introduction to WiredTiger(WT)#
WiredTiger(WT) has been the default storage engine for MongoDB since 3.2. When WT is started on a node, it takes up 50% of the total memory. So in a system with 16GB memory, WT will take up around 7GB of memory. WT utilises the memory for both read and write operations. WT takes up 50% of the memory as it offloads the optimization operations to the OS. The data is stored as uncompressed in WT cache whereas on the OS, it is highly compressed, often to a 1:10 ration compared to the WT cache data size.
To understand more about the WT cache status, you can run the below command in the Mongo shell
db.serverStatus().wiredTiger.cache
This gives the following response:
db.serverStatus().wiredTiger.cache
{
"application threads page read from disk to cache count" : 9,
"application threads page read from disk to cache time (usecs)" : 17555,
"application threads page write from cache to disk count" : 1820,
"application threads page write from cache to disk time (usecs)" : 1052322,
"bytes allocated for updates" : 20043,
"bytes belonging to page images in the cache" : 46742,
"bytes belonging to the history store table in the cache" : 173,
"bytes currently in the cache" : 73044,
"bytes dirty in the cache cumulative" : 38638327,
"bytes not belonging to page images in the cache" : 26302,
"bytes read into cache" : 43280,
"bytes written from cache" : 20517382,
"cache overflow score" : 0,
"checkpoint blocked page eviction" : 0,
"eviction calls to get a page" : 5973,
"eviction calls to get a page found queue empty" : 4973,
"eviction calls to get a page found queue empty after locking" : 20,
"eviction currently operating in aggressive mode" : 0,
"eviction empty score" : 0,
"internal pages split during eviction" : 0,
"leaf pages split during eviction" : 0,
"maximum bytes configured" : 8053063680,
"maximum page size at eviction" : 376,
"modified pages evicted" : 902,
"modified pages evicted by application threads" : 0,
"operations timed out waiting for space in cache" : 0,
"overflow pages read into cache" : 0,
"page split during eviction deepened the tree" : 0,
"page written requiring history store records" : 0,
"pages currently held in the cache" : 24,
"pages queued for eviction post lru sorting" : 0,
"pages queued for urgent eviction" : 902,
"pages queued for urgent eviction during walk" : 0,
"pages read into cache" : 20,
"pages read into cache after truncate" : 902,
"pages read into cache after truncate in prepare state" : 0,
"pages requested from the cache" : 33134,
"pages seen by eviction walk" : 0,
"pages seen by eviction walk that are already queued" : 0,
"pages walked for eviction" : 0,
"pages written from cache" : 1822,
"pages written requiring in-memory restoration" : 0,
"percentage overhead" : 8,
"tracked bytes belonging to internal pages in the cache" : 5136,
"tracked bytes belonging to leaf pages in the cache" : 67908,
"tracked dirty bytes in the cache" : 493,
"tracked dirty pages in the cache" : 1,
"unmodified pages evicted" : 0
}
Different stages of a Write operation#
When a write operation is received, it is written to the WT’s dirty page cache and the journal’s in-memory buffer.
Journaling#
Journaling refers to the process of appending every write operation to a write ahead log to ensure data recovery if the process fails. Whenever the Mongo process restarts, it checks the last checkpoint and the journal. If there are items in the journal which have not been checkpointed, Mongo creates a new checkpoint and proceeds with the initialisation.
When a write operation is received, it is stored in the in-memory buffer of the WAL. Flushing to the on-disk WAL happens every 100ms. The advantage of using journaling is that since it’s an append only log, the writes are faster to complete.
By default, journaling happens every 100 ms.
Checkpointing#
Checkpointing is the process of flushing the data from in-memory buffers to the disk. When Mongo starts, it can pick off from the latest checkpoint reliably and start its operations. Checkpointing results in addition of the data to Mongo’s internal B+ tree.
By default, the checkpointing is done every 60 seconds.
The checkpointing process ensures that the dirty data in the WT cache is flushed to disk every 60 seconds or whenever the dirty cache ratio goes over a certain threshold.
Configuring the Checkpointing process#
There are different types of threads, called eviction_threads
which are solely responsible for flushing the dirty pages to the data files. These threads usually run in the background, without affecting the application workflow. Having a higher number of eviction threads will flush the data faster, though it will also utilize more resources, thus making the application threads slower.
When the dirty cache goes over a certain percentage of WT cache, the checkpointing process starts flushing the data to the data files. This is controlled via the eviction_dirty_target
which is set to 5% by default. The checkpointing continues till the ratio is lower than 5% of the WT cache.
When the dirty cache ratio is above 20%, the workload is also distributed amongst the application threads to clear the backlog, which in turn slows the application. This is configurable via eviction_dirty_trigger
.
Configuring block sizes in WiredTiger#
The WT on-disk files are made up of blocks known as pages, which are then written to the disk. These blocks are often compressed before being written to the disk.
When writing to Mongo, the pages are evicted from memory when the size of the pages crosses the maximum configured size. The pages are evicted, which then goes through a reconciliation process to convert the in-memory representation of the data to an on-disk representation.
Higher the number of page evictions, more the cost of updating the block manager, initialising the reconciliation process, compressing the data and then writing it to the on-disk files. Depending on the size of the data being dealt with, this can be configured accordingly.
The size of the blocks are defined by allocation_size
which can be edited by configuring the storage.wiredTiger.collectionConfig.configString
before creating the collection.
Changing WiredTiger’s cache size#
By default, WT takes up 50% of the available memory on the system. The reason it takes up 50% is because it offloads multiple tasks to the kernel as well. The size of the cache can be increased or decreased depending on the type of load running on the system.
Conclusion#
Before starting on the process to tune the database, it’s important to understand the patterns of the load the database is experiencing and to gather more performance metrics.
Each configuration has to be changed with the understanding that there is a tradeoff in every configuration change, which must be predicted beforehand.
The above configurations can be applicable to most databases out there today. I wanted to take MongoDB as an example to provide a concrete example.
I hope you liked the article. Please let me know if you have any queries regarding the article. Happy reading!!