~/posts/2025-02-16_Integrating-Snapshotter-with-a-memory-datastore-in-Golang.md
$

cat 2025-02-16_Integrating-Snapshotter-with-a-memory-datastore-in-Golang.md

📅

This is a follow up article to this post that I wrote about comparing Copy on Write and Redirect on Write mechanisms.

This post will cover a more practical example of how we can integrate Copy on Write techniques on a Golang in-memory datastore.

A point in time snapshot refers to the copy of the existing data which is representative of the data in the memory at that specific time.

Goals

  • Don't affect the throughput performance of the current request processing layer.
  • Ability to take multiple snapshot instances simultaneously.
  • Ability to snapshot and restore on systems with different shards
  • Shouldn't depend on existing data files apart from the in-memory data structures

Design

Dummy Store

As an example, we will implement a DummyStore which is a simple wrapper on top of the map store.

Implementing Copy on write

The snapshotting technique would be similar to the copy-on-write mechanism, ie, additional data wouldn't have to be stored till the data has to be modified. This means additional memory would only be required if there are changes to the underyling data.

Impact on current latency benchmarks

  • For reads, there should be minimal latency change since there are no references to the get methods even when snapshotting is running. One thing which may impact the read latency is that it has to iterate through all the keys, so an implicit lock inside the datastructure may be required.
  • For writes, if a snapshot is going on, then it has to write in 2 places and an additional read to a map.

Flow

The initiation flow:

ShardThread::CallSnapshotter -> Snapshotter::Start -> Store::StartSnapshot -> SnapshotMap::Buffer
-> PITFlusher::Flush

When the iteration is over

Store::StopSnapshot -> SnapshotMap::FlushAllData -> PITFlusher::FlushAllData -> Snapshotter::Close

Changes for ShardThread and Store

The snapshot would start on every ShardThread and fetch the Store object. Every Store object needs to implement the interface SnapshotStore which is contains the StartSnapshot and StopSnapshot methods. The StartSnapshot and StopSnapshot methods would be called on the store from the snapshotter object.

StartSnapshot

When the StartSnapshot method is called, the Store should keep note of the SnapshotID in a map. There can be multiple instances of snapshots for every store as well. For any read or write operation which is performed, the Store object should check if a snapshot is being run at that instance. If no snapshot is being run, then continue as usual. If a snapshot is being run, then for any subsequent write operation, store the previous data in the snapshot's object, maybe a map. Let's call this the SnapshotMap. If there are multiple write operations to the same object and the data already exists in the SnapshotMap, then skip doing anything for the snapshot. Similarly, for reads, if a snapshot is being run, if the incoming request is from a snapshot layer, then check if there is anything in the SnapshotMap for the key. If no, then return the current value from the Store.

It should fetch the list of keys in its store attribute and iterate through them.

StopSnapshot

When the iteration through all the keys by the Store object is done, the StopSnapshot method is called by the Store. The StopSnapshot lets the SnapshotMap know that there are no more updates coming. The SnapshotMap then talks to the PITFLusher to finish syncing all the chunks to disk and then closes the main snapshot process.

Point-in-time Flusher

The PITFlusher serializes the store updates from the SnapshotMap to binary format, currently gob. It serializes and appends to a file.

Implementation

Let's write some code now.

The main snapshot object is defined as follows:

type (
	SnapshotStore interface {
		StartSnapshot(uint64, Snapshotter) error
		StopSnapshot(uint64) error
	}
	PointInTimeSnapshot struct {
		ctx        context.Context
		cancelFunc context.CancelFunc

		ID uint64

		store SnapshotStore

		SnapshotMap *SnapshotMap

		flusher *PITFlusher

		StartedAt time.Time
		EndedAt   time.Time

		exitCh chan bool
	}
)

Every snapshot object has an ID which is used as the identity of the snapshot. The snapshot object has two underlying tasks:

  • SnapshotMap
  • Flusher

The SnapshotMap is a temporary data store which the actual store object is provided a reference to. The DummyStore object adds data to the SnapshotMap using TempAdd if there are any writes during a snapshotting process. While taking a snapshot, it checks if the data is present in the SnapshotMap because of any writes after the snapshotting process has been started.

type (
	SnapshotMap struct {
		tempRepr  map[string]interface{}
		buffer    []StoreMapUpdate
		flusher   *PITFlusher
		closing   bool
		mLock     *sync.RWMutex
		totalKeys uint64
	}
	StoreMapUpdate struct {
		Key   string
		Value interface{}
	}
)

The SnapshotMap batches the writes in the form of array of StoreMapUpdate objects and passes it to the Flusher when the batch size of 1000 updates is achieved.

The Flusher receives data in batches from the SnapshotMap. It serializes the data into a binary format. In the example, I am using encoding/gob to convert it to a binary encoding. I am planning to move it to Protobuf. The Flusher then appends an open snapshot OS file with the serialized updates.

type (
	PITFlusher struct {
		ctx        context.Context
		snapshotID uint64
		updatesCh  chan []StoreMapUpdate
		exitCh     chan bool
		dlq        [][]StoreMapUpdate

		totalKeys uint64
		flushFile *os.File
	}
)

Once the process is completed, the Flusher closes and syncs the file.

Test cases and benchmarks

  • Snapshot data less than the buffer size without any subsequent writes
  • Snapshot data less than the buffer size with localized subsequent writes
  • Snapshot data less than the buffer size with spread out subsequent writes
  • Snapshot data more than the buffer size without any subsequent writes
  • Snapshot data more than the buffer size with localized subsequent writes
  • Snapshot data more than the buffer size with spread out subsequent writes
  • Ensure current get path is not affected

Results

=== RUN   TestSnapshotWithoutChangesWithNoRangeAccess
2025/02/17 09:10:38 Closing snapshot 281711000 . Total time taken 203.7495ms for total keys 1000000
--- PASS: TestSnapshotWithoutChangesWithNoRangeAccess (0.67s)
=== RUN   TestSnapshotWithoutChangesWithLowRangeAccess
2025/02/17 09:10:39 Closing snapshot 896770000 . Total time taken 248.569833ms for total keys 1000000
--- PASS: TestSnapshotWithoutChangesWithLowRangeAccess (0.66s)
=== RUN   TestSnapshotWithChangesWithLowRangeAccess
2025/02/17 09:10:40 Closing snapshot 554695000 . Total time taken 837.278666ms for total keys 1000000
--- PASS: TestSnapshotWithChangesWithLowRangeAccess (1.25s)
=== RUN   TestNoSnapshotWithChangesWithLowRangeAccess
--- PASS: TestNoSnapshotWithChangesWithLowRangeAccess (1.15s)

Runs pretty fast when there are no writes while snapshotting. Takes around 190-230ms to snapshot and write 1000000 keys to disk without simultaneous writes. Definitely slows down when there are writes in the system. Takes around 800-900ms to snapshot and write 1000000 keys to disk with low range simultaneous writes. Assumed as much in the below post since there are 2 additional read and write operations to maintain both the copies. It can be improved as the temporary snapshot store I am using is a little inefficient. But may not matter since it finishes under a second most of the times.

References

  • The code for the above tests is available here
  • Previous blog in the series available here

Hope you liked reading the article.

Please reach out to me here for more ideas or improvements.