This is a follow up article to this post that I wrote about comparing Copy on Write and Redirect on Write mechanisms.
This post will cover a more practical example of how we can integrate Copy on Write techniques on a Golang in-memory datastore.
A point in time snapshot refers to the copy of the existing data which is representative of the data in the memory at that specific time.
Goals
- Don't affect the throughput performance of the current request processing layer.
- Ability to take multiple snapshot instances simultaneously.
- Ability to snapshot and restore on systems with different shards
- Shouldn't depend on existing data files apart from the in-memory data structures
Design
Dummy Store
As an example, we will implement a DummyStore
which is a simple wrapper on top of the map
store.
Implementing Copy on write
The snapshotting technique would be similar to the copy-on-write mechanism, ie, additional data wouldn't have to be stored till the data has to be modified. This means additional memory would only be required if there are changes to the underyling data.
Impact on current latency benchmarks
- For reads, there should be minimal latency change since there are no references to the
get
methods even when snapshotting is running. One thing which may impact the read latency is that it has to iterate through all the keys, so an implicit lock inside the datastructure may be required. - For writes, if a snapshot is going on, then it has to write in 2 places and an additional read to a map.
Flow
The initiation flow:
ShardThread::CallSnapshotter -> Snapshotter::Start -> Store::StartSnapshot -> SnapshotMap::Buffer
-> PITFlusher::Flush
When the iteration is over
Store::StopSnapshot -> SnapshotMap::FlushAllData -> PITFlusher::FlushAllData -> Snapshotter::Close
Changes for ShardThread and Store
The snapshot would start on every ShardThread
and fetch the Store
object. Every Store
object
needs to implement the interface SnapshotStore
which is contains the StartSnapshot
and StopSnapshot
methods.
The StartSnapshot
and StopSnapshot
methods would be called on the store from the snapshotter object.
StartSnapshot
When the StartSnapshot
method is called, the Store
should keep note of the SnapshotID
in a map.
There can be multiple instances of snapshots for every store as well.
For any read or write operation which is performed, the Store
object should check if a snapshot is being
run at that instance. If no snapshot is being run, then continue as usual.
If a snapshot is being run, then for any subsequent write operation, store the previous data in the snapshot's
object, maybe a map. Let's call this the SnapshotMap
. If there are multiple write operations to the same object
and the data already exists in the SnapshotMap
, then skip doing anything for the snapshot.
Similarly, for reads, if a snapshot is being run, if the incoming request is from a snapshot layer, then check
if there is anything in the SnapshotMap
for the key. If no, then return the current value from the Store
.
It should fetch the list of keys in its store attribute and iterate through them.
StopSnapshot
When the iteration through all the keys by the Store
object is done, the StopSnapshot
method is called by the
Store
. The StopSnapshot
lets the SnapshotMap
know that there are no more updates coming. The SnapshotMap
then talks to the PITFLusher
to finish syncing all the chunks to disk and then closes the main snapshot
process.
Point-in-time Flusher
The PITFlusher
serializes the store updates from the SnapshotMap
to binary format, currently gob
.
It serializes and appends to a file.
Implementation
Let's write some code now.
The main snapshot object is defined as follows:
type (
SnapshotStore interface {
StartSnapshot(uint64, Snapshotter) error
StopSnapshot(uint64) error
}
PointInTimeSnapshot struct {
ctx context.Context
cancelFunc context.CancelFunc
ID uint64
store SnapshotStore
SnapshotMap *SnapshotMap
flusher *PITFlusher
StartedAt time.Time
EndedAt time.Time
exitCh chan bool
}
)
Every snapshot object has an ID which is used as the identity of the snapshot.
The snapshot
object has two underlying tasks:
- SnapshotMap
- Flusher
The SnapshotMap
is a temporary data store which the actual store object is provided a reference to.
The DummyStore
object adds data to the SnapshotMap
using TempAdd
if there are any writes during a snapshotting process.
While taking a snapshot, it checks if the data is present in the SnapshotMap
because of any writes after the snapshotting
process has been started.
type (
SnapshotMap struct {
tempRepr map[string]interface{}
buffer []StoreMapUpdate
flusher *PITFlusher
closing bool
mLock *sync.RWMutex
totalKeys uint64
}
StoreMapUpdate struct {
Key string
Value interface{}
}
)
The SnapshotMap
batches the writes in the form of array of StoreMapUpdate
objects and passes it to the Flusher
when
the batch size of 1000
updates is achieved.
The Flusher
receives data in batches from the SnapshotMap
. It serializes the data into a binary format. In the example,
I am using encoding/gob
to convert it to a binary encoding. I am planning to move it to Protobuf.
The Flusher
then appends an open snapshot OS file with the serialized updates.
type (
PITFlusher struct {
ctx context.Context
snapshotID uint64
updatesCh chan []StoreMapUpdate
exitCh chan bool
dlq [][]StoreMapUpdate
totalKeys uint64
flushFile *os.File
}
)
Once the process is completed, the Flusher
closes and syncs the file.
Test cases and benchmarks
- Snapshot data less than the buffer size without any subsequent writes
- Snapshot data less than the buffer size with localized subsequent writes
- Snapshot data less than the buffer size with spread out subsequent writes
- Snapshot data more than the buffer size without any subsequent writes
- Snapshot data more than the buffer size with localized subsequent writes
- Snapshot data more than the buffer size with spread out subsequent writes
- Ensure current
get
path is not affected
Results
=== RUN TestSnapshotWithoutChangesWithNoRangeAccess
2025/02/17 09:10:38 Closing snapshot 281711000 . Total time taken 203.7495ms for total keys 1000000
--- PASS: TestSnapshotWithoutChangesWithNoRangeAccess (0.67s)
=== RUN TestSnapshotWithoutChangesWithLowRangeAccess
2025/02/17 09:10:39 Closing snapshot 896770000 . Total time taken 248.569833ms for total keys 1000000
--- PASS: TestSnapshotWithoutChangesWithLowRangeAccess (0.66s)
=== RUN TestSnapshotWithChangesWithLowRangeAccess
2025/02/17 09:10:40 Closing snapshot 554695000 . Total time taken 837.278666ms for total keys 1000000
--- PASS: TestSnapshotWithChangesWithLowRangeAccess (1.25s)
=== RUN TestNoSnapshotWithChangesWithLowRangeAccess
--- PASS: TestNoSnapshotWithChangesWithLowRangeAccess (1.15s)
Runs pretty fast when there are no writes while snapshotting. Takes around 190-230ms to snapshot and write 1000000 keys to disk without simultaneous writes. Definitely slows down when there are writes in the system. Takes around 800-900ms to snapshot and write 1000000 keys to disk with low range simultaneous writes. Assumed as much in the below post since there are 2 additional read and write operations to maintain both the copies. It can be improved as the temporary snapshot store I am using is a little inefficient. But may not matter since it finishes under a second most of the times.
References
Hope you liked reading the article.
Please reach out to me here for more ideas or improvements.