Optimising Stripped Locks using Golang arrays ::

Problems with a global lock space#

I was recently working on an in-memory datastore which had a flat layout of the underlying data structure. The datastore contained key value pairs, similar to a hash. In the MVP phase of the implementation, to control concurrent access to the datastore operations, it took a global lock on the entire datastore.

Having a global lock on the datastore meant that every datastore operation had to wait for the read/write lock to be available. This slowed down all the operations because of the additional wait time. This also led to expensive queries affecting other queries since the inexpensive queries also had to wait for the lock to be freed.

This required us to divide up the lock space into smaller different buckets/stores so that the blast radius of queries are scoped to the buckets alone. Having a higher number of buckets will allow a higher number of concurrent requests. However, a very high number of buckets will also lead to a large lock space, which means that there would be more CPU cycles being spent on the finding the relevant lock for the bucket.

A high level approach that we will use here:

Define the number of buckets, let’s call it N.
For every key access to the datastore, hash the key.
Using the hash, find the allocated bucket by calculating the modulo with N.
Find the lock object (sync.RWMutex{}) in the bucket
Invoke the appropriate Lock method.

From first glance, it’s a simple approach where we can use a hashing function, for eg: FNV (Fowler-Noll-Vo) hashing algorithm, to convert the provided string to a hash key.

Once the hash is calculated, check in the Golang map if there is a lock object already defined in the value of the map. If there is, then return the lock object.

Arrays or Maps?#

As we can see from the above example, using a Golang map would result in a very simple implementation. However, let’s discuss on if a map is the right datastructure to use. Map is a generic dynamically allocated datastructure without much control on how the data in the map is laid out in memory.

Let’s run a simple benchmark using Maps and arrays to see how much time does it take to fetch the keys respectively. We will define different number of keys to benchmark the below code.

Small — 10 keys
Medium — 1000 keys
Large — 1000000 keys

// Benchmarking Map key fetch
func BenchmarkSmallMapFetch(b *testing.B) {
var (
smallMap = make(map[int]uint32, 10)
)

for i := 0; i < 10; i++ {
smallMap[i] = uint32(i)
}

for i := 0; i < b.N; i++ {
for innerIdx := 0; innerIdx < 10; innerIdx++ {
_ = smallMap[innerIdx]
}
}
}

// Benchmarking Array key fetch
func BenchmarkSmallArrayFetch(b *testing.B) {
var (
smallArray [10]uint32
)

for i := 0; i < 10; i++ {
smallArray[i] = uint32(i)
}

for i := 0; i < b.N; i++ {
for innerIdx := 0; innerIdx < 10; innerIdx++ {
_ = smallArray[innerIdx]
}
}
}

The results of the benchmarks are as follows:

BenchmarkSmallArrayFetch-10 264653072 4.508 ns/op
BenchmarkMediumArrayFetch-10 3629518 335.8 ns/op
BenchmarkLargeArrayFetch-10 3643 324678 ns/op

BenchmarkSmallMapFetch-10 23385183 49.75 ns/op
BenchmarkMediumMapFetch-10 241534 5011 ns/op
BenchmarkLargeMapFetch-10 27 39847242 ns/op324678

If you compare the results, fetching data from maps is at least 10 times slower than arrays. As the number of data points increase, maps become more slower compared to arrays, in some cases, even 100x slower.

There are various reasons to why arrays would be faster than maps in common use cases. Arrays are contiguous blocks of memory in the stack whereas Maps undergo uneven memory allocation, often resulting in NUMA (non uniform memory access), usually on the heap. Given the sequential layout in an array, it’s easier to align with the cache line, thus resulting in faster reads. Maps are dynamically sized, which also leads to an overhead on Golang’s runtime for memory allocation and garbage collection.

Implementing stripped locks using Arrays#

We introduce the following models to the implementation:

Lock
LockStore
LockHasher

Lock struct {
mutex *sync.RWMutex
name LockName
refCount uint32
}

LockStore struct {
hash [32]*Lock
lockCount uint8
}

LockHasher struct {
ctx context.Context
concurrency uint32
lockStores [DefaultLockConcurrency]*LockStore
}

`Lock` refers to the encapsulation of the actual mutex. It also stores the name of the lock and the `refCount` refers to the active references for the lock.

LockStore stores the locks in an array with the lock name indexed on an array.

LockHasher is the object which controls and stores the number of buckets/slots. LockHasher uses the keys to point the request to a specific LockStore.

After that, we expose simple methods for operations on all the above objects.

// ================== LockHasher. ==================
func (lockHsh *LockHasher) GetHash(strKey string) (hashSlot uint32, err error) {
var (
hashFn hash.Hash32
)
if strKey == "" {
strKey = DefaultLockIdentifier
}
hashFn = fnv.New32a()
if _, err = hashFn.Write([]byte(strKey)); err != nil {
return
}
hashSlot = hashFn.Sum32() % lockHsh.concurrency
return
}

func (lockHsh *LockHasher) GetLockStore(strKey string) (lockH *LockStore, err error) {
var (
hashSlot uint32
)
if hashSlot, err = lockHsh.GetHash(strKey); err != nil {
return
}
if lockH = lockHsh.lockStores[hashSlot]; lockH == nil {
err = fmt.Errorf(“lock store not found for %s”, strKey)
return
}
return
}

func (lockSt *LockStore) setup() (err error) {
availableLocks := []LockName{
LockOne,
LockTwo,
}
for _, lockName := range availableLocks {
if _, err = lockSt.AddLock(lockName); err != nil {
return
}
}
return
}

func (lockSt *LockStore) AddLock(name LockName) (lock *Lock, err error) {
lock = &Lock{
mutex: &sync.RWMutex{},
name: name,
refCount: 0,
}
if lockSt.hash[uint8(lock.name)] != nil {
err = fmt.Errorf(“slot already filled for %d”, lock.name)
return
}
lockSt.hash[uint8(lock.name)] = lock
lockSt.lockCount++
return
}

func (lockSt *LockStore) GetLock(name LockName) (lock *Lock, err error) {
if lock = lockSt.hash[uint8(name)]; lock == nil {
err = fmt.Errorf(“lock not found for %d”, name)
return
}
return
}

func (lockSt *LockStore) RemoveLock(name LockName) (err error) {
if lockSt.hash[uint8(name)] == nil {
err = fmt.Errorf(“lock not found for %d”, name)
return
}
lockSt.hash[uint8(name)] = nil
lockSt.lockCount–
return
}

func (lock *Lock) WLock() {
lock.mutex.Lock()
lock.refCount++
}

func (lock *Lock) WUnlock() {
lock.mutex.Unlock()
lock.refCount–
}

func (lock *Lock) RLock() {
lock.mutex.RLock()
lock.refCount++
}

func (lock *Lock) RUnlock() {
lock.mutex.RLock()
lock.refCount–
}

The above example is a naive implementation of the first phase of trying to have stripped locks. We will go into more complex implementations in the subsequent articles.

Next steps#

This is the first blog in a series of articles where we will cover the process of building an in-memory datastore starting from basics to complex memory optimised datastores. Follow along!!