Crate cachedb

source ·
Expand description

In memory Key/Value store with LRU expire and concurrent access

Description

Items are stored in N sharded/bucketized HashMaps to improve concurrency. Every Item is always behind a RwLock. Quering an item will return a guard associated to this lock. Items that are not locked are kept in a list to implement a least-recent-used expire policy. Locked items are removed from that lru list and put into the lru-list when they become unlocked. Locked Items will not block the hosting HashMap.

LRU List and expire configuration

Items that are not in use are pushed onto the tail of an least-recently-used list. Whenever a CacheDb decides to expire Items these are taken from the head of the lru-list and dropped.

There are some configuration options to tune the caching behavior. These configurations are all per-bucket and not global. Thus one has to take the number of buckets into account when configuring the cachedb. When exact accounting (for example for ‘highwater’) is needed one must use a single bucket cachedb.

Its is also important to know that min/max capacity configurations account against the capacity of the underlying containers, not the used number of entries. This allows for better memory utilization after a container got resized but should be respected in a way that caching won’t make the containers grow overly large.

The default configuration is choosen to create an rather generic cache that may grow pretty huge and being conservative with expiring entries. By this it should be useable as is for many use cases. If required the max_capacity_limit or highwater are the first knobs to tune.

Implementation Discussion

The HashMap storing the Items in Boxed entries. Entries protect the actual item by a RwLock. The API allows access to items only over these locks, returning wraped guards thereof. Since concurrent access to the Entries shall not block the Hashmap, some ‘unsafe’ code is required to implement hand over hand locking which is hidden behind an safe API.

New Items are constructed in an atomic way by passing a closure producing the item to the respective lookup function. While an Item is constructed it has a write lock which ensures that on concurrent construction/queries only one contructor wins and any other will acquire the newly constructed item.

Proof that no lifetime guarantees are violated

Is actually simple, the returned guard has a rust lifetime bound to the CacheDB object. Thus no access can outlive the hosting collection.

Proof that no data races exist

In most parts the Mutex and RwLock ensures that no data races can happen, this is validated by rust.

The unsafe part of the implementation detaches a LockGuard from its hosting collection to free the mutex on the HashMap. This could lead to potential UB when the HashMap drops a value that is still in use/locked. However this can never be happen because there is no way to drop Entries in a uncontrolled way. The guard lifetimes are tied to the hosting hashmap the can not outlive it. Dropping items from the hash map only done from the LRU list which will never contain locked (and thus in-use) Entries. The ‘remove(key)’ member function checks explicitly that an Entry is not in use or delays the removal until all locks on the Item are released.

While the HashMap may reallocate the tables and thus move the Boxes containing the Entries around, this is not a problem since the lock guards contain references to Entries directly, not to the outer Box.

Proof that locking is deadlock free

Locks acquired in the same order can never deadlock. Deadlocks happen only when 2 or more threads wait on a resource while already holding resource another theread is trying to obtain.

On lookup the hashmap will be locked. When the element is found the LRU list is locked and the element may be removed from it (when it was not in use). Once done with the LRU list its lock is released.

It is worth to mention that code using the cachedb can still deadlock when it acquires locks in ill order. The simplest advise is to have only one single exclusive lock at all time per thread. When is impractical one need to carefully consider locking order or employ other tactics to counter deadlocks.

TESTS

The ‘test::multithreaded_stress’ test can be controlled by environment variables

  • ‘STRESS_THREADS’ sets the number of threads to spawn. Defaults to 10.
  • ‘STRESS_WAIT’ threads randomly wait up to this much milliseconds to fake some work. Defaults to 5.
  • ‘STRESS_ITERATIONS’ how many iterations each thread shall do. Defaults to 100.
  • ‘STRESS_RANGE’ how many unique keys the test uses. Defaults to 1000.

The default values are rather small to make the test suite complete in short time. For dedicated stress testing at least STRESS_ITERATIONS and STRESS_THREADS has to be incresed significantly. Try ‘STRESS_ITERATIONS=10000 STRESS_RANGE=10000 STRESS_THREADS=10000’ for some harder test.

Structs

Marker for blocking locks, waits until the lock becomes available.
CacheDb implements the concurrent (bucketed) Key/Value store. Keys must implement ‘Bucketize’ which has more lax requirments than a full hash implmementation. ‘N’ is the number of buckets to use. This is const because less dereferencing and management overhead. Buckets by themself are not very expensive thus it is recommended to use a generous large enough number here. Think about expected number of concurrenct accesses times four.
A Duration type to represent a span of time, typically used for system timeouts.
Guard for the read lock. Puts unused entries into the LRU list.
Guard for the write lock. Puts unused entries into the LRU list.
A measurement of a monotonically nondecreasing clock. Opaque and useful only with Duration.
Marker for recursive locking. Allows to obtain a read-lock multiple times by a single thread.
Marker for trying to obtain a lock in a fallible way.

Enums

The errors that CacheDb implements itself. Note that the constructors can return other errors as well (‘DynResult’ is returned in those case).

Traits

Defines into which bucket a key falls. The default implementation uses the Hash trait for this. Custom implementations can override this to something more simple. It is recommended to implement this because very good distribution of the resulting value is not as important as for the hashmap.
Collects the traits a Key must implement, any user defined Key type must implement this trait and any traits it derives from. The ‘Debug’ trait is only required when the feature ‘logging’ is enabled.
Trait for implementing read/write flavors on Mutex. Note that there are no Recursive locks in Mutex, use ReentrantMutex for that.
Trait for implementing read flavors on RwLocks.
Trait for implementing read/write flavors on ReentantMutex.
Trait for implementing write flavors on RwLocks.

Type Definitions

Result type that boxes the error. Allows constructors to return arbitrary errors.
A mutual exclusion primitive useful for protecting shared data
An RAII implementation of a “scoped lock” of a mutex. When this structure is dropped (falls out of scope), the lock will be unlocked.
A mutex which can be recursively locked by a single thread.
An RAII implementation of a “scoped lock” of a reentrant mutex. When this structure is dropped (falls out of scope), the lock will be unlocked.
A reader-writer lock
RAII structure used to release the shared read access of a lock when dropped.
RAII structure used to release the exclusive write access of a lock when dropped.