Expand description
Similar to std::sync::RwLock
, yet it locks only on writes and reads are
wait-free.
VLock
is for read-heavy scenarios. Updates are strictly serialized and may
have to wait until readers of older data are finished. It works by keeping
multiple versions of the shared data and counting references to each of
the versions. Named VLock
, because it uses versioning and behaves sort-of
like a lock. Naming is hard.
§Why bother?
VLock
is a fast1 and scalable lock with stable read performance for
non-copy types at the expense of slower writes.
It’s a common pattern among various sorts of systems, where there is a number of processing threads, each making decisions via shared state and some other code that updates that shared state once in a while. In network infrastructure, for example, this can be thought of data and control planes, be it routers or various levels of load balancers. In data-heavy applications, that can be a hot view of some stuff stored in a database.
If that roughly describes your use case, you may benefit from this library
at the expense of having 1+ extra copies of data in memory, slightly more
state and slower write performance. The former is usually necessary even
when RwLock
is used to minimize time under lock,
and VLock
actually reuses old versions which may save you some time
by avoiding allocations. The extra state is one usize
plus a usize
for every version… and there’s not much you can do about slower writes.
If read performance is critical and readers can’t afford to wait for writes, you may benefit significantly.
As a bonus, the implementation is simple with about 200 lines of actual code and without any external dependencies.
§How does it work?
As mentioned above, it uses a combination of versioning and reference counting with a lock to serialize writes.
VLock<T, N>
has a fixed size N
, which is the maximum number of
versions to allocate. Version is identified by an offset in the allocated
space. State has both the offset and the counter in the same atomic.
VLock
keeps the current state and a state for every version.
Every time read
is called, the current state counter is
incremented and the associated offset is used to return the current version
to the reader. After the returned pointer is dropped, the per-version state
counter is decremented.
During update
, a first unused version is picked by checking
whether per-version counter is 0, or if nothing is available, a new version
is initialized if size allows. This is the place where waiting for readers
is happening. Once a free offset is found and it is updated, the current
state counter is reset and the new offset is written (which is one atomic
operation). The returned counter is then added to the per-version counter
associated with that version, after which it will reach 0 once all readers
are done with it.
In other words, tracking access can be thought of balancing with two counters - one incrementing in the current state, marking the start of the read access to data, and the other decrementing in the per-version state, marking the end of the access. Then, both are added together. If the resulting counter is greater than 0, there are still some readers, if it’s 0, then the version is definitely unused.
Old versions are not dropped, so it acts like a pool. That may be handy when dealing with heap-allocated datastructures that have to be updated or rebuilt, but may also lead to some old garbage lying around in memory.
§Usage
use std::sync::Arc;
use std::thread;
use std::time;
use vlock::VLock;
let lock = Arc::new(VLock::<String, 4>::new(String::from("hi there!")));
let lock_clone = Arc::clone(&lock);
let t = thread::spawn(move || {
for _ in 0..5 {
println!("{}", *lock_clone.read());
thread::sleep(time::Duration::from_millis(1));
}
lock_clone.update(
|_, value| {
value.clear();
value.push_str("bye!");
},
String::new
);
});
thread::sleep(time::Duration::from_millis(2));
lock.update(
|_, value| {
value.clear();
value.push_str("here's some text for you");
},
String::new
);
if let Err(err) = t.join() {
println!("thread has failed: {err:?}");
}
assert_eq!(*lock.read(), "bye!");
Based on synthetic benchmarks on
x86_64
laptops, read performance was 1.3-2.0 times faster thanArcSwap
, and may be order of magnitude faster than fastestRwLock
implementation in certain cases. Writes ofVLock
are more efficient thanArcSwap
. Comparing toRwLock
, writes are generally 1 to 10 times slower thanparking_lot
implementation, but an improvement over thestd
implementation. WithSeqLock
results were mixed: in some scenarios reads ofVLock
was 4 times slower, in some about 1:1 and in other 2 times quicker. Although write performance ofVLock
is significantly worse than that ofSeqLock
, it can be used for non-copy types. Note that write performance ofVLock
may degrade significantly when readers are not progressing andN
is small, in other wordsVLock
is susceptible to write starvation by prioritizing reads. ↩