Crate vec_belt

Crate vec_belt 

Source
Expand description

§VecBelt

VecBelt<T> is a high-performant, concurrent, lock-free data structure suitable for multi-threaded bulk-appending (takes a &self) and single-threaded consuming (takes a &mut self).

Under the hood, VecBelt<T> is implemented as a linked list of dynamically sized fragments that may hold several elements at once, queued by append(&self, ..). New fragments of the current length (hence an exponential size to reduce the amount of allocations necessary for many appends) are allocated as necessary. clear(&mut self, ..)-ing the VecBelt<T> merges all of these fragments into one large fragment able to contain all items contiguously, which is then passed to a consumer closure. This implies subsequent appends will be faster, as there will be substantially less fragment allocations due to the large size of the merged (and reused) fragment.

§VecBelt<T> vs Mutex<Vec<T>>

  • VecBelt<T> operates much faster than Mutex<Vec<T>>, except on very low contentions (in my case it was 4 threads times 4 appends per threads). See benchmarks below.
  • VecBelt<T> is non-blocking, i.e. it doesn’t invoke system calls to park or unpark threads in any way (worst case scenario, it yields to the CPU scheduler. Backed by crossbeam-utils’s exponential Backoff). This works because acquisitions to the sychronization primitive is extremely short-lived.
  • VecBelt<T> is append-only when accessed immutably from threads, while Mutex<Vec<T>> grants every operations.
  • VecBelt<T> may only append collections that coerce into a slice ([T] or &[T] where T: Copy), which notably don’t include most iterator adapters. This is because append(&self, ..) requires all transfer operations to not diverge (e.g. panic!(..)), otherwise the fragment slice may end up with uninitialized memory which will lead to undefined behavior when used or even dropped.

§Benchmarks

This benchmark was executed on a 13th Gen Intel(R) Core(TM) i5-13420H (12 CPUs), ~2.1 GHz running Windows 11 Pro 64-bit. Visualized results are in bench_result/report/index.html on GitHub.

The benchmark tests the append time on 4, 16, and 64 threads, each running 4, 16, 64, and 256 append operations for both VecBelt<usize> and Mutex<Vec<usize>>. Both data containers are initially backed by a buffer of length 32768, while each appends transfer data of length 64.

Tests are timed exactly when all worker threads are busy-waiting for append operations, and ended exactly after all operations are done (note that the worker threads might live a bit longer). All worker threads are then shut down outside of timings before moving on to the next tests.

§4 Threads

AppendsVecBelt<T> (avg.)Mutex<Vec<T>> (avg.)
443.305 µs43.938 µs
1644.387 µs46.197 µs
6452.276 µs61.159 µs
128109.80 µs170.24 µs

§16 Threads

AppendsVecBelt<T> (avg.)Mutex<Vec<T>> (avg.)
475.426 µs90.784 µs
16119.66 µs138.42 µs
64191.75 µs265.53 µs
128801.14 µs1.4654 ms

§64 Threads

AppendsVecBelt<T> (avg.)Mutex<Vec<T>> (avg.)
4322.48 µs369.00 µs
16460.20 µs546.95 µs
641.0637 ms1.8535 ms
1283.6124 ms6.9506 ms

§License

Licensed under either of

at your option.

Structs§

ConsumeIter
Owned iterator created by ConsumeSlice that may be used to read owned elements from the slice.
ConsumeSlice
An owned slice created by VecBelt::append. Unconsumed elements will be dropped.
VecBelt
A high-performant, concurrent, lock-free data structure suitable for multi-threaded bulk-appending (takes a &self) and single-threaded consuming (takes a &mut self).

Traits§

Transfer
Types that may be passed to append.

Functions§

transfer
Create a manually-implemented Transfer with a length and a transferer closure.