Crate kcas

source ·
Expand description

KCAS

A lock-free multi-word compare-and-swap library. It is no_std-compatible and optionally does not allocate on the heap. Because it only requires single-width atomic compare-and-swap, it is lock-free on most platforms, including RISC-V, AArch64, and x86-64.

Usage

Example

Here is an example using RefStateWrapper:

use kcas::{KCasWord, State, RefStateWrapper};
use kcas::Error;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread::{ScopedJoinHandle};

// Allocate memory on the stack for 2 threads, each operating on 3 words.
let state: State<2, 3> = State::new();

// each thread gets its own wrapper
let mut first_wrapper: RefStateWrapper<2, 3> = RefStateWrapper::construct(&state).unwrap();
let mut second_wrapper: RefStateWrapper<2, 3> = RefStateWrapper::construct(&state).unwrap();

let first_target: AtomicUsize = AtomicUsize::new(1);
let second_target: AtomicUsize = AtomicUsize::new(2);
let third_target: AtomicUsize = AtomicUsize::new(3);

// run k-CAS operations in two threads
std::thread::scope(|scope| {
    let first_handle: ScopedJoinHandle<Result<(), Error>> = scope.spawn(|| {
        first_wrapper.kcas([
            KCasWord::new(&first_target, 1, 4),
            KCasWord::new(&second_target, 2, 5),
            KCasWord::new(&third_target, 3, 6),
        ])
    });
    let second_handle: ScopedJoinHandle<Result<(), Error>> = scope.spawn(|| {
        second_wrapper.kcas([
            KCasWord::new(&first_target, 1, 7),
            KCasWord::new(&second_target, 2, 8),
            KCasWord::new(&third_target, 3, 9),
        ])
    });
});

// one of the two operations should succeed
assert!(
    (first_target.load(Ordering::Acquire) == 4
        && second_target.load(Ordering::Acquire) == 5
        && third_target.load(Ordering::Acquire) == 6
    ) || (
        first_target.load(Ordering::Acquire) == 7
        && second_target.load(Ordering::Acquire) == 8
        && third_target.load(Ordering::Acquire) == 9
    )
);

Details

Begin by instantiating a State, which is a struct shared between threads so threads can assist each other. State reserves all the memory it needs to share during initialization. State requires two generic const arguments:

  • NUM_THREADS, the maximum number of threads which can operate on that State at any given time, and
  • NUM_WORDS, the number of words to operate upon during each k-CAS operation.

See the [Limitations] section for limits on these parameters.

Next, wrap State in a “wrapper” implementation. Each wrapper reserves a thread id from the underlying State and thus is designed to execute k-CAS operations for exactly one thread at a time. There are three wrapper implementations:

  • RefStateWrapper’s constructor takes a &State where State typically lives on the stack. std::thread::scope allows borrowing RefStateWrapper across threads.
  • ArcStateWrapper’s constructor takes Arc<State> where State lives on the heap and the Arc is cloned and moved across threads. This can only be used in a std environment or a no_std environment which uses [alloc].
  • UnsafeStateWrapper’s constructor takes a NonNull<State> for greater control over where State lives and how it is used across threads.

Next, call the wrapper’s kcas method, which performs the k-CAS operation defined by its argument, a KCasWord array of size NUM_WORDS. Each individual CAS operation occurs in order.

The wrapper can be dropped once all k-CAS operations are finished for a particular thread. The drop call will return the wrapper’s thread id back into the pool of available thread ids.

Limitations

Target addresses must be used across threads in the same order

When two target addresses are passed into k-CAS operations in two threads in the opposite order, a deadlock can occur. As a result, any addresses used in two or more threads must be passed in the same order.

To illustrate, let’s say we would like a thread to perform a 3-CAS operation across three target addresses (1, 2, 3). If another thread is also performing operations, this second thread can operate on any of the first thread’s addresses individually in any position - for example, the second thread could operate across (4, 1, 5), (4, 5, 2), or (3, 4, 5). But, if the second thread would like to operate on two of the first thread’s addresses, those addresses must be specified in the same order. In this case, (1, 2, 4), (4, 1, 3), (1, 5, 3) are examples of valid orderings. However, (2, 1, 4) is not a valid ordering because 1 was specified before 2 in the first thread, but 1 was specified after 2 in the second thread.

The number of bits used by values bounds the size of NUM_THREADS

While there is no limit to the size of NUM_WORDS besides memory consumption, NUM_THREADS is limited by the number of value bits you would like to use in your application for real content. As an intermediate step, KCAS swaps tagged markers into the provided target addresses where the most significant bits are a thread id and the least significant bits are a sequence number. KCAS differentiates between tagged markers and real values by looking at the most significant bits: if they are all 0s or all 1s, then they are real values; otherwise, they are treated as tagged markers.

This limitation makes the KCAS library unsuitable for performing k-CAS operations on random values which must take up all bits of a usize. This also limits the number of threads available for performing k-CAS operations on pointers. For example, on 64-bit systems,

  • if you would like to perform k-CAS on 48-bit x86-64 pointers and the top 16 bits of the value will always be all 0s or all 1s, NUM_THREADS can be as high as 2^12 or 65536.
  • for 52-bit ARMv8.2 LVA pointers, you may set NUM_THREADS as high as 2^12 or 4096.
  • for 57-bit 5-level paging pointers, you may set NUM_THREADS as high as 2^7 or 128.

To programmatically determine how many value bits are reserved by the KCAS library, you may call get_bit_length_of_num_threads.

We plan to remove this limitation in the future by introducing an alternative implementation where tagged markers do not scale with the number of threads. This design will rely on a different mechanism for helping threads which should make the algorithm wait-free.

Structs

  • A wrapper for performing k-CAS operations which stores State inside an Arc. This makes it convenient to use ArcStateWrapper when threads might outlive the functions which created them.
  • A structure containing all the information needed to perform a single CAS operation.
  • A wrapper for performing k-CAS operations which stores State as a reference. This makes it suited for scenarios where threads will not outlive the functions which created them.
  • Attempted to convert a usize into a Stage but it was out of bounds: {0}
  • Holds KCAS operation state shared between all threads. This allows threads to help each other.
  • A wrapper for performing k-CAS operations which stores State as an unsafe NonNull. This makes it suited for scenarios where the user would like complete control over where the State lives and how it is made available across threads. The user is responsible for dropping State after all threads are finished using it.

Enums

Functions