Crate housekeeping

Expand description

Cleaning up while you’re away.

Reference counting (e.g. via Arc) is a common strategy for cleaning up resources shared between threads. However, it has limitations; reference counted objects cannot be (easily) stored in atomic variables, due to subtle race conditions. housekeeping is a concurrent memory reclaimer; it helps manage atomically-swappable resources and clean them up safely.

housekeeping is designed to run periodically, e.g. in between batches of executed tasks. It is inspired by seize and crossbeam_epoch, but is simpler and offers lower-level interfaces.

§Usage

Consider a program that needs to share atomically swappable data. At first glance, it may be implemented using Arc:

// The data stored is reference-counted with 'Arc'.
let data: Arc<Data> = Default::default();
let data_ptr = Arc::into_raw(data);
let data: AtomicPtr<Data> = AtomicPtr::new(data_ptr.cast_mut());

std::thread::scope(|s| {
    s.spawn(|| {
        // Replace the existing data.
        let new_data = Arc::new(somehow_compute());
        let new_data_ptr = Arc::into_raw(new_data);
        let old_data_ptr = data.swap(new_data_ptr.cast_mut(), Ordering::AcqRel);

        // Clean up the existing data.
        mem::drop(unsafe { Arc::from_raw(old_data_ptr) });
    });

    // Use the data.
    let data_ptr = data.load(Ordering::Acquire).cast_const();
    unsafe { Arc::increment_strong_count(data_ptr) }; // <-- unsound!!!
    let data = unsafe { Arc::from_raw(data_ptr) };
    somehow_consume(&data);
    mem::drop(data);
});

let _ = unsafe { Arc::from_raw(data.into_inner().cast_const()) };

Unfortunately, this is unsound. The spawned thread could drop old_data (causing it to be deallocated) in between the main thread’s atomic load and its call to Arc::increment_strong_count().

This situation calls for housekeeping. The code can be translated quite easily:

let cleanups = Arc::new(Cleanups::new());
// The data stored is protected by 'cleanups'.
let data: Box<Data> = Default::default();
let data_ptr = Box::into_raw(data);
let data: AtomicPtr<Data> = AtomicPtr::new(data_ptr);

std::thread::scope(|s| {
    s.spawn(|| {
        // Obtain a guard for safely handling concurrent memory.
        let guard = cleanups.clone().register();

        // Replace the existing data.
        let new_data: Box<Data> = Box::new(somehow_compute());
        let new_data_ptr = Box::into_raw(new_data);
        let old_data_ptr = data.swap(new_data_ptr, Ordering::AcqRel);

        // Defer the cleanup of the existing data.
        // SAFETY:
        // - '*mut Data' can be sent between threads safely.
        // - The closure is 'move' and is 'static.
        unsafe { guard.defer_unchecked(move || {
            mem::drop(Box::from_raw(old_data_ptr))
        }) };
    });

    // Obtain a guard for safely handling concurrent memory.
    let guard = cleanups.clone().register();

    // Use the data.
    let data_ptr = data.load(Ordering::Acquire).cast_const();
    // SAFETY: Safe as long as 'guard' is held.
    somehow_consume(unsafe { &*data_ptr });
});

let _ = unsafe { Box::from_raw(data.into_inner()) };

Every thread requires a Guard, which it can obtain by calling register(). While a thread is holding a Guard, it can safely access shared memory; shared memory visible to the thread will not be cleaned up. There is no need for reference counting, so Arc can be replaced with Box.

If a Guard is held for a long time, it should be refresh()-ed periodically. This is important for maintaining forward progress. If the guard cannot be refreshed (e.g. if a slow I/O operation is occurring), it should be dropped and a new guard should be registered afterwards.

§Rationale

seize and crossbeam_epoch have different goals and limitations from housekeeping. seize only supports atomic pointers, and prioritizes tail latency over average throughput. crossbeam_epoch supports other more atomic variables, but it has a complex implementation and does not support MIRI.

housekeeping has a very simple implementation, allows arbitrary resources to be cleaned up, and is compatible with MIRI and Loom. It offers flexible low-level interfaces (e.g. RawGuard) which give the user more control, e.g. with how cleanup actions are batched and stored.

§Crate Details

housekeeping does not have any dependencies; it is a small crate that does one thing and does it well. It is OS agnostic and no_std compatible, although it require atomic operations and heap allocations.

§Feature Flags

housekeeping has the following feature flags:

loom: Enable Loom support. This should only be enabled during testing. When this flag is enabled, housekeeping can only be used within Loom tests. Its atomic operations will be exposed to Loom so that different memory orderings can be explored.

§Algorithm

housekeeping implements quiescent state based reclamation (QSBR), which is quite similar to conventional epoch-based reclamation. It operates as follows:

During the execution of the program, objects may be shared; they may be part of a global view. For example, a concurrent hash table can be shared between threads, in which case it and its elements are globally visible. Sharing is transitive: anything accessible from a shared object is also considered shared.

Removing a shared object is non-trivial; multiple threads may be accessing it concurrently. The object cannot be dropped or recycled until all threads are guaranteed not to observe it. First, the object will be detached from the global view, so that any threads known to observe the updated global view cannot access it. Then, the object is retired (its drop/recycle is deferred) until a point where all threads are known not to observe it.

QSBR tracks the progress of the synchronization between threads. It measures progress in units of phases. At any time, every thread (more specifically, every guard) is using a particular phase. Each phase represents a snapshot of global memory; all threads using a phase will observe memory at least as recent as that snapshot.

When a guard is refreshed or dropped, it will try to make progress: if all threads are using the same phase, a new phase will be made (with a more recent snapshot of memory), and all threads will be directed to move to it. Eventually, shared objects that were removed in old phases are guaranteed not to be used by any threads, so cleanups deferred in those phases can be executed.

For efficiency, cleanups are collected in thread-local batches when they are being deferred. Guards hold a local batch that they accumulate into; when the batch is full, it is added to the global list for the current phase. By default, SimpleBatch is used, but a custom batch type can be specified to support different kinds of cleanup operations.

Most of the complexity of this crate is in the QSBR implementation (how phases are managed and the synchronizing operations involved). It is quite difficult to implement correctly, so it has been re-exported publicly (as the qsbr module) so others can build on it.

Modules§

qsbr: Core logic for QSBR.

Structs§

Cleanups: Deferred cleanup actions.
Guard: A guard for concurrent memory.
RawGuard: A raw guard for deferring cleanups.
SimpleBatch: A batch of simple cleanup functions.