Module architecture

Module architecture 

Source
Expand description

Architecture overview of the underlying design of Refuse.

§Overview

Refuse is an incremental, tracing garbage collector. Incremental garbage collectors can only run when it knows no threads are currently accessing collectable memory. This fits the access pattern of an RwLock: the collector can acquire a “write” lock to ensure that all other threads can’t read while it is running.

Originally, Refuse used an RwLock and a shared allocation arena. This did not perform well in multi-threaded benchmarks. So the global RwLock was replaced with atomic tracking the number of currently acquired CollectionGuard and whether the collector is currently trying to start collection.

Each thread allocates its own independent allocation arena and stores a copy of it in thread-local storage. It also registers a copy with the global collector. Refuse’s public API ensures that no access is provided to the local thread’s data without first having acquired a CollectionGuard. This ensures that the collector can guarantee exclusive access to the underlying data.

§Allocation Arenas

Each thread is given its own allocation arena, which is a data structure designed for concurrently reading portions of its data while still being able to perform new allocations from the owning thread.

At the root of each arena is a map of types to type-erased Bin<T>s. A Bin<T> is the root of a linked-list of Slabs<T>. Each Slabs<T> contains a list of Slab<T>s and an optional next Slabs<T>. Each Slab<T> holds 256 Slot<T>s. Each slot is a combination of the slot’s state, and the slot’s data.

The slot’s state is stored in an atomic and keeps track of:

  • A 32-bit generation. When loading a Ref, its generation is validated to ensure it is the same allocation.
  • Whether the slot is allocated or not
  • Garbage collector marking bits

The owning thread or and the collector are the only types that can modify non-atomic data in a Bin<T>. Other threads may need to load a reference to a Ref<T>’s underlying data while the owning thread is allocating. This is made safe by:

  • Because allocations for a new slab can’t be referred to until after the allocating function returns, we can update Slabs::next safely while other threads read the data structure.
  • Each slot’s state is controlled with atomic operations. This ensures consistent access for both the reading thread and the allocating thread.
  • The slot’s state is generational, minimizing the chance of an invalid reference being promoted. Even if a “stale” ref contains a reused generation, the load will still point to valid data because of the order of initialization.

§Collection

Refuse is a naive, mark-and-sweep collector. Each collection phase is divided into three portions:

  • Exclusive Access Acquisition
  • Tracing and marking
  • Sweeping

Refuse keeps track of two metrics:

  • average_collection_locking: The average duration to acquire exclusive access.
  • average_collection: The average duration of a total collection process, including exclusive access acquisition.

§Exclusive Access Acquisition

Refuse’s goal is to be able to be used in nearly any application, including games. Games typically do not want to dip below 60 frames-per-second (FPS), which means that if a garbage collection pause is longer than 16ms, it will cause FPS drops.

Refuse tries to minimize pauses by waiting for exclusive access only for a multiple of average_collection_locking. If access isn’t acquired by the deadline, collection is rescheduled again in the near future with an increased multiple. If this process fails several times consecutively, garbage collection will be forced by waiting indefinitely.

Access is controlled by a single AtomicUsize. A single bit keeps track of whether the collector is trying to collect or not. The remaining bits keep track of how many CollectionGuards are acquired and not yielding.

CollectionGuard::acquire() checks if the collection bit is set. If it is, it waits until the current collection finishes and checks again. If the bit is not set, the count is atomically incremented.

When the final CollectionGuard drops or yields, it notifies the collector thread so that it can begin collecting.

§Tracing and marking

The goal of this phase is to identify all allocations that can currently be reached by any Root<T>. When a slot is initially allocated, the marking bits are 0. Each time the collector runs, a new non-zero marking bits is selected by incrementing the previous marking bits and skipping 0 on wrap.

All Bin<T>s of all threads are scanned for any Slot<T> that is allocated and has a non-zero root count. Each allocated slot is then marked. If the slot didn’t already contain the current marking bits, it is Traced, which allows any references found to be marked.

This process continues until all found references are marked and traced.

§Sweeping

The goal of this phase is to free allocations that are no longer reachable. This is done by scanning all Bin<T>s of all threads looking for any allocated Slot<T>s that do not contain the current marking bits. When found, the slot is deallocated and contained data has its Drop implementation invoked.