Expand description
Architecture overview of the underlying design of Refuse.
§Overview
Refuse is an incremental, tracing garbage collector. Incremental garbage
collectors can only run when it knows no threads are currently accessing
collectable memory. This fits the access pattern of an RwLock: the
collector can acquire a “write” lock to ensure that all other threads can’t
read while it is running.
Originally, Refuse used an RwLock and a shared allocation arena. This did
not perform well in multi-threaded benchmarks. So the global RwLock was
replaced with atomic tracking the number of currently acquired
CollectionGuard and whether the collector is currently trying to start
collection.
Each thread allocates its own independent allocation arena and stores a copy
of it in thread-local storage. It also registers a copy with the global
collector. Refuse’s public API ensures that no access is provided to the
local thread’s data without first having acquired a CollectionGuard.
This ensures that the collector can guarantee exclusive access to the
underlying data.
§Allocation Arenas
Each thread is given its own allocation arena, which is a data structure designed for concurrently reading portions of its data while still being able to perform new allocations from the owning thread.
At the root of each arena is a map of types to type-erased Bin<T>s. A
Bin<T> is the root of a linked-list of Slabs<T>. Each Slabs<T>
contains a list of Slab<T>s and an optional next Slabs<T>. Each
Slab<T> holds 256 Slot<T>s. Each slot is a combination of the slot’s
state, and the slot’s data.
The slot’s state is stored in an atomic and keeps track of:
- A 32-bit generation. When loading a
Ref, its generation is validated to ensure it is the same allocation. - Whether the slot is allocated or not
- Garbage collector marking bits
The owning thread or and the collector are the only types that can modify
non-atomic data in a Bin<T>. Other threads may need to load a reference to
a Ref<T>’s underlying data while the owning thread is allocating. This is
made safe by:
- Because allocations for a new slab can’t be referred to until after the
allocating function returns, we can update
Slabs::nextsafely while other threads read the data structure. - Each slot’s state is controlled with atomic operations. This ensures consistent access for both the reading thread and the allocating thread.
- The slot’s state is generational, minimizing the chance of an invalid reference being promoted. Even if a “stale” ref contains a reused generation, the load will still point to valid data because of the order of initialization.
§Collection
Refuse is a naive, mark-and-sweep collector. Each collection phase is divided into three portions:
- Exclusive Access Acquisition
- Tracing and marking
- Sweeping
Refuse keeps track of two metrics:
average_collection_locking: The average duration to acquire exclusive access.average_collection: The average duration of a total collection process, including exclusive access acquisition.
§Exclusive Access Acquisition
Refuse’s goal is to be able to be used in nearly any application, including games. Games typically do not want to dip below 60 frames-per-second (FPS), which means that if a garbage collection pause is longer than 16ms, it will cause FPS drops.
Refuse tries to minimize pauses by waiting for exclusive access only for a
multiple of average_collection_locking. If access isn’t acquired by the
deadline, collection is rescheduled again in the near future with an
increased multiple. If this process fails several times consecutively,
garbage collection will be forced by waiting indefinitely.
Access is controlled by a single AtomicUsize. A single bit keeps track
of whether the collector is trying to collect or not. The remaining bits
keep track of how many CollectionGuards are acquired and not yielding.
CollectionGuard::acquire() checks if the collection bit is set. If it
is, it waits until the current collection finishes and checks again. If the
bit is not set, the count is atomically incremented.
When the final CollectionGuard drops or yields, it notifies the
collector thread so that it can begin collecting.
§Tracing and marking
The goal of this phase is to identify all allocations that can currently be
reached by any Root<T>. When a slot is initially allocated, the marking
bits are 0. Each time the collector runs, a new non-zero marking bits is
selected by incrementing the previous marking bits and skipping 0 on wrap.
All Bin<T>s of all threads are scanned for any Slot<T> that is allocated
and has a non-zero root count. Each allocated slot is then marked. If the
slot didn’t already contain the current marking bits, it is Traced,
which allows any references found to be marked.
This process continues until all found references are marked and traced.
§Sweeping
The goal of this phase is to free allocations that are no longer reachable.
This is done by scanning all Bin<T>s of all threads looking for any
allocated Slot<T>s that do not contain the current marking bits. When
found, the slot is deallocated and contained data has its Drop
implementation invoked.