lgalloc
A memory allocator for large objects backed by anonymous mappings with huge page hints.
Lgalloc stands for large (object) allocator.
We spell it lgalloc and pronounce it el-gee-alloc.
[]
= "0.7"
Example
use ManuallyDrop;
When to use lgalloc
Lgalloc is designed for programs that allocate and recycle many large memory regions (2 MiB+).
It pools regions by size class and reuses them without returning virtual address space to the kernel, which avoids the mmap/munmap overhead and kernel mmap_lock contention that dominate at high thread counts.
On Linux, it requests transparent huge pages via MADV_HUGEPAGE, reducing TLB misses for large working sets.
Lgalloc is a low-level API. Callers get a raw pointer, a capacity, and a handle; they are responsible for building higher-level abstractions (vectors, buffers) on top.
Usage constraints
- No fork.
Anonymous mappings are shared with child processes after
fork. Two processes writing to the same mapping causes undefined behavior. There is no way to mark mappings as non-inheritable. - No mlock.
Callers must not lock pages (
mlock) on regions managed by lgalloc, or must unlock them before returning the region. The background worker callsmadviseon returned regions, which fails on locked pages. - Do not free with another allocator.
Memory obtained from
allocatemust be returned viadeallocate. Passing the pointer tofree,Vec::from_raw_partswithoutManuallyDrop, or any other allocator is undefined behavior. - Minimum allocation is 2 MiB.
Size classes range from 221 (2 MiB) to 236 (64 GiB).
Requests below 2 MiB return
AllocError::InvalidSizeClass. - Capacity may be rounded up. The returned capacity can be larger than requested because allocations are rounded to power-of-two size classes.
Thread safety
Handle is Send and Sync.
Allocations can be made on one thread and freed on another.
Each thread maintains a local cache; the global pool uses lock-free work-stealing to redistribute regions.
How it works
Lgalloc is size-classed: each power-of-two size from 2 MiB to 64 GiB has its own pool. Within a size class, contiguous areas of increasing size back individual regions.
- Each thread maintains a bounded local cache of regions.
- On allocation, the thread checks its local cache, then the global dirty pool, then the global clean pool, then steals from other threads.
- On deallocation, the region goes to the local cache or, if full, to the global dirty pool.
With
eager_returnenabled,MADV_DONTNEEDis called before pushing to the global pool. - An optional background worker moves dirty regions to the clean pool by calling
MADV_FREE(Linux) orMADV_DONTNEED(other platforms), which marks pages as lazily reclaimable. - When all pools are empty, lgalloc creates a new area via
mmap(MAP_ANONYMOUS)and appliesMADV_HUGEPAGE. Area sizes double on each refill, controlled by thegrowth_dampenerconfig. - Regions are never unmapped during normal operation.
This avoids
munmapsyscall overhead but grows virtual address space. Areas are unmapped when the global state is dropped (process exit).
Platform notes
- Linux: requests transparent huge pages via
MADV_HUGEPAGE. The kernel uses 2 MiB pages when/sys/kernel/mm/transparent_hugepage/enabledisalwaysormadvise. If THP is disabled, the hint is silently ignored (one warning on stderr). - macOS ARM: the kernel does not expose a userspace huge page API. Lgalloc uses the base 16 KiB page size.
To do
- Testing is very limited.
- Allocating areas of doubling sizes seems to stress the
mmapsystem call. Consider a different strategy, such as constant-sized blocks or a limit on what areas we allocate. There's probably a trade-off between area size and number of areas. - Fixed-size areas could allow us to move areas between size classes.
- Reference-counting can determine when an area isn't referenced anymore, although this is not trivial because it's a lock-free system.