Orthotope
Orthotope is a Rust allocator library with:
- a pre-mapped arena
- fixed size classes up to
16 MiB - per-thread caches
- a shared central pool
- a tracked large-allocation path
It is aimed at allocation-heavy workloads such as ML inference, tensor pipelines, batched embedding or reranking services, and other high-throughput systems.
Installation
API
use ;
let ptr = allocate?;
unsafe
# Ok::
allocate(size)returnsResult<NonNull<u8>, AllocError>deallocate(ptr)is the primary free pathdeallocate_with_size(ptr, size)validates the recorded size before freeingglobal_stats()returns a best-effort snapshot of the global allocator's shared state
Only free live pointers returned by Orthotope. Small-object double free remains undefined behavior.
For direct instance-oriented use, the crate also exposes Allocator, AllocatorConfig,
ThreadCache, and SizeClass at the crate root. Use one ThreadCache per thread when
calling Allocator::allocate_with_cache or Allocator::deallocate_with_cache directly.
Allocator::stats() and ThreadCache::stats() expose the same best-effort snapshot model
for instance-oriented use. An empty cache may be rebound to a different allocator
instance, but reusing a non-empty cache across allocators panics instead of silently
rehoming cached blocks.
For opt-in drop-in usage in existing binaries, the crate also exposes
OrthotopeGlobalAlloc:
use OrthotopeGlobalAlloc;
static GLOBAL: OrthotopeGlobalAlloc = new;
The shim intentionally falls back to std::alloc::System for layouts with size == 0
or align() > 64. It also has a best-effort fallback for rare reentrant TLS-cache
borrow cases. global_stats() only reports Orthotope-managed allocations, not
system-fallback allocations.
Behavior
- small allocations use thread-local reuse first, then central-pool refill, then arena carving
- each thread cache owns class-local slabs carved from contiguous arena spans
- small-cache arena refill reserves one contiguous span, registers it as a local slab, and splits it into class-sized blocks
- frees are routed by a 64-byte allocation header
- small-allocation headers are refreshed in place on reuse instead of rebuilding a fresh header object
- requests above
16 MiBuse the large-allocation path - default alignment is
64bytes - custom allocator alignment must be a power of two and at least
64bytes - the global convenience API uses
AllocatorConfig::default() - freed large allocations return to an arena-backed reusable pool for later same-size or smaller large requests, using smallest-fitting reuse first
- rebinding an empty caller-owned
ThreadCacheto another allocator clears stale local slab metadata before the new allocator starts carving fresh slabs
Small-object provenance in v1 is limited to header validation plus an arena-range ownership check on the decoded block start. Foreign pointers are rejected where detectable, but small-object double free remains undefined behavior and same-arena pointer forgery is not guaranteed to be detected.
Large allocations are also tracked in a live registry. Duplicate large frees are rejected when the pointer still decodes to a valid large-allocation header for the same live allocation instance, and successful large frees return those arena-backed spans to fit-based reuse for future same-size or smaller large requests.
Because large blocks may later be reused at the same address, stale large pointers after
address reuse are not guaranteed to be distinguishable by the raw-pointer free API.
Using such pointers still violates the unsafe contract.
When using OrthotopeGlobalAlloc, GlobalAlloc::dealloc cannot return typed errors.
If Orthotope detects an invalid free on the Orthotope-managed path, the shim aborts the
process instead of continuing in an invalid state. The only tolerated leak path is a
reentrant TLS-cache borrow during panic unwind.
Small-request classes:
1..=6465..=256257..=40964097..=61446145..=81928193..=16_38416_385..=32_76832_769..=65_53665_537..=131_072131_073..=262_144262_145..=1_048_5761_048_577..=16_777_216
Benchmarking
Benchmark results are summarized in BENCHMARK.
In the current local run, Orthotope was:
- fastest on 7 of 9 workloads against system, mimalloc, and jemalloc
- about
49xfaster thanmimalloconsame_thread_small_churn/70000 - about
2.1xfaster thanmimallocand2.1xfaster thanjemalloconmixed_size_churn - about
8.8xfaster thanmimallocand6.3xfaster thanjemalloconlarge_path
The bench/ directory contains the harness used to produce these numbers. It runs each workload against Orthotope, the system allocator, mimalloc, and jemalloc, and prints a markdown table of medians.