Orthotope
Orthotope is a Rust allocator library with:
- a pre-mapped arena
- fixed size classes up to
16 MiB - per-thread caches
- a shared central pool
- a tracked large-allocation path
It is aimed at allocation-heavy workloads such as ML inference, tensor pipelines, batched embedding or reranking services, and other high-throughput systems.
Installation
API
use ;
let ptr = allocate?;
unsafe
# Ok::
allocate(size)returnsResult<NonNull<u8>, AllocError>deallocate(ptr)is the primary free pathdeallocate_with_size(ptr, size)validates the recorded size before freeingglobal_stats()returns a best-effort snapshot of the global allocator's shared state
Only free live pointers returned by Orthotope. Small-object double free remains undefined behavior.
For direct instance-oriented use, the crate also exposes Allocator, AllocatorConfig,
ThreadCache, and SizeClass at the crate root. Use one ThreadCache per thread when
calling Allocator::allocate_with_cache or Allocator::deallocate_with_cache directly.
Allocator::stats() and ThreadCache::stats() expose the same best-effort snapshot model
for instance-oriented use. An empty cache may be rebound to a different allocator
instance, but reusing a non-empty cache across allocators panics instead of silently
rehoming cached blocks.
Behavior
- small allocations use thread-local reuse first, then central-pool refill, then arena carving
- each thread cache owns class-local slabs carved from contiguous arena spans
- small-cache arena refill reserves one contiguous span, registers it as a local slab, and splits it into class-sized blocks
- frees are routed by a 64-byte allocation header
- small-allocation headers are refreshed in place on reuse instead of rebuilding a fresh header object
- requests above
16 MiBuse the large-allocation path - default alignment is
64bytes - custom allocator alignment must be a power of two and at least
64bytes - the global convenience API uses
AllocatorConfig::default() - freed large allocations return to an arena-backed reusable pool for later same-size or smaller large requests, using smallest-fitting reuse first
- rebinding an empty caller-owned
ThreadCacheto another allocator clears stale local slab metadata before the new allocator starts carving fresh slabs
Small-object provenance in v1 is limited to header validation plus an arena-range ownership check on the decoded block start. Foreign pointers are rejected where detectable, but small-object double free remains undefined behavior and same-arena pointer forgery is not guaranteed to be detected.
Large allocations are also tracked in a live registry. Duplicate large frees are rejected when the pointer still decodes to a valid large-allocation header for the same live allocation instance, and successful large frees return those arena-backed spans to fit-based reuse for future same-size or smaller large requests.
Because large blocks may later be reused at the same address, stale large pointers after
address reuse are not guaranteed to be distinguishable by the raw-pointer free API.
Using such pointers still violates the unsafe contract.
Small-request classes:
1..=6465..=256257..=40964097..=61446145..=81928193..=16_38416_385..=32_76832_769..=65_53665_537..=131_072131_073..=262_144262_145..=1_048_5761_048_577..=16_777_216
Benchmarking
Benchmark results are summarized in benchmark.
In the current local run, Orthotope was:
- fastest on
embedding_batch,mixed_size_churn, andlarge_pathin this local capture - fastest on four of the five same-thread hot-path reuse sizes, with
mimallocnarrowly leading only64 - about
2.1xfaster thanmimallocand2.2xfaster thanjemalloconmixed_size_churn - about
2.3xfaster than the system allocator and about6.7xto8.8xfaster thanjemallocandmimalloconlarge_path