pub struct MlxBufferPool { /* private fields */ }Expand description
Arena-style buffer pool that reuses Metal buffer allocations.
§Design
- Buffers are bucketed by their allocated size rounded up to the nearest power of two. This reduces fragmentation at the cost of occasionally over-allocating by up to 2x.
release()returns a single buffer;reset()returns all outstanding buffers handed out since the last reset.- The
MlxDeviceis passed in at every [alloc] call (rather than stored in the pool). This keeps the pool free of lifetime parameters so it can be embedded in any owner struct (e.g. the per-decode-tokenDecodeBufferscache in hf2q’s qwen35 forward path).
§Why an arena reset matters
In the per-decode-token hot path, each token allocates ~1750 Metal buffers
for scratch / intermediate / parameter storage across attention, FFN, and
linear-attention layers. Direct MlxDevice::alloc_buffer() calls hit
Metal’s allocator each time (5-30 µs each); pooling reuses the underlying
metal::Buffer objects across token boundaries so steady-state allocation
cost amortizes to near zero. See ADR-012 §Optimize / Task #15 for the
MoE dwq46 0.90× parity gap that motivated this work.
Implementations§
Source§impl MlxBufferPool
impl MlxBufferPool
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new empty buffer pool. The Metal device is passed to
[alloc] at every call site, so the pool itself is lifetime-free.
Sourcepub fn alloc(
&mut self,
device: &MlxDevice,
byte_len: usize,
dtype: DType,
shape: Vec<usize>,
) -> Result<MlxBuffer>
pub fn alloc( &mut self, device: &MlxDevice, byte_len: usize, dtype: DType, shape: Vec<usize>, ) -> Result<MlxBuffer>
Allocate a buffer from the pool.
If a free buffer of compatible size exists in the pool, it is reused
(with updated dtype/shape metadata). Otherwise a new Metal buffer is
allocated from device at the bucket size so future reuse is
possible for any request up to that bucket.
Each successful alloc registers the buffer in the pool’s in-use
list (ARC clone — cheap), so a subsequent [reset] returns it to
the free list automatically.
Sourcepub fn alloc_batch<I>(
&mut self,
device: &MlxDevice,
requests: I,
) -> Result<Vec<MlxBuffer>>
pub fn alloc_batch<I>( &mut self, device: &MlxDevice, requests: I, ) -> Result<Vec<MlxBuffer>>
Allocate several buffers and commit residency-set updates once.
Sourcepub fn release(&mut self, buffer: MlxBuffer)
pub fn release(&mut self, buffer: MlxBuffer)
Return a single buffer to the pool’s free list for future reuse.
The Metal memory is not deallocated — it stays resident on the GPU
for fast reuse. release is the per-buffer alternative to [reset];
see the module docs for guidance on which to use.
Mixing release and reset within the same arena cycle is not
supported — the pool’s in-use list does not deduplicate, so a buffer
returned via release and then bulk-returned via reset would land in
the free list twice (each entry holds an ARC clone of the same Metal
buffer; the duplication wastes a free-list slot but is not a memory
leak — both clones drop together once popped). Pick one pattern per
arena cycle.
Sourcepub fn reset(&mut self)
pub fn reset(&mut self)
Bulk-return every buffer handed out by [alloc] since the last reset
to the pool’s free list.
§Caller contract
All MlxBuffer values returned by alloc since the last reset must be
out-of-scope (dropped) at the time reset is called. Reset transfers
the pool’s ARC clones to the free list, where they become available to
subsequent [alloc] calls. If a caller is still holding an MlxBuffer
and a later alloc re-issues the underlying buffer, the two callers
will share GPU memory (aliasing). The Metal ARC keeps the storage
alive in either case, but writes from the new caller will be visible
to the stale caller — a correctness bug, not a memory error.
In Rust’s ownership model, locally-bound MlxBuffer values fall out of
scope at the end of their lexical block, making the per-decode-token
arena pattern safe by construction:
loop {
pool.reset(); // start of token — recycle previous token's buffers
forward_pass(&pool); // many alloc(), no explicit release
} // forward_pass returns; locals droppedSourcepub fn register_existing(
&mut self,
device: &MlxDevice,
buffer: &MlxBuffer,
) -> Result<()>
pub fn register_existing( &mut self, device: &MlxDevice, buffer: &MlxBuffer, ) -> Result<()>
Register an externally-allocated buffer with this pool’s residency set without taking ownership.
§Why this exists
alloc bucket-rounds requests up to the next power of
two, which is acceptable for transient per-token scratch (the worst
case is ~2× over-allocation on a few megabytes) but unacceptable for
large static weight tensors. hf2q’s Qwen3.5-MoE weight set totals
~17.26 GB; bucket-rounding would balloon that to ~25.55 GB
(+8.3 GB / +48% blowup) — unshippable on a 128 GB unified-memory
M5 Max once KV cache and intermediates are layered on top.
register_existing provides a residency-only path: the caller
allocates the buffer at its exact size via
MlxDevice::alloc_buffer (or
loads it via GgufFile::load_tensor_into_pool),
retains the MlxBuffer handle, and asks the pool to add the
underlying Metal allocation to its residency set so it gets the
MTLResidencySet hint on the next dispatch.
§Ownership semantics
- The pool does not take ownership of the buffer. The caller’s
MlxBufferhandle remains the canonical owner. - The pool does not recycle this buffer on
reset(it is not added toin_use). - The pool does include this buffer in its residency set so it is hinted-resident on the next encoder dispatch.
- On pool
Drop, the residency-set membership is removed but the underlying Metal buffer is not freed — the caller’sMlxBufferhandle keeps the ARC alive.
§HF2Q_NO_RESIDENCY=1 escape hatch
When the environment variable HF2Q_NO_RESIDENCY=1 is set, the
process boots its MlxDevice without any
residency set (see device.rs). In that mode this method returns
Ok(()) without touching anything — operators who suspect a
residency-induced regression can opt out without recompiling.
§Idempotence
Registering the same buffer twice (identified by its
metal::Buffer.contents() pointer) is a no-op on the second call —
the residency set membership is tracked in a HashMap keyed by
contents pointer.
§Errors
Returns MlxError::InvalidArgument if the buffer was allocated on a
different MlxDevice than any previously registered buffer.
Sourcepub fn free_count(&self) -> usize
pub fn free_count(&self) -> usize
Return all free buffers’ count (for diagnostics).
Sourcepub fn free_bytes(&self) -> usize
pub fn free_bytes(&self) -> usize
Total number of bytes held in the free list.
Sourcepub fn in_use_count(&self) -> usize
pub fn in_use_count(&self) -> usize
Number of buffers currently in-use (alloc’d but not yet reset).