Skip to main content

MlxBufferPool

Struct MlxBufferPool 

Source
pub struct MlxBufferPool { /* private fields */ }
Expand description

Arena-style buffer pool that reuses Metal buffer allocations.

§Design

  • Buffers are bucketed by their allocated size rounded up to the nearest power of two. This reduces fragmentation at the cost of occasionally over-allocating by up to 2x.
  • release() returns a single buffer; reset() returns all outstanding buffers handed out since the last reset.
  • The MlxDevice is passed in at every [alloc] call (rather than stored in the pool). This keeps the pool free of lifetime parameters so it can be embedded in any owner struct (e.g. the per-decode-token DecodeBuffers cache in hf2q’s qwen35 forward path).

§Why an arena reset matters

In the per-decode-token hot path, each token allocates ~1750 Metal buffers for scratch / intermediate / parameter storage across attention, FFN, and linear-attention layers. Direct MlxDevice::alloc_buffer() calls hit Metal’s allocator each time (5-30 µs each); pooling reuses the underlying metal::Buffer objects across token boundaries so steady-state allocation cost amortizes to near zero. See ADR-012 §Optimize / Task #15 for the MoE dwq46 0.90× parity gap that motivated this work.

Implementations§

Source§

impl MlxBufferPool

Source

pub fn new() -> Self

Create a new empty buffer pool. The Metal device is passed to [alloc] at every call site, so the pool itself is lifetime-free.

Source

pub fn alloc( &mut self, device: &MlxDevice, byte_len: usize, dtype: DType, shape: Vec<usize>, ) -> Result<MlxBuffer>

Allocate a buffer from the pool.

If a free buffer of compatible size exists in the pool, it is reused (with updated dtype/shape metadata). Otherwise a new Metal buffer is allocated from device at the bucket size so future reuse is possible for any request up to that bucket.

Each successful alloc registers the buffer in the pool’s in-use list (ARC clone — cheap), so a subsequent [reset] returns it to the free list automatically.

Source

pub fn alloc_batch<I>( &mut self, device: &MlxDevice, requests: I, ) -> Result<Vec<MlxBuffer>>
where I: IntoIterator<Item = (usize, DType, Vec<usize>)>,

Allocate several buffers and commit residency-set updates once.

Source

pub fn release(&mut self, buffer: MlxBuffer)

Return a single buffer to the pool’s free list for future reuse.

The Metal memory is not deallocated — it stays resident on the GPU for fast reuse. release is the per-buffer alternative to [reset]; see the module docs for guidance on which to use.

Mixing release and reset within the same arena cycle is not supported — the pool’s in-use list does not deduplicate, so a buffer returned via release and then bulk-returned via reset would land in the free list twice (each entry holds an ARC clone of the same Metal buffer; the duplication wastes a free-list slot but is not a memory leak — both clones drop together once popped). Pick one pattern per arena cycle.

Source

pub fn reset(&mut self)

Bulk-return every buffer handed out by [alloc] since the last reset to the pool’s free list.

§Caller contract

All MlxBuffer values returned by alloc since the last reset must be out-of-scope (dropped) at the time reset is called. Reset transfers the pool’s ARC clones to the free list, where they become available to subsequent [alloc] calls. If a caller is still holding an MlxBuffer and a later alloc re-issues the underlying buffer, the two callers will share GPU memory (aliasing). The Metal ARC keeps the storage alive in either case, but writes from the new caller will be visible to the stale caller — a correctness bug, not a memory error.

In Rust’s ownership model, locally-bound MlxBuffer values fall out of scope at the end of their lexical block, making the per-decode-token arena pattern safe by construction:

loop {
    pool.reset();          // start of token — recycle previous token's buffers
    forward_pass(&pool);   // many alloc(), no explicit release
}                          // forward_pass returns; locals dropped
Source

pub fn register_existing( &mut self, device: &MlxDevice, buffer: &MlxBuffer, ) -> Result<()>

Register an externally-allocated buffer with this pool’s residency set without taking ownership.

§Why this exists

alloc bucket-rounds requests up to the next power of two, which is acceptable for transient per-token scratch (the worst case is ~2× over-allocation on a few megabytes) but unacceptable for large static weight tensors. hf2q’s Qwen3.5-MoE weight set totals ~17.26 GB; bucket-rounding would balloon that to ~25.55 GB (+8.3 GB / +48% blowup) — unshippable on a 128 GB unified-memory M5 Max once KV cache and intermediates are layered on top.

register_existing provides a residency-only path: the caller allocates the buffer at its exact size via MlxDevice::alloc_buffer (or loads it via GgufFile::load_tensor_into_pool), retains the MlxBuffer handle, and asks the pool to add the underlying Metal allocation to its residency set so it gets the MTLResidencySet hint on the next dispatch.

§Ownership semantics
  • The pool does not take ownership of the buffer. The caller’s MlxBuffer handle remains the canonical owner.
  • The pool does not recycle this buffer on reset (it is not added to in_use).
  • The pool does include this buffer in its residency set so it is hinted-resident on the next encoder dispatch.
  • On pool Drop, the residency-set membership is removed but the underlying Metal buffer is not freed — the caller’s MlxBuffer handle keeps the ARC alive.
§HF2Q_NO_RESIDENCY=1 escape hatch

When the environment variable HF2Q_NO_RESIDENCY=1 is set, the process boots its MlxDevice without any residency set (see device.rs). In that mode this method returns Ok(()) without touching anything — operators who suspect a residency-induced regression can opt out without recompiling.

§Idempotence

Registering the same buffer twice (identified by its metal::Buffer.contents() pointer) is a no-op on the second call — the residency set membership is tracked in a HashMap keyed by contents pointer.

§Errors

Returns MlxError::InvalidArgument if the buffer was allocated on a different MlxDevice than any previously registered buffer.

Source

pub fn free_count(&self) -> usize

Return all free buffers’ count (for diagnostics).

Source

pub fn free_bytes(&self) -> usize

Total number of bytes held in the free list.

Source

pub fn in_use_count(&self) -> usize

Number of buffers currently in-use (alloc’d but not yet reset).

Source

pub fn clear(&mut self)

Clear all free buffers, releasing Metal memory. Does not affect in-use tracking.

Trait Implementations§

Source§

impl Default for MlxBufferPool

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl Drop for MlxBufferPool

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.