Skip to main content

MlxBufferPool

Struct MlxBufferPool 

Source
pub struct MlxBufferPool { /* private fields */ }
Expand description

Arena-style buffer pool that reuses Metal buffer allocations.

§Design

  • Buffers are bucketed by their allocated size rounded up to the nearest power of two. This reduces fragmentation at the cost of occasionally over-allocating by up to 2x.
  • release() returns a single buffer; reset() returns all outstanding buffers handed out since the last reset.
  • The MlxDevice is passed in at every [alloc] call (rather than stored in the pool). This keeps the pool free of lifetime parameters so it can be embedded in any owner struct (e.g. the per-decode-token DecodeBuffers cache in hf2q’s qwen35 forward path).

§Why an arena reset matters

In the per-decode-token hot path, each token allocates ~1750 Metal buffers for scratch / intermediate / parameter storage across attention, FFN, and linear-attention layers. Direct MlxDevice::alloc_buffer() calls hit Metal’s allocator each time (5-30 µs each); pooling reuses the underlying metal::Buffer objects across token boundaries so steady-state allocation cost amortizes to near zero. See ADR-012 §Optimize / Task #15 for the MoE dwq46 0.90× parity gap that motivated this work.

Implementations§

Source§

impl MlxBufferPool

Source

pub fn new() -> Self

Create a new empty buffer pool. The Metal device is passed to [alloc] at every call site, so the pool itself is lifetime-free.

Source

pub fn alloc( &mut self, device: &MlxDevice, byte_len: usize, dtype: DType, shape: Vec<usize>, ) -> Result<MlxBuffer>

Allocate a buffer from the pool.

If a free buffer of compatible size exists in the pool, it is reused (with updated dtype/shape metadata). Otherwise a new Metal buffer is allocated from device at the bucket size so future reuse is possible for any request up to that bucket.

Each successful alloc registers the buffer in the pool’s in-use list (ARC clone — cheap), so a subsequent [reset] returns it to the free list automatically.

Source

pub fn release(&mut self, buffer: MlxBuffer)

Return a single buffer to the pool’s free list for future reuse.

The Metal memory is not deallocated — it stays resident on the GPU for fast reuse. release is the per-buffer alternative to [reset]; see the module docs for guidance on which to use.

Mixing release and reset within the same arena cycle is not supported — the pool’s in-use list does not deduplicate, so a buffer returned via release and then bulk-returned via reset would land in the free list twice (each entry holds an ARC clone of the same Metal buffer; the duplication wastes a free-list slot but is not a memory leak — both clones drop together once popped). Pick one pattern per arena cycle.

Source

pub fn reset(&mut self)

Bulk-return every buffer handed out by [alloc] since the last reset to the pool’s free list.

§Caller contract

All MlxBuffer values returned by alloc since the last reset must be out-of-scope (dropped) at the time reset is called. Reset transfers the pool’s ARC clones to the free list, where they become available to subsequent [alloc] calls. If a caller is still holding an MlxBuffer and a later alloc re-issues the underlying buffer, the two callers will share GPU memory (aliasing). The Metal ARC keeps the storage alive in either case, but writes from the new caller will be visible to the stale caller — a correctness bug, not a memory error.

In Rust’s ownership model, locally-bound MlxBuffer values fall out of scope at the end of their lexical block, making the per-decode-token arena pattern safe by construction:

loop {
    pool.reset();          // start of token — recycle previous token's buffers
    forward_pass(&pool);   // many alloc(), no explicit release
}                          // forward_pass returns; locals dropped
Source

pub fn free_count(&self) -> usize

Return all free buffers’ count (for diagnostics).

Source

pub fn free_bytes(&self) -> usize

Total number of bytes held in the free list.

Source

pub fn in_use_count(&self) -> usize

Number of buffers currently in-use (alloc’d but not yet reset).

Source

pub fn clear(&mut self)

Clear all free buffers, releasing Metal memory. Does not affect in-use tracking.

Trait Implementations§

Source§

impl Default for MlxBufferPool

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.