pub struct MlxBufferPool { /* private fields */ }Expand description
Arena-style buffer pool that reuses Metal buffer allocations.
§Design
- Buffers are bucketed by their allocated size rounded up to the nearest power of two. This reduces fragmentation at the cost of occasionally over-allocating by up to 2x.
release()returns a single buffer;reset()returns all outstanding buffers handed out since the last reset.- The
MlxDeviceis passed in at every [alloc] call (rather than stored in the pool). This keeps the pool free of lifetime parameters so it can be embedded in any owner struct (e.g. the per-decode-tokenDecodeBufferscache in hf2q’s qwen35 forward path).
§Why an arena reset matters
In the per-decode-token hot path, each token allocates ~1750 Metal buffers
for scratch / intermediate / parameter storage across attention, FFN, and
linear-attention layers. Direct MlxDevice::alloc_buffer() calls hit
Metal’s allocator each time (5-30 µs each); pooling reuses the underlying
metal::Buffer objects across token boundaries so steady-state allocation
cost amortizes to near zero. See ADR-012 §Optimize / Task #15 for the
MoE dwq46 0.90× parity gap that motivated this work.
Implementations§
Source§impl MlxBufferPool
impl MlxBufferPool
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new empty buffer pool. The Metal device is passed to
[alloc] at every call site, so the pool itself is lifetime-free.
Sourcepub fn alloc(
&mut self,
device: &MlxDevice,
byte_len: usize,
dtype: DType,
shape: Vec<usize>,
) -> Result<MlxBuffer>
pub fn alloc( &mut self, device: &MlxDevice, byte_len: usize, dtype: DType, shape: Vec<usize>, ) -> Result<MlxBuffer>
Allocate a buffer from the pool.
If a free buffer of compatible size exists in the pool, it is reused
(with updated dtype/shape metadata). Otherwise a new Metal buffer is
allocated from device at the bucket size so future reuse is
possible for any request up to that bucket.
Each successful alloc registers the buffer in the pool’s in-use
list (ARC clone — cheap), so a subsequent [reset] returns it to
the free list automatically.
Sourcepub fn release(&mut self, buffer: MlxBuffer)
pub fn release(&mut self, buffer: MlxBuffer)
Return a single buffer to the pool’s free list for future reuse.
The Metal memory is not deallocated — it stays resident on the GPU
for fast reuse. release is the per-buffer alternative to [reset];
see the module docs for guidance on which to use.
Mixing release and reset within the same arena cycle is not
supported — the pool’s in-use list does not deduplicate, so a buffer
returned via release and then bulk-returned via reset would land in
the free list twice (each entry holds an ARC clone of the same Metal
buffer; the duplication wastes a free-list slot but is not a memory
leak — both clones drop together once popped). Pick one pattern per
arena cycle.
Sourcepub fn reset(&mut self)
pub fn reset(&mut self)
Bulk-return every buffer handed out by [alloc] since the last reset
to the pool’s free list.
§Caller contract
All MlxBuffer values returned by alloc since the last reset must be
out-of-scope (dropped) at the time reset is called. Reset transfers
the pool’s ARC clones to the free list, where they become available to
subsequent [alloc] calls. If a caller is still holding an MlxBuffer
and a later alloc re-issues the underlying buffer, the two callers
will share GPU memory (aliasing). The Metal ARC keeps the storage
alive in either case, but writes from the new caller will be visible
to the stale caller — a correctness bug, not a memory error.
In Rust’s ownership model, locally-bound MlxBuffer values fall out of
scope at the end of their lexical block, making the per-decode-token
arena pattern safe by construction:
loop {
pool.reset(); // start of token — recycle previous token's buffers
forward_pass(&pool); // many alloc(), no explicit release
} // forward_pass returns; locals droppedSourcepub fn free_count(&self) -> usize
pub fn free_count(&self) -> usize
Return all free buffers’ count (for diagnostics).
Sourcepub fn free_bytes(&self) -> usize
pub fn free_bytes(&self) -> usize
Total number of bytes held in the free list.
Sourcepub fn in_use_count(&self) -> usize
pub fn in_use_count(&self) -> usize
Number of buffers currently in-use (alloc’d but not yet reset).