pub struct Arena {
pub buffer: Buffer,
pub f16_buffer: Option<Buffer>,
pub offsets: HashMap<NodeId, usize>,
pub lens: HashMap<NodeId, usize>,
pub size: usize,
pub scratch_off: usize,
pub scratch_bytes: usize,
}Expand description
One contiguous arena buffer + per-node byte offsets. Lives for the entire executable graph’s lifetime.
Fields§
§buffer: BufferUnderlying GPU buffer. Bound as a single STORAGE_READ_WRITE resource for every kernel; offsets disambiguate per-node access.
f16_buffer: Option<Buffer>Optional shadow buffer holding f16 versions of every value
written via write_f32. Sized at half the arena byte budget
(each f32 element pairs with an f16 element at the same logical
index — i.e. f16_off = f32_off / 2). Created only when the
device exposes the SHADER_F16 feature; matmul kernels with
f16-typed B input bind both buffer (for f32 activations) and
f16_buffer (for f16 weights). Halves global memory traffic
on the dominant matmul reads.
offsets: HashMap<NodeId, usize>Per-node byte offset into buffer.
lens: HashMap<NodeId, usize>Per-node byte length.
size: usizeTotal arena size in bytes.
scratch_off: usizeByte offset of the tail scratch zone (also size - scratch_bytes).
Set when callers request scratch via from_plan_with_scratch.
Reuseable across ops since scratch is temporary — only one
op writes to it at a time within a schedule.
scratch_bytes: usizeSize in bytes of the tail scratch zone (0 when not used).
Implementations§
Source§impl Arena
impl Arena
Sourcepub fn from_plan_with_scratch(
device: &Device,
plan: &MemoryPlan,
scratch_bytes: usize,
) -> Self
pub fn from_plan_with_scratch( device: &Device, plan: &MemoryPlan, scratch_bytes: usize, ) -> Self
Build an arena from a memory plan with an extra tail scratch zone
of scratch_bytes reserved past the plan’s arena_size. Useful for
ops that need throwaway temp storage that doesn’t fit in a
workgroup-shared variable.
Sourcepub fn from_plan(device: &Device, plan: &MemoryPlan) -> Self
pub fn from_plan(device: &Device, plan: &MemoryPlan) -> Self
Build an arena from a memory plan. Allocates one big buffer sized to fit every node’s offset+length.
pub fn has(&self, id: NodeId) -> bool
pub fn offset(&self, id: NodeId) -> usize
pub fn len_of(&self, id: NodeId) -> usize
Sourcepub fn param_fits_f16_mirror(&self, id: NodeId) -> bool
pub fn param_fits_f16_mirror(&self, id: NodeId) -> bool
Whether this node’s f16 mirror fits in the capped f16 shadow buffer.
Sourcepub fn set_actual_len(&mut self, id: NodeId, bytes: usize)
pub fn set_actual_len(&mut self, id: NodeId, bytes: usize)
Override the actual data length (in bytes) for a node. The backend calls this after planning to record true elem*4 sizes instead of the alignment-padded slot sizes.
Sourcepub fn write_f32(&self, queue: &Queue, id: NodeId, data: &[f32])
pub fn write_f32(&self, queue: &Queue, id: NodeId, data: &[f32])
Write f32 data into the node’s slot. The queue performs an
async transfer; subsequent kernel dispatches on the same queue
see the new bytes. When the device supports SHADER_F16, also
downcasts and writes the same data into the f16 shadow buffer
at offset f32_offset / 2 — so matmul kernels with f16 weight
bindings can read directly from there at half the bandwidth.
Sourcepub fn write_f16_shadow(&self, queue: &Queue, id: NodeId, data: &[f32])
pub fn write_f16_shadow(&self, queue: &Queue, id: NodeId, data: &[f32])
Downcast host f32 data into the f16 shadow buffer at id’s slot.
Used when skipping redundant f32 write_buffer but CoopF16Vk still
needs a fresh f16 mirror (e.g. input upload hash cache hits).
Sourcepub fn read_f32(&self, device: &Device, queue: &Queue, id: NodeId) -> Vec<f32>
pub fn read_f32(&self, device: &Device, queue: &Queue, id: NodeId) -> Vec<f32>
Read a node’s bytes back to host f32. Uses a fresh staging buffer;
hot paths should call read_f32_pooled with a reused ReadbackStaging.
Sourcepub fn read_bytes_range(
&self,
device: &Device,
queue: &Queue,
byte_off: usize,
len: usize,
) -> Vec<u8> ⓘ
pub fn read_bytes_range( &self, device: &Device, queue: &Queue, byte_off: usize, len: usize, ) -> Vec<u8> ⓘ
Read a byte range from the arena (used for packed GGUF weights).
Sourcepub fn write_bytes_range(&self, queue: &Queue, byte_off: usize, data: &[u8])
pub fn write_bytes_range(&self, queue: &Queue, byte_off: usize, data: &[u8])
Write raw bytes into the arena at byte_off.
Auto Trait Implementations§
impl !RefUnwindSafe for Arena
impl !UnwindSafe for Arena
impl Freeze for Arena
impl Send for Arena
impl Sync for Arena
impl Unpin for Arena
impl UnsafeUnpin for Arena
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more