Expand description
Buffer arena for the wgpu backend. Mirrors the rlx-metal arena
shape: pre-plan one big storage buffer at compile time, sub-allocate
per-node offsets at known positions, treat I/O as write_buffer /
read_buffer against those offsets.
wgpu’s storage buffers are fine for both reads and writes from
compute shaders; there’s no shared-memory requirement at the API
level (unlike Metal where StorageModeShared matters). On Apple
Silicon wgpu’s Metal backend gives us unified memory automatically.
Structs§
- Arena
- One contiguous arena buffer + per-node byte offsets. Lives for the entire executable graph’s lifetime.
- Readback
Layout - Layout for batched output readback into a staging buffer.
- Readback
Staging - Reusable MAP_READ staging buffer for output readback.
- Tiny
Readback Staging - Fixed 256 B MAP_READ staging for scalar (≤16 B) readback — avoids
map_buffer_on_submit+ full-layout decode on MoltenVK hot paths.
Functions§
- decode_
mapped_ readback_ f32 - Decode f32 outputs after submit + map callback (used with
schedule_readback_map). - decode_
tiny_ mapped_ f32 - After submit: decode one f32 vector from an already-mapped tiny staging buffer.
- encode_
readback_ copies - Append arena→staging copies to an encoder (no submit).
- map_
readback_ f32 - Map staging after submit and decode f32 outputs (one poll).
- plan_
f32_ uniform - Plan memory using f32-sized slots regardless of declared IR dtype,
with liveness-aware slot reuse (see
rlx_compile::memory::plan_memory_f32_uniform). - read_
f32_ many_ pooled - Read several nodes with one submit + one poll (contiguous staging layout).
- read_
f32_ pooled - Read one node via a reused staging buffer (one submit + one poll).
- read_
tiny_ f32_ after_ submit - After submit: map only
lenbytes and decode one f32 vector. - schedule_
readback_ map - Schedule
map_asyncon the encoder so mapping starts with submit (wgpu 29+). - use_
tiny_ readback - True when fused readback can use the tiny scalar fast path.
- wait_
readback_ map - Poll until a readback map callback completes (fast path for tiny outputs).