Skip to main content

Module buffer

Module buffer 

Source
Expand description

Buffer arena for the wgpu backend. Mirrors the rlx-metal arena shape: pre-plan one big storage buffer at compile time, sub-allocate per-node offsets at known positions, treat I/O as write_buffer / read_buffer against those offsets.

wgpu’s storage buffers are fine for both reads and writes from compute shaders; there’s no shared-memory requirement at the API level (unlike Metal where StorageModeShared matters). On Apple Silicon wgpu’s Metal backend gives us unified memory automatically.

Structs§

Arena
One contiguous arena buffer + per-node byte offsets. Lives for the entire executable graph’s lifetime.
ReadbackLayout
Layout for batched output readback into a staging buffer.
ReadbackStaging
Reusable MAP_READ staging buffer for output readback.
TinyReadbackStaging
Fixed 256 B MAP_READ staging for scalar (≤16 B) readback — avoids map_buffer_on_submit + full-layout decode on MoltenVK hot paths.

Functions§

decode_mapped_readback_f32
Decode f32 outputs after submit + map callback (used with schedule_readback_map).
decode_tiny_mapped_f32
After submit: decode one f32 vector from an already-mapped tiny staging buffer.
encode_readback_copies
Append arena→staging copies to an encoder (no submit).
map_readback_f32
Map staging after submit and decode f32 outputs (one poll).
plan_f32_uniform
Plan memory using f32-sized slots regardless of declared IR dtype, with liveness-aware slot reuse (see rlx_compile::memory::plan_memory_f32_uniform).
read_f32_many_pooled
Read several nodes with one submit + one poll (contiguous staging layout).
read_f32_pooled
Read one node via a reused staging buffer (one submit + one poll).
read_tiny_f32_after_submit
After submit: map only len bytes and decode one f32 vector.
schedule_readback_map
Schedule map_async on the encoder so mapping starts with submit (wgpu 29+).
use_tiny_readback
True when fused readback can use the tiny scalar fast path.
wait_readback_map
Poll until a readback map callback completes (fast path for tiny outputs).