Module buffer

Expand description

The f32-uniform GPU arena. Like rlx-cuda / rlx-wgpu, every tensor is an f32 slot at a byte offset in one contiguous buffer. We allocate the arena as HOST_VISIBLE | HOST_COHERENT memory and keep it persistently mapped, so host upload/readback is a plain memcpy with no staging buffer or transfer command. (On discrete GPUs a DEVICE_LOCAL arena + staging would have higher bandwidth — a documented follow-up; correctness first.)

Structs§

Arena

Module buffer

Module buffer Copy item path

Structs§

Module buffer