Expand description
File-read → megakernel ring-slot pump. Linux-only.
The two halves needed for mapped-read → GPU-visible-memory → compute
already existed separately before this module: AsyncUringStream owns
the io_uring submission + completion queue and the GPU-mapped DMA buffer,
while crate::megakernel::Megakernel::publish_slot owns the host-side
ring-slot writer that signals a persistent GPU kernel. Nothing composed
them - a caller had to manually reach into both every dispatch.
UringMegakernelPump wires them together so a host thread can run one
compact loop:
pump.submit_file_scan(fd, offset, len, tenant, opcode, [a0,a1,a2])?;
pump.drain_into_ring(&mut ring_bytes)?;
// …later…
let epoch = pump.observe_epoch(&control_bytes);§Flow
submit_file_scanposts anIORING_OP_READ_FIXEDthat targetsGpuMappedBuffer[chunk_idx * slot_len..]. The bytes land in host-visible GPU memory, so the kernel sees them the moment the ring-slot status flips to PUBLISHED.- The (tenant, opcode, args) payload is staged in
pending: Vec<PendingPublish>keyed bychunk_idx. drain_into_ringpolls the io_uring CQ and, for each success, writes the staged slot into the caller-supplied ring buffer viaMegakernel::publish_slot. Errors surface with a structuredPipelineErrorthat names the failing chunk.
§Backpressure
The pump does not allocate new ring slots on its own -
submit_file_scan takes a caller-assigned slot_idx. The host
thread is responsible for slot bookkeeping (e.g., round-robin
over slot_count published slots with the kernel draining
them).
§Linux-only
This module only compiles on target_os = "linux"; the io_uring
surface itself is Linux-specific. Callers gate their pipeline
code the same way.
Structs§
- Uring
Megakernel Pump - Compose an
AsyncUringStreamwith the megakernel ring-slot writer so the host can drive the compatibility mapped-read ingest loop with one compact pump. Native NVMe → BAR1 ingest is owned bysuper::driver::NvmeGpuIngestDriver::new_gpudirect.