Module builder

Source

Expand description

IR program builders - construct the megakernel Program from vyre IR.

Two flavours:

Interpreted (build_program_sharded) - If-tree opcode dispatch.
JIT (build_program_jit) - payload processor fused directly.

Functions§

build_program: Build the default megakernel IR (256 lanes × 1 workgroup, no custom opcodes).
build_program_jit: Build the JIT Megakernel IR where payload processor logic is fused into the body stream.
build_program_jit_slots: Build the JIT megakernel IR for an explicit number of ring slots.
build_program_priority: Build a priority-aware megakernel IR.
build_program_priority_slots: Build a priority-aware megakernel IR for an explicit ring slot count.
build_program_sharded: Build the megakernel IR with a custom workgroup size and optional custom opcodes.
build_program_sharded_no_io: Build the megakernel IR without the IO polling sidecar.
build_program_sharded_once_slots: Build a finite one-pass sharded megakernel IR for host-submitted batches.
build_program_sharded_once_slots_control_report_shared: Build a finite one-pass megakernel that reports completion through the control buffer only.
build_program_sharded_once_slots_shared: Shared-Arc variant of build_program_sharded_once_slots for hot runtime dispatchers that must not clone the megakernel template every launch.
build_program_sharded_slots: Build the megakernel IR for an explicit number of ring slots.
build_program_sharded_slots_shared: Build the sharded megakernel IR as a shared immutable template.
build_program_sharded_with_io_polling: Build the megakernel IR with the experimental IO polling sidecar.
build_program_sharded_with_workspace_adapter: Build the sharded megakernel IR with a consumer-owned resident workspace.
build_program_with_self_loading_miss_handler: Build the megakernel IR with a self-loading load-miss handler.
persistent_body: The body that runs once per iteration per lane. Exposed for tests and downstream crates that splice additional opcodes.
persistent_body_jit: The JIT body that runs once per iteration per lane.
persistent_body_priority: Priority-aware loop body. Replaces the per-lane 1:1 slot mapping with the scheduler’s priority scan.
persistent_body_priority_slots: Priority-aware loop body for an explicit ring slot count.
try_build_program_with_self_loading_miss_handler: Fallible variant of build_program_with_self_loading_miss_handler.
try_persistent_body: Fallible persistent body builder with explicit staging-allocation reporting.

Module builder

Module builder Copy item path

Functions§

Module builder