Skip to main content

Module megakernel

Module megakernel 

Source
Expand description

Persistent megakernel - the vyre Program that runs forever on the GPU, decoding host-fed ring opcodes from a host-fed ring buffer. Persistent megakernel - the GPU becomes a VIR0 bytecode interpreter.

One dispatch compiles the program; the kernel loops forever, pulling packed bytecode slots from a host-fed ring buffer and executing each. The host never re-dispatches - it only writes new slots and observes atomic counters in the control buffer.

§Layout

  • protocol - ring-buffer slot layout, control words, opcodes.
  • handlers - built-in opcode handlers + extension mechanism.
  • builder - IR Program construction (interpreted + JIT).
  • execution - compiled persistent-kernel handle and dispatch path.
  • resident - host mirrors for GPU-resident runtime buffers.
  • readback - strict output-buffer decoding after dispatch.
  • recovery - device-loss classification and pipeline rebuild.

§Coordination protocol

  1. Read control[SHUTDOWN]; if non-zero, Node::Return.
  2. Read this slot’s status; skip idle slots without tenant metadata loads.
  3. If PUBLISHED, read tenant_id; authorize via tenant-mask table.
  4. CAS ring_buffer[status] from PUBLISHED → CLAIMED.
  5. Dispatch on opcode through If-tree (or JIT fused body).
  6. atomic_add(control[DONE_COUNT], 1).
  7. Store DONE into the status word.

Re-exports§

pub use automata_worklist::AutomataStateIndex;
pub use automata_worklist::AutomataWorklistEvidence;
pub use automata_worklist::AutomataWorklistMode;
pub use automata_worklist::AutomataWorklistPolicy;
pub use automata_worklist::AutomataWorklistRecommendation;
pub use automata_worklist::AutomataWorklistRequest;
pub use automata_worklist::AUTOMATA_WORKLIST_EVIDENCE_SCHEMA_VERSION;
pub use builder::build_program;
pub use builder::build_program_jit;
pub use builder::build_program_jit_slots;
pub use builder::build_program_priority;
pub use builder::build_program_priority_slots;
pub use builder::build_program_sharded;
pub use builder::build_program_sharded_no_io;
pub use builder::build_program_sharded_once_slots;
pub use builder::build_program_sharded_once_slots_control_report_shared;
pub use builder::build_program_sharded_once_slots_shared;
pub use builder::build_program_sharded_slots;
pub use builder::build_program_sharded_slots_shared;
pub use builder::build_program_sharded_with_io_polling;
pub use builder::build_program_sharded_with_workspace_adapter;
pub use builder::persistent_body;
pub use builder::persistent_body_jit;
pub use builder::persistent_body_priority;
pub use builder::persistent_body_priority_slots;
pub use builder::try_build_program_with_self_loading_miss_handler;
pub use builder::build_program_with_self_loading_miss_handler;
pub use descriptor::BatchDescriptor;
pub use descriptor::BuiltinOpcode;
pub use descriptor::PackedOpDescriptor;
pub use descriptor::SlotDescriptor;
pub use descriptor::SlotOpcode;
pub use descriptor::WindowClass;
pub use descriptor::WindowDescriptor;
pub use execution::Megakernel;
pub use execution::MegakernelDispatchOutput;
pub use execution::MegakernelDispatchStats;
pub use execution::MegakernelResidentHandles;
pub use handlers::OpcodeHandler;
pub use io::IoCompletion;
pub use io::IoRequest;
pub use io::MegakernelIoQueue;
pub use io::IO_SLOT_COUNT;
pub use io::IO_SLOT_WORDS;
pub use mixed_work::mixed_work_protocol_evidence;
pub use mixed_work::validate_mixed_work_protocol;
pub use mixed_work::MixedWorkProtocolError;
pub use mixed_work::MixedWorkProtocolEvidence;
pub use mixed_work::MixedWorkProtocolPlan;
pub use mixed_work::MixedWorkQueueClass;
pub use mixed_work::MixedWorkUnit;
pub use mixed_work::MixedWorkUnitType;
pub use mixed_work::OutputSlabId;
pub use mixed_work::ResidentArtifactId;
pub use mixed_work::MIXED_WORK_PROTOCOL_SCHEMA_VERSION;
pub use planner::build_bellman_tn_order_program;
pub use planner::build_kfac_autotune_step_program;
pub use planner::build_persistent_fixpoint_program;
pub use planner::build_scallop_lineage_with_scratch;
pub use planner::build_scallop_provenance_wide_program;
pub use planner::build_sinkhorn_clustering_program;
pub use planner::build_sinkhorn_full_clustering_program;
pub use planner::build_scallop_lineage_with_program_and_scratch;
pub use planner::default_worker_groups_from_limits;
pub use planner::dispatch_grid_for;
pub use planner::padded_slot_count;
pub use planner::plan_compact_fusion_into;
pub use planner::select_fused_subset;
pub use planner::select_fused_subset_compact;
pub use planner::select_fused_subset_compact_into;
pub use planner::select_fused_subset_into;
pub use planner::select_fused_subset_with_rate;
pub use planner::select_optimal_fused_subset;
pub use planner::try_detect_cross_arm_redundancy;
pub use planner::try_prune_redundant_work_items_into;
pub use planner::try_prune_redundant_work_items_with_scratch_into;
pub use planner::worker_workgroup_size;
pub use planner::CompactFusionPlanningScratch;
pub use planner::CrossArmRedundancy;
pub use planner::FusionSelectionScratch;
pub use planner::MegakernelCaps;
pub use planner::MegakernelConfig;
pub use planner::MegakernelGridLimits;
pub use planner::MegakernelGridPlan;
pub use planner::MegakernelGridRequest;
pub use planner::MegakernelLaunchGeometry;
pub use planner::MegakernelReport;
pub use planner::MegakernelSizingPolicy;
pub use planner::MegakernelTelemetry;
pub use planner::MegakernelWorkItem;
pub use planner::MegakernelWorkloadHints;
pub use planner::RedundantWorkItemPruneScratch;
pub use planner::prune_redundant_work_items_into;
pub use planner::prune_redundant_work_items_with_scratch_into;
pub use policy::diffuse_priority_across_siblings;
pub use policy::diffuse_priority_across_siblings_into;
pub use policy::try_diffuse_priority_across_siblings;
pub use policy::try_diffuse_priority_across_siblings_into;
pub use policy::MegakernelDispatchTopology;
pub use policy::MegakernelExecutionMode;
pub use policy::MegakernelGraphBlasSwitchClass;
pub use policy::MegakernelLaunchCacheStats;
pub use policy::MegakernelLaunchPolicy;
pub use policy::MegakernelLaunchRecommendation;
pub use policy::MegakernelLaunchRequest;
pub use policy::MegakernelPromotionEvidence;
pub use policy::MegakernelPromotionRoute;
pub use policy::MegakernelQueuePressure;
pub use policy::MegakernelTopologyEvidence;
pub use policy::PriorityDrainReason;
pub use policy::PriorityDrainRecommendation;
pub use policy::PriorityRequeueAccounting;
pub use policy::HOT_WINDOW_PROMOTION_EVIDENCE_SCHEMA_VERSION;
pub use policy::PRIORITY_COUNTER_DRAIN_FIX;
pub use policy::PRIORITY_COUNTER_DRAIN_HEADROOM;
pub use policy::TOPOLOGY_EVIDENCE_SCHEMA_VERSION;
pub use protocol::control;
pub use protocol::control_byte_len;
pub use protocol::count_done_ring_slots;
pub use protocol::debug;
pub use protocol::debug_log_byte_len;
pub use protocol::encode_control;
pub use protocol::encode_empty_debug_log;
pub use protocol::encode_empty_ring;
pub use protocol::opcode;
pub use protocol::read_debug_log;
pub use protocol::read_done_count;
pub use protocol::read_epoch;
pub use protocol::read_metrics;
pub use protocol::read_observable;
pub use protocol::ring_byte_len;
pub use protocol::slot;
pub use protocol::try_count_done_ring_slots;
pub use protocol::try_encode_control;
pub use protocol::try_encode_control_into;
pub use protocol::try_encode_empty_debug_log;
pub use protocol::try_encode_empty_debug_log_into;
pub use protocol::try_encode_empty_ring;
pub use protocol::try_encode_empty_ring_into;
pub use protocol::try_read_debug_log;
pub use protocol::try_read_done_count;
pub use protocol::try_read_epoch;
pub use protocol::try_read_metrics;
pub use protocol::try_read_observable;
pub use protocol::DebugRecord;
pub use protocol::ProtocolError;
pub use protocol::ARG0_WORD;
pub use protocol::ARGS_PER_SLOT;
pub use protocol::CONTROL_MIN_WORDS;
pub use protocol::OPCODE_WORD;
pub use protocol::PRIORITY_WORD;
pub use protocol::SLOT_WORDS;
pub use protocol::STATUS_WORD;
pub use protocol::TENANT_WORD;
pub use readback::MegakernelReadback;
pub use readback::MegakernelReadbackCounters;
pub use telemetry::MegakernelRuntimeEvidence;
pub use telemetry::RuntimeEvidenceMetricCoverage;
pub use telemetry::RuntimeEvidenceMetricFamily;
pub use telemetry::TelemetryDecodeCapacityEvidence;
pub use telemetry::TelemetryDecodeScratch;
pub use telemetry::RUNTIME_IO_EVIDENCE_SCHEMA_VERSION;
pub use telemetry::TELEMETRY_DECODE_CAPACITY_SCHEMA_VERSION;
pub use recovery::backend_error_indicates_device_loss;
pub use recovery::MegakernelRecoveryDecision;
pub use recovery::MegakernelRecoveryPolicy;
pub use resident::MegakernelResidentBuffers;
pub use resident::MegakernelResidentDispatchScratch;
pub use rule_catalog::BatchRuleProgram;
pub use rule_catalog::BatchRuleRejection;
pub use scheduler::default_priority_offsets;
pub use scheduler::priority_partition_active_lane_count;
pub use scheduler::priority_partition_probe_budget;
pub use scheduler::priority_partition_probe_count;
pub use scheduler::priority_scan_body;
pub use scheduler::priority_scan_body_with_stride;
pub use scheduler::try_default_priority_offsets;
pub use scheduler::write_default_priority_offsets;
pub use speculation::PairedSpeculationSample;
pub use speculation::PairedSpeculationUpdate;
pub use speculation::PairedSpeculationWindow;
pub use task::TaskPriority;
pub use task::TaskQueueSnapshot;
pub use task::TaskState;
pub use task::TaskWorkItem;
pub use telemetry::ControlSnapshot;
pub use telemetry::CountMinSketch;
pub use telemetry::MegakernelRuntimeCounters;
pub use telemetry::RingOccupancy;
pub use telemetry::RingSlotSnapshot;
pub use telemetry::RingStatus;
pub use telemetry::RingTelemetry;
pub use telemetry::SketchTelemetry;
pub use telemetry::WindowTelemetry;
pub use workspace_adapter::MegakernelWorkspaceAdapter;
pub use workspace_layout::build_workspace_regions;
pub use workspace_layout::first_workspace_region;
pub use workspace_layout::next_record_workspace_region;
pub use workspace_layout::next_workspace_region;
pub use workspace_layout::workspace_record_words;
pub use workspace_layout::MegakernelWorkspaceLayoutError;
pub use workspace_layout::MegakernelWorkspaceRegion;
pub use workspace_layout::MegakernelWorkspaceRegionSpec;

Modules§

advanced
Advanced Tier-2 LEGO building blocks for the persistent megakernel.
automata_worklist
Non-blocking automata worklist policy for resident megakernel scheduling.
builder
IR program builders - construct the megakernel Program from vyre IR.
descriptor
Typed host-side descriptors for publishing work into the megakernel ring.
execution
Compiled persistent-megakernel handle and dispatch path.
handlers
Built-in opcode handler bodies - STORE_U32, ATOMIC_ADD, PRINTF, SHUTDOWN.
io
IO subsystem - GPU↔runtime DMA request queue for the persistent megakernel.
ir_util
Shared IR fragments used by megakernel builders and schedulers.
mixed_work
Runtime-owned mixed-work protocol for resident megakernel batches.
planner
Megakernel planning, fusion selection, sizing, and work-item contracts.
policy
Resident megakernel launch policy and queue-pressure decisions.
protocol
Ring-buffer protocol constants - slot layout, control words, opcodes, debug log.
readback
Typed host readback view for persistent megakernel outputs.
recovery
Device-loss classification and persistent-pipeline rebuild policy.
resident
Host mirrors for megakernel GPU-resident runtime buffers.
ring
Ring producer / consumer traits for the megakernel host protocol.
rule_catalog
DFA rule catalog packing for batched megakernel dispatch.
scaling
Occupancy-aware grid scaling for megakernels.
scheduler
Work scheduler - priority-aware slot scanning for the persistent megakernel.
speculation
Runtime-side paired speculation races for megakernel dispatch.
task
Resident task queue ABI for pause, resume, requeue, and priority aging.
telemetry
Host-side telemetry decoders for the megakernel ring and control buffers.
workspace_adapter
Consumer-neutral resident workspace adapter for megakernel programs.
workspace_layout
Generic resident workspace region layout for megakernel adapters.

Enums§

RingSlotTransition
Explicit host-observable ring slot lifecycle transition.

Traits§

MegakernelDispatch
Backend-neutral megakernel dispatch contract.