pub struct CudaMegakernelPlanCache { /* private fields */ }Expand description
Bounded LRU cache for CUDA megakernel topology plans.
Implementations§
Source§impl CudaMegakernelPlanCache
impl CudaMegakernelPlanCache
Sourcepub fn with_max_entries(max_entries: usize) -> Self
pub fn with_max_entries(max_entries: usize) -> Self
Create a cache with an explicit entry bound.
Sourcepub fn get_or_insert_with(
&mut self,
key: CudaMegakernelPlanCacheKey,
build: impl FnOnce() -> CudaMegakernelTopologyDecision,
) -> Result<CudaMegakernelCachedPlan, CudaMegakernelMemoryError>
pub fn get_or_insert_with( &mut self, key: CudaMegakernelPlanCacheKey, build: impl FnOnce() -> CudaMegakernelTopologyDecision, ) -> Result<CudaMegakernelCachedPlan, CudaMegakernelMemoryError>
Return a cached plan or insert a newly selected topology decision.
Sourcepub fn get_or_select_topology(
&mut self,
graph_layout_hash: u64,
analysis_kind: CudaMegakernelAnalysisKind,
device: CudaMegakernelDeviceKey,
sample: CudaMegakernelScheduleSample,
graph: CudaMegakernelGraphShape,
memory: CudaMegakernelMemoryBudget,
launch_overhead_ns: f64,
fusion_pressure: f64,
) -> Result<CudaMegakernelCachedPlan, CudaMegakernelMemoryError>
pub fn get_or_select_topology( &mut self, graph_layout_hash: u64, analysis_kind: CudaMegakernelAnalysisKind, device: CudaMegakernelDeviceKey, sample: CudaMegakernelScheduleSample, graph: CudaMegakernelGraphShape, memory: CudaMegakernelMemoryBudget, launch_overhead_ns: f64, fusion_pressure: f64, ) -> Result<CudaMegakernelCachedPlan, CudaMegakernelMemoryError>
Return a cached topology plan or select and cache one from the current CUDA telemetry sample.
This is the hot-path convenience API: callers provide stable graph, analysis, device, and telemetry inputs, while the cache owns the pressure bucketing needed to avoid stale sparse/dense decisions.
Sourcepub fn get_or_plan_execution(
&mut self,
graph_layout_hash: u64,
analysis_kind: CudaMegakernelAnalysisKind,
device: CudaMegakernelDeviceKey,
sample: CudaMegakernelScheduleSample,
graph: CudaMegakernelGraphShape,
bytes_per_node: u64,
bytes_per_edge: u64,
frontier_bytes: u64,
scratch_bytes: u64,
output_bytes: u64,
budget_bytes: u64,
launch_overhead_ns: f64,
fusion_pressure: f64,
) -> Result<CudaMegakernelExecutionPlan, CudaMegakernelMemoryError>
pub fn get_or_plan_execution( &mut self, graph_layout_hash: u64, analysis_kind: CudaMegakernelAnalysisKind, device: CudaMegakernelDeviceKey, sample: CudaMegakernelScheduleSample, graph: CudaMegakernelGraphShape, bytes_per_node: u64, bytes_per_edge: u64, frontier_bytes: u64, scratch_bytes: u64, output_bytes: u64, budget_bytes: u64, launch_overhead_ns: f64, fusion_pressure: f64, ) -> Result<CudaMegakernelExecutionPlan, CudaMegakernelMemoryError>
Return a cache-backed, memory-validated CUDA megakernel execution plan.
The cache key uses sparse-plan memory pressure because sparse is the lower-bound resident footprint shared by every topology. A cache hit reuses the prior topology decision, then this method validates the exact current dense/fused/sparse byte budget before returning a launchable plan. If the cached non-sparse topology no longer fits, the method downgrades to sparse only after proving the sparse plan fits.
Sourcepub fn stats(&self) -> CudaMegakernelPlanCacheStats
pub fn stats(&self) -> CudaMegakernelPlanCacheStats
Return cache counters.