pub struct MegakernelLaunchPolicy {Show 13 fields
pub sizing: MegakernelSizingPolicy,
pub min_hit_capacity: u32,
pub hit_capacity_multiplier: u32,
pub saturated_waves: u32,
pub hot_opcode_threshold: u32,
pub hot_window_threshold: u32,
pub jit_queue_len_threshold: u32,
pub priority_age_threshold: u32,
pub sparse_frontier_threshold_bps: u16,
pub dense_frontier_threshold_bps: u16,
pub memory_pressure_threshold_bps: u16,
pub fusion_edge_threshold: u32,
pub scratch_bytes_per_hit: u32,
}Expand description
Single policy surface for megakernel launch sizing and telemetry-driven routing.
Fields§
§sizing: MegakernelSizingPolicySizing policy for worker counts and grid geometry.
min_hit_capacity: u32Minimum capacity for sparse-hit results.
hit_capacity_multiplier: u32Multiplier for expected hits to determine capacity.
saturated_waves: u32Number of waves that define a saturated queue.
hot_opcode_threshold: u32Threshold for promoting hot opcodes to JIT.
hot_window_threshold: u32Threshold for promoting hot windows to JIT.
jit_queue_len_threshold: u32Queue length threshold to prefer JIT over interpreter.
priority_age_threshold: u32Priority age threshold to trigger aging promotions.
sparse_frontier_threshold_bps: u16Frontier density at or below this value uses sparse expansion.
dense_frontier_threshold_bps: u16Frontier density at or above this value uses dense propagation.
memory_pressure_threshold_bps: u16Memory pressure at or above this value uses the memory-constrained path.
fusion_edge_threshold: u32Minimum graph edge count before dense hot work is eligible for fusion.
scratch_bytes_per_hit: u32Conservative resident scratch bytes needed per sparse-hit entry.
Implementations§
Source§impl MegakernelLaunchPolicy
impl MegakernelLaunchPolicy
Sourcepub fn launch_cache_stats() -> MegakernelLaunchCacheStats
pub fn launch_cache_stats() -> MegakernelLaunchCacheStats
Return launch recommendation cache telemetry for the current thread.
Sourcepub fn reset_launch_cache_for_thread()
pub fn reset_launch_cache_for_thread()
Clear launch recommendation cache entries and counters for this thread.
Sourcepub fn recommend(
&self,
request: MegakernelLaunchRequest,
) -> Result<MegakernelLaunchRecommendation, BackendError>
pub fn recommend( &self, request: MegakernelLaunchRequest, ) -> Result<MegakernelLaunchRecommendation, BackendError>
Recommend geometry, hit capacity, and interpreter/JIT route.
§Errors
Returns BackendError when required adapter limits are zero or derived
launch values cannot fit the u32 ring protocol.
Sourcepub fn recommend_with_topology_evidence(
&self,
request: MegakernelLaunchRequest,
) -> Result<(MegakernelLaunchRecommendation, MegakernelTopologyEvidence), BackendError>
pub fn recommend_with_topology_evidence( &self, request: MegakernelLaunchRequest, ) -> Result<(MegakernelLaunchRecommendation, MegakernelTopologyEvidence), BackendError>
Recommend a launch and emit topology evidence for parity benches.
§Errors
Returns BackendError when the underlying recommendation cannot be
built from the request or adapter limits.
Sourcepub fn recommend_with_promotion_evidence(
&self,
request: MegakernelLaunchRequest,
) -> Result<(MegakernelLaunchRecommendation, MegakernelPromotionEvidence), BackendError>
pub fn recommend_with_promotion_evidence( &self, request: MegakernelLaunchRequest, ) -> Result<(MegakernelLaunchRecommendation, MegakernelPromotionEvidence), BackendError>
Recommend a launch and emit hot opcode/window promotion evidence.
§Errors
Returns BackendError when the underlying recommendation cannot be
built from the request or adapter limits.
Sourcepub fn recommend_with_previous_topology(
&self,
request: MegakernelLaunchRequest,
previous_topology: MegakernelDispatchTopology,
) -> Result<MegakernelLaunchRecommendation, BackendError>
pub fn recommend_with_previous_topology( &self, request: MegakernelLaunchRequest, previous_topology: MegakernelDispatchTopology, ) -> Result<MegakernelLaunchRecommendation, BackendError>
Recommend a launch while preserving the previous topology inside a narrow hysteresis band.
CUDA resident graphs and long-running dataflow streams should use this entry point when they can track the last successful topology. It prevents borderline frontier-density or memory-pressure telemetry from repeatedly switching kernel variants, invalidating launch plans, and disturbing cache locality at scale.
§Errors
Returns BackendError when required adapter limits are zero or derived
launch values cannot fit the u32 ring protocol.
Sourcepub fn autotune_hit_capacity_multiplier(
&self,
candidate_multipliers: &[u32],
costs: &[f64],
) -> u32
pub fn autotune_hit_capacity_multiplier( &self, candidate_multipliers: &[u32], costs: &[f64], ) -> u32
Select the best hit_capacity_multiplier from a candidate set.
candidate_multipliers are the multipliers to try; costs[i]
is the observed dispatch latency (or any minimization metric)
when candidate_multipliers[i] was used. Lower cost wins; the
minimum observed cost selects the multiplier.
Returns the chosen multiplier. If candidate_multipliers is
empty, returns the policy’s existing hit_capacity_multiplier.
Sourcepub fn autotune_workgroup_size(
&self,
candidate_sizes: &[u32],
costs: &[f64],
current_size: u32,
) -> u32
pub fn autotune_workgroup_size( &self, candidate_sizes: &[u32], costs: &[f64], current_size: u32, ) -> u32
Select the best workgroup-size from a candidate set.
candidate_sizes[i] is paired
with costs[i] (lower is better). Returns the chosen size or
the policy’s sizing.default_workgroup_size_x() fallback.
Sourcepub fn natural_gradient_autotune_step(
m_inv_sqrt: &[f64],
grad: &[f64],
n: u32,
learning_rate: f64,
) -> Vec<f64>
pub fn natural_gradient_autotune_step( m_inv_sqrt: &[f64], grad: &[f64], n: u32, learning_rate: f64, ) -> Vec<f64>
Compute the next-step parameter delta for a continuous autotune knob using a Fisher-preconditioned natural-gradient step.
m_inv_sqrt: inverse-square-root of the Fisher block (n×n
row-major). Passing an identity matrix reduces the natural
gradient to plain gradient descent.
grad: plain gradient ∂latency/∂param (length n).
Returns the parameter delta -lr · M_inv_sqrt · grad.
P-DRIVER-8: every continuous autotune knob (workgroup size, hit-capacity, fixpoint iteration count, …) should follow the natural-gradient direction by default - Fisher-preconditioned descent converges 5-10× faster than plain gradient on the elongated-valley latency surfaces typical of GPU autotuning.
Sourcepub fn try_natural_gradient_autotune_step(
m_inv_sqrt: &[f64],
grad: &[f64],
n: u32,
learning_rate: f64,
) -> Result<Vec<f64>, BackendError>
pub fn try_natural_gradient_autotune_step( m_inv_sqrt: &[f64], grad: &[f64], n: u32, learning_rate: f64, ) -> Result<Vec<f64>, BackendError>
Compute the next-step parameter delta with fallible output staging.
§Errors
Returns BackendError when host staging cannot be reserved for the
natural-gradient vector.
Sourcepub fn natural_gradient_autotune_step_into(
m_inv_sqrt: &[f64],
grad: &[f64],
n: u32,
learning_rate: f64,
out: &mut Vec<f64>,
)
pub fn natural_gradient_autotune_step_into( m_inv_sqrt: &[f64], grad: &[f64], n: u32, learning_rate: f64, out: &mut Vec<f64>, )
Compute the natural-gradient autotune step into caller-owned storage.
Sourcepub fn try_natural_gradient_autotune_step_into(
m_inv_sqrt: &[f64],
grad: &[f64],
n: u32,
learning_rate: f64,
out: &mut Vec<f64>,
) -> Result<(), BackendError>
pub fn try_natural_gradient_autotune_step_into( m_inv_sqrt: &[f64], grad: &[f64], n: u32, learning_rate: f64, out: &mut Vec<f64>, ) -> Result<(), BackendError>
Compute the natural-gradient autotune step into caller-owned storage with fallible host staging.
§Errors
Returns BackendError when host staging cannot be reserved for the
natural-gradient vector.
Trait Implementations§
Source§impl Clone for MegakernelLaunchPolicy
impl Clone for MegakernelLaunchPolicy
Source§fn clone(&self) -> MegakernelLaunchPolicy
fn clone(&self) -> MegakernelLaunchPolicy
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreimpl Copy for MegakernelLaunchPolicy
Source§impl Debug for MegakernelLaunchPolicy
impl Debug for MegakernelLaunchPolicy
Source§impl Default for MegakernelLaunchPolicy
impl Default for MegakernelLaunchPolicy
impl Eq for MegakernelLaunchPolicy
Source§impl Hash for MegakernelLaunchPolicy
impl Hash for MegakernelLaunchPolicy
Source§impl PartialEq for MegakernelLaunchPolicy
impl PartialEq for MegakernelLaunchPolicy
Source§fn eq(&self, other: &MegakernelLaunchPolicy) -> bool
fn eq(&self, other: &MegakernelLaunchPolicy) -> bool
self and other values to be equal, and is used by ==.impl StructuralPartialEq for MegakernelLaunchPolicy
Auto Trait Implementations§
impl Freeze for MegakernelLaunchPolicy
impl RefUnwindSafe for MegakernelLaunchPolicy
impl Send for MegakernelLaunchPolicy
impl Sync for MegakernelLaunchPolicy
impl Unpin for MegakernelLaunchPolicy
impl UnsafeUnpin for MegakernelLaunchPolicy
impl UnwindSafe for MegakernelLaunchPolicy
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.