pub struct CompiledGraph { /* private fields */ }Expand description
A compiled graph ready for execution.
Created by crate::Session::compile. Holds the fused + memory-planned
graph and all pre-allocated execution state. Call
CompiledGraph::run repeatedly with different inputs — zero
allocation per call.
Implementations§
Source§impl CompiledGraph
impl CompiledGraph
Sourcepub fn set_param(&mut self, name: &str, data: &[f32])
pub fn set_param(&mut self, name: &str, data: &[f32])
Set a named parameter (model weight). Call once per parameter after compilation.
Sourcepub fn run(&mut self, inputs: &[(&str, &[f32])]) -> Vec<Vec<f32>>
pub fn run(&mut self, inputs: &[(&str, &[f32])]) -> Vec<Vec<f32>>
Execute the graph with named inputs.
Returns one Vec<f32> per graph output (copies from arena).
Sourcepub fn run_raw(&mut self, inputs: &[(&str, &[f32])]) -> Vec<(*const f32, usize)>
pub fn run_raw(&mut self, inputs: &[(&str, &[f32])]) -> Vec<(*const f32, usize)>
Execute and return raw pointers to output data (zero-copy).
Data is valid until the next run/run_raw call.
§Safety
The returned pointers point into the arena. Do not use after the next call to run/run_raw (arena data will be overwritten).
Sourcepub fn run_slots(&mut self, inputs: &[&[f32]]) -> &[(usize, usize)]
pub fn run_slots(&mut self, inputs: &[&[f32]]) -> &[(usize, usize)]
Fastest execution: inputs by slot index (order matches graph input declaration).
Returns output (offset, len) pairs. Read data via arena_ptr().add(offset).
Zero HashMap lookup, zero Vec allocation, zero name matching.
Sourcepub fn bind_handle(&mut self, name: &str, data: &[f32]) -> bool
pub fn bind_handle(&mut self, name: &str, data: &[f32]) -> bool
Bind a persistent buffer (KV-cache, optimizer state, etc.).
Stays alive across run() calls; the backend uses it as the
graph input with the matching name.
Returns true if the backend supports persistent handles.
Sourcepub fn read_handle(&self, name: &str) -> Option<Vec<f32>>
pub fn read_handle(&self, name: &str) -> Option<Vec<f32>>
Read the current contents of a persistent buffer.
Sourcepub fn bind_gpu_handle(&mut self, name: &str, data: &[f32]) -> bool
pub fn bind_gpu_handle(&mut self, name: &str, data: &[f32]) -> bool
GPU-resident MLX input (no-op on non-MLX backends).
pub fn has_gpu_handle(&self, name: &str) -> bool
pub fn set_gpu_handle_feed( &mut self, handle_name: &str, output_index: usize, ) -> bool
pub fn read_gpu_handle(&self, name: &str) -> Option<Vec<f32>>
Sourcepub fn run_feed_gpu_handle(
&mut self,
inputs: &[(&str, &[f32])],
handle_name: &str,
output_index: usize,
) -> Option<Vec<f32>>
pub fn run_feed_gpu_handle( &mut self, inputs: &[(&str, &[f32])], handle_name: &str, output_index: usize, ) -> Option<Vec<f32>>
Run, refresh GPU handle from output, return that output vector.
Sourcepub fn set_active_extent(&mut self, extent: Option<(usize, usize)>)
pub fn set_active_extent(&mut self, extent: Option<(usize, usize)>)
Hint subsequent run calls to process only the first actual
rows along the bucket axis (out of upper, the compile extent).
Backends that support per-kernel active-extent dispatch honor
this; others ignore it. Pass None to clear.
See BucketedCompileCache::run_padded for the canonical caller.
Sourcepub fn set_moe_resident_experts(&mut self, mask: &[bool])
pub fn set_moe_resident_experts(&mut self, mask: &[bool])
TIDE merged MoE placement (mask[expert] device-resident if any layer has it).
Sourcepub fn set_moe_resident_experts_per_layer(&mut self, masks: &[&[bool]])
pub fn set_moe_resident_experts_per_layer(&mut self, masks: &[&[bool]])
Per MoE layer placement (forward order). Preferred on CPU over merged mask.
Sourcepub fn enable_moe_topk_capture(&mut self, num_experts: usize) -> bool
pub fn enable_moe_topk_capture(&mut self, num_experts: usize) -> bool
Capture MoE router TopK on next forward (CPU). Returns false if unsupported.
Sourcepub fn take_moe_topk_capture(&mut self) -> Option<Vec<Vec<u32>>>
pub fn take_moe_topk_capture(&mut self) -> Option<Vec<Vec<u32>>>
Per-layer expert indices from the last forward (MoE router TopK order).
Sourcepub fn take_moe_residency_stats(&mut self) -> Option<MoeResidencyStats>
pub fn take_moe_residency_stats(&mut self) -> Option<MoeResidencyStats>
GroupedMatMul GPU/CPU token accounting from the last forward (CPU).
Sourcepub fn commit_no_wait(&mut self, inputs: &[(&str, &[f32])])
pub fn commit_no_wait(&mut self, inputs: &[(&str, &[f32])])
Encode + commit a forward pass without waiting for the device.
Outputs of intermediate calls are stomped — use run_pipelined
when you need each call’s outputs back. Pair with sync_pending
to drain. CPU is synchronous, so this falls back to run.
Sourcepub fn sync_pending(&mut self)
pub fn sync_pending(&mut self)
Wait for every command queued by commit_no_wait. CPU is a no-op.
Sourcepub fn run_pipelined(
&mut self,
input_sets: &[Vec<(&str, &[f32])>],
) -> Vec<Vec<Vec<f32>>>
pub fn run_pipelined( &mut self, input_sets: &[Vec<(&str, &[f32])>], ) -> Vec<Vec<Vec<f32>>>
Pipelined batch run. Issues one commit per input set, syncs once
at the end. On Metal, each commit gets its own output snapshot
(allocated + blit-copied), so subsequent commits stomping the
shared arena don’t corrupt earlier runs’ outputs.
Returns out[run_idx][output_idx][element_idx].
Sourcepub fn set_param_typed(&mut self, name: &str, data: &[u8], dtype: DType)
pub fn set_param_typed(&mut self, name: &str, data: &[u8], dtype: DType)
Set a named parameter from raw bytes in the given dtype. The backend handles the widen-to-f32 (or zero-widen, when supported natively) on the way in. Lets callers feed F16/BF16 weights without a host-side cast.
Trait Implementations§
Source§impl Clone for CompiledGraph
impl Clone for CompiledGraph
Auto Trait Implementations§
impl Freeze for CompiledGraph
impl !RefUnwindSafe for CompiledGraph
impl Send for CompiledGraph
impl !Sync for CompiledGraph
impl Unpin for CompiledGraph
impl UnsafeUnpin for CompiledGraph
impl !UnwindSafe for CompiledGraph
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more