pub struct Graph { /* private fields */ }Expand description
A CUDA graph representing a DAG of GPU operations.
Nodes represent individual operations (kernel launches, memory copies, memsets, or empty barriers). Dependencies are directed edges that enforce execution ordering between nodes.
The graph can be instantiated into a GraphExec for repeated
low-overhead execution.
Implementations§
Source§impl Graph
impl Graph
Sourcepub fn add_kernel_node(
&mut self,
function_name: &str,
grid: (u32, u32, u32),
block: (u32, u32, u32),
shared_mem: u32,
) -> usize
pub fn add_kernel_node( &mut self, function_name: &str, grid: (u32, u32, u32), block: (u32, u32, u32), shared_mem: u32, ) -> usize
Adds a kernel launch node to the graph.
Returns the index of the newly created node, which can be used
to establish dependencies via add_dependency.
§Parameters
function_name- Name of the kernel function.grid- Grid dimensions(x, y, z).block- Block dimensions(x, y, z).shared_mem- Dynamic shared memory in bytes.
Sourcepub fn add_memcpy_node(
&mut self,
direction: MemcpyDirection,
size: usize,
) -> usize
pub fn add_memcpy_node( &mut self, direction: MemcpyDirection, size: usize, ) -> usize
Adds a memory copy node to the graph.
Returns the index of the newly created node.
§Parameters
direction- Direction of the memory copy.size- Size of the transfer in bytes.
Sourcepub fn add_memset_node(&mut self, size: usize, value: u8) -> usize
pub fn add_memset_node(&mut self, size: usize, value: u8) -> usize
Adds a memset node to the graph.
Returns the index of the newly created node.
§Parameters
size- Number of bytes to set.value- Byte value to fill with.
Sourcepub fn add_empty_node(&mut self) -> usize
pub fn add_empty_node(&mut self) -> usize
Adds an empty (no-op) node to the graph.
Empty nodes are useful as synchronisation barriers — they have no work of their own but can serve as join points for multiple dependency chains.
Returns the index of the newly created node.
Sourcepub fn add_dependency(&mut self, from: usize, to: usize) -> CudaResult<()>
pub fn add_dependency(&mut self, from: usize, to: usize) -> CudaResult<()>
Adds a dependency edge from node from to node to.
This means to will not begin execution until from has
completed. Both indices must refer to existing nodes.
§Errors
Returns CudaError::InvalidValue if either index is out of bounds
or if from == to (self-dependency).
Sourcepub fn node_count(&self) -> usize
pub fn node_count(&self) -> usize
Returns the total number of nodes in the graph.
Sourcepub fn dependency_count(&self) -> usize
pub fn dependency_count(&self) -> usize
Returns the total number of dependency edges in the graph.
Sourcepub fn dependencies(&self) -> &[(usize, usize)]
pub fn dependencies(&self) -> &[(usize, usize)]
Returns a slice of all dependency edges as (from, to) pairs.
Sourcepub fn get_node(&self, index: usize) -> Option<&GraphNode>
pub fn get_node(&self, index: usize) -> Option<&GraphNode>
Returns the node at the given index, or None if out of bounds.
Sourcepub fn topological_sort(&self) -> CudaResult<Vec<usize>>
pub fn topological_sort(&self) -> CudaResult<Vec<usize>>
Performs a topological sort of the graph nodes.
Returns the node indices in an order that respects all dependency edges, or an error if the graph contains a cycle.
§Errors
Returns CudaError::InvalidValue if the graph contains a
dependency cycle.
Sourcepub fn instantiate(&self) -> CudaResult<GraphExec>
pub fn instantiate(&self) -> CudaResult<GraphExec>
Instantiates the graph into an executable form.
The returned GraphExec can be launched on a stream with
minimal CPU overhead. The graph is validated (topological sort)
during instantiation.
§Errors
Returns CudaError::InvalidValue if the graph contains cycles.