pub struct Graph { /* private fields */ }Expand description
A CUDA graph — a DAG of CUDA operations.
Implementations§
Source§impl Graph
impl Graph
Sourcepub fn new(context: &Context) -> Result<Self>
pub fn new(context: &Context) -> Result<Self>
Create an empty graph in the given context. Use this as a starting point for explicit node construction — note that baracuda v0.1 does not yet expose typed node builders, so in practice you’ll almost always build graphs via stream capture instead.
Sourcepub fn instantiate(&self) -> Result<GraphExec>
pub fn instantiate(&self) -> Result<GraphExec>
Compile this graph into an executable form that can be launched.
Sourcepub fn instantiate_with_flags(&self, flags: u64) -> Result<GraphExec>
pub fn instantiate_with_flags(&self, flags: u64) -> Result<GraphExec>
As Self::instantiate but passes flags to
cuGraphInstantiateWithFlags (see instantiate_flags).
Sourcepub fn node_count(&self) -> Result<usize>
pub fn node_count(&self) -> Result<usize>
Approximate number of nodes in the graph (useful for debugging).
Sourcepub fn add_empty_node(&self, dependencies: &[GraphNode]) -> Result<GraphNode>
pub fn add_empty_node(&self, dependencies: &[GraphNode]) -> Result<GraphNode>
Add an empty “join / barrier” node with the given dependencies. Returns the new node so it can be used as a dependency of later nodes.
Sourcepub unsafe fn add_kernel_node(
&self,
dependencies: &[GraphNode],
function: &Function,
grid: impl Into<Dim3>,
block: impl Into<Dim3>,
shared_mem_bytes: u32,
args: &mut [*mut c_void],
) -> Result<GraphNode>
pub unsafe fn add_kernel_node( &self, dependencies: &[GraphNode], function: &Function, grid: impl Into<Dim3>, block: impl Into<Dim3>, shared_mem_bytes: u32, args: &mut [*mut c_void], ) -> Result<GraphNode>
Add a kernel-launch node. args is the same *mut c_void array you’d
pass to cuLaunchKernel — build it from baracuda_types::KernelArg
pointers, same as the crate::LaunchBuilder does.
§Safety
Same responsibilities as crate::LaunchBuilder::launch: argument
count, order, and types must match the kernel’s C signature, and
pointer-typed arguments must remain live as long as any executable
derived from this graph is running.
Sourcepub fn add_memset_u32_node(
&self,
dependencies: &[GraphNode],
dst: CUdeviceptr,
value: u32,
count: usize,
) -> Result<GraphNode>
pub fn add_memset_u32_node( &self, dependencies: &[GraphNode], dst: CUdeviceptr, value: u32, count: usize, ) -> Result<GraphNode>
Add a 1-D memset node that fills count elements starting at dst
with the 4-byte pattern value. Operates in the graph’s parent
context.
Sourcepub fn clone_graph(&self) -> Result<Self>
pub fn clone_graph(&self) -> Result<Self>
Deep-copy this graph (including its topology). The clone is independent — destroying one does not affect the other.
Sourcepub fn add_memcpy_node(
&self,
dependencies: &[GraphNode],
params: &CUDA_MEMCPY3D,
) -> Result<GraphNode>
pub fn add_memcpy_node( &self, dependencies: &[GraphNode], params: &CUDA_MEMCPY3D, ) -> Result<GraphNode>
Add a memcpy node. params is a fully-populated CUDA_MEMCPY3D.
Sourcepub unsafe fn add_host_node(
&self,
dependencies: &[GraphNode],
fn_: unsafe extern "C" fn(*mut c_void),
user_data: *mut c_void,
) -> Result<GraphNode>
pub unsafe fn add_host_node( &self, dependencies: &[GraphNode], fn_: unsafe extern "C" fn(*mut c_void), user_data: *mut c_void, ) -> Result<GraphNode>
Add a host-function node. Hand the closure’s trampoline address and a user-data pointer that remains valid for the lifetime of any executable graph derived from this graph.
§Safety
fn_ will be invoked on a CUDA-internal host thread with
user_data as its argument. The pointer must remain valid as long
as any GraphExec containing this node is alive.
Sourcepub fn add_child_graph_node(
&self,
dependencies: &[GraphNode],
child: &Graph,
) -> Result<GraphNode>
pub fn add_child_graph_node( &self, dependencies: &[GraphNode], child: &Graph, ) -> Result<GraphNode>
Add a child-graph node — executes child in its entirety when
reached.
Sourcepub fn add_event_record_node(
&self,
dependencies: &[GraphNode],
event: &Event,
) -> Result<GraphNode>
pub fn add_event_record_node( &self, dependencies: &[GraphNode], event: &Event, ) -> Result<GraphNode>
Add an event-record node — records event when executed.
Sourcepub fn add_event_wait_node(
&self,
dependencies: &[GraphNode],
event: &Event,
) -> Result<GraphNode>
pub fn add_event_wait_node( &self, dependencies: &[GraphNode], event: &Event, ) -> Result<GraphNode>
Add an event-wait node — blocks downstream nodes until event has
been recorded.
Sourcepub fn add_mem_alloc_node(
&self,
dependencies: &[GraphNode],
device: &Device,
bytesize: usize,
) -> Result<(GraphNode, CUdeviceptr)>
pub fn add_mem_alloc_node( &self, dependencies: &[GraphNode], device: &Device, bytesize: usize, ) -> Result<(GraphNode, CUdeviceptr)>
Add a stream-ordered memory allocation node. When the graph runs,
the node allocates bytesize bytes on device (from the device’s
default pool). The resulting device pointer is returned in the
output tuple alongside the new node.
Sourcepub fn add_mem_free_node(
&self,
dependencies: &[GraphNode],
dptr: CUdeviceptr,
) -> Result<GraphNode>
pub fn add_mem_free_node( &self, dependencies: &[GraphNode], dptr: CUdeviceptr, ) -> Result<GraphNode>
Add a stream-ordered memory-free node for dptr (which is
typically the dptr returned by a prior
Self::add_mem_alloc_node on the same graph).
Sourcepub fn add_batch_mem_op_node(
&self,
dependencies: &[GraphNode],
ops: &mut [CUstreamBatchMemOpParams],
) -> Result<GraphNode>
pub fn add_batch_mem_op_node( &self, dependencies: &[GraphNode], ops: &mut [CUstreamBatchMemOpParams], ) -> Result<GraphNode>
Add a batch-memop node — a single node that performs a sequence of 32/64-bit wait/write value operations on device memory atomically wrt the graph’s execution order.
ops may include any mix of baracuda_cuda_sys::types::CUstreamBatchMemOpParams
entries built with that type’s wait_value_* / write_value_* helpers.
Sourcepub fn add_dependencies(
&self,
from: &[GraphNode],
to: &[GraphNode],
) -> Result<()>
pub fn add_dependencies( &self, from: &[GraphNode], to: &[GraphNode], ) -> Result<()>
Add dependency edges from each node in from to its counterpart in
to. Both slices must have the same length.
Sourcepub fn remove_dependencies(
&self,
from: &[GraphNode],
to: &[GraphNode],
) -> Result<()>
pub fn remove_dependencies( &self, from: &[GraphNode], to: &[GraphNode], ) -> Result<()>
Remove previously-added dependency edges.
Sourcepub fn debug_dot_print(&self, path: &str, flags: u32) -> Result<()>
pub fn debug_dot_print(&self, path: &str, flags: u32) -> Result<()>
Dump a Graphviz-compatible representation of this graph to path.
Pass flags = 0 for the default verbose output.
Sourcepub fn conditional_handle(
&self,
default_launch_value: u32,
flags: u32,
) -> Result<CUgraphConditionalHandle>
pub fn conditional_handle( &self, default_launch_value: u32, flags: u32, ) -> Result<CUgraphConditionalHandle>
Create a conditional handle tied to this parent graph. Pass the
handle’s value from inside a kernel (via
cudaGraphSetConditional(handle, val)) to drive whether or how
many times the conditional body executes.
Sourcepub fn add_conditional_node(
&self,
dependencies: &[GraphNode],
handle: CUgraphConditionalHandle,
type_: i32,
size: u32,
) -> Result<(GraphNode, Graph)>
pub fn add_conditional_node( &self, dependencies: &[GraphNode], handle: CUgraphConditionalHandle, type_: i32, size: u32, ) -> Result<(GraphNode, Graph)>
Add a conditional node (IF / WHILE / SWITCH). Returns (node, body)
— populate the body graph with the code to execute conditionally.
type_ is one of
baracuda_cuda_sys::types::CUgraphConditionalNodeType.
size is the count of body graphs (1 for IF/WHILE; up to N for SWITCH).
Source§impl Graph
impl Graph
Sourcepub fn retain_user_object(
&self,
object: &UserObject,
count: u32,
flags: u32,
) -> Result<()>
pub fn retain_user_object( &self, object: &UserObject, count: u32, flags: u32, ) -> Result<()>
Have this graph take count references to object. When the
graph is destroyed (or when release_user_object
is called), those references are dropped.
flags is reserved (pass 0) in CUDA 12.x; CUDA 13 adds
CU_GRAPH_USER_OBJECT_MOVE = 1.