Skip to main content

Graph

Struct Graph 

Source
pub struct Graph { /* private fields */ }

Implementations§

Source§

impl Graph

Source

pub unsafe fn from_raw_in_context( handle: cudaGraph_t, ctx: Arc<Context>, ) -> Result<Self>

Wraps an existing CUDA graph handle associated with ctx and takes ownership of it.

§Safety

handle must be a valid CUDA graph handle associated with ctx. Ownership of handle is transferred to the returned Graph, and the handle must not be destroyed elsewhere after calling this function.

Source

pub fn create_buffer<T>(&mut self, length: usize) -> Result<GraphBuffer<T>>
where T: DeviceRepr + Send + Sync,

Allocates graph-retained device memory.

The returned buffer can be used with graph-buffer node APIs. Any graph or executable graph that records the buffer retains the underlying device allocation for replay.

§Errors

Returns an error if CUDA cannot allocate device memory, the requested byte count overflows, or CUDA reports runtime initialization diagnostics.

Source

pub fn zeroes_buffer<T>(&mut self, length: usize) -> Result<GraphBuffer<T>>
where T: DeviceRepr + Send + Sync,

Allocates graph-retained device memory initialized to zero bytes.

§Errors

Returns an error if CUDA cannot allocate or initialize device memory, the requested byte count overflows, or CUDA reports runtime initialization diagnostics.

Source

pub fn buffer_from_slice<T>(&mut self, values: &[T]) -> Result<GraphBuffer<T>>
where T: DeviceRepr + Send + Sync,

Allocates graph-retained device memory initialized from a host slice.

§Errors

Returns an error if CUDA cannot allocate or copy device memory, the requested byte count overflows, or CUDA reports runtime initialization diagnostics.

Source

pub fn instantiate(&self) -> Result<ExecutableGraph>

Source

pub fn instantiate_with_flags( &self, flags: GraphInstantiateFlags, ) -> Result<ExecutableGraph>

Instantiates graph as an executable graph. The graph is validated for any structural constraints or intra-node constraints which were not previously validated. If instantiation is successful, returns an instantiated executable graph.

flags controls the behavior of instantiation and subsequent graph launches. Valid flags are:

  • GraphInstantiateFlags::AUTO_FREE_ON_LAUNCH, which configures a graph containing memory allocation nodes to automatically free any unfreed memory allocations before the graph is relaunched.

  • GraphInstantiateFlags::DEVICE_LAUNCH, which configures the graph for launch from the device. If this flag is passed, the executable graph handle returned can be used to launch the graph from both the host and device. This flag can only be used on platforms which support unified addressing. This flag cannot be used in conjunction with GraphInstantiateFlags::AUTO_FREE_ON_LAUNCH.

  • GraphInstantiateFlags::USE_NODE_PRIORITY, which causes the graph to use the priorities from the per-node attributes rather than the priority of the launch stream during execution. Priorities are only available on kernel nodes and are copied from stream priority during stream capture.

If the graph contains any allocation or free nodes, there can be at most one executable graph in existence for that graph at a time. An attempt to instantiate a second executable graph before dropping the first results in an error. The same also applies if the graph contains any device-updatable kernel nodes.

If the graph contains kernels which call device-side ExecutableGraph::launch from multiple devices, this results in an error.

Graphs instantiated for launch on the device have additional restrictions which do not apply to host graphs:

  • The graph’s nodes must reside on a single device.
  • The graph can only contain kernel nodes, memcpy nodes, memset nodes, and child graph nodes.
  • The graph cannot be empty and must contain at least one kernel, memcpy, or memset node. Operation-specific restrictions are outlined below.
  • Kernel nodes:
    • Use of CUDA Dynamic Parallelism is not permitted.
    • Cooperative launches are permitted as long as MPS is not in use.
  • Memcpy nodes:
    • Only copies involving device memory and/or pinned device-mapped host memory are permitted.
    • Copies involving CUDA arrays are not permitted.
    • Both operands must be accessible from the current device, and the current device must match the device of other nodes in the graph.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn try_clone(&self) -> Result<Self>

Creates a copy of original_graph. All parameters are copied into the cloned graph. The original graph may be modified after this call without affecting the clone.

Child graph nodes in the original graph are recursively copied into the clone.

Cloning is not supported for graphs that contain memory allocation nodes, memory free nodes, or conditional nodes.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn add_dependency(&mut self, from: GraphNode, to: GraphNode) -> Result<()>

Source

pub fn add_dependencies( &mut self, from: &[GraphNode], to: &[GraphNode], ) -> Result<()>

Source

pub fn add_dependencies_with_data( &mut self, from: &[GraphNode], to: &[GraphNode], edge_data: &[GraphEdgeData], ) -> Result<()>

Elements in from and to at corresponding indices define each dependency to add. Each node in from and to must belong to this graph.

If from and to are empty, the call returns without modifying the graph. Specifying an existing dependency returns an error.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn remove_dependency( &mut self, from: GraphNode, to: GraphNode, ) -> Result<()>

Source

pub fn remove_dependencies( &mut self, from: &[GraphNode], to: &[GraphNode], ) -> Result<()>

Source

pub fn remove_dependencies_with_data( &mut self, from: &[GraphNode], to: &[GraphNode], edge_data: &[GraphEdgeData], ) -> Result<()>

Elements in from and to at corresponding indices define each dependency to remove. Each node in from and to must belong to this graph.

If from and to are empty, the call returns without modifying the graph. Specifying an edge that does not exist in the graph, with data matching edge_data, results in an error. Passing an empty edge_data slice is equivalent to passing default edge data for each edge.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn add_edges(&mut self, edges: &[GraphEdge]) -> Result<()>

Source

pub fn remove_edges(&mut self, edges: &[GraphEdge]) -> Result<()>

Source

pub fn add_empty_node( &mut self, dependencies: &[GraphNode], ) -> Result<GraphNode>

Creates a node that performs no operation and adds it to the graph with the given dependencies. The dependency list may be empty, in which case the node is placed at the graph root. It may not contain duplicate entries.

An empty node performs no operation during execution, but can be used for transitive ordering. For example, a phased execution graph with 2 groups of n nodes with a barrier between them can be represented using an empty node and 2*n dependency edges, rather than no empty node and n^2 dependency edges.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation or reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn add_event_record_node( &mut self, dependencies: &[GraphNode], event: &Event, ) -> Result<GraphNode>

Creates an event record node and adds it to the graph with the given dependencies and event. The dependency list may be empty, in which case the node is placed at the graph root. It may not contain duplicate entries.

Each graph launch records event to capture execution of the node’s dependencies.

These nodes may not be used in loops or conditionals.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn add_event_wait_node( &mut self, dependencies: &[GraphNode], event: &Event, ) -> Result<GraphNode>

Creates an event wait node and adds it to the graph with the given dependencies and event. The dependency list may be empty, in which case the node is placed at the graph root. It may not contain duplicate entries.

The graph node waits for all work captured in event. See sys::cuEventRecord for details on what is captured by an event. Synchronization is performed efficiently on the device when applicable. event may come from a different context or device than the launch stream.

These nodes may not be used in loops or conditionals.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub unsafe fn add_host_node( &mut self, dependencies: &[GraphNode], params: &HostNodeParams, ) -> Result<GraphNode>

Creates a CPU execution node and adds it to the graph with the given dependencies and host-node parameters. The dependency list may be empty, in which case the node is placed at the graph root. It may not contain duplicate entries.

When the graph is launched, the node invokes the specified CPU function. Host nodes are not supported under MPS with pre-Volta GPUs.

Graph objects are not threadsafe.

§Safety

CUDA stores the raw callback function and user-data pointer in the graph node for later replay. The caller must ensure params remains valid according to HostNodeParams::new for every graph instantiation and launch that can execute this node.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub unsafe fn add_kernel_node<'a, P>( &mut self, dependencies: &[GraphNode], function: DeviceFunction, config: &LaunchConfig, params: P, ) -> Result<GraphNode>
where P: KernelLaunchArgs<'a>,

Creates a kernel execution node and adds it to the graph with the given dependencies, launch configuration, and kernel parameters. The dependency list may be empty, in which case the node is placed at the graph root. It may not contain duplicate entries.

When the graph is launched, the node invokes the kernel on the grid and blocks specified by LaunchConfig. LaunchConfig::shared_memory_bytes sets the amount of dynamic shared memory available to each thread block. Kernel parameters are passed with KernelParameters or tuples of shared or mutable references.

Kernels launched using graphs must not use texture and surface references. Reading or writing through any texture or surface reference is undefined behavior. This restriction does not apply to texture and surface objects.

Runtime kernel handles queried via sys::cudaLibraryGetKernel or sys::cudaGetKernel may be used. The symbol passed to sys::cudaGetKernel must be registered with the same CUDA Runtime instance. Passing a symbol that belongs to a different runtime instance results in undefined behavior.

Graph objects are not threadsafe.

§Safety

CUDA copies the kernel argument values during this call and stores those copied values in the graph node for later replay. If an argument value is itself a pointer, only the pointer address is copied. The caller must ensure every copied pointer value remains valid for every graph instantiation, update, and launch that can execute this node. Mutable pointer arguments must also remain exclusive for the work ordered by those launches.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub unsafe fn add_memory_copy_node_1d( &mut self, dependencies: &[GraphNode], params: &MemoryCopy1DNodeParams, ) -> Result<GraphNode>

Creates a new 1D memcpy node and adds it to the graph with the given dependencies. The dependency list may be empty, in which case the node is placed at the root of the graph, and it may not contain duplicate entries.

When the graph is launched, the node copies count bytes from src to dst. The transfer direction is described by MemoryCopyKind. MemoryCopyKind::Default is recommended when unified virtual addressing is available, in which case the transfer direction is inferred from the pointer values. Launching a memcpy node with dst and src pointers that do not match the direction of the copy results in undefined behavior.

Memcpy nodes have additional restrictions for managed memory if any device in the system does not support concurrent managed access.

Graph objects are not threadsafe.

§Safety

CUDA stores the raw source and destination addresses in the graph node for later replay. The caller must ensure params remains valid according to [Memcpy1DNodeParams::new] for every graph instantiation and launch that can execute this node.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub unsafe fn add_memory_copy_node_1d_device_to_device<D, S>( &mut self, dependencies: &[GraphNode], dst: &mut D, src: &S, ) -> Result<GraphNode>
where D: ByteBufferMut + ?Sized, S: ByteBuffer + ?Sized,

Creates a device-to-device memcpy node from typed byte buffers.

The node copies src.byte_len() bytes. dst must have at least that many bytes.

§Safety

CUDA stores the raw source and destination addresses in the graph node for later replay. The caller must ensure dst and src remain valid for every graph instantiation and launch that can execute this node. dst must not be accessed through another mutable path while graph launches using this node can write it.

§Errors

Returns an error if dst is smaller than src, if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics.

Source

pub fn add_buffer_memory_copy_node_1d_device_to_device<T>( &mut self, dependencies: &[GraphNode], dst: &mut GraphBuffer<T>, src: &GraphBuffer<T>, ) -> Result<GraphNode>
where T: DeviceRepr + Send + Sync,

Creates a device-to-device memcpy node between graph-retained buffers.

The node copies src.byte_len() bytes. dst must have at least that many bytes. The graph retains both allocations so the baked CUDA graph pointers remain live for future instantiation and replay.

§Errors

Returns an error if dst is smaller than src, if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics.

Source

pub unsafe fn add_memory_copy_node( &mut self, dependencies: &[GraphNode], params: &MemoryCopy3DNodeParams, ) -> Result<GraphNode>

Creates a memcpy node and adds it to the graph with the given dependencies. The dependency list may be empty, in which case the node is placed at the graph root. It may not contain duplicate entries.

When the graph is launched, the node performs the memcpy described by params. See sys::cudaMemcpy3D for a description of the structure and its restrictions.

Memcpy nodes have additional restrictions for managed memory if any device in the system does not support concurrent managed access.

Graph objects are not threadsafe.

§Safety

CUDA stores the raw source and destination addresses in the graph node for later replay. The caller must ensure params remains valid according to [Memcpy3DNodeParams] for every graph instantiation and launch that can execute this node.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub unsafe fn add_memory_copy_node_to_symbol( &mut self, dependencies: &[GraphNode], params: &MemoryCopyToSymbolNodeParams, ) -> Result<GraphNode>

§Safety

CUDA stores the raw symbol and source pointer in the graph node for later replay. The caller must ensure params remains valid according to [MemcpyToSymbolNodeParams::new] for every graph instantiation and launch that can execute this node.

Source

pub unsafe fn add_memory_copy_node_from_symbol( &mut self, dependencies: &[GraphNode], params: &MemoryCopyFromSymbolNodeParams, ) -> Result<GraphNode>

§Safety

CUDA stores the raw destination and symbol pointer in the graph node for later replay. The caller must ensure params remains valid according to MemoryCopyFromSymbolNodeParams::new for every graph instantiation and launch that can execute this node.

Source

pub unsafe fn add_memory_set_node( &mut self, dependencies: &[GraphNode], params: &MemorySetNodeParams, ) -> Result<GraphNode>

Creates a new memset node and adds it to the graph with the given dependencies. The dependency list may be empty, in which case the node is placed at the root of the graph, and it may not contain duplicate entries.

The element size must be 1, 2, or 4 bytes. When the graph is launched, the node performs the memset described by params.

Graph objects are not threadsafe.

§Safety

CUDA stores the destination address in the graph node for later replay. The caller must ensure params remains valid according to MemorySetNodeParams::new for every graph instantiation and launch that can execute this node.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn add_child_graph_node( &mut self, dependencies: &[GraphNode], child_graph: &Self, ) -> Result<GraphNode>

Creates a new node which executes an embedded graph, and adds it to the graph with the given dependencies. The dependency list may be empty, in which case the node is placed at the root of the graph, and it may not contain duplicate entries.

If child_graph contains allocation nodes, free nodes, or conditional nodes, this call returns an error.

The node executes an embedded child graph. The child graph is cloned in this call.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn add_memory_free_node( &mut self, dependencies: &[GraphNode], allocation: &MemoryAllocationNodeInfo, ) -> Result<GraphNode>

Creates a new memory free node for a graph allocation and adds it to the graph. The dependency list may be empty, in which case the node is placed at the root of the graph, and it may not contain duplicate entries.

[Graph::add_mem_free_node] returns crate::error::Status::InvalidValue if the caller attempts to free:

  • an allocation twice in the same graph.
  • an address that was not returned by an allocation node.
  • an invalid address.

The following restrictions apply to graphs which contain allocation and/or memory free nodes:

  • Nodes and edges of the graph cannot be deleted.
  • The graph can only be used in a child node if the ownership is moved to the parent.
  • Only one instantiation of the graph may exist at any point in time.
  • The graph cannot be cloned.

Graph objects are not threadsafe.

§Errors

Returns Error::GraphNodeMismatch if allocation did not come from this graph. Returns an error if CUDA rejects the graph operation or if a previous asynchronous launch reported an error.

Source

pub unsafe fn add_memory_free_node_raw( &mut self, dependencies: &[GraphNode], ptr: DevicePtr, ) -> Result<GraphNode>

Creates a new memory free node from a raw device address.

§Safety

CUDA stores the raw address in the graph. The caller must ensure ptr is a graph allocation that may be freed by this graph, is ordered after the allocation node, and is not freed more than once or by another graph in a way that violates CUDA graph allocation ownership rules.

Source

pub fn add_memory_allocation_node( &mut self, dependencies: &[GraphNode], params: &MemoryAllocationNodeParams<'_>, ) -> Result<(GraphNode, MemoryAllocationNodeInfo)>

Creates a new allocation node and adds it to the graph with the given dependencies and allocation parameters. The dependency list may be empty, in which case the node is placed at the root of the graph, and it may not contain duplicate entries.

When [Graph::add_mem_alloc_node] creates an allocation node, it returns the allocation metadata in MemoryAllocationNodeInfo. The allocation’s address remains fixed across instantiations and launches.

If the allocation is freed in the same graph, by creating a free node using [Graph::add_mem_free_node], the allocation can be accessed by nodes ordered after the allocation node but before the free node. These allocations cannot be freed outside the owning graph, and they can only be freed once in the owning graph.

If the allocation is not freed in the same graph, then it can be accessed not only by nodes in the graph which are ordered after the allocation node, but also by stream operations ordered after the graph’s execution but before the allocation is freed.

Allocations which are not freed in the same graph can be freed by:

It is not possible to free an allocation in both the owning graph and another graph. If the allocation is freed in the same graph, a free node cannot be added to another graph. If the allocation is freed in another graph, a free node can no longer be added to the owning graph.

The following restrictions apply to graphs which contain allocation and/or memory free nodes:

  • Nodes and edges of the graph cannot be deleted.
  • The graph can only be used in a child node if the ownership is moved to the parent.
  • Only one instantiation of the graph may exist at any point in time.
  • The graph cannot be cloned.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation or if a previous asynchronous launch reported an error.

Source

pub fn nodes(&self) -> Result<Vec<GraphNode>>

Returns this graph’s nodes.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn root_nodes(&self) -> Result<Vec<GraphNode>>

Returns this graph’s root nodes.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn edges(&self) -> Result<Vec<GraphEdge>>

Returns this graph’s dependency edges.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects the graph operation, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics. Callbacks must not call CUDA functions; see Stream::add_callback.

Source

pub fn topology_summary(&self) -> Result<GraphTopologySummary>

Returns a compact summary of this graph’s native CUDA topology.

The summary is computed from CUDA graph introspection APIs and counts node kinds, root nodes, and dependency edges in this graph. Child graph nodes are counted as child nodes here; callers that need recursive details can query the child graph returned by GraphNode::child_graph.

Graph objects are not threadsafe.

§Errors

Returns an error if CUDA rejects a topology query, if a previous asynchronous launch reported an error, or if CUDA reports runtime initialization diagnostics.

Source

pub fn write_dot(&self, path: &str, flags: GraphDebugDotFlags) -> Result<()>

Writes a DOT-formatted description of the graph to path. By default this includes the graph topology, node types, node ID, kernel names, and memcpy direction. flags can request more detailed information about each node type, such as parameter values, kernel attributes, node handles, and function handles.

§Errors

Returns an error if path contains an interior NUL byte or if CUDA Runtime cannot write the DOT file.

Source

pub fn as_raw(&self) -> cudaGraph_t

Source

pub fn context(&self) -> Option<&Context>

Source

pub fn into_raw(self) -> cudaGraph_t

Consumes the graph and returns the raw CUDA graph handle without destroying it.

The caller becomes responsible for eventually destroying the returned handle with CUDA.

Trait Implementations§

Source§

impl Debug for Graph

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl !RefUnwindSafe for Graph

§

impl !UnwindSafe for Graph

§

impl Freeze for Graph

§

impl Send for Graph

§

impl Sync for Graph

§

impl Unpin for Graph

§

impl UnsafeUnpin for Graph

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.