1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
//! Graph capture and replay for compute backends
//!
//! Graph capture records a sequence of operations that can be replayed efficiently.
//! This is a runtime-level concept (CUDA Graphs, Vulkan command buffers, etc.)
//! that benefits any compute workload — not just ML.
/// A captured computation sequence that can be replayed.
///
/// # Replay semantics
///
/// On capture-capable backends (CUDA), `launch()` replays the recorded
/// computation on the same fixed-address buffers. Callers update input
/// data in-place, then call `launch()` to re-execute with new values.
///
/// On non-capture backends (CPU, WebGPU), `capture_graph` executes the
/// closure eagerly and returns `NoOpGraph`. `launch()` is a no-op —
/// the computation already ran. Callers wanting repeated execution on
/// these backends must call the operations directly (not via launch).
///
/// Use `R::supports_graph_capture()` to check capability without
/// side effects, then branch:
///
/// ```ignore
/// if R::supports_graph_capture() {
/// let (graph, _) = R::capture_graph(client, |c| hot_path(c))?;
/// loop { update_inputs(); graph.launch()?; read_outputs(); }
/// } else {
/// loop { update_inputs(); hot_path(client)?; }
/// }
/// ```
/// No-op graph for backends without capture support (CPU, WebGPU).
///
/// Operations execute eagerly during "capture" — `launch()` is a no-op.
;