1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
//! CUDA backend module: device lifecycle, allocation pools, and kernel dispatch.
//!
//! `allocations` owns transient device and pinned-host pools plus the
//! `cuda_check` error wrapper. `module_cache` owns loaded PTX modules.
//! `resident` owns long-lived CUDA allocations and in-flight handle guards.
//! `dispatch` owns the `CudaBackend` struct, launch geometry, and
//! kernel-launch orchestration. The public surface is re-exported below.
/// Shared non-wrapping atomic accounting primitives.
pub
/// Device-side allocation pools, pinned-host pools, and `cuda_check`.
/// Capability, feature-flag, and validation-cache policy.
pub
/// Checked CUDA copy primitives shared by host, resident, and graph paths.
pub
/// cudaGraph capture-and-replay path. Records one full Program dispatch into
/// a `CUgraph` then replays it on demand to reduce hot-path launch overhead.
/// cudaGraph replay path.
pub
/// CUDA backend handle, launch geometry, and kernel-launch orchestration -
/// including the cooperative-launch path that routes through
/// `cuLaunchCooperativeKernel` when the caller opts in via
/// `DispatchConfig::cooperative`.
/// Host-borrowed buffer dispatch path.
pub
/// Checked CUDA host-memory registration boundary.
pub
/// Raw CUDA kernel launch boundary.
pub
/// Checked launch-parameter byte sizing.
pub
/// Loaded PTX module cache and submodular eviction policy.
pub
/// Shared monotonic ordering helpers for staging hot paths.
pub
/// CUDA output readback range handling.
pub
/// Shared dispatch-plan assembly helpers.
pub
/// PTX target probing against the live CUDA driver.
pub
/// Resident buffer management - long-lived device allocations.
pub
/// Resident-buffer dispatch path.
pub
/// Shared resident-dispatch contracts and checked accounting.
pub
/// Host and device copies for resident buffers.
pub
/// Shared resident readback interval fusion.
pub
/// Shared resident upload interval fusion.
pub
/// Shared fallible staging reservation helpers for backend hot paths.
pub
/// Atomic CUDA runtime telemetry counters.
pub
pub use *;
pub use ModuleCacheKey;
pub use CudaDispatchPlan;
pub use ResidentUseGuard;
pub use CudaResidentDispatchStep;
// Public surface - these names appear on the crate root.
pub use CachedCudaGraph;
pub use CudaBackend;
pub use CudaPtxSourceCacheSnapshot;
pub use CudaResidentBuffer;
pub use CudaTelemetrySnapshot;