rakka_accel_cuda/lib.rs
1//! # rakka-accel-cuda
2//!
3//! GPU acceleration via the actor model. Wraps NVIDIA CUDA libraries as
4//! actors on top of [`rakka`](../rakka). See `README.md` and the
5//! architecture document under `docs/` for the full design.
6//!
7//! ## Foundation Phase F1 (current)
8//!
9//! - Two-tier supervision: [`device::DeviceActor`] (stable address) ↔
10//! [`device::ContextActor`] (owns `Arc<CudaContext>`, restartable).
11//! - [`gpu_ref::GpuRef`] with generation-token validity checks.
12//! - [`dispatcher::GpuDispatcher`] pinning actor execution to a single
13//! OS thread.
14//! - [`completion::HostFnCompletion`] for sub-microsecond stream
15//! completion via `cuLaunchHostFunc`.
16//! - [`stream::PerActorAllocator`] as the default §5.7 strategy.
17//! - [`kernel::BlasActor`] performing cuBLAS SGEMM as the canonical
18//! demo.
19//!
20//! Phases F2–F5 (cuDNN, cuFFT, NCCL, TensorRT, the `PythonGpuBridge`)
21//! and the four blueprint sub-crates are deferred.
22
23// Subjective clippy lints that fight the actor-message design:
24// * `type_complexity` — actor messages and kernel envelopes return
25// tuples of typed `Arc<CudaSlice<T>>` keep-alives; refactoring to
26// `type` aliases would worsen the public API.
27// * `too_many_arguments` — kernel-launcher fns mirror the underlying
28// CUDA library entry points (cuDNN conv, cuSPARSE SpMV) which take
29// 8–10 args; collapsing to a config struct just moves the fields.
30// * `arc_with_non_send_sync` — CUDA driver handles (CudaGraph,
31// cudnnHandle) are `!Send` by design and only ever shared inside
32// the producing actor.
33// * `large_enum_variant` — kernel-message enums have one large
34// conv-descriptor variant; boxing it would fragment the hot path.
35#![allow(
36 clippy::type_complexity,
37 clippy::too_many_arguments,
38 clippy::arc_with_non_send_sync,
39 clippy::large_enum_variant
40)]
41
42pub mod completion;
43pub mod device;
44pub mod dispatcher;
45pub mod error;
46pub mod gpu_ref;
47pub mod graph;
48pub mod host;
49pub mod kernel;
50pub mod memory;
51#[cfg(feature = "nccl")]
52pub mod multi_device;
53#[cfg(feature = "telemetry")]
54pub mod observability;
55pub mod p2p;
56pub mod pipeline;
57pub mod placement;
58pub mod prelude;
59pub mod replay;
60pub mod stream;
61#[cfg(feature = "streams")]
62pub mod streams_pipeline;