Expand description
§rakka-accel-cuda
GPU acceleration via the actor model. Wraps NVIDIA CUDA libraries as
actors on top of rakka. See README.md and the
architecture document under docs/ for the full design.
§Foundation Phase F1 (current)
- Two-tier supervision:
device::DeviceActor(stable address) ↔device::ContextActor(ownsArc<CudaContext>, restartable). gpu_ref::GpuRefwith generation-token validity checks.dispatcher::GpuDispatcherpinning actor execution to a single OS thread.completion::HostFnCompletionfor sub-microsecond stream completion viacuLaunchHostFunc.stream::PerActorAllocatoras the default §5.7 strategy.kernel::BlasActorperforming cuBLAS SGEMM as the canonical demo.
Phases F2–F5 (cuDNN, cuFFT, NCCL, TensorRT, the PythonGpuBridge)
and the four blueprint sub-crates are deferred.
Modules§
- completion
- Completion strategies (§5.10).
- device
DeviceActor(outer tier) +ContextActor(inner tier) — §5.11.- dispatcher
GpuDispatcher(§5.1) — pinned single-thread runtime that ensures the actor’s CUDA context stays current on the same OS thread for the actor’s whole lifetime.- error
- Error taxonomy and the supervisor decider for context-poisoning recovery (§5.3, §5.11 of the architecture document).
- gpu_ref
GpuRef<T>— opaque, message-friendly handle to a GPU buffer (§5.8).- graph
GraphActor— record a CUDA stream-capture once, replay many.- host
- Host-side support: pinned (page-locked) memory pool +
PinnedBuf<T>. - kernel
- Kernel-actor wrappers around CUDA library handles (§3.2).
- memory
- Managed (unified) memory.
- p2p
- P2P (peer-to-peer) topology + cross-device async memcpy.
- pipeline
- Multi-stream pipeline pattern.
- placement
PlacementActor— picks the best-fitDeviceActorfor each request based on a configurablePlacementPolicy.- prelude
- Common imports for users of
rakka-accel-cuda. - replay
- Deterministic-replay harness.
- stream
- Stream allocation strategies (§5.7).