Expand description
Local-GPU worker — two-tier supervision adapter (doc §4, §5.3).
WorkerActor is the stable parent: addressable, supervised by
the engine-core, never restarts. Its child ContextActor is
restartable and owns the runtime-specific resources (CUDA
context, weights, etc). When the runner reports
CudaContextPoisoned the parent panics with the
rakka_accel::cuda::error::CONTEXT_POISONED_TAG marker so that
rakka_accel::cuda::error::device_supervisor_strategy routes the
failure to Directive::Restart.
The supervision policy (3 retries / 60s, decider, marker tags) is
re-used verbatim from rakka-accel’s error module — that’s the
upstream substrate for the doc’s §5.11 two-tier pattern. The
body this crate adds is the runtime-polymorphic
Box<dyn ModelRunner> slot, which is inference-specific.
Per-runtime crates supply the runner via the WorkerSlot factory.
Remote runtimes go through inference-remote-core::RemoteWorkerActor
instead.
Structs§
- Context
Actor ContextActor— restartable child holding the CUDA context (or the remote-network analogue). Distinct fromrakka_accel::cuda::device::ContextActor: that one specialises to CUDA memory / streams; this one holds the polymorphicBox<dyn ModelRunner>so the same supervision shape covers remote-network runners too.- Worker
Actor - Worker
Slot - What the parent hands to its child on construction. The runner
owns the GPU context indirectly (via
cudarc::driver::CudaContext,rakka_accel::cuda::device::DeviceState, or whatever the backend uses); when the parent decides to rebuild, it constructs a freshWorkerSlotand the child cell starts anew.