atomr-infer-runtime
Runtime-agnostic actors on top of
atomr-core. Gateway, per-request lifecycle, coordinator, deployment manager, two-tier supervision — none of which knows or cares whether the underlying backend is a GPU or a remote network call.
Actors
| Actor | Doc § | Purpose |
|---|---|---|
ApiGatewayActor |
§4, §6.1 | OpenAI-compatible HTTP endpoint; spawns a RequestActor per request. |
RequestActor |
§6.1 | Per-request lifecycle; aggregates TokenChunks into Tokens. |
DpCoordinatorActor |
§4 | Cluster-singleton routing CRDT — picks an engine for a deployment. |
EngineCoreActor (local) |
§5.1 | Per-replica local-GPU orchestrator; owns a Box<dyn ModelRunner>. |
WorkerActor + ContextActor |
§5.3, §5.11 | Two-tier supervision; restarts on ContextPoisoned. |
DeploymentPlacementActor |
§7.2 | Picks nodes for new deployments; delegates GPU choice to atomr_accel_cuda::placement::PlacementActor. |
DeploymentManagerActor |
§4 | Cluster-singleton catalog of deployments. |
MetricsActor |
§7.7, §12.4 | Per-deployment counters and budget tracking. |
Remote-network engine cores live in
atomr-infer-remote-core — same actor
shapes, different internals (HTTP/2 worker pool instead of CUDA
streams).
Two-tier supervision — adopted, not reinvented
[]
= { = true, = ["local-gpu"] }
With the local-gpu feature, WorkerActor::supervisor_strategy()
returns
atomr_accel_cuda::error::device_supervisor_strategy()
verbatim — three retries inside a 60-second window with the upstream
ContextPoisoned / OutOfMemory / Unrecoverable decider. When a
ModelRunner::execute returns InferenceError::CudaContextPoisoned,
the ContextActor panics with the
atomr_accel_cuda::error::CONTEXT_POISONED_TAG
marker so the upstream supervisor routes the failure to Restart.
Without the feature, the same shape is preserved with an in-crate
fallback strategy — useful when you embed the runtime-agnostic actors
into a remote-only build that doesn't want atomr-accel in its
dependency graph.
Feature flags
| Feature | Adds | When to enable |
|---|---|---|
| (default) | runtime-agnostic actors only | Remote-only deployments |
local-gpu |
atomr-accel dep; upstream supervisor strategy |
Any deployment with local GPU runtimes |
A canonical wiring
use ;
use ;
use Config;
# async