atomr-infer-runtime 0.4.0

Runtime-agnostic actor topology for atomr-infer — gateway, request actor, dp-coordinator, engine-core, two-tier worker, placement, deployment manager, metrics.
Documentation

atomr-infer-runtime

Runtime-agnostic actors on top of rakka-core. Gateway, per-request lifecycle, coordinator, deployment manager, two-tier supervision — none of which knows or cares whether the underlying backend is a GPU or a remote network call.

Actors

Actor Doc § Purpose
ApiGatewayActor §4, §6.1 OpenAI-compatible HTTP endpoint; spawns a RequestActor per request.
RequestActor §6.1 Per-request lifecycle; aggregates TokenChunks into Tokens.
DpCoordinatorActor §4 Cluster-singleton routing CRDT — picks an engine for a deployment.
EngineCoreActor (local) §5.1 Per-replica local-GPU orchestrator; owns a Box<dyn ModelRunner>.
WorkerActor + ContextActor §5.3, §5.11 Two-tier supervision; restarts on ContextPoisoned.
DeploymentPlacementActor §7.2 Picks nodes for new deployments; delegates GPU choice to atomr_accel::cuda::placement::PlacementActor.
DeploymentManagerActor §4 Cluster-singleton catalog of deployments.
MetricsActor §7.7, §12.4 Per-deployment counters and budget tracking.

Remote-network engine cores live in atomr-infer-remote-core — same actor shapes, different internals (HTTP/2 worker pool instead of CUDA streams).

Two-tier supervision — adopted, not reinvented

[dependencies]
atomr-infer-runtime = { workspace = true, features = ["local-gpu"] }

With the local-gpu feature, WorkerActor::supervisor_strategy() returns atomr_accel::cuda::error::device_supervisor_strategy() verbatim — three retries inside a 60-second window with the upstream ContextPoisoned / OutOfMemory / Unrecoverable decider. When a ModelRunner::execute returns InferenceError::CudaContextPoisoned, the ContextActor panics with the atomr_accel::cuda::error::CONTEXT_POISONED_TAG marker so the upstream supervisor routes the failure to Restart.

Without the feature, the same shape is preserved with an in-crate fallback strategy — useful when you embed the runtime-agnostic actors into a remote-only build that doesn't want rakka-accel in its dependency graph.

Feature flags

Feature Adds When to enable
(default) runtime-agnostic actors only Remote-only deployments
local-gpu rakka-accel dep; upstream supervisor strategy Any deployment with local GPU runtimes

A canonical wiring

use inference_runtime::{
    ApiGatewayActor, DeploymentManagerActor, DpCoordinatorActor, GatewayConfig,
    MetricsActor, spawn_gateway,
};
use rakka_core::actor::{ActorSystem, Props};
use rakka_config::Config;

# async fn run() -> anyhow::Result<()> {
let sys = ActorSystem::create("inference", Config::reference()).await?;

let dp = sys.actor_of(Props::create(|| DpCoordinatorActor::new()), "dp")?;
let _mgr = sys.actor_of(Props::create(|| DeploymentManagerActor::new()), "mgr")?;
let _metrics = sys.actor_of(Props::create(|| MetricsActor::new()), "metrics")?;
let _gateway = spawn_gateway(&sys, GatewayConfig::default(), dp)?;
# Ok(())
# }