Expand description
§inference-runtime
Runtime-agnostic actor implementations on top of rakka-core.
Per architecture doc §4 these are the actors whose logic doesn’t
depend on whether the underlying backend is local-GPU or
remote-network: the gateway, the per-request lifecycle actor, the
coordinator, the deployment manager, placement, metrics. Local-GPU
specifics (WorkerActor with ContextActor two-tier supervision)
also live here because the shape of two-tier supervision is
shared infrastructure even though the per-runtime rebuild logic is
contributed by per-runtime crates.
Remote-network engine cores live in inference-remote-core.
Re-exports§
pub use deployment_manager::DeploymentManagerActor;pub use deployment_manager::DeploymentManagerMsg;pub use deployment_manager::DeploymentRecord;pub use deployment_manager::DeploymentState;pub use dp_coordinator::DpCoordinatorActor;pub use dp_coordinator::DpCoordinatorMsg;pub use dp_coordinator::RouteTarget;pub use engine_core::AddRequest;pub use engine_core::EngineCoreActor;pub use engine_core::EngineCoreMsg;pub use engine_core::LocalEngineConfig;pub use gateway::spawn_gateway;pub use gateway::ApiGatewayActor;pub use gateway::ApiGatewayMsg;pub use gateway::GatewayConfig;pub use metrics::DeploymentMetrics;pub use metrics::FailureKind;pub use metrics::MetricsActor;pub use metrics::MetricsMsg;pub use metrics::MetricsSnapshot;pub use placement::DeploymentPlacementActor;pub use placement::NodeAssignment;pub use placement::PlacementConstraints;pub use placement::PlacementError;pub use placement::PlacementMsg;pub use placement::PlacementResult;pub use request::RequestActor;pub use request::RequestMsg;pub use request::Route;pub use request::StreamingResponse;pub use worker::ContextActor;pub use worker::ContextMsg;pub use worker::WorkerActor;pub use worker::WorkerMsg;pub use worker::WorkerSlot;
Modules§
- deployment_
manager DeploymentManagerActor— cluster-singleton owner of the deployment catalog. Doc §4. Manages create/update/delete and surfaces the current set to the gateway andDpCoordinatorActor.- dp_
coordinator DpCoordinatorActor— one cluster-singleton per model. Doc §4, §6.1.- engine_
core EngineCoreActor— local-GPU per-replica orchestrator. Doc §4, §5.1.- gateway
ApiGatewayActor— HTTP gateway. Doc §4, §6.- metrics
MetricsActor— in-process aggregation of per-deployment counters. Doc §7.7, §12.4.- placement
DeploymentPlacementActor— picks nodes for new deployments. Doc §7.2.- request
RequestActor— one per active client request. Doc §6.1, §6.2.- worker
- Local-GPU worker — two-tier supervision adapter (doc §4, §5.3).