Expand description
§inference-remote-core
Shared remote-runtime infrastructure (doc §5, §10.3, §12). Provides
the HTTP-shaped analog of the local-GPU WorkerActor /
EngineCoreActor pair, plus the cross-cutting concerns that the
GPU side doesn’t need (rate limiting, circuit breaking, credential
refresh, SSE parsing, retry/backoff, error classification, cost
aggregation).
Per-provider crates (inference-runtime-openai, -anthropic,
-gemini, -litellm) depend on this crate and contribute one
ModelRunner impl plus a RuntimeConfig shape.
Re-exports§
pub use backoff::compute_backoff;pub use backoff::BackoffPolicy;pub use circuit_breaker::CircuitBreakerActor;pub use circuit_breaker::CircuitBreakerHandle;pub use circuit_breaker::CircuitState;pub use classify::classify_http_status;pub use classify::parse_retry_after;pub use engine::AddRequest;pub use engine::EngineMetrics;pub use engine::EngineMsg;pub use engine::RemoteEngineConfig;pub use engine::RemoteEngineCoreActor;pub use http::build_client;pub use http::HttpClient;pub use queue::Priority;pub use queue::PriorityRequest;pub use queue::RequestQueue;pub use rate_limit::AcquirePermit;pub use rate_limit::Permit;pub use rate_limit::RateLimiterActor;pub use rate_limit::RateLimiterHandle;pub use rate_limit::StrictRateLimiterActor;pub use retry::Attempt;pub use retry::RetryDecision;pub use retry::RetryEngine;pub use session::CredentialProvider;pub use session::RemoteSessionActor;pub use session::SessionConfig;pub use session::SessionRebuildRequest;pub use session::SessionSnapshot;pub use session::StaticApiKey;pub use sse::decode_sse_stream;pub use sse::SseChunk;pub use worker::RemoteWorkerActor;pub use worker::WorkerMsg;pub use worker::WorkerSlot;
Modules§
- backoff
- Exponential backoff + jitter, mirroring atomr’s
pattern::backoff::BackoffOptionsshape but specialised for the per-request retry loop insideRemoteWorkerActor. - circuit_
breaker - Circuit-breaker actor (doc §3.5, §12.2). One per
(provider, endpoint). - classify
- HTTP-status → typed
InferenceErrorclassification. - engine
RemoteEngineCoreActor— per-replica HTTP orchestrator. Doc §5.1.- http
- HTTP/2 client construction and shared types. Doc §3.5, §5.8.
- queue
- Bounded priority queue for
RemoteEngineCoreActor. Per doc §5.2 the queue is a module, not an actor — every per-message hop would add mailbox latency for no architectural payoff. - rate_
limit - Rate-limiter actors. Doc §3.5, §12.1.
- retry
- Per-request retry decision logic. Doc §3.5 (Backoff on 429), §12.3.
- session
RemoteSessionActor— analog of the localContextActor(CUDA §5.11).- sse
- Provider-agnostic SSE chunk parsing. Every provider’s stream is
framed identically (lines beginning
data: <json>separated by blank lines, terminated bydata: [DONE]); only the inner JSON shape differs. Per-provider crates layer concrete types on top. - worker
RemoteWorkerActor— one per concurrent slot. Doc §5.1, §5.8.