Skip to main content

Crate atomr_infer_remote_core

Crate atomr_infer_remote_core 

Source
Expand description

§inference-remote-core

Shared remote-runtime infrastructure (doc §5, §10.3, §12). Provides the HTTP-shaped analog of the local-GPU WorkerActor / EngineCoreActor pair, plus the cross-cutting concerns that the GPU side doesn’t need (rate limiting, circuit breaking, credential refresh, SSE parsing, retry/backoff, error classification, cost aggregation).

Per-provider crates (inference-runtime-openai, -anthropic, -gemini, -litellm) depend on this crate and contribute one ModelRunner impl plus a RuntimeConfig shape.

Re-exports§

pub use backoff::compute_backoff;
pub use backoff::BackoffPolicy;
pub use circuit_breaker::CircuitBreakerActor;
pub use circuit_breaker::CircuitBreakerHandle;
pub use circuit_breaker::CircuitState;
pub use classify::classify_http_status;
pub use classify::parse_retry_after;
pub use engine::AddRequest;
pub use engine::EngineMetrics;
pub use engine::EngineMsg;
pub use engine::RemoteEngineConfig;
pub use engine::RemoteEngineCoreActor;
pub use http::build_client;
pub use http::HttpClient;
pub use queue::Priority;
pub use queue::PriorityRequest;
pub use queue::RequestQueue;
pub use rate_limit::AcquirePermit;
pub use rate_limit::Permit;
pub use rate_limit::RateLimiterActor;
pub use rate_limit::RateLimiterHandle;
pub use rate_limit::StrictRateLimiterActor;
pub use retry::Attempt;
pub use retry::RetryDecision;
pub use retry::RetryEngine;
pub use session::CredentialProvider;
pub use session::RemoteSessionActor;
pub use session::SessionConfig;
pub use session::SessionRebuildRequest;
pub use session::SessionSnapshot;
pub use session::StaticApiKey;
pub use sse::decode_sse_stream;
pub use sse::SseChunk;
pub use worker::RemoteWorkerActor;
pub use worker::WorkerMsg;
pub use worker::WorkerSlot;

Modules§

backoff
Exponential backoff + jitter, mirroring atomr’s pattern::backoff::BackoffOptions shape but specialised for the per-request retry loop inside RemoteWorkerActor.
circuit_breaker
Circuit-breaker actor (doc §3.5, §12.2). One per (provider, endpoint).
classify
HTTP-status → typed InferenceError classification.
engine
RemoteEngineCoreActor — per-replica HTTP orchestrator. Doc §5.1.
http
HTTP/2 client construction and shared types. Doc §3.5, §5.8.
queue
Bounded priority queue for RemoteEngineCoreActor. Per doc §5.2 the queue is a module, not an actor — every per-message hop would add mailbox latency for no architectural payoff.
rate_limit
Rate-limiter actors. Doc §3.5, §12.1.
retry
Per-request retry decision logic. Doc §3.5 (Backoff on 429), §12.3.
session
RemoteSessionActor — analog of the local ContextActor (CUDA §5.11).
sse
Provider-agnostic SSE chunk parsing. Every provider’s stream is framed identically (lines beginning data: <json> separated by blank lines, terminated by data: [DONE]); only the inner JSON shape differs. Per-provider crates layer concrete types on top.
worker
RemoteWorkerActor — one per concurrent slot. Doc §5.1, §5.8.