Crate atomr_infer_core

Expand description

§inference-core

Foundation types for the atomr-infer workspace. Per architecture doc v4 §10.4 this crate has no actor-system dependencies — only serde / thiserror / bytes / secrecy (plus the documented async-trait exception for the ModelRunner trait).

Everything in here is consumed by inference-runtime (actor implementations) and the per-runtime crates. Authors of new runtime backends only need to depend on this crate to satisfy the ModelRunner contract.

Re-exports§

pub use batch::ExecuteBatch;
pub use batch::Message;
pub use batch::MessageContent;
pub use batch::Role;
pub use batch::SamplingParams;
pub use cost::CostEstimate;
pub use cost::EstimateCost;
pub use deployment::Budget;
pub use deployment::BudgetAction;
pub use deployment::CapacityPolicy;
pub use deployment::Deployment;
pub use deployment::RateLimits;
pub use deployment::Replica;
pub use deployment::RetryPolicy;
pub use deployment::Serving;
pub use deployment::Timeouts;
pub use error::InferenceError;
pub use error::InferenceResult;
pub use registry::infer_runtime;
pub use runner::ModelRunner;
pub use runner::RunHandle;
pub use runner::SessionRebuildCause;
pub use runner::WeightSource;
pub use runtime::CircuitBreakerConfig;
pub use runtime::JitterKind;
pub use runtime::ProviderKind;
pub use runtime::RuntimeConfig;
pub use runtime::RuntimeKind;
pub use runtime::TransportKind;
pub use tokens::FinishReason;
pub use tokens::TokenChunk;
pub use tokens::TokenUsage;
pub use tokens::Tokens;

Modules§

batch: Request batch — what the runtime executes.
cost: Cost-estimation primitives. Used by inference-pipeline’s TieredRouter and by MetricsActor for budget enforcement (doc §9.2, §12.4).
deployment: Deployment value object — the shared declarative surface for every local-GPU and remote-network backend (doc §11.1, §11.3).
error: InferenceError — the typed error surface that flows up to the RequestActor regardless of whether the bottleneck was GPU memory, GIL contention, or remote provider quota (doc §6.2).
registry: Default runtime selection — Deployment::infer_runtime() (doc §3.2).
runner: ModelRunner — the trait every runtime backend implements.
runtime: Runtime / transport / provider taxonomy and per-runtime configuration.
tokens: Output side: the streaming token chunks runners emit and the RequestActor accumulates.

Structs§

SecretBox: Wrapper type for values that contains secrets, which attempts to limit accidental exposure and ensure secrets are wiped from memory when dropped. (e.g. passwords, cryptographic keys, access tokens or other credentials)

Traits§

ExposeSecret: Expose a reference to an inner secret

Type Aliases§

SecretString: Re-export of secrecy::SecretString so consumer crates do not need to take a direct dependency on secrecy. Architecturally significant: credentials are part of the type system from the bottom up (doc §12.5).

Crate atomr_infer_core

Crate atomr_infer_core Copy item path

§inference-core

Re-exports§

Modules§

Structs§

Traits§

Type Aliases§

Crate atomr_infer_core