Expand description
§inference-core
Foundation types for the atomr-infer workspace. Per architecture
doc v4 §10.4 this crate has no actor-system dependencies — only
serde / thiserror / bytes / secrecy (plus the documented async-trait
exception for the ModelRunner trait).
Everything in here is consumed by inference-runtime (actor
implementations) and the per-runtime crates. Authors of new runtime
backends only need to depend on this crate to satisfy the
ModelRunner contract.
Re-exports§
pub use batch::ExecuteBatch;pub use batch::Message;pub use batch::MessageContent;pub use batch::Role;pub use batch::SamplingParams;pub use cost::CostEstimate;pub use cost::EstimateCost;pub use deployment::Budget;pub use deployment::BudgetAction;pub use deployment::CapacityPolicy;pub use deployment::Deployment;pub use deployment::RateLimits;pub use deployment::Replica;pub use deployment::RetryPolicy;pub use deployment::Serving;pub use deployment::Timeouts;pub use error::InferenceError;pub use error::InferenceResult;pub use registry::infer_runtime;pub use runner::ModelRunner;pub use runner::RunHandle;pub use runner::SessionRebuildCause;pub use runner::WeightSource;pub use runtime::CircuitBreakerConfig;pub use runtime::JitterKind;pub use runtime::ProviderKind;pub use runtime::RuntimeConfig;pub use runtime::RuntimeKind;pub use runtime::TransportKind;pub use tokens::FinishReason;pub use tokens::TokenChunk;pub use tokens::TokenUsage;pub use tokens::Tokens;
Modules§
- batch
- Request batch — what the runtime executes.
- cost
- Cost-estimation primitives. Used by
inference-pipeline’sTieredRouterand byMetricsActorfor budget enforcement (doc §9.2, §12.4). - deployment
Deploymentvalue object — the shared declarative surface for every local-GPU and remote-network backend (doc §11.1, §11.3).- error
InferenceError— the typed error surface that flows up to theRequestActorregardless of whether the bottleneck was GPU memory, GIL contention, or remote provider quota (doc §6.2).- registry
- Default runtime selection —
Deployment::infer_runtime()(doc §3.2). - runner
ModelRunner— the trait every runtime backend implements.- runtime
- Runtime / transport / provider taxonomy and per-runtime configuration.
- tokens
- Output side: the streaming token chunks runners emit and the
RequestActoraccumulates.
Structs§
- Secret
Box - Wrapper type for values that contains secrets, which attempts to limit accidental exposure and ensure secrets are wiped from memory when dropped. (e.g. passwords, cryptographic keys, access tokens or other credentials)
Traits§
- Expose
Secret - Expose a reference to an inner secret
Type Aliases§
- Secret
String - Re-export of
secrecy::SecretStringso consumer crates do not need to take a direct dependency onsecrecy. Architecturally significant: credentials are part of the type system from the bottom up (doc §12.5).