Skip to main content

Crate atomr_infer_core

Crate atomr_infer_core 

Source
Expand description

§inference-core

Foundation types for the atomr-infer workspace. Per architecture doc v4 §10.4 this crate has no actor-system dependencies — only serde / thiserror / bytes / secrecy (plus the documented async-trait exception for the ModelRunner trait).

Everything in here is consumed by inference-runtime (actor implementations) and the per-runtime crates. Authors of new runtime backends only need to depend on this crate to satisfy the ModelRunner contract.

Re-exports§

pub use batch::ExecuteBatch;
pub use batch::Message;
pub use batch::MessageContent;
pub use batch::Role;
pub use batch::SamplingParams;
pub use cost::CostEstimate;
pub use cost::EstimateCost;
pub use deployment::Budget;
pub use deployment::BudgetAction;
pub use deployment::CapacityPolicy;
pub use deployment::Deployment;
pub use deployment::RateLimits;
pub use deployment::Replica;
pub use deployment::RetryPolicy;
pub use deployment::Serving;
pub use deployment::Timeouts;
pub use error::InferenceError;
pub use error::InferenceResult;
pub use registry::infer_runtime;
pub use runner::ModelRunner;
pub use runner::RunHandle;
pub use runner::SessionRebuildCause;
pub use runner::WeightSource;
pub use runtime::CircuitBreakerConfig;
pub use runtime::JitterKind;
pub use runtime::ProviderKind;
pub use runtime::RuntimeConfig;
pub use runtime::RuntimeKind;
pub use runtime::TransportKind;
pub use tokens::FinishReason;
pub use tokens::TokenChunk;
pub use tokens::TokenUsage;
pub use tokens::Tokens;

Modules§

batch
Request batch — what the runtime executes.
cost
Cost-estimation primitives. Used by inference-pipeline’s TieredRouter and by MetricsActor for budget enforcement (doc §9.2, §12.4).
deployment
Deployment value object — the shared declarative surface for every local-GPU and remote-network backend (doc §11.1, §11.3).
error
InferenceError — the typed error surface that flows up to the RequestActor regardless of whether the bottleneck was GPU memory, GIL contention, or remote provider quota (doc §6.2).
registry
Default runtime selection — Deployment::infer_runtime() (doc §3.2).
runner
ModelRunner — the trait every runtime backend implements.
runtime
Runtime / transport / provider taxonomy and per-runtime configuration.
tokens
Output side: the streaming token chunks runners emit and the RequestActor accumulates.

Structs§

SecretBox
Wrapper type for values that contains secrets, which attempts to limit accidental exposure and ensure secrets are wiped from memory when dropped. (e.g. passwords, cryptographic keys, access tokens or other credentials)

Traits§

ExposeSecret
Expose a reference to an inner secret

Type Aliases§

SecretString
Re-export of secrecy::SecretString so consumer crates do not need to take a direct dependency on secrecy. Architecturally significant: credentials are part of the type system from the bottom up (doc §12.5).