pub struct Deployment {
pub name: String,
pub model: String,
pub runtime: Option<RuntimeKind>,
pub runtime_config: Option<RuntimeConfig>,
pub gpus: Option<u32>,
pub replicas: u32,
pub serving: Serving,
pub budget: Option<Budget>,
pub idempotent: bool,
}Expand description
A model deployment. The runtime field selects the backend; every
other field has a runtime-agnostic interpretation. Local deployments
fill gpus; remote deployments leave it None and use serving’s
max_concurrent instead.
Fields§
§name: String§model: String§runtime: Option<RuntimeKind>Optional explicit runtime. When omitted, infer_runtime picks
based on the model name (doc §3.2).
runtime_config: Option<RuntimeConfig>Backend-specific configuration. When omitted, defaults are used.
gpus: Option<u32>Local-only: number of GPUs per replica.
replicas: u32Number of replicas (local: HA + scale-out; remote: independent worker pools, possibly different API keys).
serving: Serving§budget: Option<Budget>§idempotent: boolTrue for normal LLM inference; false to disable retries on non-idempotent stateful APIs (doc §12.3).
Implementations§
Source§impl Deployment
impl Deployment
Sourcepub fn effective_runtime(&self) -> RuntimeKind
pub fn effective_runtime(&self) -> RuntimeKind
Effective runtime kind: explicit override wins, otherwise infer from the model name (doc §3.2).
Sourcepub fn validate(&self) -> Result<(), DeploymentValidationError>
pub fn validate(&self) -> Result<(), DeploymentValidationError>
Cheap structural validation done at deploy time. Heavier checks
(provider tier limits, network egress) live in inference-runtime
where we can perform IO.
Trait Implementations§
Source§impl Clone for Deployment
impl Clone for Deployment
Source§fn clone(&self) -> Deployment
fn clone(&self) -> Deployment
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more