pub struct RoutingConfig {Show 14 fields
pub mode: String,
pub confidence_threshold: f64,
pub local_first: bool,
pub cost_aware: bool,
pub estimated_output_tokens: u32,
pub accuracy_floor: f64,
pub accuracy_min_obs: usize,
pub cost_weight: Option<f64>,
pub canary_model: Option<String>,
pub canary_fraction: f64,
pub blocked_models: Vec<String>,
pub per_provider_timeout_seconds: u64,
pub max_total_inference_seconds: u64,
pub max_fallback_attempts: usize,
}Fields§
§mode: String§confidence_threshold: f64§local_first: bool§cost_aware: bool§estimated_output_tokens: u32§accuracy_floor: f64Minimum observed quality score (0.0–1.0) for a model to be considered
during metascore routing. Models with fewer than accuracy_min_obs
observations are exempt (insufficient data). Set to 0.0 to disable.
accuracy_min_obs: usizeMinimum observations before the accuracy floor applies to a model.
cost_weight: Option<f64>Custom cost weight for metascore [0.0–1.0]. When set, replaces the
binary cost_aware toggle with a continuous dial: 0.0 = ignore cost,
1.0 = maximize savings. Efficacy weight adjusts inversely.
When None, falls back to cost_aware boolean behavior.
canary_model: Option<String>Canary model to route a fraction of traffic through for A/B validation.
When set, canary_fraction of requests are routed to this model instead
of the metascore winner. Set to None to disable canary routing.
canary_fraction: f64Fraction of requests routed to the canary model [0.0–1.0].
Only effective when canary_model is set. Default: 0.0 (disabled).
blocked_models: Vec<String>Static model blocklist — models listed here are unconditionally excluded from all routing paths (override, metascore, fallback). Useful as an instant kill-switch without restarting the server.
per_provider_timeout_seconds: u64Per-provider timeout in seconds for interactive inference. If a single model doesn’t respond within this window, the fallback chain advances. Increase for slow local models (e.g. large quantized models on CPU/GPU).
max_total_inference_seconds: u64Total wall-clock budget in seconds for the entire inference fallback chain (all attempts combined). Increase if you have many fallback candidates or slow providers.
max_fallback_attempts: usizeMaximum number of fallback attempts before giving up.
Trait Implementations§
Source§impl Clone for RoutingConfig
impl Clone for RoutingConfig
Source§fn clone(&self) -> RoutingConfig
fn clone(&self) -> RoutingConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more