pub struct LlmBasedRouter { /* private fields */ }Expand description
LLM-powered router that uses a model to make routing decisions
Uses the configured tier to analyze requests and choose optimal target. Provides intelligent fallback when rule-based routing is ambiguous.
§Construction-Time Validation
Uses TierSelector to validate that the specified tier has available endpoints.
The tier is chosen via config.routing.router_tier at construction time.
Implementations§
Source§impl LlmBasedRouter
impl LlmBasedRouter
Sourcepub fn new(
selector: Arc<ModelSelector>,
tier: TargetModel,
router_timeout_secs: u64,
metrics: Arc<Metrics>,
) -> AppResult<Self>
pub fn new( selector: Arc<ModelSelector>, tier: TargetModel, router_timeout_secs: u64, metrics: Arc<Metrics>, ) -> AppResult<Self>
Create a new LLM-based router using the specified tier
Returns an error if no endpoints are configured for the specified tier.
§Arguments
selector- The underlying ModelSelectortier- Which tier (Fast, Balanced, Deep) to use for routing decisionsrouter_timeout_secs- Timeout for router queries in secondsmetrics- Metrics collector for observability
§Tier Selection
- Fast: Lowest latency (~50-200ms) but may misroute complex requests
- Balanced: Recommended default (~100-500ms) with good accuracy
- Deep: Highest accuracy (~2-5s) but rarely worth the latency overhead
§Construction-Time Validation
The TierSelector validates tier availability at construction, ensuring
at least one endpoint exists for the specified tier.
Sourcepub fn tier(&self) -> TargetModel
pub fn tier(&self) -> TargetModel
Returns the configured router tier
Sourcepub async fn route(
&self,
user_prompt: &str,
meta: &RouteMetadata,
) -> AppResult<RoutingDecision>
pub async fn route( &self, user_prompt: &str, meta: &RouteMetadata, ) -> AppResult<RoutingDecision>
Route request using LLM analysis
§Async Behavior
This method is async because it:
- Waits for LLM inference: ~100-500ms for 30B model routing decision (dominant latency)
- Makes HTTP requests to LLM endpoints (network I/O, ~10-100ms connection overhead)
- Awaits endpoint selection from ModelSelector (async lock acquisition, <1ms)
- Performs health tracking mark_success/mark_failure (async lock, <1ms)
Total typical latency: ~110-600ms (dominated by LLM inference)
§Retry Logic & Failure Tracking (Dual-Level)
Implements sophisticated retry with TWO failure tracking mechanisms:
- Request-Scoped Exclusion (
failed_endpoints): Prevents retrying the same endpoint within THIS request. Clears when function returns. - Global Health Tracking: Marks endpoints unhealthy after 3 consecutive failures across ALL requests. Persists via ModelSelector’s health_checker.
§Cancellation Safety
If the returned Future is dropped (cancelled), in-flight LLM queries will be aborted but endpoint health state remains consistent (mark_success/mark_failure only called after query completes).
Trait Implementations§
Source§impl LlmRouter for LlmBasedRouter
Implementation of LlmRouter trait for LlmBasedRouter
impl LlmRouter for LlmBasedRouter
Implementation of LlmRouter trait for LlmBasedRouter
This allows LlmBasedRouter to be used as a trait object for dependency injection in tests.