LlmBasedRouter

Struct LlmBasedRouter 

Source
pub struct LlmBasedRouter { /* private fields */ }
Expand description

LLM-powered router that uses a model to make routing decisions

Uses the configured tier to analyze requests and choose optimal target. Provides intelligent fallback when rule-based routing is ambiguous.

§Construction-Time Validation

Uses TierSelector to validate that the specified tier has available endpoints. The tier is chosen via config.routing.router_tier at construction time.

Implementations§

Source§

impl LlmBasedRouter

Source

pub fn new( selector: Arc<ModelSelector>, tier: TargetModel, router_timeout_secs: u64, metrics: Arc<Metrics>, ) -> AppResult<Self>

Create a new LLM-based router using the specified tier

Returns an error if no endpoints are configured for the specified tier.

§Arguments
  • selector - The underlying ModelSelector
  • tier - Which tier (Fast, Balanced, Deep) to use for routing decisions
  • router_timeout_secs - Timeout for router queries in seconds
  • metrics - Metrics collector for observability
§Tier Selection
  • Fast: Lowest latency (~50-200ms) but may misroute complex requests
  • Balanced: Recommended default (~100-500ms) with good accuracy
  • Deep: Highest accuracy (~2-5s) but rarely worth the latency overhead
§Construction-Time Validation

The TierSelector validates tier availability at construction, ensuring at least one endpoint exists for the specified tier.

Source

pub fn tier(&self) -> TargetModel

Returns the configured router tier

Source

pub async fn route( &self, user_prompt: &str, meta: &RouteMetadata, ) -> AppResult<RoutingDecision>

Route request using LLM analysis

§Async Behavior

This method is async because it:

  • Waits for LLM inference: ~100-500ms for 30B model routing decision (dominant latency)
  • Makes HTTP requests to LLM endpoints (network I/O, ~10-100ms connection overhead)
  • Awaits endpoint selection from ModelSelector (async lock acquisition, <1ms)
  • Performs health tracking mark_success/mark_failure (async lock, <1ms)

Total typical latency: ~110-600ms (dominated by LLM inference)

§Retry Logic & Failure Tracking (Dual-Level)

Implements sophisticated retry with TWO failure tracking mechanisms:

  1. Request-Scoped Exclusion (failed_endpoints): Prevents retrying the same endpoint within THIS request. Clears when function returns.
  2. Global Health Tracking: Marks endpoints unhealthy after 3 consecutive failures across ALL requests. Persists via ModelSelector’s health_checker.
§Cancellation Safety

If the returned Future is dropped (cancelled), in-flight LLM queries will be aborted but endpoint health state remains consistent (mark_success/mark_failure only called after query completes).

Trait Implementations§

Source§

impl LlmRouter for LlmBasedRouter

Implementation of LlmRouter trait for LlmBasedRouter

This allows LlmBasedRouter to be used as a trait object for dependency injection in tests.

Source§

fn route<'life0, 'life1, 'life2, 'async_trait>( &'life0 self, user_prompt: &'life1 str, meta: &'life2 RouteMetadata, ) -> Pin<Box<dyn Future<Output = AppResult<RoutingDecision>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait,

Route a request based on LLM analysis Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more