Expand description
§llmux v2
Hook-driven LLM model multiplexer with pluggable switch policy.
All model lifecycle management (start, stop, health check) is delegated to user-provided scripts. llmux handles request routing, in-flight tracking, draining, and policy-driven switching.
§Architecture
┌─────────────────────────────────────────────────────────┐
│ llmux │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Middleware (Tower Layer) │ │
│ │ - Extracts model from request │ │
│ │ - Ensures model ready (triggers switch if needed) │ │
│ │ - Acquires in-flight guard │ │
│ │ - Wraps response in GuardedBody for streaming │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Switcher │ │
│ │ - Drain → sleep hook → wake hook │ │
│ │ - Policy decides when to switch │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Reverse Proxy │ │
│ │ - Forwards to localhost:model_port │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ ▼ ▼ ▼ │
│ [backend:8001] [backend:8002] [backend:8003] │
│ wake.sh / sleep.sh / alive.sh │
└─────────────────────────────────────────────────────────┘Re-exports§
pub use policy::FifoPolicy;pub use policy::PolicyContext;pub use policy::PolicyDecision;pub use policy::ScheduleContext;pub use policy::SwitchPolicy;
Modules§
- policy
- Switch policies for model switching decisions.
Structs§
- Config
- Top-level configuration
- Hook
Runner - Runs lifecycle scripts for models.
- InFlight
Guard - Guard that tracks in-flight requests. When dropped, decrements the count and notifies the drain waiter.
- Model
Config - Configuration for a single model.
- Model
Switcher - The model switcher coordinates wake/sleep transitions.
- Model
Switcher Layer - Layer that adds model switching to a service.
- Model
Switcher Service - Service that wraps requests with model switching.
- Policy
Config - Policy configuration
- Proxy
State - Shared state for the proxy handler.
- Switch
Cost Tracker - Tracks empirical switch costs using exponential moving average.
Enums§
- Switch
Error - Errors from the switcher
- Switcher
State - State of the model switcher
Functions§
- build_
app - Build the complete llmux stack.
- proxy_
handler - Axum fallback handler that forwards requests to model backends.