Skip to main content

Crate llmux

Crate llmux 

Source
Expand description

§llmux v2

Hook-driven LLM model multiplexer with pluggable switch policy.

All model lifecycle management (start, stop, health check) is delegated to user-provided scripts. llmux handles request routing, in-flight tracking, draining, and policy-driven switching.

§Architecture

┌─────────────────────────────────────────────────────────┐
│                        llmux                            │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Middleware (Tower Layer)                           │  │
│  │ - Extracts model from request                     │  │
│  │ - Ensures model ready (triggers switch if needed) │  │
│  │ - Acquires in-flight guard                        │  │
│  │ - Wraps response in GuardedBody for streaming     │  │
│  └───────────────────────────────────────────────────┘  │
│                          │                              │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Switcher                                          │  │
│  │ - Drain → sleep hook → wake hook                  │  │
│  │ - Policy decides when to switch                   │  │
│  └───────────────────────────────────────────────────┘  │
│                          │                              │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Reverse Proxy                                     │  │
│  │ - Forwards to localhost:model_port                │  │
│  └───────────────────────────────────────────────────┘  │
│                          │                              │
│      ┌───────────────────┼───────────────────┐         │
│      ▼                   ▼                   ▼         │
│  [backend:8001]     [backend:8002]      [backend:8003] │
│   wake.sh / sleep.sh / alive.sh                        │
└─────────────────────────────────────────────────────────┘

Re-exports§

pub use policy::FifoPolicy;
pub use policy::PolicyContext;
pub use policy::PolicyDecision;
pub use policy::ScheduleContext;
pub use policy::SwitchPolicy;

Modules§

policy
Switch policies for model switching decisions.

Structs§

Config
Top-level configuration
HookRunner
Runs lifecycle scripts for models.
InFlightGuard
Guard that tracks in-flight requests. When dropped, decrements the count and notifies the drain waiter.
ModelConfig
Configuration for a single model.
ModelSwitcher
The model switcher coordinates wake/sleep transitions.
ModelSwitcherLayer
Layer that adds model switching to a service.
ModelSwitcherService
Service that wraps requests with model switching.
PolicyConfig
Policy configuration
ProxyState
Shared state for the proxy handler.
SwitchCostTracker
Tracks empirical switch costs using exponential moving average.

Enums§

SwitchError
Errors from the switcher
SwitcherState
State of the model switcher

Functions§

build_app
Build the complete llmux stack.
proxy_handler
Axum fallback handler that forwards requests to model backends.