opensourcellmrouter
A fast, local-first LLM router. Drop it in front of any OpenAI- or Anthropic-compatible client and route requests across a configurable pipeline of local and cloud providers — with classifiers, cost/latency/random routing rules, plugins, a live dashboard, and a built-in TUI.
cargo install opensourcellmrouter
What it does
Every request arrives on an OpenAI-compatible (/v1/chat/completions) or Anthropic-compatible (/v1/messages) endpoint and flows through:
request → classifiers → plugins → router → provider → plugins → response classifiers → response
- Classifiers tag the request (e.g.
vision,code,nsfw) based on content. - Router rules pick a provider based on tags, cost, latency, throughput, model name prefix, discovered Ollama models, or at random. Rules are evaluated in order; first match wins.
- Plugins can mutate the request/response or force a specific provider.
- The chosen provider receives the (possibly rewritten) request and returns a response.
- Response classifiers tag the response after it comes back (e.g.
refusal), surfaced via theX-Router-Response-Tagsheader (request-side tags get their ownX-Router-Request-Tagsheader) — the OpenAI/Anthropic response body is never modified. - Every exchange is logged as JSONL and broadcast live to the dashboard.
Quick start
# Copy and edit the example config
# Add API keys for any cloud providers you want
# Build and run
Or use the included demo script (starts llama-server if needed, opens the TUI):
Point any OpenAI-compatible client at http://localhost:8090/v1.
Providers
Each [[providers]] entry is an upstream backend. Three wire formats are supported:
format |
Speaks | Example |
|---|---|---|
openai |
OpenAI chat completions API | OpenAI, llama-server, vLLM, Cloudflare Workers AI, xAI (Grok) |
anthropic |
Anthropic Messages API | Anthropic Claude |
ollama |
Ollama native API (/api/chat) |
Local Ollama instance |
# Local llama.cpp server (OpenAI-compatible)
[[]]
= "local"
= "openai"
= "http://localhost:8080/v1"
= 0.0
= 60
= 900
= 20
# Ollama (native API, no /v1 suffix)
[[]]
= "ollama"
= "ollama"
= "http://localhost:11434"
= 75
# OpenAI (key read from $OPENAI_API_KEY)
[[]]
= "openai"
= "openai"
= "https://api.openai.com/v1"
= "OPENAI_API_KEY"
= 5.0
= 90
# Anthropic (key read from $ANTHROPIC_API_KEY)
[[]]
= "anthropic"
= "anthropic"
= "https://api.anthropic.com"
= "ANTHROPIC_API_KEY"
= 15.0
= 95
# Cloudflare Workers AI
[[]]
= "cloudflare"
= "openai"
= "https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/v1"
= "CLOUDFLARE_API_TOKEN"
= 0.2
= 80
# xAI (Grok) — OpenAI-compatible (key read from $XAI_API_KEY)
[[]]
= "xai"
= "openai"
= "https://api.x.ai/v1"
= "XAI_API_KEY"
= 5.0
= 88
API keys go in .env (gitignored) and are sourced automatically by demo.sh, or export them in your shell before running. A provider with a missing key is skipped automatically at startup (logged as a warning) — it's never selected by any router rule.
Router rules
Rules live in [[routers]] and are evaluated top-to-bottom; first match wins.
prefix — route by model name prefix
[[]]
= "prefix"
= "local/"
= "local"
= "llama3.2-3b"
tag — route by classifier tag
[[]]
= "tag"
= "code"
= "ollama"
= "deepseek-r1:latest"
discover — route to Ollama if it has the model
Queries GET /api/tags at startup and routes requests whose model name appears in the result.
[[]]
= "discover"
= "ollama"
price — pick the cheapest provider
[[]]
= "price"
= ["local", "openai"] # omit for all providers
= 5.0 # optional ceiling
latency — pick the fastest provider
[[]]
= "latency"
= 500
throughput — pick the highest-throughput provider
[[]]
= "throughput"
= 30
fallback — score-based catch-all
Ranks by quality_bias * quality - (1 - quality_bias) * cost. Good chain terminator.
[[]]
= "fallback"
= 0.7 # 0 = cheapest, 1 = highest quality
random — pick at random
# Random provider from all configured providers:
[[]]
= "random"
# Or pick from explicit (provider, model) pairs:
[[]]
= "random"
= [
{ = "local", = "llama3.2-3b" },
{ = "ollama", = "deepseek-r1:latest" },
{ = "cloudflare", = "@cf/meta/llama-3.1-8b-instruct" },
{ = "openai", = "gpt-4o-mini" },
]
See docs/examples.md for full end-to-end recipes.
Classifiers
Classifiers run on every request before routing and attach tags to it. The only built-in classifier is keyword, which matches words in the prompt:
[]
= true
[]
= ["image", "photo", "screenshot", "diagram"]
= ["function", "class", "import", "def ", "fn "]
= ["nsfw", "adult", "explicit"]
Tags are available to tag router rules and appear in logs and the dashboard.
There's also a response-side counterpart: [response_classifiers.<id>] tags
the response after the provider replies, e.g. to flag a refusal:
[]
= true
Every response carries six X-Router-* headers without ever touching the
OpenAI/Anthropic response body: X-Router-Request-Tags (e.g. vision),
X-Router-Response-Tags (e.g. refusal), X-Router-Provider (which
provider handled it), X-Router-Model (the model actually sent), and
X-Router-Input-Tokens/X-Router-Output-Tokens (token usage). The same data
shows up in logs/dashboard. See docs/classifiers.md.
Dashboard and TUI
Browser dashboard
GET /dashboard streams a live feed of every request via SSE. Enable it in config:
[]
= true
= 8090
The feed emits four event types per request — start, classified, routed, and complete — so you see classifier tags and routing decisions appear in real time, before the response arrives.
Terminal TUI
Three-pane UI: live pipeline feed (top), running stats by provider/tag (bottom-left), built-in chat client (bottom-right). Keys: Tab/i to focus chat, ↑↓ to scroll feed, q to quit.
Watch mode
Prints each request's classifier tags, routing decision, and response to stdout as they happen.
Logging
[]
= true
= "logs/requests.jsonl"
Every completed request is appended as a line of JSON including provider, requested model, sent model, tags, plugins, messages, response, and duration.
Plugins
Plugins hook into the pipeline at four stages: on_start, pre_request, post_response, and on_end. Two are built in:
| Plugin | What it does |
|---|---|
response-healing |
Repairs truncated or malformed JSON in responses |
pareto-router |
Forces requests to a tier (low/medium/high) based on config or per-request override |
[]
= true
Configuration reference
| Section | Docs |
|---|---|
| Providers | docs/providers.md |
| Routers | docs/routers.md |
| Classifiers | docs/classifiers.md |
| Plugins | docs/plugins.md |
| Pipeline overview | docs/README.md |
| Examples | docs/examples.md |
| Coding agents (Claude Code, Copilot CLI, Codex) | docs/coding-agents.md |
Security (host, api_key_env) |
docs/security.md |
Building from source
Requires Rust 1.85+ (edition 2024).
License
MIT — see LICENSE.