opensourcellmrouter 0.5.1

A fast, local-first LLM router — proxy any OpenAI/Anthropic/Ollama client to your own provider pipeline with classifiers, cost/latency/random routing rules, plugins, a live dashboard, and a TUI.
opensourcellmrouter-0.5.1 is not a library.

opensourcellmrouter

A fast, local-first LLM router. Drop it in front of any OpenAI- or Anthropic-compatible client and route requests across a configurable pipeline of local and cloud providers — with classifiers, cost/latency/random routing rules, plugins, a live dashboard, and a built-in TUI.

cargo install opensourcellmrouter

What it does

Every request arrives on an OpenAI-compatible (/v1/chat/completions) or Anthropic-compatible (/v1/messages) endpoint and flows through:

request → classifiers → plugins → router → provider → plugins → response classifiers → response
  • Classifiers tag the request (e.g. vision, code, nsfw) based on content.
  • Router rules pick a provider based on tags, cost, latency, throughput, model name prefix, discovered Ollama models, or at random. Rules are evaluated in order; first match wins.
  • Plugins can mutate the request/response or force a specific provider.
  • The chosen provider receives the (possibly rewritten) request and returns a response.
  • Response classifiers tag the response after it comes back (e.g. refusal), surfaced via the X-Router-Response-Tags header (request-side tags get their own X-Router-Request-Tags header) — the OpenAI/Anthropic response body is never modified.
  • Every exchange is logged as JSONL and broadcast live to the dashboard.

Quick start

git clone https://github.com/CarterCole/opensourcellmrouter
cd opensourcellmrouter

# Copy and edit the example config
cp config.example.toml config.toml

# Add API keys for any cloud providers you want
echo 'OPENAI_API_KEY=sk-...' >> .env
echo 'ANTHROPIC_API_KEY=sk-ant-...' >> .env
echo 'CLOUDFLARE_API_TOKEN=cfat_...' >> .env

# Build and run
cargo run -- my-config.toml

Or use the included demo script (starts llama-server if needed, opens the TUI):

./demo.sh

Point any OpenAI-compatible client at http://localhost:8090/v1.


Providers

Each [[providers]] entry is an upstream backend. Three wire formats are supported:

format Speaks Example
openai OpenAI chat completions API OpenAI, llama-server, vLLM, Cloudflare Workers AI
anthropic Anthropic Messages API Anthropic Claude
ollama Ollama native API (/api/chat) Local Ollama instance
# Local llama.cpp server (OpenAI-compatible)
[[providers]]
name                      = "local"
format                    = "openai"
base_url                  = "http://localhost:8080/v1"
cost_per_1m_tokens        = 0.0
quality                   = 60
latency_ms                = 900
throughput_tokens_per_sec = 20

# Ollama (native API, no /v1 suffix)
[[providers]]
name     = "ollama"
format   = "ollama"
base_url = "http://localhost:11434"
quality  = 75

# OpenAI (key read from $OPENAI_API_KEY)
[[providers]]
name               = "openai"
format             = "openai"
base_url           = "https://api.openai.com/v1"
api_key_env        = "OPENAI_API_KEY"
cost_per_1m_tokens = 5.0
quality            = 90

# Anthropic (key read from $ANTHROPIC_API_KEY)
[[providers]]
name               = "anthropic"
format             = "anthropic"
base_url           = "https://api.anthropic.com"
api_key_env        = "ANTHROPIC_API_KEY"
cost_per_1m_tokens = 15.0
quality            = 95

# Cloudflare Workers AI
[[providers]]
name               = "cloudflare"
format             = "openai"
base_url           = "https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/v1"
api_key_env        = "CLOUDFLARE_API_TOKEN"
cost_per_1m_tokens = 0.2
quality            = 80

API keys go in .env (gitignored) and are sourced automatically by demo.sh, or export them in your shell before running. A missing key logs a warning and keeps the provider available — the first request that hits it will fail. Set strict = true on a provider to skip it at startup instead:

[[providers]]
name        = "openai"
strict      = true   # skipped at startup if OPENAI_API_KEY is unset
api_key_env = "OPENAI_API_KEY"

Router rules

Rules live in [[routers]] and are evaluated top-to-bottom; first match wins.

prefix — route by model name prefix

[[routers]]
type          = "prefix"
model_prefix  = "local/"
provider      = "local"
rewrite_model = "llama3.2-3b"

tag — route by classifier tag

[[routers]]
type          = "tag"
tag           = "code"
provider      = "ollama"
rewrite_model = "deepseek-r1:latest"

discover — route to Ollama if it has the model

Queries GET /api/tags at startup and routes requests whose model name appears in the result.

[[routers]]
type     = "discover"
provider = "ollama"

price — pick the cheapest provider

[[routers]]
type                    = "price"
providers               = ["local", "openai"]   # omit for all providers
max_cost_per_1m_tokens  = 5.0                   # optional ceiling

latency — pick the fastest provider

[[routers]]
type           = "latency"
max_latency_ms = 500

throughput — pick the highest-throughput provider

[[routers]]
type               = "throughput"
min_tokens_per_sec = 30

fallback — score-based catch-all

Ranks by quality_bias * quality - (1 - quality_bias) * cost. Good chain terminator.

[[routers]]
type         = "fallback"
quality_bias = 0.7        # 0 = cheapest, 1 = highest quality

random — pick at random

# Random provider from all configured providers:
[[routers]]
type = "random"

# Or pick from explicit (provider, model) pairs:
[[routers]]
type = "random"
candidates = [
  { provider = "local",      model = "llama3.2-3b"              },
  { provider = "ollama",     model = "deepseek-r1:latest"       },
  { provider = "cloudflare", model = "@cf/meta/llama-3.1-8b-instruct" },
  { provider = "openai",     model = "gpt-4o-mini"              },
]

See docs/examples.md for full end-to-end recipes.


Classifiers

Classifiers run on every request before routing and attach tags to it. The only built-in classifier is keyword, which matches words in the prompt:

[classifiers.keyword]
enabled = true

[classifiers.keyword.tags]
vision = ["image", "photo", "screenshot", "diagram"]
code   = ["function", "class", "import", "def ", "fn "]
nsfw   = ["nsfw", "adult", "explicit"]

Tags are available to tag router rules and appear in logs and the dashboard.

There's also a response-side counterpart: [response_classifiers.<id>] tags the response after the provider replies, e.g. to flag a refusal:

[response_classifiers.refusal]
enabled = true

Every response carries six X-Router-* headers without ever touching the OpenAI/Anthropic response body: X-Router-Request-Tags (e.g. vision), X-Router-Response-Tags (e.g. refusal), X-Router-Provider (which provider handled it), X-Router-Model (the model actually sent), and X-Router-Input-Tokens/X-Router-Output-Tokens (token usage). The same data shows up in logs/dashboard. See docs/classifiers.md.


Dashboard and TUI

Browser dashboard

GET /dashboard streams a live feed of every request via SSE. Enable it in config:

[server]
dashboard = true
port      = 8090

The feed emits four event types per request — start, classified, routed, and complete — so you see classifier tags and routing decisions appear in real time, before the response arrives.

Terminal TUI

opensourcellmrouter tui http://localhost:8090

Three-pane UI: live pipeline feed (top), running stats by provider/tag (bottom-left), built-in chat client (bottom-right). Keys: Tab/i to focus chat, ↑↓ to scroll feed, q to quit.

Watch mode

opensourcellmrouter watch http://localhost:8090

Prints each request's classifier tags, routing decision, and response to stdout as they happen.


Logging

[logging]
enabled = true
path    = "logs/requests.jsonl"

Every completed request is appended as a line of JSON including provider, requested model, sent model, tags, plugins, messages, response, and duration.


Plugins

Plugins hook into the pipeline at four stages: on_start, pre_request, post_response, and on_end. Two are built in:

Plugin What it does
response-healing Repairs truncated or malformed JSON in responses
pareto-router Forces requests to a tier (low/medium/high) based on config or per-request override
[plugins.response-healing]
enabled = true

Configuration reference

Section Docs
Providers docs/providers.md
Routers docs/routers.md
Classifiers docs/classifiers.md
Plugins docs/plugins.md
Pipeline overview docs/README.md
Examples docs/examples.md

Building from source

cargo build --release
./target/release/opensourcellmrouter config.toml

Requires Rust 1.85+ (edition 2024).


License

MIT — see LICENSE.