Nexus

One API endpoint. Any backend. Zero configuration.

Nexus is a distributed LLM orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway. Local first, cloud when needed.

Features

🔍 Auto-Discovery — Finds LLM backends on your network via mDNS
🎯 Intelligent Routing — Routes by model capabilities, load, and latency
🔄 Transparent Failover — Retries with fallback backends automatically
🔌 OpenAI-Compatible — Works with any OpenAI API client
⚡ Zero Config — Just run it — works out of the box with Ollama
🔒 Privacy Zones — Structural enforcement prevents data from reaching cloud backends
💰 Budget Management — Token-aware cost tracking with automatic spend limits
📊 Real-time Dashboard — Monitor backends, models, and requests in your browser
🧠 Quality Tracking — Profiles backend response quality to inform routing decisions
📐 Embeddings API — OpenAI-compatible /v1/embeddings with capability-aware routing
📋 Request Queuing — Holds requests when backends are busy, with priority support

Supported Backends

Backend	Status	Discovery
Ollama	✅ Supported	mDNS (auto)
LM Studio	✅ Supported	Static config
vLLM	✅ Supported	Static config
llama.cpp	✅ Supported	Static config
exo	✅ Supported	mDNS (auto)
OpenAI	✅ Supported	Static config

Quick Start

# Install from source
cargo install --path .

# Start with auto-discovery (zero config)
nexus serve

# Or with Docker
docker run -d -p 8000:8000 leocamello/nexus

Once running, send your first request:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3:70b", "messages": [{"role": "user", "content": "Hello!"}]}'

Point any OpenAI-compatible client to http://localhost:8000/v1 — Claude Code, Continue.dev, OpenAI SDK, or plain curl.

→ Full setup guide — installation, configuration, CLI reference, and more.

Architecture

┌──────────────────────────────────────────────────┐
│              Nexus Orchestrator                   │
│  - Discovers backends via mDNS                   │
│  - Tracks model capabilities & quality           │
│  - Routes to best available backend              │
│  - Queues requests when backends are busy        │
│  - OpenAI-compatible API + Embeddings            │
└──────────────────────────────────────────────────┘
        │           │           │           │
        ▼           ▼           ▼           ▼
   ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
   │ Ollama │  │  vLLM  │  │  exo   │  │ OpenAI │
   │  7B    │  │  70B   │  │  32B   │  │ cloud  │
   └────────┘  └────────┘  └────────┘  └────────┘

Documentation

	Document	What you'll find
🚀	Getting Started	Installation, configuration, CLI, environment variables
📖	REST API	HTTP endpoints, X-Nexus-* headers, error responses
🔌	WebSocket API	Real-time dashboard protocol
🏗️	Architecture	System design, module structure, data flows
🗺️	Roadmap	Feature index (F01–F23), version history, future plans
🔧	Troubleshooting	Common errors, debugging tips
❓	FAQ	What Nexus is (and isn't), common questions
🤝	Contributing	Dev workflow, coding standards, PR guidelines
📋	Changelog	Release history
🔒	Security	Vulnerability reporting

License

Apache License 2.0 — see LICENSE for details.

Related Projects

exo — Distributed AI inference
LM Studio — Desktop app for local LLMs
Ollama — Easy local LLM serving
vLLM — High-throughput LLM serving
LiteLLM — Cloud LLM API router

nexus-orchestrator 0.4.0