Nexus
One API endpoint. Any backend. Zero configuration.
Nexus is a distributed LLM orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway. Local first, cloud when needed.
Features
- π Auto-Discovery β Finds LLM backends on your network via mDNS
- π― Intelligent Routing β Routes by model capabilities, load, and latency
- π Transparent Failover β Retries with fallback backends automatically
- π OpenAI-Compatible β Works with any OpenAI API client
- β‘ Zero Config β Just run it β works out of the box with Ollama
- π Privacy Zones β Structural enforcement prevents data from reaching cloud backends
- π° Budget Management β Token-aware cost tracking with automatic spend limits
- π Real-time Dashboard β Monitor backends, models, and requests in your browser
- π§ Quality Tracking β Profiles backend response quality to inform routing decisions
- π Embeddings API β OpenAI-compatible
/v1/embeddingswith capability-aware routing - π Request Queuing β Holds requests when backends are busy, with priority support
Supported Backends
| Backend | Status | Discovery |
|---|---|---|
| Ollama | β Supported | mDNS (auto) |
| LM Studio | β Supported | Static config |
| vLLM | β Supported | Static config |
| llama.cpp | β Supported | Static config |
| exo | β Supported | mDNS (auto) |
| OpenAI | β Supported | Static config |
Quick Start
# Install from source
# Start with auto-discovery (zero config)
# Or with Docker
Once running, send your first request:
Point any OpenAI-compatible client to http://localhost:8000/v1 β Claude Code, Continue.dev, OpenAI SDK, or plain curl.
β Full setup guide β installation, configuration, CLI reference, and more.
Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Nexus Orchestrator β
β - Discovers backends via mDNS β
β - Tracks model capabilities & quality β
β - Routes to best available backend β
β - Queues requests when backends are busy β
β - OpenAI-compatible API + Embeddings β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β β
βΌ βΌ βΌ βΌ
ββββββββββ ββββββββββ ββββββββββ ββββββββββ
β Ollama β β vLLM β β exo β β OpenAI β
β 7B β β 70B β β 32B β β cloud β
ββββββββββ ββββββββββ ββββββββββ ββββββββββ
Documentation
| Document | What you'll find | |
|---|---|---|
| π | Getting Started | Installation, configuration, CLI, environment variables |
| π | REST API | HTTP endpoints, X-Nexus-* headers, error responses |
| π | WebSocket API | Real-time dashboard protocol |
| ποΈ | Architecture | System design, module structure, data flows |
| πΊοΈ | Roadmap | Feature index (F01βF23), version history, future plans |
| π§ | Troubleshooting | Common errors, debugging tips |
| β | FAQ | What Nexus is (and isn't), common questions |
| π€ | Contributing | Dev workflow, coding standards, PR guidelines |
| π | Changelog | Release history |
| π | Security | Vulnerability reporting |
License
Apache License 2.0 β see LICENSE for details.