Nexus
One API endpoint. Any backend. Zero configuration.
Nexus is a distributed LLM model serving orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway.
Features
- π Auto-Discovery: Automatically finds LLM backends on your network via mDNS
- π― Intelligent Routing: Routes requests based on model capabilities and load
- π Transparent Failover: Automatically retries with fallback backends
- π OpenAI-Compatible: Works with any OpenAI API client
- β‘ Zero Config: Just run it - works out of the box with Ollama
- π Structured Logging: Queryable JSON logs for every request with correlation IDs (quickstart)
Supported Backends
| Backend | Status | Notes |
|---|---|---|
| Ollama | β Supported | Auto-discovery via mDNS |
| LM Studio | β Supported | OpenAI-compatible API |
| vLLM | β Supported | Static configuration |
| llama.cpp server | β Supported | Static configuration |
| exo | β Supported | Auto-discovery via mDNS |
| OpenAI | β Supported | Cloud fallback |
| LocalAI | π Planned |
Quick Start
From Source
# Install
# Generate a configuration file
# Run with auto-discovery
# Or with a custom config file
Docker
# Run with default settings
# Run with custom config
# Run with host network (for mDNS discovery)
From GitHub Releases
Download pre-built binaries from Releases.
CLI Commands
# Start the server
# List backends
# Add a backend manually (auto-detects type)
# Remove a backend
# List available models
# Show system health
# Generate config file
# Generate shell completions
Environment Variables
| Variable | Description | Default |
|---|---|---|
NEXUS_CONFIG |
Config file path | nexus.toml |
NEXUS_PORT |
Listen port | 8000 |
NEXUS_HOST |
Listen address | 0.0.0.0 |
NEXUS_LOG_LEVEL |
Log level (trace/debug/info/warn/error) | info |
NEXUS_LOG_FORMAT |
Log format (pretty/json) | pretty |
NEXUS_DISCOVERY |
Enable mDNS discovery | true |
NEXUS_HEALTH_CHECK |
Enable health checking | true |
Precedence: CLI args > Environment variables > Config file > Defaults
API Usage
Once running, Nexus exposes an OpenAI-compatible API:
# Health check
# List available models
# Chat completion (non-streaming)
# Chat completion (streaming)
Web Dashboard
Nexus includes a web dashboard for real-time monitoring and observability. Access it at http://localhost:8000/ in your browser.
Features:
- π Real-time backend health monitoring with status indicators
- πΊοΈ Model availability matrix showing which models are available on which backends
- π Request history with last 100 requests, durations, and error details
- π WebSocket-based live updates (with HTTP polling fallback)
- π± Fully responsive - works on desktop, tablet, and mobile
- π Dark mode support (system preference)
- π Works without JavaScript (graceful degradation with auto-refresh)
The dashboard provides a visual overview of your Nexus cluster, making it easy to monitor backend health, track model availability, and debug request issues in real-time.
With Claude Code / Continue.dev
Point your AI coding assistant to http://localhost:8000 as the API endpoint.
With OpenAI SDK
=
=
Observability
Nexus exposes metrics for monitoring and debugging:
# Prometheus metrics (for Grafana, Prometheus, etc.)
# JSON stats (for dashboards and debugging)
|
Prometheus metrics include request counters, duration histograms, error rates, backend latency, token usage, and fleet state gauges. Configure your Prometheus scraper to target http://<nexus-host>:8000/metrics.
JSON stats provide an at-a-glance view with uptime, per-backend request counts, latency, and pending request depth.
Configuration
# nexus.toml
[]
= "0.0.0.0"
= 8000
[]
= true
[[]]
= "local-ollama"
= "http://localhost:11434"
= "ollama"
= 1
[[]]
= "gpu-server"
= "http://192.168.1.100:8000"
= "vllm"
= 2
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββ
β Nexus Orchestrator β
β - Discovers backends via mDNS β
β - Tracks model capabilities β
β - Routes to best available backend β
β - OpenAI-compatible API β
βββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββ ββββββββββ ββββββββββ
β Ollama β β vLLM β β exo β
β 7B β β 70B β β 32B β
ββββββββββ ββββββββββ ββββββββββ
Development
# Build
# Run tests
# Run with logging
RUST_LOG=debug
# Check formatting
# Lint
License
Apache License 2.0 - see LICENSE for details.