nexus-orchestrator 0.4.0

Distributed LLM model serving orchestrator - unified API gateway for heterogeneous inference backends
Documentation

Nexus

Rust License GitHub Release Docker Crates.io docs.rs codecov CI

One API endpoint. Any backend. Zero configuration.

Nexus is a distributed LLM orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway. Local first, cloud when needed.

Features

  • πŸ” Auto-Discovery β€” Finds LLM backends on your network via mDNS
  • 🎯 Intelligent Routing β€” Routes by model capabilities, load, and latency
  • πŸ”„ Transparent Failover β€” Retries with fallback backends automatically
  • πŸ”Œ OpenAI-Compatible β€” Works with any OpenAI API client
  • ⚑ Zero Config β€” Just run it β€” works out of the box with Ollama
  • πŸ”’ Privacy Zones β€” Structural enforcement prevents data from reaching cloud backends
  • πŸ’° Budget Management β€” Token-aware cost tracking with automatic spend limits
  • πŸ“Š Real-time Dashboard β€” Monitor backends, models, and requests in your browser
  • 🧠 Quality Tracking β€” Profiles backend response quality to inform routing decisions
  • πŸ“ Embeddings API β€” OpenAI-compatible /v1/embeddings with capability-aware routing
  • πŸ“‹ Request Queuing β€” Holds requests when backends are busy, with priority support

Supported Backends

Backend Status Discovery
Ollama βœ… Supported mDNS (auto)
LM Studio βœ… Supported Static config
vLLM βœ… Supported Static config
llama.cpp βœ… Supported Static config
exo βœ… Supported mDNS (auto)
OpenAI βœ… Supported Static config

Quick Start

# Install from source
cargo install --path .

# Start with auto-discovery (zero config)
nexus serve

# Or with Docker
docker run -d -p 8000:8000 leocamello/nexus

Once running, send your first request:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3:70b", "messages": [{"role": "user", "content": "Hello!"}]}'

Point any OpenAI-compatible client to http://localhost:8000/v1 β€” Claude Code, Continue.dev, OpenAI SDK, or plain curl.

β†’ Full setup guide β€” installation, configuration, CLI reference, and more.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Nexus Orchestrator                   β”‚
β”‚  - Discovers backends via mDNS                   β”‚
β”‚  - Tracks model capabilities & quality           β”‚
β”‚  - Routes to best available backend              β”‚
β”‚  - Queues requests when backends are busy        β”‚
β”‚  - OpenAI-compatible API + Embeddings            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚           β”‚           β”‚           β”‚
        β–Ό           β–Ό           β–Ό           β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ Ollama β”‚  β”‚  vLLM  β”‚  β”‚  exo   β”‚  β”‚ OpenAI β”‚
   β”‚  7B    β”‚  β”‚  70B   β”‚  β”‚  32B   β”‚  β”‚ cloud  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Documentation

Document What you'll find
πŸš€ Getting Started Installation, configuration, CLI, environment variables
πŸ“– REST API HTTP endpoints, X-Nexus-* headers, error responses
πŸ”Œ WebSocket API Real-time dashboard protocol
πŸ—οΈ Architecture System design, module structure, data flows
πŸ—ΊοΈ Roadmap Feature index (F01–F23), version history, future plans
πŸ”§ Troubleshooting Common errors, debugging tips
❓ FAQ What Nexus is (and isn't), common questions
🀝 Contributing Dev workflow, coding standards, PR guidelines
πŸ“‹ Changelog Release history
πŸ”’ Security Vulnerability reporting

License

Apache License 2.0 β€” see LICENSE for details.

Related Projects

  • exo β€” Distributed AI inference
  • LM Studio β€” Desktop app for local LLMs
  • Ollama β€” Easy local LLM serving
  • vLLM β€” High-throughput LLM serving
  • LiteLLM β€” Cloud LLM API router