ModelMux - Vertex AI to OpenAI Proxy (Rust)
| | | | |
\ | | | /
\ | | | /
\| | |/
+----------->
ModelMux is a production-ready, async Rust proxy that acts as a drop-in replacement for the OpenAI API. It translates OpenAI-compatible requests into Google Vertex AI (Anthropic Claude) calls while preserving streaming, tool/function calling, and error semantics. Designed for performance, safety, and clean architecture, ModelMux is ideal for teams standardizing on OpenAI APIs while running on Vertex AI infrastructure.
🎉 New in v1.0.0: Production Ready
ModelMux v1.0.0 adds service management and Linux packaging:
- 🍺 Brew services:
brew services start modelmux— run as a background service (macOS) - 🐧 systemd daemon: Linux system and user service units — see
packaging/systemd/ - 📦 .deb packages: Install on Ubuntu/Debian with
dpkg -i modelmux_*.deb - 🏗️ Multi-layered configuration: CLI args > env vars > user config > system config > defaults
- 📝 TOML configuration: Human-readable config files;
modelmux config initfor quick setup
Quick setup: modelmux config init creates your configuration interactively!
"The internet is like a vast electronic library. But someone has scattered all the books on the floor." — Lao Tzu
What is ModelMux?
ModelMux is a high-performance Rust proxy server that seamlessly converts OpenAI-compatible API requests to Vertex AI (Anthropic Claude) format. Built with Rust Edition 2024 for maximum performance and type safety.
- 🔁 Drop-in OpenAI replacement — zero client changes
- ⚡ High performance — async Rust with Tokio
- 🧠 Full tool/function calling support
- 📡 Streaming (SSE) compatible
- 🛡 Strong typing & clean architecture
- ☁️ Built for Vertex AI (Claude)
Use ModelMux to standardize on the OpenAI API while keeping full control over your AI backend.
Stop rewriting API glue code. Start muxing.
Features
- 🔌 OpenAI-Compatible API: Drop-in replacement for OpenAI API endpoints
- 🛠️ Tool/Function Calling: Full support for OpenAI tool calling format
- 📡 Smart Streaming: Server-Sent Events (SSE) with intelligent client detection
- 🎯 Client Detection: Automatically adjusts behavior for IDEs, browsers, and CLI tools
- ⚡ High Performance: Async Rust with Tokio for maximum concurrency
- 🔒 Type Safety: Leverages Rust's type system for compile-time guarantees
- 🔄 Retry Logic: Configurable retry mechanisms with exponential backoff
- 📊 Observability: Structured logging and health monitoring
- 🧩 Clean Architecture: SOLID principles with modular design
- ⚙️ Professional Config: Multi-layered configuration with CLI management tools
Installation
Homebrew (macOS)
Cargo
From Source
As a Library
Add to your Cargo.toml:
[]
= "1.0"
Quick Start
1. Set up configuration
Use the interactive configuration wizard:
Or create a configuration file manually. On macOS: ~/Library/Application Support/com.SkyCorp.modelmux/config.toml (or ~/.config/modelmux/config.toml on Linux):
[]
= 3000
= "info"
= true
= 3
[]
# Path to Google Cloud service account JSON file
= "~/Library/Application Support/com.SkyCorp.modelmux/service-account.json"
# Or inline JSON for containers:
# service_account_json = '{"type": "service_account", ...}'
[]
# Vertex AI provider - set these OR use env vars (.env supported)
= "{your-project}"
= "{your-region}"
= "{your-region}"
= "anthropic"
= "{your-model}"
[]
= "auto" # auto, never, standard, buffered, always
= 65536
= 5000
Note: You can also use a .env file or environment variables (VERTEX_PROJECT, VERTEX_REGION, etc.) for provider config.
2. Run ModelMux
# or
Homebrew (macOS): Run as a background service with brew services start modelmux (start/stop/restart like PostgreSQL or Redis).
Linux (systemd): Run as a daemon with systemd — see packaging/systemd/README.md.
3. Validate and start
# Validate your configuration
# Start the server
4. Send OpenAI-compatible requests
That's it! Your OpenAI code now talks to Vertex AI.
Configuration
ModelMux uses a modern, professional configuration system with multiple sources:
Configuration File (Recommended)
Create ~/.config/modelmux/config.toml:
# ModelMux Configuration
# Platform-specific locations:
# Linux: ~/.config/modelmux/config.toml
# macOS: ~/Library/Application Support/modelmux/config.toml
# Windows: %APPDATA%/modelmux/config.toml
[]
= 3000
= "info" # trace, debug, info, warn, error
= true
= 3
[]
# Recommended: Use service account file
= "~/.config/modelmux/service-account.json"
# Alternative: Inline JSON (for containers)
# service_account_json = '{"type": "service_account", ...}'
[]
# Vertex AI provider (config file OR env vars / .env)
= "{your-project}"
= "{your-region}"
= "{your-region}"
= "{your publisher}}"
= "{your-model}"
[]
= "auto" # auto, never, standard, buffered, always
= 65536
= 5000
CLI Configuration Commands
# Interactive setup wizard
# Display current configuration
# Validate configuration
# Edit configuration file
Environment Variables and .env
Supported for backward compatibility. Place a .env file in your project directory or current working directory:
# Provider configuration
LLM_PROVIDER=vertex
VERTEX_PROJECT=my-gcp-project
VERTEX_REGION=europe-west1
VERTEX_LOCATION=europe-west1
VERTEX_PUBLISHER=anthropic
VERTEX_MODEL_ID=claude-3-5-sonnet@20241022
# Configuration overrides (use MODELMUX_ prefix)
MODELMUX_SERVER_PORT=3000
MODELMUX_SERVER_LOG_LEVEL=info
MODELMUX_AUTH_SERVICE_ACCOUNT_FILE=/path/to/key.json
The .env file is loaded automatically when modelmux starts (from the current working directory).
Streaming Modes
ModelMux intelligently adapts its streaming behavior based on the client:
auto(default): Automatically detects client capabilities and chooses the best streaming mode- Forces non-streaming for IDEs (RustRover, IntelliJ, VS Code) and CLI tools (goose, curl)
- Uses buffered streaming for web browsers
- Uses standard streaming for API clients
non-streaming: Forces complete JSON responses for all clientsstandard: Word-by-word streaming as received from Vertex AIbuffered: Accumulates chunks for better client compatibility
Client Detection
ModelMux automatically detects problematic clients:
Non-streaming clients:
- JetBrains IDEs (RustRover, IntelliJ, PyCharm, etc.)
- CLI tools (goose, curl, wget, httpie)
- API testing tools (Postman, Insomnia, Thunder Client)
- Clients that don't accept
text/event-stream
Buffered streaming clients:
- Web browsers (Chrome, Firefox, Safari, Edge)
- VS Code and similar editors
API Endpoints
Chat Completions
POST /v1/chat/completions
OpenAI-compatible chat completions with full tool calling support.
Models
GET /v1/models
List available models in OpenAI format.
Health Check
GET /health
Service health and metrics endpoint.
Library Usage
Use ModelMux programmatically in your Rust applications:
use ;
async
Architecture
OpenAI Client ──► ModelMux ──► Vertex AI (Claude)
│ │ │
│ │ │
OpenAI API ──► Translation ──► Anthropic API
Format Layer Format
Core Components:
config- Configuration management and environment handlingauth- Google Cloud authentication for Vertex AIserver- HTTP server with intelligent routingconverter- Bidirectional format translationerror- Comprehensive error types and handling
Project Structure
modelmux/
├── Cargo.toml # Dependencies and metadata
├── README.md # This file
├── LICENSE-MIT # MIT license
├── LICENSE-APACHE # Apache 2.0 license
├── docs/
└── src/
├── main.rs # Application entry point
├── lib.rs # Library interface
├── config.rs # Configuration management
├── auth.rs # Google Cloud authentication
├── error.rs # Error types
├── server.rs # HTTP server and routes
└── converter/ # Format conversion modules
├── mod.rs
├── openai_to_anthropic.rs
└── anthropic_to_openai.rs
Examples
Tool/Function Calling
Streaming Response
Performance
ModelMux is built for production workloads:
- Zero-copy JSON parsing where possible
- Async/await throughout for maximum concurrency
- Connection pooling for upstream requests
- Intelligent buffering for streaming responses
- Memory efficient request/response handling
Comparison with Node.js Version
| Feature | Node.js | ModelMux (Rust) |
|---|---|---|
| Performance | Good | Excellent |
| Memory Usage | Higher | Lower |
| Type Safety | Runtime | Compile-time |
| Error Handling | Try/catch | Result types |
| Concurrency | Event loop | Async/await |
| Startup Time | Fast | Very Fast |
| Binary Size | Large | Small |
Observability
Health Endpoint
Returns service metrics:
Logging
Configure log levels via environment:
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributing
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Development
Roadmap
See ROADMAP.md for detailed future plans.
✅ Completed in v1.0.0:
- ✅ Brew services and systemd daemon support
- ✅ .deb packages for Ubuntu/Debian (amd64, arm64)
- ✅ Professional configuration system with TOML files
- ✅ CLI configuration management (
modelmux config init/show/edit)
Near term:
- Docker container images
- Enhanced metrics and monitoring (Prometheus, OpenTelemetry)
Future:
- Multiple provider support (OpenAI, Anthropic, Cohere, etc.)
- Intelligent request routing and load balancing
- Request/response caching layer
- Web UI for configuration and monitoring
- Advanced analytics and usage insights
| | | | |
\ | | | /
\ | | | /
\| | |/
+----------->