ModelMux - Vertex AI to OpenAI Proxy (Rust)
| | | | |
\ | | | /
\ | | | /
\| | |/
+----------->
ModelMux is a production-ready, async Rust proxy that acts as a drop-in replacement for the OpenAI API. It translates OpenAI-compatible requests into Google Vertex AI (Anthropic Claude) calls while preserving streaming, tool/function calling, and error semantics. Designed for performance, safety, and clean architecture, ModelMux is ideal for teams standardizing on OpenAI APIs while running on Vertex AI infrastructure.
🎉 New in v0.6.0: Professional Configuration System
ModelMux now features a professional, industry-standard configuration system:
- 🏗️ Multi-layered configuration: CLI args > env vars > user config > system config > defaults
- 📁 Platform-native directories: XDG-compliant paths on Linux, standard locations on macOS/Windows
- 📝 TOML configuration: Human-readable config files instead of complex environment variables
- 🔒 Secure credential storage: File-based service account storage with proper permissions
- ⚙️ CLI management:
modelmux config init,validate,show, andeditcommands - 🔄 Backward compatible: Existing
.envconfigurations continue to work
Quick setup: modelmux config init creates your configuration interactively!
"The internet is like a vast electronic library. But someone has scattered all the books on the floor." — Lao Tzu
What is ModelMux?
ModelMux is a high-performance Rust proxy server that seamlessly converts OpenAI-compatible API requests to Vertex AI (Anthropic Claude) format. Built with Rust Edition 2024 for maximum performance and type safety.
- 🔁 Drop-in OpenAI replacement — zero client changes
- ⚡ High performance — async Rust with Tokio
- 🧠 Full tool/function calling support
- 📡 Streaming (SSE) compatible
- 🛡 Strong typing & clean architecture
- ☁️ Built for Vertex AI (Claude)
Use ModelMux to standardize on the OpenAI API while keeping full control over your AI backend.
Stop rewriting API glue code. Start muxing.
Features
- 🔌 OpenAI-Compatible API: Drop-in replacement for OpenAI API endpoints
- 🛠️ Tool/Function Calling: Full support for OpenAI tool calling format
- 📡 Smart Streaming: Server-Sent Events (SSE) with intelligent client detection
- 🎯 Client Detection: Automatically adjusts behavior for IDEs, browsers, and CLI tools
- ⚡ High Performance: Async Rust with Tokio for maximum concurrency
- 🔒 Type Safety: Leverages Rust's type system for compile-time guarantees
- 🔄 Retry Logic: Configurable retry mechanisms with exponential backoff
- 📊 Observability: Structured logging and health monitoring
- 🧩 Clean Architecture: SOLID principles with modular design
- ⚙️ Professional Config: Multi-layered configuration with CLI management tools
Installation
Homebrew (macOS)
Cargo
From Source
As a Library
Add to your Cargo.toml:
[]
= "0.6"
Quick Start
1. Set up configuration
Use the interactive configuration wizard:
Or create a configuration file manually at ~/.config/modelmux/config.toml:
[]
= 3000
= "info"
= true
= 3
[]
# Path to Google Cloud service account JSON file
= "~/.config/modelmux/service-account.json"
# Or inline JSON for containers:
# service_account_json = '{"type": "service_account", ...}'
[]
= "auto" # auto, never, standard, buffered, always
= 65536
= 5000
Note: Provider configuration (LLM_PROVIDER, VERTEX_* variables) is still handled via environment variables for backward compatibility.
2. Run ModelMux
# or
3. Validate and start
# Validate your configuration
# Start the server
4. Send OpenAI-compatible requests
That's it! Your OpenAI code now talks to Vertex AI.
Configuration
ModelMux uses a modern, professional configuration system with multiple sources:
Configuration File (Recommended)
Create ~/.config/modelmux/config.toml:
# ModelMux Configuration
# Platform-specific locations:
# Linux: ~/.config/modelmux/config.toml
# macOS: ~/Library/Application Support/modelmux/config.toml
# Windows: %APPDATA%/modelmux/config.toml
[]
= 3000
= "info" # trace, debug, info, warn, error
= true
= 3
[]
# Recommended: Use service account file
= "~/.config/modelmux/service-account.json"
# Alternative: Inline JSON (for containers)
# service_account_json = '{"type": "service_account", ...}'
[]
= "auto" # auto, never, standard, buffered, always
= 65536
= 5000
CLI Configuration Commands
# Interactive setup wizard
# Display current configuration
# Validate configuration
# Edit configuration file
Environment Variables (Legacy)
Still supported for backward compatibility:
# Provider configuration (still required)
# Configuration overrides (use MODELMUX_ prefix)
Streaming Modes
ModelMux intelligently adapts its streaming behavior based on the client:
auto(default): Automatically detects client capabilities and chooses the best streaming mode- Forces non-streaming for IDEs (RustRover, IntelliJ, VS Code) and CLI tools (goose, curl)
- Uses buffered streaming for web browsers
- Uses standard streaming for API clients
non-streaming: Forces complete JSON responses for all clientsstandard: Word-by-word streaming as received from Vertex AIbuffered: Accumulates chunks for better client compatibility
Client Detection
ModelMux automatically detects problematic clients:
Non-streaming clients:
- JetBrains IDEs (RustRover, IntelliJ, PyCharm, etc.)
- CLI tools (goose, curl, wget, httpie)
- API testing tools (Postman, Insomnia, Thunder Client)
- Clients that don't accept
text/event-stream
Buffered streaming clients:
- Web browsers (Chrome, Firefox, Safari, Edge)
- VS Code and similar editors
API Endpoints
Chat Completions
POST /v1/chat/completions
OpenAI-compatible chat completions with full tool calling support.
Models
GET /v1/models
List available models in OpenAI format.
Health Check
GET /health
Service health and metrics endpoint.
Library Usage
Use ModelMux programmatically in your Rust applications:
use ;
async
Architecture
OpenAI Client ──► ModelMux ──► Vertex AI (Claude)
│ │ │
│ │ │
OpenAI API ──► Translation ──► Anthropic API
Format Layer Format
Core Components:
config- Configuration management and environment handlingauth- Google Cloud authentication for Vertex AIserver- HTTP server with intelligent routingconverter- Bidirectional format translationerror- Comprehensive error types and handling
Project Structure
modelmux/
├── Cargo.toml # Dependencies and metadata
├── README.md # This file
├── LICENSE-MIT # MIT license
├── LICENSE-APACHE # Apache 2.0 license
├── docs/
└── src/
├── main.rs # Application entry point
├── lib.rs # Library interface
├── config.rs # Configuration management
├── auth.rs # Google Cloud authentication
├── error.rs # Error types
├── server.rs # HTTP server and routes
└── converter/ # Format conversion modules
├── mod.rs
├── openai_to_anthropic.rs
└── anthropic_to_openai.rs
Examples
Tool/Function Calling
Streaming Response
Performance
ModelMux is built for production workloads:
- Zero-copy JSON parsing where possible
- Async/await throughout for maximum concurrency
- Connection pooling for upstream requests
- Intelligent buffering for streaming responses
- Memory efficient request/response handling
Comparison with Node.js Version
| Feature | Node.js | ModelMux (Rust) |
|---|---|---|
| Performance | Good | Excellent |
| Memory Usage | Higher | Lower |
| Type Safety | Runtime | Compile-time |
| Error Handling | Try/catch | Result types |
| Concurrency | Event loop | Async/await |
| Startup Time | Fast | Very Fast |
| Binary Size | Large | Small |
Observability
Health Endpoint
Returns service metrics:
Logging
Configure log levels via environment:
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributing
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Development
Roadmap
See ROADMAP.md for detailed future plans.
✅ Completed in v0.6.0:
- ✅ Professional configuration system with TOML files
- ✅ Configuration validation tools (
modelmux config validate) - ✅ CLI configuration management (
modelmux config init/show/edit) - ✅ Platform-native configuration directories
- ✅ Secure service account file handling
Near term:
- Docker container images
- Enhanced metrics and monitoring (Prometheus, OpenTelemetry)
Future:
- Multiple provider support (OpenAI, Anthropic, Cohere, etc.)
- Intelligent request routing and load balancing
- Request/response caching layer
- Web UI for configuration and monitoring
- Advanced analytics and usage insights
| | | | |
\ | | | /
\ | | | /
\| | |/
+----------->