ModelMux - Vertex AI to OpenAI Proxy (Rust)
| | | | |
\ | | | /
\ | | | /
\| | |/
+----------->
ModelMux is a production-ready, async Rust proxy that acts as a drop-in replacement for the OpenAI API. It translates OpenAI-compatible requests into Google Vertex AI (Anthropic Claude) calls while preserving streaming, tool/function calling, and error semantics. Designed for performance, safety, and clean architecture, ModelMux is ideal for teams standardizing on OpenAI APIs while running on Vertex AI infrastructure.
"The internet is like a vast electronic library. But someone has scattered all the books on the floor." — Lao Tzu
What is ModelMux?
ModelMux is a high-performance Rust proxy server that seamlessly converts OpenAI-compatible API requests to Vertex AI (Anthropic Claude) format. Built with Rust Edition 2024 for maximum performance and type safety.
- 🔁 Drop-in OpenAI replacement — zero client changes
- ⚡ High performance — async Rust with Tokio
- 🧠 Full tool/function calling support
- 📡 Streaming (SSE) compatible
- 🛡 Strong typing & clean architecture
- ☁️ Built for Vertex AI (Claude)
Use ModelMux to standardize on the OpenAI API while keeping full control over your AI backend.
Stop rewriting API glue code. Start muxing.
Features
- 🔌 OpenAI-Compatible API: Drop-in replacement for OpenAI API endpoints
- 🛠️ Tool/Function Calling: Full support for OpenAI tool calling format
- 📡 Smart Streaming: Server-Sent Events (SSE) with intelligent client detection
- 🎯 Client Detection: Automatically adjusts behavior for IDEs, browsers, and CLI tools
- ⚡ High Performance: Async Rust with Tokio for maximum concurrency
- 🔒 Type Safety: Leverages Rust's type system for compile-time guarantees
- 🔄 Retry Logic: Configurable retry mechanisms with exponential backoff
- 📊 Observability: Structured logging and health monitoring
- 🧩 Clean Architecture: SOLID principles with modular design
Installation
Cargo
From Source
As a Library
Add to your Cargo.toml:
[]
= "0.2"
Quick Start
1. Set up your environment
Create a .env file:
# Required: Base64-encoded Google Cloud service account key
GCP_SERVICE_ACCOUNT_KEY="your-base64-encoded-key-here"
# Required: Vertex AI configuration
LLM_URL="https://europe-west1-aiplatform.googleapis.com/v1/projects/<your_project>/locations/<your_location>/publishers/"
LLM_CHAT_ENDPOINT="<your_model>:streamRawPredict"
LLM_MODEL="claude-sonnet-4"
# Optional: Server configuration
PORT=3000
LOG_LEVEL=info # trace, debug, info, warn, error
# Optional: Streaming configuration
STREAMING_MODE=auto # auto, non-streaming, standard, buffered
# Optional: Retry configuration
ENABLE_RETRIES=true
MAX_RETRY_ATTEMPTS=3
2. Run ModelMux
# or
3. Send OpenAI-compatible requests
That's it! Your OpenAI code now talks to Vertex AI.
Configuration
Environment Variables
# Required Configuration
# Optional Configuration
# trace, debug, info, warn, error
# auto, non-streaming, standard, buffered
Streaming Modes
ModelMux intelligently adapts its streaming behavior based on the client:
auto(default): Automatically detects client capabilities and chooses the best streaming mode- Forces non-streaming for IDEs (RustRover, IntelliJ, VS Code) and CLI tools (goose, curl)
- Uses buffered streaming for web browsers
- Uses standard streaming for API clients
non-streaming: Forces complete JSON responses for all clientsstandard: Word-by-word streaming as received from Vertex AIbuffered: Accumulates chunks for better client compatibility
Client Detection
ModelMux automatically detects problematic clients:
Non-streaming clients:
- JetBrains IDEs (RustRover, IntelliJ, PyCharm, etc.)
- CLI tools (goose, curl, wget, httpie)
- API testing tools (Postman, Insomnia, Thunder Client)
- Clients that don't accept
text/event-stream
Buffered streaming clients:
- Web browsers (Chrome, Firefox, Safari, Edge)
- VS Code and similar editors
API Endpoints
Chat Completions
POST /v1/chat/completions
OpenAI-compatible chat completions with full tool calling support.
Models
GET /v1/models
List available models in OpenAI format.
Health Check
GET /health
Service health and metrics endpoint.
Library Usage
Use ModelMux programmatically in your Rust applications:
use ;
async
Architecture
OpenAI Client ──► ModelMux ──► Vertex AI (Claude)
│ │ │
│ │ │
OpenAI API ──► Translation ──► Anthropic API
Format Layer Format
Core Components:
config- Configuration management and environment handlingauth- Google Cloud authentication for Vertex AIserver- HTTP server with intelligent routingconverter- Bidirectional format translationerror- Comprehensive error types and handling
Project Structure
modelmux/
├── Cargo.toml # Dependencies and metadata
├── README.md # This file
├── LICENSE-MIT # MIT license
├── LICENSE-APACHE # Apache 2.0 license
├── docs/
└── src/
├── main.rs # Application entry point
├── lib.rs # Library interface
├── config.rs # Configuration management
├── auth.rs # Google Cloud authentication
├── error.rs # Error types
├── server.rs # HTTP server and routes
└── converter/ # Format conversion modules
├── mod.rs
├── openai_to_anthropic.rs
└── anthropic_to_openai.rs
Examples
Tool/Function Calling
Streaming Response
Performance
ModelMux is built for production workloads:
- Zero-copy JSON parsing where possible
- Async/await throughout for maximum concurrency
- Connection pooling for upstream requests
- Intelligent buffering for streaming responses
- Memory efficient request/response handling
Comparison with Node.js Version
| Feature | Node.js | ModelMux (Rust) |
|---|---|---|
| Performance | Good | Excellent |
| Memory Usage | Higher | Lower |
| Type Safety | Runtime | Compile-time |
| Error Handling | Try/catch | Result types |
| Concurrency | Event loop | Async/await |
| Startup Time | Fast | Very Fast |
| Binary Size | Large | Small |
Observability
Health Endpoint
Returns service metrics:
Logging
Configure log levels via environment:
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributing
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Development
Roadmap
See ROADMAP.md for detailed future plans.
Near term:
- Docker container images
- Configuration validation tools
- Enhanced metrics and monitoring
Future:
- Multiple provider support (OpenAI, Anthropic, Cohere, etc.)
- Intelligent request routing and load balancing
- Request/response caching layer
- Web UI for configuration and monitoring
- Advanced analytics and usage insights
| | | | |
\ | | | /
\ | | | /
\| | |/
+----------->