OpenModex Rust SDK

The official Rust SDK for the OpenModex AI Gateway API. Access 100+ LLM models from OpenAI, Anthropic, Google, DeepSeek, Mistral, and Qwen through a single unified API with intelligent routing, automatic fallbacks, and built-in cost tracking.

Features

Unified API -- One client for all major LLM providers
Smart Routing -- Automatic model selection optimized for cost, latency, or quality
Client-Side Fallbacks -- Automatic retry with backup models on failure
Streaming -- First-class SSE streaming via futures::Stream
Async/Await -- Built on tokio + reqwest for high-performance async I/O
Type Safe -- Strongly typed request/response structs with serde
Builder Pattern -- Ergonomic request construction
Automatic Retries -- Exponential backoff on 429/5xx errors
OpenAI Compatible -- Drop-in replacement by changing base_url

Requirements

Rust 1.70+ (edition 2021)

Installation

Add to your Cargo.toml:

[dependencies]
openmodex = "0.1"
tokio = { version = "1", features = ["full"] }
futures = "0.3"  # Only needed for streaming

Quick Start

use openmodex::{OpenModex, ChatCompletionRequest, ChatMessage};

#[tokio::main]
async fn main() -> Result<(), openmodex::Error> {
    let client = OpenModex::new("omx_sk_...")?;

    let response = client.chat().completions().create(
        ChatCompletionRequest::new("gpt-4o")
            .message(ChatMessage::user("What is OpenModex?"))
    ).await?;

    let content = response.choices[0]
        .message.as_ref()
        .and_then(|m| m.content.as_deref())
        .unwrap_or("");
    println!("{content}");

    Ok(())
}

Or use the OPENMODEX_API_KEY environment variable:

let client = OpenModex::from_env()?;

Usage

Chat Completions

let response = client.chat().completions().create(
    ChatCompletionRequest::new("claude-3-5-sonnet")
        .message(ChatMessage::system("You are a helpful assistant."))
        .message(ChatMessage::user("Explain quantum computing in simple terms."))
        .temperature(0.7)
        .max_tokens(1000)
).await?;

println!("{}", response.choices[0]
    .message.as_ref()
    .and_then(|m| m.content.as_deref())
    .unwrap_or(""));

Streaming

use futures::StreamExt;

let mut stream = client.chat().completions().create_stream(
    ChatCompletionRequest::new("gpt-4o")
        .message(ChatMessage::user("Write a short story."))
        .temperature(0.9)
        .max_tokens(512)
).await?;

while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(content) = chunk.choices.first()
        .and_then(|c| c.delta.content.as_ref())
    {
        print!("{content}");
    }
}

Smart Routing (OpenModex Extension)

Let the gateway pick the best model or optimize for cost/latency:

use openmodex::RoutingConfig;

let response = client.chat().completions().create(
    ChatCompletionRequest::new("gpt-4o")
        .message(ChatMessage::user("Hello!"))
        .routing(RoutingConfig {
            strategy: Some("cost_optimized".into()),
            fallback: Some(vec!["claude-3-5-sonnet".into()]),
            allow_upgrade: Some(true),
        })
).await?;

OpenModex Metadata

Every response includes OpenModex-specific metadata:

let response = client.chat().completions().create(
    ChatCompletionRequest::new("gpt-4o")
        .message(ChatMessage::user("Hello!"))
).await?;

if let Some(meta) = &response.openmodex {
    println!("Provider: {}", meta.provider);
    println!("Model used: {}", meta.model_used);
    println!("Cache hit: {}", meta.cache_hit);
    println!("Routing: {}", meta.routing_strategy);
    println!("Latency: {}ms", meta.latency_ms);
    println!("Request ID: {}", meta.request_id);
}

Cache Control

use openmodex::CacheConfig;

let response = client.chat().completions().create(
    ChatCompletionRequest::new("gpt-4o")
        .message(ChatMessage::user("What is 2+2?"))
        .cache(CacheConfig {
            enabled: Some(true),
            ttl: Some(3600),
        })
).await?;

Client-Side Fallbacks

Automatically retry with backup models on failure:

let client = OpenModex::builder()
    .api_key("omx_sk_...")
    .fallback_models(vec![
        "gpt-4o".into(),
        "claude-3-5-sonnet".into(),
        "gemini-1.5-pro".into(),
    ])
    .build()?;

// If gpt-4o fails (5xx/timeout), automatically tries the next model
let response = client.chat().completions().create(
    ChatCompletionRequest::new("gpt-4o")
        .message(ChatMessage::user("Hello!"))
).await?;

Embeddings

use openmodex::EmbeddingRequest;

let response = client.embeddings().create(
    EmbeddingRequest::new(
        "text-embedding-3-small",
        "The quick brown fox jumps over the lazy dog.",
    )
).await?;

println!("Dimensions: {}", response.data[0].embedding.len());

Models

// List all available models
let models = client.models().list().await?;
for m in &models.data {
    println!("{} ({})", m.id, m.provider);
}

// Get a specific model
let model = client.models().get("openai/gpt-4o").await?;
println!("{}: {}", model.name, model.description);

// Compare models side by side
let comparison = client.models().compare(&["openai/gpt-4o", "anthropic/claude-3-5-sonnet"]).await?;
if let Some(highlights) = &comparison.highlights {
    println!("Cheapest: {}", highlights.cheapest);
    println!("Best quality: {}", highlights.best_quality);
}

Legacy Completions

use openmodex::CompletionRequest;

let response = client.completions().create(
    CompletionRequest::new("gpt-3.5-turbo-instruct", "Once upon a time")
        .max_tokens(100)
).await?;

println!("{}", response.choices[0].text);

Error Handling

use openmodex::{Error, ApiError};

match client.chat().completions().create(req).await {
    Ok(response) => println!("{:?}", response),
    Err(Error::Api(e)) => {
        println!("API error: {} (status: {})", e.message, e.status_code);
        if e.is_rate_limited() {
            println!("Rate limited -- back off and retry");
        }
        if e.is_auth_error() {
            println!("Check your API key");
        }
    }
    Err(e) => println!("Other error: {e}"),
}

Configuration

Method	Description	Default
`.api_key(key)`	Your OpenModex API key	`OPENMODEX_API_KEY` env var
`.base_url(url)`	API base URL	`https://api.openmodex.com/v1`
`.timeout(duration)`	Request timeout	`30s`
`.max_retries(n)`	Max retry attempts on transient errors	`2`
`.default_model(model)`	Default model when none specified	`None`
`.fallback_models(models)`	Ordered fallback model chain	`[]`
`.default_headers(headers)`	Headers sent with every request	`{}`

All options are set via ClientBuilder:

use std::time::Duration;

let client = OpenModex::builder()
    .api_key("omx_sk_...")
    .base_url("https://api.openmodex.com/v1")
    .timeout(Duration::from_secs(60))
    .max_retries(3)
    .default_model("gpt-4o")
    .fallback_models(vec!["claude-3-5-sonnet".into()])
    .build()?;

OpenAI SDK Compatibility

OpenModex Gateway supports drop-in compatibility. If you are already using an OpenAI-compatible Rust crate, you can route through OpenModex by changing just the base URL:

// With the `async-openai` crate
let config = async_openai::config::OpenAIConfig::new()
    .with_api_base("https://api.openmodex.com/v1")
    .with_api_key("omx_sk_live_...");

let client = async_openai::Client::with_config(config);

Examples

See the examples/ directory for runnable examples:

# Set your API key
export OPENMODEX_API_KEY="omx_sk_..."

# Run examples
cargo run --example quickstart
cargo run --example streaming
cargo run --example models
cargo run --example fallback

quickstart -- Basic chat completion
streaming -- SSE streaming
models -- List, retrieve, and compare models
fallback -- Client-side fallback chain

License

MIT

openmodex 0.1.1