OmniLLM
A production-grade Rust library for provider-neutral LLM access with multi-key load balancing, per-key rate limiting, protocol conversion, circuit breaking, and lock-free cost tracking.
Documentation
Repository Docs Site
The documentation site source lives in the GitHub repository:
Features
- Canonical
Responses + Capability Layerhybrid request/response model - Additive multi-endpoint API layer with canonical request/response types for generation, embeddings, images, audio, and rerank
- Protocol-aware dispatch for OpenAI Responses, OpenAI Chat Completions, Claude Messages, and Gemini GenerateContent
- Raw JSON and typed transcoders between supported protocols and endpoint families
- Message-level
raw_messagepreservation for higher-fidelity round trips - Embedded provider support registry for OpenAI, Azure OpenAI, Anthropic, Gemini, Vertex AI, Bedrock, and OpenAI-compatible endpoints
- Replay fixture sanitization helpers for safe record/replay style testing
- Multi-key load balancing with per-key rate limiting and circuit breaking
- Lock-free budget tracking with pre-reserve + settle accounting
- Non-streaming
calland canonical streamingstreamAPIs
Canonical Model
Generation stays centered on the existing Response API semantic model:
LlmRequest/LlmResponseare still the canonical generation types.ApiRequest/ApiResponseadd separate canonical types for embeddings, image generations, audio transcriptions, audio speech, and rerank.ConversionReport<T>makes bridge semantics explicit withbridged,lossy, andloss_reasons.
This keeps generation normalized around "generate one response" while avoiding capability lock-in to any single wire protocol.
Endpoint Families
Current typed endpoint coverage:
| Endpoint | Canonical type | Implemented wire formats |
|---|---|---|
| Generation | LlmRequest / LlmResponse |
open_ai_responses, open_ai_chat_completions, anthropic_messages, gemini_generate_content |
| Embeddings | EmbeddingRequest / EmbeddingResponse |
open_ai_embeddings |
| Image generation | ImageGenerationRequest / ImageGenerationResponse |
open_ai_image_generations |
| Audio transcription | AudioTranscriptionRequest / AudioTranscriptionResponse |
open_ai_audio_transcriptions |
| Audio speech | AudioSpeechRequest / AudioSpeechResponse |
open_ai_audio_speech |
| Rerank | RerankRequest / RerankResponse |
open_ai_rerank |
Provider support is exposed through embedded_provider_registry(). The registry distinguishes:
native: implemented with provider-native wire formatcompatible: OpenAI-compatible or wrapper-style supportplanned: listed in the matrix but not yet implemented as a codec/runtime adapter
Quick Start
use ;
use CancellationToken;
async
Protocol Transcoding
use ;
let raw_chat = r#"{
"model": "gpt-4.1-mini",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 32
}"#;
let raw_responses = transcode_request?;
Typed multi-endpoint transcoding keeps bridge metadata:
use ;
let raw_chat = r#"{
"model": "gpt-4.1-mini",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 32
}"#;
let report = transcode_api_request?;
assert!;
assert!;
println!;
If you bridge from the canonical Responses model to a narrower protocol, loss_reasons will tell you exactly what was dropped, such as unsupported builtin tools or provider-specific metadata.
Multi-Endpoint API
use ;
let request = Embeddings;
let transport = emit_transport_request?;
assert_eq!;
if let Json = transport.value.body
Local demo:
Replay Sanitization
ReplayFixture, sanitize_transport_request, sanitize_transport_response, and sanitize_json_value are intended for record/replay tests. They redact common secrets by default:
- auth headers
- query tokens such as
ak - JSON fields such as
api_key,token,secret - large binary/base64 payload fields
use ;
use json;
let request = TransportRequest ;
let sanitized = sanitize_transport_request;
assert_eq!;
Live Responses Demo
Optional live test:
The live demo and live tests read all endpoint configuration from environment variables or a local ignored .env file. See .env.example.
Gateway Builder
use Duration;
use ;
let gateway = new
.add_key
.budget_limit_usd
.pool_config
.request_timeout
.build
.expect;
Observability
for status in gateway.pool_status
println!;