litellm-rs
A high-performance Rust library and gateway for calling 100+ LLM APIs in an OpenAI-compatible format.
Features
- 100+ AI Providers - OpenAI, Anthropic, Google, Azure, AWS Bedrock, and more
- OpenAI-Compatible API - Drop-in replacement for OpenAI SDK
- High Performance - 10,000+ requests/second, <10ms routing overhead
- Intelligent Routing - Load balancing, failover, cost optimization
- Enterprise Ready - Auth, rate limiting, caching, observability
Quick Start (5 Minutes, API-Only Recommended)
Most users use this project as a unified API library, not as a gateway server. Start with API-only mode first.
[]
= { = "0.4", = false, = ["lite"] }
For crate users, no make is required.
Usage
As a Library (API Integration)
use ;
async
As a Gateway Server
Run from source repository
Install binary and run
Notes:
gatewayandgoogle-gatewaybinaries requirestoragefeature at build time.- Default features include
sqlite, so defaultcargo run/cargo installsatisfy this requirement.
Installation
# Full gateway with SQLite + Redis (default)
[]
= "0.4"
# API-only - lightweight, no actix-web/argon2/aes-gcm/clap
[]
= { = "0.4", = false }
# API-only with metrics
[]
= { = "0.4", = false, = ["lite"] }
# Gateway modules in library context (not standalone gateway binary runtime)
[]
= { = "0.4", = false, = ["gateway"] }
Supported Providers
| Provider | Chat | Embeddings | Images | Audio |
|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ✅ | ✅ |
| Anthropic | ✅ | - | - | - |
| Google (Gemini) | ✅ | ✅ | ✅ | - |
| Azure OpenAI | ✅ | ✅ | ✅ | ✅ |
| AWS Bedrock | ✅ | ✅ | - | - |
| Google Vertex AI | ✅ | ✅ | ✅ | - |
| Groq | ✅ | - | - | ✅ |
| DeepSeek | ✅ | - | - | - |
| Kimi (Moonshot AI) | ✅ | - | - | - |
| GLM (Zhipu AI) | ✅ | - | - | - |
| MiniMax | ✅ | - | - | - |
| Mistral | ✅ | ✅ | - | - |
| Cohere | ✅ | ✅ | - | - |
| OpenRouter | ✅ | - | - | - |
| Together AI | ✅ | ✅ | - | - |
| Fireworks AI | ✅ | ✅ | - | - |
| Perplexity | ✅ | - | - | - |
| Replicate | ✅ | - | ✅ | - |
| Hugging Face | ✅ | ✅ | - | - |
| Ollama | ✅ | ✅ | - | - |
| And 80+ more... |
Environment Variables
# Provider API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
AZURE_OPENAI_API_KEY=...
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
GROQ_API_KEY=...
DEEPSEEK_API_KEY=...
MOONSHOT_API_KEY=...
ZHIPU_API_KEY=...
MINIMAX_API_KEY=...
# Optional
LITELLM_VERBOSE=true # Enable verbose logging
Examples
Multi-Provider Routing
use ;
// Automatically routes to the right provider based on model name
let openai = completion.await?;
let anthropic = completion.await?;
let google = completion.await?;
let bedrock = completion
.await?;
Embeddings
use ;
// Single text
let embedding = embed_text.await?;
// Batch
let embeddings = embedding.await?;
Streaming
use ;
use StreamExt;
let mut stream = completion_stream.await?;
while let Some = stream.next.await
Performance
- Throughput: 10,000+ requests/second
- Latency: <10ms routing overhead
- Memory: ~50MB base footprint
- Concurrency: Fully async with Tokio
Troubleshooting
Build/test uses too much CPU or memory
- Use API-only defaults first:
cargo test --lib --tests --no-default-features --features "lite" - Limit local parallelism when needed:
CARGO_BUILD_JOBS=4 cargo test --lib --tests --no-default-features --features "lite" -- --test-threads=4 - Avoid
--all-featuresunless you are doing release/nightly validation
I only need provider API aggregation, not gateway
- Prefer
default-features = falsewithfeatures = ["lite"] - Use gateway runtime commands only when you need HTTP server/auth/storage middleware
Documentation
Contributing
See CONTRIBUTING.md for development setup and guidelines.
Security
See SECURITY.md for security policy and vulnerability reporting.
License
MIT License - see LICENSE for details.
Acknowledgments
Inspired by LiteLLM (Python).