Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
litellm-rs
A high-performance Rust library for unified LLM API access.
litellm-rs provides a simple, consistent interface to interact with multiple AI providers (OpenAI, Anthropic, Google, Azure, and more) through a single, unified API. Built with Rust's performance and safety guarantees, it simplifies multi-provider AI integration in production systems.
use ;
async
Key Features
- Unified API - Single interface for OpenAI, Anthropic, Google, Azure, and 100+ other providers
- High Performance - Built in Rust with async/await for maximum throughput
- Production Ready - Automatic retries, comprehensive error handling, and provider failover
- Flexible Deployment - Use as a Rust library or deploy as a standalone HTTP gateway
- OpenAI Compatible - Works with existing OpenAI client libraries and tools
Installation
Add this to your Cargo.toml:
[]
= "0.1.0"
= { = "1.0", = ["full"] }
= "1.0"
Or build from source:
Usage
As a Library
Basic Example
use ;
async
Using Multiple Providers
use ;
async
Custom Endpoints
use ;
async
As a Gateway Server
Start the server:
# Set your API keys
# Start the proxy server
# Server starts on http://localhost:8000
Make requests:
# OpenAI GPT-4
# Anthropic Claude
Response (OpenAI Format)
Call any model supported by a provider, with model=<model_name>. See Supported Providers for complete list.
Streaming (Docs)
LiteLLM-RS supports streaming the model response back, pass stream=true to get a streaming response.
Streaming is supported for all models (OpenAI, Anthropic, Google, Azure, Groq, etc.)
Supported Providers
- OpenAI - GPT-4, GPT-3.5, DALL-E
- Anthropic - Claude 3 Opus, Sonnet, Haiku
- Google - Gemini Pro, Gemini Flash
- Azure OpenAI - Managed OpenAI deployments
- Groq - High-speed Llama inference
- AWS Bedrock - Claude, Llama, and more
- And 95+ more providers...
Features
- Unified Interface - Single API for 100+ providers
- OpenAI Compatible - Drop-in replacement for OpenAI client
- Streaming Support - Real-time response streaming
- Automatic Retries - Built-in exponential backoff
- Load Balancing - Distribute requests across providers
- Cost Tracking - Monitor spending per request/user
- Function Calling - Tool use across all capable models
- Vision Support - Multimodal inputs for capable models
- Custom Endpoints - Connect to self-hosted models
- Request Caching - Reduce costs with intelligent caching
- Rate Limiting - Protect against quota exhaustion
- Observability - OpenTelemetry tracing and metrics
Configuration
Create a config/gateway.yaml file:
server:
host: "0.0.0.0"
port: 8000
providers:
openai:
api_key: "${OPENAI_API_KEY}"
anthropic:
api_key: "${ANTHROPIC_API_KEY}"
google:
api_key: "${GOOGLE_API_KEY}"
router:
strategy: "round_robin"
max_retries: 3
timeout: 60
See config/gateway.yaml.example for a complete example.
Documentation
Performance
| Metric | Value | Notes |
|---|---|---|
| Throughput | 10,000+ req/s | On 8-core CPU |
| Latency | <10ms | Routing overhead |
| Memory | ~50MB | Base footprint |
| Startup | <100ms | Cold start time |
Deployment
Docker
Kubernetes
Binary
# Download the latest release
Contributing
Contributions are welcome! Please read our Contributing Guide for details.
# Setup
# Test
# Format
# Lint
Roadmap
- Core OpenAI-compatible API
- 15+ provider integrations
- Streaming support
- Automatic retries and failover
- Response caching
- WebSocket support
- Plugin system
- Web dashboard
See GitHub Issues for detailed roadmap.
License
Licensed under the MIT License. See LICENSE for details.
Acknowledgments
Special thanks to the Rust community and all contributors to this project.
Built with Rust for performance and reliability.