EdgeQuake LLM
A unified Rust library providing LLM and embedding provider abstraction with support for multiple backends, intelligent caching, rate limiting, and cost tracking.
Python users: see
edgequake-litellmโ a drop-in LiteLLM replacement backed by this Rust library.
Features
- ๐ค 12 LLM Providers: OpenAI, Anthropic, Gemini, xAI, Mistral AI, OpenRouter, Ollama, LMStudio, HuggingFace, VSCode Copilot, Azure OpenAI, OpenAI Compatible
- ๐ฆ Response Caching: Reduce costs with intelligent caching (memory + persistent)
- โก Rate Limiting: Built-in API rate limit management with exponential backoff
- ๐ฐ Cost Tracking: Session-level cost monitoring and metrics
- ๐ Retry Logic: Automatic retry with configurable strategies
- ๐ฏ Reranking: BM25, RRF, and hybrid reranking strategies
- ๐ Observability: OpenTelemetry integration for metrics and tracing
- ๐งช Testing: Mock provider for unit tests
- ๐ Python bindings:
edgequake-litellmโ LiteLLM-compatible Python package
Quick Start
Add to your Cargo.toml:
[]
= "0.2"
= { = "1.0", = ["full"] }
Basic Usage
use ;
async
Supported Providers
| Provider | Models | Streaming | Embeddings | Tool Use |
|---|---|---|---|---|
| OpenAI | GPT-4, GPT-5 | โ | โ | โ |
| Azure OpenAI | Azure GPT | โ | โ | โ |
| Anthropic | Claude 3+, 4 | โ | โ | โ |
| Gemini | Gemini 2.0+, 3.0 | โ | โ | โ |
| xAI | Grok 2, 3, 4 | โ | โ | โ |
| Mistral AI | Mistral Small/Large, Codestral | โ | โ | โ |
| OpenRouter | 616+ models | โ | โ | โ |
| Ollama | Local models | โ | โ | โ |
| LMStudio | Local models | โ | โ | โ |
| HuggingFace | Open-source | โ | โ | โ ๏ธ |
| VSCode Copilot | GitHub models | โ | โ | โ |
| OpenAI Compatible | Custom | โ | โ | โ |
Examples
Multi-Provider Abstraction
use ;
async
Response Caching
use ;
let provider = from_env;
let cache_config = CacheConfig ;
let cached = new;
// Subsequent identical requests served from cache
Cost Tracking
use SessionCostTracker;
let tracker = new;
// After each completion
tracker.add_completion;
// Get summary
let summary = tracker.summary;
println!;
Rate Limiting
use ;
let config = RateLimiterConfig ;
let limited = new;
// Automatic rate limiting with exponential backoff
Provider Setup
OpenAI
let provider = new;
// or
let provider = from_env;
Anthropic
let provider = from_env;
Gemini
let provider = from_env;
OpenRouter
let provider = new;
Local Providers
// Ollama (assumes running on localhost:11434)
let provider = new;
// LMStudio (assumes running on localhost:1234)
let provider = new;
Advanced Features
OpenTelemetry Integration
Enable with otel feature:
= { = "0.2", = ["otel"] }
use TracingProvider;
let provider = from_env;
let traced = new;
// Automatic span creation and GenAI semantic conventions
Reranking
use ;
let reranker = new;
let results = reranker.rerank.await?;
Documentation
API Documentation
- Rust Docs - Auto-generated API reference
Guides
- Provider Families - Deep comparison of OpenAI vs Anthropic vs Gemini
- Providers Guide - Setup and configuration for all 11 providers
- Architecture - System design and patterns
- Examples - Runnable code examples
Features
- Caching - Response caching strategies
- Cost Tracking - Token usage and cost monitoring
- Rate Limiting - API rate limit handling
- Reranking - BM25, RRF, and hybrid strategies
- Observability - OpenTelemetry integration
Operations
- Performance Tuning - Latency, throughput, cost optimization
- Security - API keys, input validation, privacy best practices
Reference
- Testing - Testing strategies and mock provider
- Migration Guide - Upgrading between versions
- FAQ - Frequently asked questions and troubleshooting
Python Package: edgequake-litellm
edgequake-litellm is a drop-in replacement for LiteLLM backed by this Rust library. It exposes the same Python API as LiteLLM so existing code can migrate with a one-line import change.
Install
Pre-built wheels ship for:
| Platform | Architectures |
|---|---|
| Linux (glibc) | x86_64, aarch64 |
| Linux (musl / Alpine) | x86_64, aarch64 |
| macOS | x86_64, arm64 (Apple Silicon) |
| Windows | x86_64 |
Drop-in migration
# Before
# After โ one line change
# All calls stay identical
=
Async & streaming
# Async completion
= await
# Streaming
= await
Embeddings
=
= .
Compatibility surface
| LiteLLM feature | edgequake-litellm |
|---|---|
completion() / acompletion() |
โ |
embedding() |
โ |
Streaming (stream=True) |
โ |
response.choices[0].message.content |
โ |
response.to_dict() |
โ |
stream_chunk_builder(chunks) |
โ |
AuthenticationError, RateLimitError, NotFoundError |
โ |
set_verbose, drop_params globals |
โ |
max_completion_tokens, seed, user, timeout params |
โ |
Source
The Python package lives in edgequake-litellm/ and is published to PyPI via the python-publish workflow.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
Licensed under the Apache License, Version 2.0 (LICENSE-APACHE).
Credits
Extracted from the EdgeCode project, a Rust coding agent with OODA loop decision framework.