edgequake-llm 0.2.6

Multi-provider LLM abstraction library with caching, rate limiting, and cost tracking
Documentation

EdgeQuake LLM

Crates.io Documentation PyPI License CI

A unified Rust library providing LLM and embedding provider abstraction with support for multiple backends, intelligent caching, rate limiting, and cost tracking.

Python users: see edgequake-litellm โ€” a drop-in LiteLLM replacement backed by this Rust library.

Features

  • ๐Ÿค– 12 LLM Providers: OpenAI, Anthropic, Gemini, xAI, Mistral AI, OpenRouter, Ollama, LMStudio, HuggingFace, VSCode Copilot, Azure OpenAI, OpenAI Compatible
  • ๐Ÿ“ฆ Response Caching: Reduce costs with intelligent caching (memory + persistent)
  • โšก Rate Limiting: Built-in API rate limit management with exponential backoff
  • ๐Ÿ’ฐ Cost Tracking: Session-level cost monitoring and metrics
  • ๐Ÿ”„ Retry Logic: Automatic retry with configurable strategies
  • ๐ŸŽฏ Reranking: BM25, RRF, and hybrid reranking strategies
  • ๐Ÿ“Š Observability: OpenTelemetry integration for metrics and tracing
  • ๐Ÿงช Testing: Mock provider for unit tests
  • ๐Ÿ Python bindings: edgequake-litellm โ€” LiteLLM-compatible Python package

Quick Start

Add to your Cargo.toml:

[dependencies]
edgequake-llm = "0.2"
tokio = { version = "1.0", features = ["full"] }

Basic Usage

use edgequake_llm::{OpenAIProvider, LLMProvider, ChatMessage, ChatRole};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize provider
    let provider = OpenAIProvider::new("your-api-key", "gpt-4");

    // Create message
    let messages = vec![
        ChatMessage {
            role: ChatRole::User,
            content: "What is Rust?".to_string(),
            ..Default::default()
        }
    ];

    // Get completion
    let response = provider.complete(&messages, None).await?;
    println!("{}", response.content);

    Ok(())
}

Supported Providers

Provider Models Streaming Embeddings Tool Use
OpenAI GPT-4, GPT-5 โœ… โœ… โœ…
Azure OpenAI Azure GPT โœ… โœ… โœ…
Anthropic Claude 3+, 4 โœ… โŒ โœ…
Gemini Gemini 2.0+, 3.0 โœ… โœ… โœ…
xAI Grok 2, 3, 4 โœ… โŒ โœ…
Mistral AI Mistral Small/Large, Codestral โœ… โœ… โœ…
OpenRouter 616+ models โœ… โŒ โœ…
Ollama Local models โœ… โœ… โœ…
LMStudio Local models โœ… โœ… โœ…
HuggingFace Open-source โœ… โŒ โš ๏ธ
VSCode Copilot GitHub models โœ… โŒ โœ…
OpenAI Compatible Custom โœ… โœ… โœ…

Examples

Multi-Provider Abstraction

use edgequake_llm::{LLMProvider, OpenAIProvider, AnthropicProvider};

async fn try_providers() -> Result<(), Box<dyn std::error::Error>> {
    let providers: Vec<Box<dyn LLMProvider>> = vec![
        Box::new(OpenAIProvider::from_env()),
        Box::new(AnthropicProvider::from_env()),
    ];

    for provider in providers {
        println!("Testing: {}", provider.name());
        // Use provider...
    }

    Ok(())
}

Response Caching

use edgequake_llm::{OpenAIProvider, CachedProvider, CacheConfig};

let provider = OpenAIProvider::from_env();
let cache_config = CacheConfig {
    ttl_seconds: 3600,  // 1 hour
    max_entries: 1000,
};

let cached = CachedProvider::new(provider, cache_config);
// Subsequent identical requests served from cache

Cost Tracking

use edgequake_llm::SessionCostTracker;

let tracker = SessionCostTracker::new();

// After each completion
tracker.add_completion(
    "openai",
    "gpt-4",
    prompt_tokens,
    completion_tokens,
);

// Get summary
let summary = tracker.summary();
println!("Total cost: ${:.4}", summary.total_cost);

Rate Limiting

use edgequake_llm::{RateLimitedProvider, RateLimiterConfig};

let config = RateLimiterConfig {
    max_requests_per_minute: 60,
    max_tokens_per_minute: 100_000,
};

let limited = RateLimitedProvider::new(provider, config);
// Automatic rate limiting with exponential backoff

Provider Setup

OpenAI

export OPENAI_API_KEY=sk-...
let provider = OpenAIProvider::new("your-key", "gpt-4");
// or
let provider = OpenAIProvider::from_env();

Anthropic

export ANTHROPIC_API_KEY=sk-ant-...
let provider = AnthropicProvider::from_env();

Gemini

export GOOGLE_API_KEY=...
let provider = GeminiProvider::from_env();

OpenRouter

export OPENROUTER_API_KEY=sk-or-v1-...
let provider = OpenRouterProvider::new("your-key");

Local Providers

// Ollama (assumes running on localhost:11434)
let provider = OllamaProvider::new("http://localhost:11434");

// LMStudio (assumes running on localhost:1234)
let provider = LMStudioProvider::new("http://localhost:1234");

Advanced Features

OpenTelemetry Integration

Enable with otel feature:

edgequake-llm = { version = "0.2", features = ["otel"] }
use edgequake_llm::TracingProvider;

let provider = OpenAIProvider::from_env();
let traced = TracingProvider::new(provider, "my-service");
// Automatic span creation and GenAI semantic conventions

Reranking

use edgequake_llm::{BM25Reranker, Reranker};

let reranker = BM25Reranker::new();
let results = reranker.rerank(query, documents, top_k).await?;

Documentation

API Documentation

Guides

Features

Operations

Reference

  • Testing - Testing strategies and mock provider
  • Migration Guide - Upgrading between versions
  • FAQ - Frequently asked questions and troubleshooting

Python Package: edgequake-litellm

edgequake-litellm is a drop-in replacement for LiteLLM backed by this Rust library. It exposes the same Python API as LiteLLM so existing code can migrate with a one-line import change.

PyPI Python

Install

pip install edgequake-litellm

Pre-built wheels ship for:

Platform Architectures
Linux (glibc) x86_64, aarch64
Linux (musl / Alpine) x86_64, aarch64
macOS x86_64, arm64 (Apple Silicon)
Windows x86_64

Drop-in migration

# Before
import litellm

# After โ€” one line change
import edgequake_litellm as litellm

# All calls stay identical
response = litellm.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Async & streaming

import asyncio
import edgequake_litellm as litellm

async def main():
    # Async completion
    response = await litellm.acompletion(
        model="anthropic/claude-3-5-sonnet-20241022",
        messages=[{"role": "user", "content": "Explain Rust in one sentence."}],
    )
    print(response.choices[0].message.content)

    # Streaming
    stream = await litellm.acompletion(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Count to 5"}],
        stream=True,
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

Embeddings

import edgequake_litellm as litellm

response = litellm.embedding(
    model="openai/text-embedding-3-small",
    input=["Hello world", "Rust is fast"],
)
vector = response.data[0].embedding
print(f"Embedding dim: {len(vector)}")

Compatibility surface

LiteLLM feature edgequake-litellm
completion() / acompletion() โœ…
embedding() โœ…
Streaming (stream=True) โœ…
response.choices[0].message.content โœ…
response.to_dict() โœ…
stream_chunk_builder(chunks) โœ…
AuthenticationError, RateLimitError, NotFoundError โœ…
set_verbose, drop_params globals โœ…
max_completion_tokens, seed, user, timeout params โœ…

Source

The Python package lives in edgequake-litellm/ and is published to PyPI via the python-publish workflow.


Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

Licensed under the Apache License, Version 2.0 (LICENSE-APACHE).

Credits

Extracted from the EdgeCode project, a Rust coding agent with OODA loop decision framework.