Expand description
EdgeQuake LLM - LLM and Embedding Provider Abstraction
§Implements
- FEAT0017: Multi-Provider LLM Support
- FEAT0018: Embedding Provider Abstraction
- FEAT0019: LLM Response Caching
- FEAT0020: API Rate Limiting
- FEAT0005: Embedding Generation (via providers)
§Enforces
- BR0301: LLM API rate limits (configurable per provider)
- BR0302: Document size limits (context window awareness)
- BR0303: Cost tracking per request
- BR0010: Embedding dimension validated (1536 default)
This crate provides traits and implementations for:
- Text completion (LLM providers)
- Text embedding (embedding providers)
- Token counting and management
- Rate limiting for API calls
- Response caching for cost reduction
§Providers
| Provider | FEAT0017 | Chat | Embeddings | Notes |
|---|---|---|---|---|
| OpenAI | ✓ | ✓ | ✓ | Primary production provider |
| Azure OpenAI | ✓ | ✓ | ✓ | Enterprise deployments |
| Ollama | ✓ | ✓ | ✓ | Local/on-prem models |
| LM Studio | ✓ | ✓ | ✓ | Local OpenAI-compatible API |
| Gemini | ✓ | ✓ | ✓ | Google AI |
| Mock | ✓ | ✓ | ✓ | Testing (no API calls) |
§Architecture
The crate uses trait-based abstraction to support multiple LLM backends:
- OpenAI (GPT-4, GPT-3.5)
- OpenAI-compatible APIs (Ollama, LM Studio, etc.)
- Anthropic (Claude 3.5, Claude 3)
- Future: Mistral, local models
§Example
ⓘ
use edgequake_llm::{LLMProvider, OpenAIProvider};
let provider = OpenAIProvider::new("your-api-key");
let response = provider.complete("Hello, world!").await?;§See Also
crate::traitsfor provider trait definitionscrate::providersfor concrete implementationscrate::cachefor response caching
Re-exports§
pub use cache::CacheConfig;pub use cache::CacheStats;pub use cache::CachedProvider;pub use cache::LLMCache;pub use cache_prompt::apply_cache_control;pub use cache_prompt::parse_cache_stats;pub use cache_prompt::CachePromptConfig;pub use cache_prompt::CacheStats as PromptCacheStats;pub use cost_tracker::format_cost;pub use cost_tracker::format_tokens;pub use cost_tracker::CostEntry;pub use cost_tracker::CostSummary;pub use cost_tracker::ModelPricing;pub use cost_tracker::SessionCostTracker;pub use error::LlmError;pub use error::Result;pub use error::RetryStrategy;pub use factory::ProviderFactory;pub use factory::ProviderType;pub use inference_metrics::InferenceMetrics;pub use middleware::LLMMiddleware;pub use middleware::LLMMiddlewareStack;pub use middleware::LLMRequest;pub use middleware::LogLevel;pub use middleware::LoggingLLMMiddleware;pub use middleware::MetricsLLMMiddleware;pub use middleware::MetricsSummary;pub use model_config::DefaultsConfig;pub use model_config::ModelCapabilities;pub use model_config::ModelCard;pub use model_config::ModelConfigError;pub use model_config::ModelCost;pub use model_config::ModelType;pub use model_config::ModelsConfig;pub use model_config::ProviderConfig;pub use model_config::ProviderType as ConfigProviderType;pub use providers::azure_openai::AzureOpenAIProvider;pub use providers::gemini::GeminiProvider;pub use providers::jina::JinaProvider;pub use providers::lmstudio::LMStudioProvider;pub use providers::mock::MockProvider;pub use providers::ollama::OllamaModelDetails;pub use providers::ollama::OllamaModelInfo;pub use providers::ollama::OllamaModelsResponse;pub use providers::ollama::OllamaProvider;pub use providers::openai::OpenAIProvider;pub use providers::mistral::MistralProvider;pub use providers::anthropic::AnthropicProvider;pub use providers::openrouter::ModelArchitecture as OpenRouterModelArchitecture;pub use providers::openrouter::ModelInfo as OpenRouterModelInfo;pub use providers::openrouter::ModelPricing as OpenRouterModelPricing;pub use providers::openrouter::ModelsResponse as OpenRouterModelsResponse;pub use providers::openrouter::OpenRouterProvider;pub use providers::openai_compatible::OpenAICompatibleProvider;pub use providers::vscode::Model as CopilotModel;pub use providers::vscode::ModelsResponse as CopilotModelsResponse;pub use providers::vscode::VsCodeCopilotProvider;pub use providers::xai::XAIProvider;pub use rate_limiter::RateLimitedProvider;pub use rate_limiter::RateLimiter;pub use rate_limiter::RateLimiterConfig;pub use registry::ProviderRegistry;pub use reranker::BM25Reranker;pub use reranker::HttpReranker;pub use reranker::HybridReranker;pub use reranker::MockReranker;pub use reranker::RRFReranker;pub use reranker::RerankConfig;pub use reranker::RerankResult;pub use reranker::Reranker;pub use reranker::ScoreAggregation;pub use reranker::TermOverlapReranker;pub use retry::RetryExecutor;pub use tokenizer::Tokenizer;pub use traits::CacheControl;pub use traits::ChatMessage;pub use traits::ChatRole;pub use traits::CompletionOptions;pub use traits::EmbeddingProvider;pub use traits::FunctionCall;pub use traits::FunctionDefinition;pub use traits::ImageData;pub use traits::LLMProvider;pub use traits::LLMResponse;pub use traits::ToolCall;pub use traits::ToolChoice;pub use traits::ToolDefinition;pub use traits::ToolResult;
Modules§
- cache
- LLM response caching for reducing API costs and latency.
- cache_
prompt - Prompt caching utilities for Anthropic Claude models.
- cost_
tracker - Session Cost Tracker
- error
- LLM error types with retry strategies.
- factory
- LLM provider factory for environment-based selection.
- inference_
metrics - Inference Metrics for Real-Time Streaming Display
- middleware
- LLM Provider Middleware System (OODA-125)
- model_
config - Model Configuration Module
- providers
- LLM provider implementations.
- rate_
limiter - Async-aware rate limiting for LLM API calls.
- registry
- LLM Provider Registry - Pluggable Provider Management
- reranker
- Reranking functionality for improved retrieval quality.
- retry
- Retry executor for LLM operations with exponential backoff.
- tokenizer
- Token counting utilities.
- traits
- LLM provider traits for text completion and embedding.