Skip to main content

Crate edgequake_llm

Crate edgequake_llm 

Source
Expand description

EdgeQuake LLM - LLM and Embedding Provider Abstraction

§Implements

  • FEAT0017: Multi-Provider LLM Support
  • FEAT0018: Embedding Provider Abstraction
  • FEAT0019: LLM Response Caching
  • FEAT0020: API Rate Limiting
  • FEAT0005: Embedding Generation (via providers)

§Enforces

  • BR0301: LLM API rate limits (configurable per provider)
  • BR0302: Document size limits (context window awareness)
  • BR0303: Cost tracking per request
  • BR0010: Embedding dimension validated (1536 default)

This crate provides traits and implementations for:

  • Text completion (LLM providers)
  • Text embedding (embedding providers)
  • Token counting and management
  • Rate limiting for API calls
  • Response caching for cost reduction

§Providers

ProviderFEAT0017ChatEmbeddingsNotes
OpenAIPrimary production provider
Azure OpenAIEnterprise deployments
OllamaLocal/on-prem models
LM StudioLocal OpenAI-compatible API
GeminiGoogle AI
MockTesting (no API calls)

§Architecture

The crate uses trait-based abstraction to support multiple LLM backends:

  • OpenAI (GPT-4, GPT-3.5)
  • OpenAI-compatible APIs (Ollama, LM Studio, etc.)
  • Anthropic (Claude 3.5, Claude 3)
  • Future: Mistral, local models

§Example

use edgequake_llm::{LLMProvider, OpenAIProvider};

let provider = OpenAIProvider::new("your-api-key");
let response = provider.complete("Hello, world!").await?;

§See Also

Re-exports§

pub use cache::CacheConfig;
pub use cache::CacheStats;
pub use cache::CachedProvider;
pub use cache::LLMCache;
pub use cache_prompt::apply_cache_control;
pub use cache_prompt::parse_cache_stats;
pub use cache_prompt::CachePromptConfig;
pub use cache_prompt::CacheStats as PromptCacheStats;
pub use cost_tracker::format_cost;
pub use cost_tracker::format_tokens;
pub use cost_tracker::CostEntry;
pub use cost_tracker::CostSummary;
pub use cost_tracker::ModelPricing;
pub use cost_tracker::SessionCostTracker;
pub use error::LlmError;
pub use error::Result;
pub use error::RetryStrategy;
pub use factory::ProviderFactory;
pub use factory::ProviderType;
pub use inference_metrics::InferenceMetrics;
pub use middleware::LLMMiddleware;
pub use middleware::LLMMiddlewareStack;
pub use middleware::LLMRequest;
pub use middleware::LogLevel;
pub use middleware::LoggingLLMMiddleware;
pub use middleware::MetricsLLMMiddleware;
pub use middleware::MetricsSummary;
pub use model_config::DefaultsConfig;
pub use model_config::ModelCapabilities;
pub use model_config::ModelCard;
pub use model_config::ModelConfigError;
pub use model_config::ModelCost;
pub use model_config::ModelType;
pub use model_config::ModelsConfig;
pub use model_config::ProviderConfig;
pub use model_config::ProviderType as ConfigProviderType;
pub use providers::azure_openai::AzureOpenAIProvider;
pub use providers::gemini::GeminiProvider;
pub use providers::jina::JinaProvider;
pub use providers::lmstudio::LMStudioProvider;
pub use providers::mock::MockProvider;
pub use providers::ollama::OllamaModelDetails;
pub use providers::ollama::OllamaModelInfo;
pub use providers::ollama::OllamaModelsResponse;
pub use providers::ollama::OllamaProvider;
pub use providers::openai::OpenAIProvider;
pub use providers::mistral::MistralProvider;
pub use providers::anthropic::AnthropicProvider;
pub use providers::openrouter::ModelArchitecture as OpenRouterModelArchitecture;
pub use providers::openrouter::ModelInfo as OpenRouterModelInfo;
pub use providers::openrouter::ModelPricing as OpenRouterModelPricing;
pub use providers::openrouter::ModelsResponse as OpenRouterModelsResponse;
pub use providers::openrouter::OpenRouterProvider;
pub use providers::openai_compatible::OpenAICompatibleProvider;
pub use providers::vscode::Model as CopilotModel;
pub use providers::vscode::ModelsResponse as CopilotModelsResponse;
pub use providers::vscode::VsCodeCopilotProvider;
pub use providers::xai::XAIProvider;
pub use rate_limiter::RateLimitedProvider;
pub use rate_limiter::RateLimiter;
pub use rate_limiter::RateLimiterConfig;
pub use registry::ProviderRegistry;
pub use reranker::BM25Reranker;
pub use reranker::HttpReranker;
pub use reranker::HybridReranker;
pub use reranker::MockReranker;
pub use reranker::RRFReranker;
pub use reranker::RerankConfig;
pub use reranker::RerankResult;
pub use reranker::Reranker;
pub use reranker::ScoreAggregation;
pub use reranker::TermOverlapReranker;
pub use retry::RetryExecutor;
pub use tokenizer::Tokenizer;
pub use traits::CacheControl;
pub use traits::ChatMessage;
pub use traits::ChatRole;
pub use traits::CompletionOptions;
pub use traits::EmbeddingProvider;
pub use traits::FunctionCall;
pub use traits::FunctionDefinition;
pub use traits::ImageData;
pub use traits::LLMProvider;
pub use traits::LLMResponse;
pub use traits::ToolCall;
pub use traits::ToolChoice;
pub use traits::ToolDefinition;
pub use traits::ToolResult;

Modules§

cache
LLM response caching for reducing API costs and latency.
cache_prompt
Prompt caching utilities for Anthropic Claude models.
cost_tracker
Session Cost Tracker
error
LLM error types with retry strategies.
factory
LLM provider factory for environment-based selection.
inference_metrics
Inference Metrics for Real-Time Streaming Display
middleware
LLM Provider Middleware System (OODA-125)
model_config
Model Configuration Module
providers
LLM provider implementations.
rate_limiter
Async-aware rate limiting for LLM API calls.
registry
LLM Provider Registry - Pluggable Provider Management
reranker
Reranking functionality for improved retrieval quality.
retry
Retry executor for LLM operations with exponential backoff.
tokenizer
Token counting utilities.
traits
LLM provider traits for text completion and embedding.