llm-edge-agent
High-performance LLM intercepting proxy with intelligent caching, routing, and observability.
Overview
The llm-edge-agent binary is the main executable for the LLM Edge Agent system - an enterprise-grade intercepting proxy for Large Language Model APIs. It sits between your applications and LLM providers (OpenAI, Anthropic, etc.), providing intelligent request routing, multi-tier caching, cost optimization, and comprehensive observability.
Key Features:
- High Performance: 1000+ RPS throughput, <50ms proxy overhead
- Intelligent Caching: Multi-tier (L1 Moka + L2 Redis) with 70%+ hit rates
- Smart Routing: Model-based, cost-optimized, latency-optimized, and failover strategies
- Multi-Provider Support: OpenAI, Anthropic, with easy extensibility
- Enterprise Observability: Prometheus metrics, Grafana dashboards, Jaeger tracing
- Production Ready: Comprehensive testing, security hardening, chaos engineering validated
Installation
From Crates.io (when published)
Building from Source
# Clone the repository
# Build the binary
# The binary will be at: target/release/llm-edge-agent
Using Docker
# Pull the image (when published)
# Or build locally
Quick Start
Prerequisites
At least one LLM provider API key is required:
- OpenAI API key (for GPT models)
- Anthropic API key (for Claude models)
Optional infrastructure:
- Redis 7.0+ (for L2 distributed caching)
- Prometheus (for metrics collection)
- Grafana (for dashboards)
- Jaeger (for distributed tracing)
Configuration
Create a .env file or set environment variables:
# Required: At least one provider API key
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
# Optional: Server configuration
HOST=0.0.0.0
PORT=8080
METRICS_PORT=9090
# Optional: L2 Cache (Redis)
ENABLE_L2_CACHE=true
REDIS_URL=redis://localhost:6379
# Optional: Observability
ENABLE_TRACING=true
ENABLE_METRICS=true
RUST_LOG=info,llm_edge_agent=debug
Running the Binary
Standalone (L1 cache only):
# Set API key
# Run the binary
With full infrastructure (recommended):
# Start complete stack with Docker Compose
# The agent will automatically connect to Redis, Prometheus, etc.
Check health:
Making Your First Request
The proxy exposes an OpenAI-compatible API:
Response includes metadata:
Usage
Environment Variables
| Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Server bind address |
PORT |
8080 |
HTTP server port |
METRICS_PORT |
9090 |
Prometheus metrics port |
OPENAI_API_KEY |
- | OpenAI API key (required if using OpenAI) |
ANTHROPIC_API_KEY |
- | Anthropic API key (required if using Anthropic) |
ENABLE_L2_CACHE |
false |
Enable Redis L2 cache |
REDIS_URL |
- | Redis connection URL |
ENABLE_TRACING |
true |
Enable distributed tracing |
ENABLE_METRICS |
true |
Enable Prometheus metrics |
RUST_LOG |
info |
Logging configuration |
API Endpoints
Main Proxy Endpoint:
POST /v1/chat/completions- OpenAI-compatible chat completions
Health & Monitoring:
GET /health- Detailed system health statusGET /health/ready- Kubernetes readiness probeGET /health/live- Kubernetes liveness probeGET /metrics- Prometheus metrics
Supported Models
OpenAI:
- gpt-4, gpt-4-turbo, gpt-4o
- gpt-3.5-turbo
Anthropic:
- claude-3-opus, claude-3-sonnet, claude-3-haiku
- claude-2.1, claude-2.0
The proxy automatically routes requests to the appropriate provider based on the model name.
Architecture
The binary integrates all LLM Edge Agent components:
llm-edge-agent (binary)
├── HTTP Server (Axum)
│ ├── Request validation
│ ├── Health check endpoints
│ └── Metrics endpoint
│
├── Cache Layer (llm-edge-cache)
│ ├── L1: In-memory (Moka)
│ └── L2: Distributed (Redis)
│
├── Routing Layer (llm-edge-routing)
│ ├── Model-based routing
│ ├── Cost optimization
│ ├── Latency optimization
│ └── Failover support
│
├── Provider Layer (llm-edge-providers)
│ ├── OpenAI adapter
│ └── Anthropic adapter
│
└── Observability (llm-edge-monitoring)
├── Prometheus metrics
├── Distributed tracing
└── Structured logging
Deployment
Docker
# Build image
# Run container
Docker Compose
See docker-compose.production.yml in the repository root for a complete production-ready stack including:
- LLM Edge Agent (3 replicas)
- Redis cluster (3 nodes)
- Prometheus
- Grafana (with pre-built dashboards)
- Jaeger
Kubernetes
# Create namespace
# Create secrets
# Deploy
Features:
- HorizontalPodAutoscaler (3-10 replicas)
- Rolling updates (zero downtime)
- Resource limits and requests
- Liveness and readiness probes
Monitoring
Metrics
The binary exposes Prometheus metrics on the configured METRICS_PORT:
Request Metrics:
llm_edge_requests_total- Total request countllm_edge_request_duration_seconds- Request latency histogramllm_edge_request_errors_total- Error count by type
Cache Metrics:
llm_edge_cache_hits_total{tier="l1|l2"}- Cache hitsllm_edge_cache_misses_total- Cache missesllm_edge_cache_latency_seconds- Cache operation latency
Provider Metrics:
llm_edge_provider_latency_seconds- Provider response timellm_edge_provider_errors_total- Provider errorsllm_edge_cost_usd_total- Cumulative cost tracking
Token Metrics:
llm_edge_tokens_used_total- Token usage by provider/modelllm_edge_tokens_prompt_total- Prompt tokensllm_edge_tokens_completion_total- Completion tokens
Health Checks
Health endpoint response:
Performance
Benchmarks:
- Throughput: 1000+ requests/second
- Proxy Overhead: <50ms (P95)
- L1 Cache Hit: <100μs
- L2 Cache Hit: 1-2ms
- Memory Usage: <2GB under normal load
Cache Performance:
- Overall Hit Rate: >70%
- L1 Hit Rate: 60-70% (hot data)
- L2 Hit Rate: 10-15% (warm data)
- Cost Savings: 70%+ (cached responses are free)
Documentation
For comprehensive documentation, see the root README:
- Full architecture guide
- Testing documentation
- Infrastructure setup
- Deployment guides
- API reference
Development
This crate can be used both as a binary and as a library:
As a Binary:
As a Library:
[]
= "1.0.0"
use ;
async
Troubleshooting
Provider not available:
Error: No providers configured
Solution: Set at least one API key (OPENAI_API_KEY or ANTHROPIC_API_KEY)
Redis connection failed:
Warning: L2 cache enabled but connection failed
Solution: Verify Redis is running and REDIS_URL is correct. Agent will fall back to L1-only mode.
High latency:
- Check provider health:
curl http://localhost:8080/health - Monitor metrics:
curl http://localhost:9090/metrics - Review logs: Set
RUST_LOG=debug
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Contributing
See the Contributing Guide in the root repository.