anyllm_proxy
HTTP proxy that accepts Anthropic Messages API and OpenAI Chat Completions requests, translates between formats, and forwards to any supported backend (OpenAI, Anthropic, Azure, Vertex, Gemini, Bedrock, or any of the ~90 OpenAI-compatible providers in the catalog). Ships with an optional admin UI, virtual key management, response caching, cost tracking, and a Batch API.
This is the top-level binary crate in the anyllm-proxy workspace.
Where it fits
Five-crate workspace:
anyllm_translate- pure format mapping, no I/O.anyllm_providers- provider and model catalog (~90 providers).anyllm_client- async HTTP client wrapping translate + transport.anyllm_batch_engine- batch job queue and worker pool.anyllm_proxy(this crate) - the axum HTTP server that wires everything together.
If you only need format translation in your own Rust code, depend on anyllm_translate or anyllm_client directly. Use this crate when you want a standalone server with config files, key management, an admin UI, metrics, and a batch surface.
Run it
OPENAI_API_KEY=sk-...
# Listens on 0.0.0.0:3000, health at GET /health
With the admin UI on a separate port (3001 by default):
OPENAI_API_KEY=sk-...
Switch backends via BACKEND=:
GROQ_API_KEY=... BACKEND=groq
ANTHROPIC_API_KEY=... BACKEND=anthropic
Any provider id from anyllm_providers is accepted as a BACKEND value. Full env var reference lives in docs/ENV.md.
Cargo features
| Feature | What it adds |
|---|---|
redis |
Distributed rate limiting via Redis sorted sets. |
qdrant |
Optional semantic response cache backed by Qdrant. |
otel |
OpenTelemetry OTLP trace export. |
dangerous-builtin-tools |
Enables BashTool and ReadFileTool in the builtin tool registry. Do not enable in production without sandboxing. |
Modules
| Module | Purpose |
|---|---|
server |
Axum HTTP server: routes, middleware (auth, request ID, size/concurrency limits), SSE streaming. |
admin |
Localhost-only admin server: config management, request log, WebSocket live updates. |
backend |
Backend HTTP clients for OpenAI, Anthropic, Azure, Vertex, Gemini, Bedrock. |
config |
Env-based config, TLS client cert setup, URL validation. |
batch |
OpenAI-compatible Batch API surface (wraps anyllm_batch_engine). |
cache |
Response caching: in-memory (moka) plus optional Redis tier. |
callbacks |
Webhook callbacks for request completion. |
cost |
Per-request cost tracking and model pricing. |
env_parser |
Pure env-file parser used by startup bootstrap and the admin import endpoint. |
fallback |
Backend fallback chains for transparent failover. |
integrations |
Named integration registry (Langfuse, etc.). |
metrics |
Request count and success/error tracking, exposed via GET /metrics. |
ratelimit |
Distributed rate limiting (requires redis feature). |
tools |
Builtin server-side tool registry. |
otel |
OpenTelemetry OTLP export (requires otel feature). |
Install
Published to crates.io alongside the rest of the workspace:
Docker images are published as followthewhit3rabbit/anyllm-proxy. A Debian package can be built with cargo deb -p anyllm_proxy.
Library examples
anyllm_proxy is primarily a binary, but several modules are usable as a library when you are building your own server, tool, or admin surface and want to reuse the proxy's well-tested pieces instead of rewriting them. If you only need format mapping or a client, prefer anyllm_translate or anyllm_client directly.
1. Parse a .env file without touching the process environment
env_parser is pure: it does not read files, mutate std::env, or call set_var. It accepts a string and returns parsed pairs plus warnings, which is what you want for editor integrations, config previewers, or import endpoints.
use ;
let raw = "OPENAI_API_KEY=sk-...\nBACKEND=groq\n# comment\nGROQ_API_KEY=gsk_...";
let result = parse_env_content;
if !result.hard_errors.is_empty
for pair in &result.pairs
for warn in &result.warnings
// Round-trip safely back to env file syntax (handles quoting/escaping).
let line = format!;
hard_errors being non-empty means the caller must not apply pairs. Warnings are informational.
2. Compute USD cost from token usage
The cost module embeds the LiteLLM pricing table at compile time. Use it to attach cost numbers to your own request logs without re-hosting the JSON.
use ModelPricing;
let pricing = load;
if let Some = pricing.price_for_model
let total_usd = pricing.cost_for_usage;
println!;
To override at runtime (for example, custom enterprise pricing), point MODEL_PRICING_FILE at a JSON file with the same shape, then call ModelPricing::load_with_optional_override(Some(path)).
3. Reuse cache keys and fallback config parsing
Two small, stable pieces that are useful even outside the proxy:
use ;
use parse_fallback_config;
// Deterministic cache key for an OpenAI-shaped request body.
let body = json!;
let key = cache_key_for_request;
// Parse the proxy's fallback YAML format.
let yaml = r#"
fallback_chains:
default:
- name: azure
env_prefix: AZURE_FALLBACK_
- name: openai
env_prefix: OPENAI_FALLBACK_
"#;
let cfg = parse_fallback_config?;
Larger pieces (MemoryCache, RedisCache, SemanticCache, FallbackChain, Metrics) are public but are designed around the proxy's own state types. If you need them in a different server, expect to pull more of the surrounding wiring from crates/proxy/src/server/ rather than treating each as a standalone helper.
4. Embed the batch API surface
The proxy's batch module wraps anyllm_batch_engine with OpenAI-compatible routes. If you want the same /v1/batches and /v1/files HTTP shape inside your own server, mount the batch handlers and provide a BatchEngine in shared state. For a more decoupled integration, depend on anyllm_batch_engine directly (see that crate's README for examples).
5. Run as a sidecar instead of linking
If you do not need to embed proxy modules into your own binary, the simplest "library use" is to run anyllm_proxy as a sidecar on localhost:3000 and have your application speak Anthropic Messages or OpenAI Chat Completions to it. This is the supported path for non-Rust clients.
Documentation
- Full architecture:
docs/proxy-architecture.md. - Env vars:
docs/ENV.md. - Endpoint inventory:
docs/ENDPOINTS.md. - Config files:
docs/CONFIG.md.
Tests
The full workspace suite is roughly 1100 tests; ten are #[ignore] because they hit live APIs and need real keys.