anyllm_proxy

HTTP proxy that accepts Anthropic Messages API and OpenAI Chat Completions requests, translates between formats, and forwards to any supported backend (OpenAI, Anthropic, Azure, Vertex, Gemini, Bedrock, or any of the ~90 OpenAI-compatible providers in the catalog). Ships with an optional admin UI, virtual key management, response caching, cost tracking, and a Batch API.

This is the top-level binary crate in the anyllm-proxy workspace.

Where it fits

Five-crate workspace:

anyllm_translate - pure format mapping, no I/O.
anyllm_providers - provider and model catalog (~90 providers).
anyllm_client - async HTTP client wrapping translate + transport.
anyllm_batch_engine - batch job queue and worker pool.
anyllm_proxy (this crate) - the axum HTTP server that wires everything together.

If you only need format translation in your own Rust code, depend on anyllm_translate or anyllm_client directly. Use this crate when you want a standalone server with config files, key management, an admin UI, metrics, and a batch surface.

Run it

OPENAI_API_KEY=sk-... cargo run -p anyllm_proxy
# Listens on 0.0.0.0:3000, health at GET /health

With the admin UI on a separate port (3001 by default):

OPENAI_API_KEY=sk-... cargo run -p anyllm_proxy -- --webui

Switch backends via BACKEND=:

GROQ_API_KEY=... BACKEND=groq cargo run -p anyllm_proxy
ANTHROPIC_API_KEY=... BACKEND=anthropic cargo run -p anyllm_proxy

Any provider id from anyllm_providers is accepted as a BACKEND value. Full env var reference lives in docs/ENV.md.

Cargo features

Feature	What it adds
`redis`	Distributed rate limiting via Redis sorted sets.
`qdrant`	Optional semantic response cache backed by Qdrant.
`otel`	OpenTelemetry OTLP trace export.
`dangerous-builtin-tools`	Enables BashTool and ReadFileTool in the builtin tool registry. Do not enable in production without sandboxing.

Modules

Module	Purpose
`server`	Axum HTTP server: routes, middleware (auth, request ID, size/concurrency limits), SSE streaming.
`admin`	Localhost-only admin server: config management, request log, WebSocket live updates.
`backend`	Backend HTTP clients for OpenAI, Anthropic, Azure, Vertex, Gemini, Bedrock.
`config`	Env-based config, TLS client cert setup, URL validation.
`batch`	OpenAI-compatible Batch API surface (wraps `anyllm_batch_engine`).
`cache`	Response caching: in-memory (moka) plus optional Redis tier.
`callbacks`	Webhook callbacks for request completion.
`cost`	Per-request cost tracking and model pricing.
`env_parser`	Pure env-file parser used by startup bootstrap and the admin import endpoint.
`fallback`	Backend fallback chains for transparent failover.
`integrations`	Named integration registry (Langfuse, etc.).
`metrics`	Request count and success/error tracking, exposed via `GET /metrics`.
`ratelimit`	Distributed rate limiting (requires `redis` feature).
`tools`	Builtin server-side tool registry.
`otel`	OpenTelemetry OTLP export (requires `otel` feature).

Install

Published to crates.io alongside the rest of the workspace:

cargo install anyllm_proxy

Docker images are published as followthewhit3rabbit/anyllm-proxy. A Debian package can be built with cargo deb -p anyllm_proxy.

Library examples

anyllm_proxy is primarily a binary, but several modules are usable as a library when you are building your own server, tool, or admin surface and want to reuse the proxy's well-tested pieces instead of rewriting them. If you only need format mapping or a client, prefer anyllm_translate or anyllm_client directly.

1. Parse a `.env` file without touching the process environment

env_parser is pure: it does not read files, mutate std::env, or call set_var. It accepts a string and returns parsed pairs plus warnings, which is what you want for editor integrations, config previewers, or import endpoints.

use anyllm_proxy::env_parser::{parse_env_content, escape_for_env_file};

let raw = "OPENAI_API_KEY=sk-...\nBACKEND=groq\n# comment\nGROQ_API_KEY=gsk_...";
let result = parse_env_content(raw);

if !result.hard_errors.is_empty() {
    for err in &result.hard_errors {
        eprintln!("error: {err}");
    }
    return Err("aborting: env file rejected".into());
}

for pair in &result.pairs {
    println!("{}={}", pair.key, pair.value);
}
for warn in &result.warnings {
    eprintln!("warning at line {:?}: {}", warn.line, warn.message);
}

// Round-trip safely back to env file syntax (handles quoting/escaping).
let line = format!("MY_VAR={}", escape_for_env_file("value with spaces"));

hard_errors being non-empty means the caller must not apply pairs. Warnings are informational.

2. Compute USD cost from token usage

The cost module embeds the LiteLLM pricing table at compile time. Use it to attach cost numbers to your own request logs without re-hosting the JSON.

use anyllm_proxy::cost::ModelPricing;

let pricing = ModelPricing::load();
if let Some((input_per_m, output_per_m)) = pricing.price_for_model("gpt-4o-mini") {
    println!("input: ${input_per_m}/M tokens, output: ${output_per_m}/M tokens");
}

let total_usd = pricing.cost_for_usage("gpt-4o-mini", 12_345, 678);
println!("call cost: ${total_usd:.6}");

To override at runtime (for example, custom enterprise pricing), point MODEL_PRICING_FILE at a JSON file with the same shape, then call ModelPricing::load_with_optional_override(Some(path)).

3. Reuse cache keys and fallback config parsing

Two small, stable pieces that are useful even outside the proxy:

use anyllm_proxy::cache::{cache_key_for_request, CacheNamespace};
use anyllm_proxy::fallback::config::parse_fallback_config;

// Deterministic cache key for an OpenAI-shaped request body.
let body = serde_json::json!({
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "hi"}],
});
let key = cache_key_for_request(&body, CacheNamespace::OpenAI);

// Parse the proxy's fallback YAML format.
let yaml = r#"
fallback_chains:
  default:
    - name: azure
      env_prefix: AZURE_FALLBACK_
    - name: openai
      env_prefix: OPENAI_FALLBACK_
"#;
let cfg = parse_fallback_config(yaml)?;

Larger pieces (MemoryCache, RedisCache, SemanticCache, FallbackChain, Metrics) are public but are designed around the proxy's own state types. If you need them in a different server, expect to pull more of the surrounding wiring from crates/proxy/src/server/ rather than treating each as a standalone helper.

4. Embed the batch API surface

The proxy's batch module wraps anyllm_batch_engine with OpenAI-compatible routes. If you want the same /v1/batches and /v1/files HTTP shape inside your own server, mount the batch handlers and provide a BatchEngine in shared state. For a more decoupled integration, depend on anyllm_batch_engine directly (see that crate's README for examples).

5. Run as a sidecar instead of linking

If you do not need to embed proxy modules into your own binary, the simplest "library use" is to run anyllm_proxy as a sidecar on localhost:3000 and have your application speak Anthropic Messages or OpenAI Chat Completions to it. This is the supported path for non-Rust clients.

Documentation

Full architecture: docs/proxy-architecture.md.
Env vars: docs/ENV.md.
Endpoint inventory: docs/ENDPOINTS.md.
Config files: docs/CONFIG.md.

Tests

cargo test -p anyllm_proxy

The full workspace suite is roughly 1100 tests; ten are #[ignore] because they hit live APIs and need real keys.

anyllm_proxy 0.9.6