anyllm_proxy 0.9.2

HTTP proxy translating Anthropic Messages API to OpenAI Chat Completions
docs.rs failed to build anyllm_proxy-0.9.2
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: anyllm_proxy-0.9.6

anyllm_proxy

HTTP proxy that accepts Anthropic Messages API and OpenAI Chat Completions requests, translates between formats, and forwards to any supported backend (OpenAI, Anthropic, Azure, Vertex, Gemini, Bedrock, or any of the ~90 OpenAI-compatible providers in the catalog). Ships with an optional admin UI, virtual key management, response caching, cost tracking, and a Batch API.

This is the top-level binary crate in the anyllm-proxy workspace.

Where it fits

Five-crate workspace:

  • anyllm_translate - pure format mapping, no I/O.
  • anyllm_providers - provider and model catalog (~90 providers).
  • anyllm_client - async HTTP client wrapping translate + transport.
  • anyllm_batch_engine - batch job queue and worker pool.
  • anyllm_proxy (this crate) - the axum HTTP server that wires everything together.

If you only need format translation in your own Rust code, depend on anyllm_translate or anyllm_client directly. Use this crate when you want a standalone server with config files, key management, an admin UI, metrics, and a batch surface.

Run it

OPENAI_API_KEY=sk-... cargo run -p anyllm_proxy
# Listens on 0.0.0.0:3000, health at GET /health

With the admin UI on a separate port (3001 by default):

OPENAI_API_KEY=sk-... cargo run -p anyllm_proxy -- --webui

Switch backends via BACKEND=:

GROQ_API_KEY=... BACKEND=groq cargo run -p anyllm_proxy
ANTHROPIC_API_KEY=... BACKEND=anthropic cargo run -p anyllm_proxy

Any provider id from anyllm_providers is accepted as a BACKEND value. Full env var reference lives in docs/ENV.md.

Cargo features

Feature What it adds
redis Distributed rate limiting via Redis sorted sets.
qdrant Optional semantic response cache backed by Qdrant.
otel OpenTelemetry OTLP trace export.
dangerous-builtin-tools Enables BashTool and ReadFileTool in the builtin tool registry. Do not enable in production without sandboxing.

Modules

Module Purpose
server Axum HTTP server: routes, middleware (auth, request ID, size/concurrency limits), SSE streaming.
admin Localhost-only admin server: config management, request log, WebSocket live updates.
backend Backend HTTP clients for OpenAI, Anthropic, Azure, Vertex, Gemini, Bedrock.
config Env-based config, TLS client cert setup, URL validation.
batch OpenAI-compatible Batch API surface (wraps anyllm_batch_engine).
cache Response caching: in-memory (moka) plus optional Redis tier.
callbacks Webhook callbacks for request completion.
cost Per-request cost tracking and model pricing.
env_parser Pure env-file parser used by startup bootstrap and the admin import endpoint.
fallback Backend fallback chains for transparent failover.
integrations Named integration registry (Langfuse, etc.).
metrics Request count and success/error tracking, exposed via GET /metrics.
ratelimit Distributed rate limiting (requires redis feature).
tools Builtin server-side tool registry.
otel OpenTelemetry OTLP export (requires otel feature).

Install

Published to crates.io alongside the rest of the workspace:

cargo install anyllm_proxy

Docker images are published as followthewhit3rabbit/anyllm-proxy. A Debian package can be built with cargo deb -p anyllm_proxy.

Library examples

anyllm_proxy is primarily a binary, but several modules are usable as a library when you are building your own server, tool, or admin surface and want to reuse the proxy's well-tested pieces instead of rewriting them. If you only need format mapping or a client, prefer anyllm_translate or anyllm_client directly.

1. Parse a .env file without touching the process environment

env_parser is pure: it does not read files, mutate std::env, or call set_var. It accepts a string and returns parsed pairs plus warnings, which is what you want for editor integrations, config previewers, or import endpoints.

use anyllm_proxy::env_parser::{parse_env_content, escape_for_env_file};

let raw = "OPENAI_API_KEY=sk-...\nBACKEND=groq\n# comment\nGROQ_API_KEY=gsk_...";
let result = parse_env_content(raw);

if !result.hard_errors.is_empty() {
    for err in &result.hard_errors {
        eprintln!("error: {err}");
    }
    return Err("aborting: env file rejected".into());
}

for pair in &result.pairs {
    println!("{}={}", pair.key, pair.value);
}
for warn in &result.warnings {
    eprintln!("warning at line {:?}: {}", warn.line, warn.message);
}

// Round-trip safely back to env file syntax (handles quoting/escaping).
let line = format!("MY_VAR={}", escape_for_env_file("value with spaces"));

hard_errors being non-empty means the caller must not apply pairs. Warnings are informational.

2. Compute USD cost from token usage

The cost module embeds the LiteLLM pricing table at compile time. Use it to attach cost numbers to your own request logs without re-hosting the JSON.

use anyllm_proxy::cost::ModelPricing;

let pricing = ModelPricing::load();
if let Some((input_per_m, output_per_m)) = pricing.price_for_model("gpt-4o-mini") {
    println!("input: ${input_per_m}/M tokens, output: ${output_per_m}/M tokens");
}

let total_usd = pricing.cost_for_usage("gpt-4o-mini", 12_345, 678);
println!("call cost: ${total_usd:.6}");

To override at runtime (for example, custom enterprise pricing), point MODEL_PRICING_FILE at a JSON file with the same shape, then call ModelPricing::load_with_optional_override(Some(path)).

3. Reuse cache keys and fallback config parsing

Two small, stable pieces that are useful even outside the proxy:

use anyllm_proxy::cache::{cache_key_for_request, CacheNamespace};
use anyllm_proxy::fallback::config::parse_fallback_config;

// Deterministic cache key for an OpenAI-shaped request body.
let body = serde_json::json!({
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "hi"}],
});
let key = cache_key_for_request(&body, CacheNamespace::OpenAI);

// Parse the proxy's fallback YAML format.
let yaml = r#"
fallback_chains:
  default:
    - name: azure
      env_prefix: AZURE_FALLBACK_
    - name: openai
      env_prefix: OPENAI_FALLBACK_
"#;
let cfg = parse_fallback_config(yaml)?;

Larger pieces (MemoryCache, RedisCache, SemanticCache, FallbackChain, Metrics) are public but are designed around the proxy's own state types. If you need them in a different server, expect to pull more of the surrounding wiring from crates/proxy/src/server/ rather than treating each as a standalone helper.

4. Embed the batch API surface

The proxy's batch module wraps anyllm_batch_engine with OpenAI-compatible routes. If you want the same /v1/batches and /v1/files HTTP shape inside your own server, mount the batch handlers and provide a BatchEngine in shared state. For a more decoupled integration, depend on anyllm_batch_engine directly (see that crate's README for examples).

5. Run as a sidecar instead of linking

If you do not need to embed proxy modules into your own binary, the simplest "library use" is to run anyllm_proxy as a sidecar on localhost:3000 and have your application speak Anthropic Messages or OpenAI Chat Completions to it. This is the supported path for non-Rust clients.

Documentation

Tests

cargo test -p anyllm_proxy

The full workspace suite is roughly 1100 tests; ten are #[ignore] because they hit live APIs and need real keys.