ferryllm

Universal LLM protocol middleware for OpenAI, Anthropic, Claude Code, and OpenAI-compatible backends.

ferryllm is a Rust gateway that lets clients and providers speak different LLM protocols through one shared internal representation. Use it as a local Claude Code bridge, a private model gateway, or an embeddable adapter library.

Highlights

OpenAI-compatible entrypoint: POST /v1/chat/completions
Anthropic-compatible entrypoint: POST /v1/messages
OpenAI-compatible and Anthropic backend adapters
Claude Code to OpenAI-compatible backend routing
Model aliases, prefix routing, and model rewrite rules
Streaming SSE translation with tool-call support
Config-driven standalone server: ferryllm serve --config ferryllm.toml
Request timeout, body limit, API-key auth, rate limits, concurrency caps, metrics, retry, fallback, and circuit breaker support
Library-first architecture for adding new entry protocols and provider adapters

Why

Most LLM gateways become an N x M matrix: every client protocol needs custom code for every provider protocol. ferryllm uses an N + M design instead.

Client protocol -> ferryllm IR -> provider protocol

That means a new backend adapter can immediately serve OpenAI-style clients, Anthropic-style clients, and Claude Code without rewriting every path.

Quick Start

Install from crates.io:

cargo install ferryllm

Or run from source:

git clone https://github.com/caomengxuan666/ferryllm.git
cd ferryllm
cargo run --features http --bin ferryllm -- serve --config examples/config/codexapis.toml

Use an OpenAI-compatible provider key:

export CODX_API_KEY="your-api-key"
RUST_LOG=info ferryllm serve --config examples/config/codexapis.toml

Smoke test the Anthropic-compatible endpoint:

curl -s http://127.0.0.1:3000/v1/messages \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer local-test-token' \
  -d '{"model":"cc-gpt55","max_tokens":64,"messages":[{"role":"user","content":"hello"}]}'

Claude Code With GPT-5.5

Claude Code sends Anthropic-format requests. ferryllm can receive those requests, rewrite the model, and forward them to an OpenAI-compatible backend.

Claude Code
  -> POST /v1/messages, model = claude-*
  -> ferryllm Anthropic entry
  -> unified IR
  -> route match: claude-
  -> rewrite backend model: gpt-5.4
  -> OpenAI-compatible backend

Start ferryllm:

export CODX_API_KEY="your-api-key"
RUST_LOG=ferryllm=info,tower_http=info \
  ferryllm serve --config examples/config/codexapis.toml

Point Claude Code at ferryllm:

ANTHROPIC_API_KEY=dummy \
ANTHROPIC_BASE_URL=http://127.0.0.1:3000 \
claude --bare --print --model claude-opus-4-6 \
  "Reply with exactly one short word: pong"

Expected output:

pong

See docs/claude-code.md for persistent Claude Code and cc-switch setup.

Configuration

ferryllm uses TOML configuration. Secrets stay in environment variables.

[server]
listen = "0.0.0.0:3000"
request_timeout_secs = 120
body_limit_mb = 32

[logging]
level = "info"
format = "text"

[metrics]
enabled = true

[[providers]]
name = "codexapis"
type = "openai"
base_url = "https://codexapis.com"
api_key_env = "CODX_API_KEY"

[[routes]]
match = "cc-gpt55"
match_type = "exact"
provider = "codexapis"
rewrite_model = "gpt-5.4"

[[routes]]
match = "claude-"
provider = "codexapis"
rewrite_model = "gpt-5.4"

Check a config without starting the server:

ferryllm check-config --config examples/config/codexapis.toml

API Surface

Endpoint	Purpose
`POST /v1/chat/completions`	OpenAI-compatible chat completions
`POST /v1/messages`	Anthropic-compatible messages
`GET /health`	Simple health check
`GET /healthz`	Kubernetes-style liveness check
`GET /readyz`	Readiness check
`GET /metrics`	Prometheus-style metrics

Architecture

src/
  adapter.rs        Adapter trait
  ir.rs             Unified request, response, content, tool, and stream types
  router.rs         Exact and prefix model routing
  server.rs         Axum HTTP server
  config.rs         TOML config loader and validator
  entry/            Client protocol translators
  adapters/         Backend provider adapters

More detail: docs/architecture.md.

Load Testing

ferryllm ships a benchmark-style load tester for local mock-upstream testing:

MOCK_DELAY_MS=20 cargo run --example mock_openai_upstream --features http
cargo run --release --example load_test --features http -- \
  --preset mock-anthropic \
  --requests 10000 \
  --concurrency 512

See docs/load-testing.md.

Documentation

Roadmap

More provider adapters, including Gemini
Weighted and latency-aware provider pools
Hot-reload configuration
Richer Prometheus metrics labels
Per-key quota and usage accounting hooks
Packaged Docker images and deployment templates

License

MIT. See LICENSE.

ferryllm 0.1.1