ferryllm
Universal LLM protocol middleware for OpenAI, Anthropic, Claude Code, and OpenAI-compatible backends.
ferryllm is a Rust gateway that lets clients and providers speak different LLM protocols through one shared internal representation. Use it as a local Claude Code bridge, a private model gateway, or an embeddable adapter library.
Highlights
- OpenAI-compatible entrypoint:
POST /v1/chat/completions - Anthropic-compatible entrypoint:
POST /v1/messages - OpenAI-compatible and Anthropic backend adapters
- Claude Code to OpenAI-compatible backend routing
- Model aliases, prefix routing, and model rewrite rules
- Streaming SSE translation with tool-call support
- Config-driven standalone server:
ferryllm serve --config ferryllm.toml - Request timeout, body limit, API-key auth, rate limits, concurrency caps, metrics, retry, fallback, and circuit breaker support
- Library-first architecture for adding new entry protocols and provider adapters
Why
Most LLM gateways become an N x M matrix: every client protocol needs custom code for every provider protocol. ferryllm uses an N + M design instead.
Client protocol -> ferryllm IR -> provider protocol
That means a new backend adapter can immediately serve OpenAI-style clients, Anthropic-style clients, and Claude Code without rewriting every path.
Quick Start
Install from crates.io:
Or run from source:
Use an OpenAI-compatible provider key:
RUST_LOG=info
Smoke test the Anthropic-compatible endpoint:
Claude Code With GPT-5.5
Claude Code sends Anthropic-format requests. ferryllm can receive those requests, rewrite the model, and forward them to an OpenAI-compatible backend.
Claude Code
-> POST /v1/messages, model = claude-*
-> ferryllm Anthropic entry
-> unified IR
-> route match: claude-
-> rewrite backend model: gpt-5.4
-> OpenAI-compatible backend
Start ferryllm:
RUST_LOG=ferryllm=info,tower_http=info \
Point Claude Code at ferryllm:
ANTHROPIC_API_KEY=dummy \
ANTHROPIC_BASE_URL=http://127.0.0.1:3000 \
Expected output:
pong
See docs/claude-code.md for persistent Claude Code and cc-switch setup.
Configuration
ferryllm uses TOML configuration. Secrets stay in environment variables.
[]
= "0.0.0.0:3000"
= 120
= 32
[]
= "info"
= "text"
[]
= true
[[]]
= "codexapis"
= "openai"
= "https://codexapis.com"
= "CODX_API_KEY"
[[]]
= "cc-gpt55"
= "exact"
= "codexapis"
= "gpt-5.4"
[[]]
= "claude-"
= "codexapis"
= "gpt-5.4"
Check a config without starting the server:
API Surface
| Endpoint | Purpose |
|---|---|
POST /v1/chat/completions |
OpenAI-compatible chat completions |
POST /v1/messages |
Anthropic-compatible messages |
GET /health |
Simple health check |
GET /healthz |
Kubernetes-style liveness check |
GET /readyz |
Readiness check |
GET /metrics |
Prometheus-style metrics |
Architecture
src/
adapter.rs Adapter trait
ir.rs Unified request, response, content, tool, and stream types
router.rs Exact and prefix model routing
server.rs Axum HTTP server
config.rs TOML config loader and validator
entry/ Client protocol translators
adapters/ Backend provider adapters
More detail: docs/architecture.md.
Load Testing
ferryllm ships a benchmark-style load tester for local mock-upstream testing:
MOCK_DELAY_MS=20
See docs/load-testing.md.
Documentation
- Chinese README
- Architecture
- Claude Code setup
- Configuration
- Compatibility notes
- Deployment
- Load testing
- Prompt caching and token observability
Roadmap
- More provider adapters, including Gemini
- Weighted and latency-aware provider pools
- Hot-reload configuration
- Richer Prometheus metrics labels
- Per-key quota and usage accounting hooks
- Packaged Docker images and deployment templates
License
MIT. See LICENSE.