ferryllm 0.1.4 - Docs.rs

# ferryllm

A universal LLM protocol middleware written in Rust. Ferry your requests between OpenAI, Anthropic, and beyond — one unified internal representation, N+M adapters instead of an N×M matrix.

## Design goals

1. **Unified IR** — A self-built superset internal representation that captures the semantics of all major LLM protocols (text, images, tool calls, thinking blocks, streaming lifecycle), not tied to any single provider's format.

2. **N+M adapter architecture** — Entry adapters translate client protocols into IR; exit adapters translate IR into backend-native requests. Adding a new provider requires only one exit adapter, and all entry formats automatically gain access.

3. **Zero-copy, async, streaming-first** — Built on tokio + axum + reqwest. SSE streams are translated on the fly without buffering the entire response. Designed for <1ms proxy overhead.

4. **Dual entry points** — Exposes both `POST /v1/chat/completions` (OpenAI format) and `POST /v1/messages` (Anthropic format). Route by model name to the appropriate backend.

5. **Embeddable library + standalone binary** — Core types and adapters are a library (`lib.rs`); the HTTP server is behind a `http` feature flag. Usable as a Rust dependency or as a standalone proxy.

## Architecture

```
Client (OpenAI format) ──→ POST /v1/chat/completions ──→ entry/openai ──→ IR
Client (Anthropic format) ─→ POST /v1/messages ──────────→ entry/anthropic ─→ IR
                                                                                │
                                                                         router.resolve(model)
                                                                                │
                                   ┌────────────────────────────────────────────┤
                                   ▼                                            ▼
                            adapters/anthropic                           adapters/openai
                            IR → Anthropic native → HTTP                  IR → OpenAI native → HTTP
                                   │                                            │
                                   ▼                                            ▼
                              Anthropic API                              OpenAI API / vLLM
```

## Project structure

```
src/
├── ir.rs           # Unified IR: ChatRequest, ChatResponse, Message, ContentBlock,
│                   #   StreamEvent (8 event types), Tool, Usage
├── adapter.rs      # Adapter trait: chat() + chat_stream()
├── router.rs       # Model router: prefix-match + default fallback
├── server.rs       # axum HTTP server (behind "http" feature)
├── adapters/
│   ├── openai.rs   # Exit adapter: IR ↔ OpenAI wire format
│   └── anthropic.rs# Exit adapter: IR ↔ Anthropic wire format
└── entry/
    ├── openai.rs   # Entry translator: OpenAI client JSON ↔ IR
    └── anthropic.rs# Entry translator: Anthropic client JSON ↔ IR
```

## Roadmap

- [x] Unified IR types
- [x] Adapter trait
- [x] OpenAI adapter (backend)
- [x] Anthropic adapter (backend)
- [x] Entry translation (client-side)
- [x] Model router
- [x] HTTP server with SSE streaming
- [ ] Gemini adapter
- [x] Stateful tool-call JSON accumulation in streams
- [ ] Connection pooling & keep-alive
- [ ] Hot-reload configuration
- [x] Observability (tracing / metrics)
- [x] Rate limiting