llmshim
A blazing fast LLM API translation layer in pure Rust. One interface, every provider.
Benchmarks
p50 latency over 20 runs. Same prompt, same models, same machine.
| Metric | llmshim | litellm | langchain |
|---|---|---|---|
| Anthropic (p50) | 1,234ms | 1,288ms | 1,363ms |
| OpenAI (p50) | 648ms | 1,180ms | 700ms |
| Streaming TTFT | 1,065ms | 1,658ms | 1,623ms |
| Cold start | 1,396ms | 2,382ms | 1,619ms |
| Memory (RSS) | 8 MB | 255 MB | 255 MB |
All three libraries hit the same APIs (Responses API for OpenAI, Messages API for Anthropic). llmshim adds ~2µs of translation overhead per request — the rest is network.
Run it yourself:
What it does
Send requests through llmshim → it translates to whichever provider you choose → translates the response back. Zero infrastructure, zero databases, ~5MB binary.
=
Switch providers by changing the model string. Everything else stays the same.
Install
Or from source:
Configure
# Set API keys once — persisted to ~/.llmshim/config.toml
Or from the CLI: llmshim configure
Supported models
| Provider | Models | Reasoning visible |
|---|---|---|
| OpenAI | gpt-5.4 |
Yes (summaries) |
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 |
Yes (full thinking) |
| Google Gemini | gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview |
Yes (thought summaries) |
| xAI | grok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning |
No (hidden) |
Chat
# Simple
=
# With message history
=
Streaming
pass # thinking tokens
Multi-model conversations
Switch models mid-conversation. History carries over.
=
=
=
Tool use
=
=
Tools are accepted in OpenAI Chat Completions format and auto-translated to each provider's native format.
Reasoning / thinking
=
# thinking content
# answer
Fallback chains
=
Proxy server
llmshim runs as an HTTP proxy with its own API spec. Any language can talk to it.
# Listening on http://localhost:3000
| Method | Path | Description |
|---|---|---|
POST |
/v1/chat |
Chat completion (or streaming with stream: true) |
POST |
/v1/chat/stream |
Always-streaming SSE with typed events |
GET |
/v1/models |
List available models |
GET |
/health |
Health check |
Full API spec: api/openapi.yaml
Docker
CLI
How it works
No canonical struct. Requests flow as serde_json::Value — each provider maps only what it understands. Adding a provider = implementing one trait with three methods.
llmshim::completion(router, request)
→ router.resolve("anthropic/claude-sonnet-4-6")
→ provider.transform_request(model, &value)
→ HTTP
→ provider.transform_response(model, body)
Key features
- Multi-model conversations — switch providers mid-chat, history carries over
- Reasoning/thinking — visible chain-of-thought from OpenAI, Anthropic, and Gemini
- Streaming — token-by-token with thinking in dim grey
- Tool use — Chat Completions format auto-translated to each provider
- Vision/images — send images in any format, auto-translated between providers
- Fallback chains — automatic failover across providers with exponential backoff
- Cross-provider translation — system messages, tool calls, and provider-specific fields all handled
Build & test