llmshim
A blazing fast LLM API translation layer in pure Rust. One interface, every provider.
What it does
Send requests through llmshim's API → it translates to whichever provider you choose → translates the response back. Zero infrastructure, zero databases, ~5MB binary.
=
=
Switch providers by changing the model string. Everything else stays the same.
Supported providers & models
| Provider | Models | Reasoning visible |
|---|---|---|
| OpenAI | gpt-5.4 |
Yes (summaries) |
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 |
Yes (full thinking) |
| Google Gemini | gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview |
Yes (thought summaries) |
| xAI | grok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning |
No (hidden) |
Quick start
# Install
# Configure API keys (interactive, like aws configure)
# Or set keys individually
# Start the interactive chat
# Start the proxy server
Keys are stored in ~/.llmshim/config.toml. You can also set environment variables directly. Precedence: env vars > config file.
Docker
# Build the image
# Start the proxy (uses keys from ~/.llmshim/config.toml or env vars)
# Check status, view logs, stop
# Or run directly with docker
Proxy Server
llmshim runs as an HTTP proxy with its own API spec (not OpenAI-compatible). Any language can talk to it.
# Listening on http://localhost:3000
Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/v1/chat |
Chat completion (or streaming with stream: true) |
POST |
/v1/chat/stream |
Always-streaming SSE with typed events |
GET |
/v1/models |
List available models |
GET |
/health |
Health check |
Request format
config holds provider-agnostic settings. provider_config passes raw JSON to the underlying provider for features like Anthropic thinking or Gemini safety settings.
Fallback chains
Add fallback to automatically try other models when the primary fails (429, 500, 502, 503):
Each model is retried up to 2 times with exponential backoff before moving to the next. Also available as a Rust API:
let config = new;
let resp = completion_with_fallback.await?;
Streaming events
The /v1/chat/stream endpoint emits typed SSE events:
event: reasoning
data: {"type":"reasoning","text":"Let me think..."}
event: content
data: {"type":"content","text":"The answer is 42."}
event: usage
data: {"type":"usage","input_tokens":30,"output_tokens":50}
event: done
data: {"type":"done"}
Client Libraries
Generated from the OpenAPI spec. Install and go.
Python
=
# Chat
=
# Stream
# Multi-model conversation
=
=
=
See clients/python/ for setup.
TypeScript
import { LlmShim } from "./src/index.ts";
const client = new LlmShim();
// Chat
const resp = await client.chat("claude-sonnet-4-6", "Hello!", { max_tokens: 500 });
console.log(resp.message.content);
// Stream
for await (const chunk of client.stream("gpt-5.4", "Write a poem")) {
if (chunk.type === "content") process.stdout.write(chunk.text!);
}
See clients/typescript/ for setup.
curl
Interactive CLI
$ llmshim
llmshim — multi-provider LLM chat
1. GPT-5.4
2. Claude Opus 4.6
3. Claude Sonnet 4.6
...
Select model [1-9]: 3
you: What is Rust?
Claude Sonnet 4.6: [thinking in dim grey...]
Rust is a systems programming language...
[2.1s · ↑ 30 · ↓ 150 tokens]
you: /model gpt
Switched to: GPT-5.4
you: Tell me more
GPT-5.4: [continues the conversation...]
Commands: /model (switch by name, number, or fuzzy match), /clear, /history, /quit
Key features
- Multi-model conversations — switch providers mid-chat, history carries over
- Reasoning/thinking — visible chain-of-thought from OpenAI, Anthropic, and Gemini
- Streaming — token-by-token with thinking in dim grey, answers in default color
- Vision/images — send images in any format, auto-translated between providers
- Retry + fallback chains — automatic failover across providers with exponential backoff
- Cross-provider translation — tool calls, system messages, and provider-specific fields all handled automatically
- JSONL logging —
llmshim chat --log llmshim.log - OpenAPI spec — generate clients for any language from
api/openapi.yaml
Architecture
No canonical struct. Requests flow as serde_json::Value — each provider maps only what it understands. Adding a provider = implementing one trait with three methods.
llmshim::completion(router, request)
→ router.resolve("anthropic/claude-sonnet-4-6")
→ provider.transform_request(model, &value)
→ HTTP
→ provider.transform_response(model, body)
Build & Test