llmposter
Test your LLM apps without burning tokens, waiting on rate limits, or chasing flaky network errors.
llmposter is a drop-in mock server for OpenAI, Anthropic, and Gemini APIs. Point your existing client at it instead of the real API and get deterministic, repeatable responses for every test run. Built in Rust. Zero runtime dependencies.
What it does
๐ฆ Rust library or standalone CLI โ Use it in-process with cargo add llmposter --dev for Rust tests, or run the llmposter CLI binary for language-agnostic testing, local development, and CI pipelines. Same engine, same fixtures, same behavior.
๐ฏ Speaks 4 real LLM API formats โ OpenAI Chat Completions, Anthropic Messages, Gemini generateContent, and OpenAI Responses API. Your client code doesn't change โ just swap the base URL.
๐ก Full streaming support โ SSE for OpenAI/Anthropic/Responses, JSON-array + SSE modes for Gemini. Streaming tool calls included. Per-frame latency and chunk size configurable.
๐งช Fixture-driven โ Define request โ response pairs in YAML or with a fluent builder API. Substring, regex, model, and provider matching. First-match-wins ordering. Validates at load time so typos don't survive to runtime.
๐ ๏ธ Tool calling โ Mock tool-use responses with full type fidelity. Globally unique tool-call IDs across requests. Works with multi-turn agent flows.
๐ฅ Failure injection โ Simulate real-world LLM pain: rate limits (429), server errors (5xx), latency, body corruption, mid-stream truncation, and genuine ConnectionReset transport disconnects. Test your retry logic, backoff, and error handling against realistic failure modes.
๐ Streaming chaos โ Seeded jitter (latency_jitter_ms), duplicated SSE frames, and probabilistic activation (probability, chaos_seed). Randomized but reproducible โ same seed + same request order = bit-identical chaos, so jitter-flavored tests never go flaky.
๐ Stateful multi-turn scenarios โ Named state machines for tool-call loops, retry sequences, and conversation branching. A fixture can require a specific state to match and advance the state on match โ ideal for agent testing.
โป๏ธ Hot-reload fixtures โ Edit a YAML file and the running server picks up changes automatically with --watch, or send kill -HUP <pid> like a traditional daemon. Invalid YAML leaves the previous fixtures serving โ partial edits never take down the server.
๐งต Response templating โ Render fixture responses through a Jinja-style template (content_template) at request time with access to user_message, model, provider, and the full request JSON. Behind the optional templating feature.
๐ Request capture & assertion โ Every request is captured. Call server.get_requests() to verify what your client actually sent. Asserts that complement your response testing.
๐ Authentication testing โ Bearer token auth with use-count expiration. Full OAuth 2.0 mock server (PKCE, device flow, refresh, revocation, OIDC discovery) behind a feature flag. Provider-specific 401 error shapes.
๐ฆ HTTP status echo โ GET /code/200, GET /code/429, etc. Mini-httpbin built in. Test client behavior against any HTTP status without writing a fixture.
โก Fast and deterministic โ Fixed IDs, sequential counters, no randomness. Tests run the same way every time. Rust async throughout โ each ServerBuilder::build() spawns a lightweight axum server on an OS-assigned port, so every #[tokio::test] gets its own isolated mock.
Quick Start (Library)
[]
= "0.4"
= { = "1", = ["macros", "rt-multi-thread"] }
= "0.13"
= "1"
use ;
async
Quick Start (CLI)
# Install via Homebrew
# Or install via Cargo
# Create fixtures
# Run server
# Point your app at http://127.0.0.1:8080
Supported Providers
| Route | Provider |
|---|---|
POST /v1/chat/completions |
OpenAI Chat Completions |
POST /v1/messages |
Anthropic Messages |
POST /v1/responses |
OpenAI Responses API |
POST /v1beta/models/{model}:generateContent |
Gemini |
POST /v1beta/models/{model}:streamGenerateContent |
Gemini (streaming) |
GET /code/200 (any 100โ599) |
HTTP status echo (mini-httpbin) |
All providers support streaming and non-streaming. For OpenAI, Anthropic, and Responses API, just swap the base URL โ the paths are identical to the real APIs. Gemini uses separate endpoints for streaming (streamGenerateContent) and non-streaming (generateContent).
Authentication
Bearer token enforcement on LLM endpoints โ off by default, fully backward compatible.
let server = new
.with_bearer_token // valid forever
.with_bearer_token_uses // expires after 1 use
.fixture
.build.await.unwrap;
// Requests must include: Authorization: Bearer test-token-123
OAuth 2.0 Mock Server
Full OAuth server via oauth-mock integration โ PKCE, device code, token refresh, revocation.
let server = new
.with_oauth_defaults // spawns OAuth server on separate port
.fixture
.build.await.unwrap;
let oauth_url = server.oauth_url.unwrap; // e.g. http://127.0.0.1:12345
// Point your client's token_url at oauth_url
// Tokens issued by the OAuth server are automatically valid on LLM endpoints
Documentation
- Getting Started โ Installation, first fixture, first test
- Fixtures โ YAML format, matching rules, tool calls
- Failure Simulation โ Error codes, latency, truncation, disconnect
- CLI Reference โ Flags, validate mode, verbose logging
- Library API โ Rust
ServerBuilder, programmatic fixtures - Spec Deviations โ Known gaps from real APIs
Provider Guides
- OpenAI Chat Completions โ Fields, streaming, error shapes
- Anthropic Messages โ Fields, streaming, error shapes
- Gemini generateContent โ Fields, streaming, camelCase
- OpenAI Responses API โ Fields, streaming events, envelopes
License
AGPL-3.0