Siumai — Unified LLM Interface for Rust
Siumai (烧卖) is a type-safe Rust library that provides a single, consistent API over multiple LLM providers. It focuses on clear abstractions, predictable behavior, and practical extensibility.
This README keeps things straightforward: what you can do, how to customize, and short examples.
What It Provides
-
Unified clients for multiple providers (OpenAI, Anthropic, Google Gemini, Ollama, Groq, xAI, and OpenAI‑compatible vendors)
-
Capability traits for chat, streaming, tools, vision, audio, files, embeddings, and rerank
-
Streaming with start/delta/usage/end events and cancellation
-
Tool calling and a lightweight orchestrator for multi‑step workflows
-
Structured outputs:
-
Provider‑native structured outputs (OpenAI/Anthropic/Gemini, etc.)
-
Provider‑agnostic decoding helpers with JSON repair and validation (via
siumai-extras)
-
-
HTTP interceptors, middleware, and a simple retry facade
-
Optional extras for telemetry, OpenTelemetry, schema validation, and server adapters
Install
[]
= "0.11.0-beta.6"
= { = "1", = ["rt-multi-thread", "macros"] }
Migration (beta.6)
Upgrading from 0.11.0-beta.4 (or earlier)?
-
See
docs/migration/migration-0.11.0-beta.6.md- Note: legacy method-style entry points are treated as compatibility surface; the explicit module is
siumai::compat.
- Note: legacy method-style entry points are treated as compatibility surface; the explicit module is
Feature flags (enable only what you need):
# One provider
= { = "0.11.0-beta.6", = ["openai"] }
# Multiple providers
= { = "0.11.0-beta.6", = ["openai", "anthropic", "google"] }
# All
= { = "0.11.0-beta.6", = ["all-providers"] }
Note: siumai enables openai by default. Disable defaults via default-features = false.
Optional package for advanced utilities:
[]
= "0.11.0-beta.6"
= { = "0.11.0-beta.6", = ["schema", "telemetry", "opentelemetry", "server", "mcp"] }
Usage
Construction order
For new code, prefer construction modes in this order:
-
registry-firstfor application code and cross-provider routing -
config-firstfor provider-specific setup and tests -
builder conveniencefor quick setup, migration, and side-by-side comparison
Rule of thumb:
-
reach for
registry::global().language_model("provider:model")?in app code -
reach for
*Client::from_config(*Config::new(...))in provider-specific code -
treat
Siumai::builder()and provider builders as convenience wrappers, not the architectural center
Public surface map
Use public surfaces by intent, not by habit:
-
App-level routing and default usage:
registry::global()+ the six family APIstext::{generate, stream},embedding::embed,image::generate,rerank::rerank,speech::synthesize, andtranscription::transcribe -
Provider-specific construction:
siumai::providers::<provider>::*Config+*Client::from_config(...) -
Provider-specific typed escape hatches:
siumai::provider_ext::<provider>::options::*, request ext traits, and typed response metadata helpers -
Migration and quick demos:
siumai::compat::SiumaiandProvider::*()builders -
Last-resort vendor knobs: raw
with_provider_option(...)when a typed provider extension does not exist yet
Policy for new features:
-
no new capability should be builder-only
-
typed provider knobs should live under
provider_ext::<provider>before recommending raw provider-option maps -
docs and examples should present
registry-first -> config-first -> builder conveniencein that order -
docs should treat the six family modules
text,embedding,image,rerank,speech, andtranscriptionas the primary public entry points
Registry (recommended)
Use the registry to resolve models via provider:model and get a handle with a uniform API.
use *;
async
Note: OpenAI routing via the registry uses the Responses API by default. If you specifically need
Chat Completions (POST /chat/completions) for a specific request, override it via provider options:
use *;
use ;
let req = new.with_openai_options;
Supported examples of provider:model:
-
openai:gpt-4o,openai:gpt-4o-mini -
anthropic:claude-3-5-sonnet-20240620 -
anthropic-vertex:claude-3-5-sonnet-20240620 -
gemini:gemini-2.0-flash-exp -
groq:llama-3.1-70b-versatile -
xai:grok-beta -
ollama:llama3.2 -
minimaxi:minimax-text-01
OpenAI‑compatible vendors follow the same pattern (API keys read as {PROVIDER_ID}_API_KEY when possible). See docs for details.
OpenAI-compatible vendors (config-first)
Typed vendor views such as siumai::provider_ext::openrouter and siumai::provider_ext::perplexity
are helper layers over the same compat runtime; they do not imply that every preset should grow
into a separate full provider package.
For OpenAI-compatible providers like Moonshot/OpenRouter/DeepSeek, you can use the built-in vendor registry:
use *;
use OpenAiCompatibleClient;
async
Notes:
-
OpenAiCompatibleClient::from_builtin_envreads API keys from env using this precedence:-
ProviderConfig.api_key_env(when present) -
ProviderConfig.api_key_env_aliases(fallbacks) -
${PROVIDER_ID}_API_KEY(uppercased,-replaced with_)
-
-
To discover built-in OpenAI-compatible provider ids, call
siumai::providers::openai_compatible::list_provider_ids().
Provider clients (config-first)
Provider-specific client:
use *;
use ;
async
MiniMaxi (config-first):
use models;
use *;
use ;
async
OpenAI‑compatible (custom base URL):
use *;
use ;
async
Rerank (registry-first)
use *;
async
Provider-specific rerank setup should still prefer config-first clients plus typed request extensions. Runnable references:
siumai/examples/05-integrations/registry/rerank.rssiumai/examples/04-provider-specific/cohere/rerank.rssiumai/examples/04-provider-specific/togetherai/rerank.rssiumai/examples/04-provider-specific/bedrock/rerank.rs
OpenAI endpoint routing (Responses vs Chat Completions)
Siumai supports both OpenAI chat endpoints:
-
Responses API:
POST /responses(default) -
Chat Completions:
POST /chat/completions(override viaproviderOptions.openai.responsesApi.enabled = false)
If you need to override the default on a per-request basis, set providerOptions.openai.responsesApi.enabled
explicitly on the ChatRequest.
Builder convenience (compat)
Builder-style construction remains available as a temporary compatibility surface.
It is useful for quick demos and migration, but it is not the recommended default for new code.
Recommended order:
-
first: registry-first for app-level code
-
second: config-first for provider-specific code
-
third: builder convenience for quick setup and comparison
If you still want the builder style, prefer importing it explicitly from siumai::compat:
use Siumai;
let client = builder
.openai
.api_key
.model
.build
.await?;
Compatibility note: planned removal target is no earlier than 0.12.0.
For details, see docs/migration/migration-0.11.0-beta.6.md.
Builder policy note: builders are expected to converge on the same config-first construction path.
If a feature matters for real usage, it should also be reachable from provider config/client APIs and,
when appropriate, typed provider extensions under provider_ext.
Streaming
use StreamExt;
use *;
async
Streaming cancellation
chat_stream_with_cancel returns a ChatStreamHandle with a first-class CancelHandle.
Cancellation is wakeable: it can stop a pending next().await immediately (useful for both SSE and WebSocket streams).
use StreamExt;
use *;
async
OpenAI WebSocket streaming (Responses API)
If you have many sequential streaming steps (e.g., tool loops), OpenAI's WebSocket mode can reduce
TTFB by reusing a persistent connection. Enable the feature and inject the transport:
Note: base_url must use http:// or https:// (it is converted to ws:// / wss:// internally).
WebSocket mode only applies to Responses streaming (POST /responses). It is not compatible with
Chat Completions (POST /chat/completions).
# Cargo.toml
= { = "0.11.0-beta.6", = ["openai-websocket"] }
use StreamExt;
use *;
use ;
use Arc;
async
OpenAI WebSocket session (warm-up + single connection)
For agentic workflows with many sequential streaming steps, prefer a single-connection session
so previous_response_id continuation stays unambiguous:
This session also includes a conservative recovery strategy:
-
if WebSocket setup fails (transient/connectivity), it falls back to HTTP (SSE) streaming for that request
-
for some WebSocket-specific OpenAI errors, it may rebuild the connection and retry once
Note: configuration errors (e.g. invalid base_url, unsupported URL scheme) are surfaced directly and do not fall back to HTTP.
You can customize it, e.g. disable all recovery:
OpenAiWebSocketSession::from_config_default_http(cfg)?.with_recovery_config(OpenAiWebSocketRecoveryConfig { allow_http_fallback: false, max_ws_retries: 0 });
Important: recovery may rebuild the WebSocket connection (or fall back to HTTP), which resets
connection-local continuation state (previous_response_id). If you strictly rely on continuation
via a single warm connection, consider disabling recovery.
When recovery happens, the session also emits ChatStreamEvent::Custom with event_type="openai:ws-recovery".
OpenAiWebSocketSession also attempts best-effort remote cancellation when using chat_stream_with_cancel(...)
by calling POST /responses/{id}/cancel once the response id is observed. Disable via session.with_remote_cancel(false).
use StreamExt;
use *;
use ;
async
Structured output
1) Provider‑agnostic decoding (recommended for cross‑provider flows)
Use siumai-extras to parse model text into typed JSON with optional schema validation and repair:
use Deserialize;
use *;
use generate_object;
async
Under the hood this uses siumai_extras::structured_output::OutputDecodeConfig to:
-
enforce shape hints (object/array/enum)
-
optionally validate against a JSON Schema
-
repair common issues (markdown fences, trailing commas, partial slices)
2) Provider‑native structured outputs (example: OpenAI Responses API)
For providers that expose native structured outputs, configure them via provider options.
You still can combine them with the decoding helpers above if you want:
use *;
use ;
use json;
let schema = json!;
let req = new
.message
.build
.with_openai_options;
let resp = generate.await?;
// Optionally: further validate/repair/deserialize using `siumai-extras` helpers.
Retries
use *;
use ;
let text = client
.ask_with_retry
.await?;
Customization
-
HTTP client and headers
-
Middleware chain (defaults, clamping, reasoning extraction)
-
HTTP interceptors (request/response hooks, SSE observation)
-
Retry options and backoff
HTTP configuration
You have three practical ways to control HTTP behavior, from simple to advanced.
- Provider config +
HttpConfig(most common)
use *;
use ;
let http_cfg = builder
.timeout
.connect_timeout
.user_agent
.header
.stream_disable_compression // keep SSE stable; default can be controlled by env
.build;
let cfg = new
.with_model
.with_http_config;
let client = from_config?;
HttpConfigbuilder + shared client builder (centralized configuration)
use build_http_client_from_config;
use *;
use ;
// Construct a reusable HTTP config
let http_cfg = builder
.timeout
.connect_timeout
.user_agent
.proxy
.header
.stream_disable_compression // explicit SSE stability
.build;
// Build reqwest client using the shared helper
let http = build_http_client_from_config?;
let cfg = new
.with_model
.with_http_config;
let client = new;
- Fully custom reqwest client (maximum control)
use *;
use ;
let http = builder
.timeout
// .danger_accept_invalid_certs(true) // if needed for dev
.build?;
let cfg = new
.with_model;
let client = new;
Notes:
-
Streaming stability: By default,
stream_disable_compressionis derived fromSIUMAI_STREAM_DISABLE_COMPRESSION(true unless set tofalse|0|off|no). You can override it per client usingHttpConfig::builder().stream_disable_compression(...). -
Builder-style HTTP toggles remain available, but they are part of the builder compatibility surface. Prefer
HttpConfig+ registry/config-first clients for new code.
Registry with custom middleware and interceptors:
use *;
use chain_default_and_clamp;
use LoggingInterceptor;
use ;
use HashMap;
use Arc;
let reg = create_provider_registry;
Extras (siumai-extras)
-
Telemetry subscribers and helpers
-
OpenTelemetry middleware (W3C Trace Context)
-
JSON schema validation
-
Server adapters (Axum SSE)
-
MCP utilities
See the siumai-extras crate for details and examples.
Examples
Examples are under siumai/examples/:
-
01-quickstart — basic chat, streaming, provider switching
-
02-core-api — chat, streaming, tools, multimodal
-
03-advanced-features — middleware, retry, orchestrator, error types
-
04-provider-specific — provider‑unique capabilities
-
05-integrations — registry, MCP, telemetry
-
06-applications — chatbot, code assistant, API server
Typical commands:
Status and notes
-
OpenAI Responses API
web_searchis wired throughhosted_tools::openai::web_searchand the OpenAI Responses pipeline, but is still considered experimental and may change. -
Several modules were reorganized in 0.11: HTTP helpers live under
execution::http::*, Vertex helpers underauth::vertex. See CHANGELOG for migration notes.
API keys and environment variables:
-
OpenAI:
.api_key(..)orOPENAI_API_KEY -
Anthropic:
.api_key(..)orANTHROPIC_API_KEY -
Groq:
.api_key(..)orGROQ_API_KEY -
Gemini:
.api_key(..)orGEMINI_API_KEY -
Bedrock: prefer
BedrockConfig::with_region(...)+ caller-supplied SigV4 headers inHttpConfig.headers;BEDROCK_API_KEYis available only for Bearer/proxy compatibility -
xAI:
.api_key(..)orXAI_API_KEY -
Ollama: no API key
-
OpenAI‑compatible via Registry: reads
{PROVIDER_ID}_API_KEY(e.g.,DEEPSEEK_API_KEY) -
OpenAI‑compatible via Builder:
.api_key(..)or{PROVIDER_ID}_API_KEY
For Bedrock-specific guidance, see siumai/examples/04-provider-specific/bedrock/README.md.
Compatibility note: OpenAI-compatible builder entry points remain available, but they belong to the builder compatibility surface and are not the recommended default for new code.
Acknowledgements
This project draws inspiration from:
-
Vercel AI SDK (adapter patterns)
-
Cherry Studio (transformer design)
Changelog and license
See CHANGELOG.md for detailed changes and migration tips.
Licensed under either of:
-
Apache License, Version 2.0, or
-
MIT license
at your option.