chat-rs
A multi-provider LLM framework for Rust. Build type-safe chat clients with tool calling, structured output, streaming, and embeddings — swap providers with a single line change.
Features
- Multi-provider — Gemini, Claude, OpenAI, DeepSeek, Ollama, Hugging Face, Cerebras, mistral.rs (local), generic OpenAI-compatible servers, generic Responses API servers, and Router today, more coming (see Roadmap)
- Router — route requests across multiple providers with fallback and custom strategies (keyword, embedding, capability-based)
- Type-safe builder — compile-time enforcement of valid configurations via type-state pattern
- Tool calling — define tools with
#[tool]in Rust, or load@tool-decorated Python scripts at runtime; the framework handles the call loop automatically - Structured output — deserialize model responses directly into your Rust types via
schemars - Streaming — real-time token-by-token output with tool call support
- Human in the loop — pause mid-turn on sensitive tool calls, let a human approve or reject, then resume the stream
- Embeddings — generate vector embeddings through the same unified API
- Retry & callbacks — configurable retry strategies with before/after hooks
- Native tools — provider-specific features like Google Search, code execution, web search
Quick Start
Add to your Cargo.toml:
[]
= { = "0.3.1", = ["openai"] }
= { = "1", = ["macros", "rt-multi-thread"] }
use ;
async
Set your API key via environment variable (OPENAI_API_KEY, GEMINI_API_KEY, or CLAUDE_API_KEY), or pass it explicitly with .with_api_key().
Providers
Enable providers via feature flags:
# Pick one or more
= { = "0.3.1", = ["gemini"] }
= { = "0.3.1", = ["claude"] }
= { = "0.3.1", = ["openai"] }
= { = "0.3.1", = ["ollama"] }
= { = "0.3.1", = ["huggingface"] }
= { = "0.3.1", = ["cerebras"] }
= { = "0.3.1", = ["completions"] }
= { = "0.3.1", = ["router", "gemini", "claude"] }
= { = "0.3.1", = ["gemini", "claude", "openai", "stream"] }
| Provider | Feature | API Key Env Var | Builder |
|---|---|---|---|
| Google Gemini | gemini |
GEMINI_API_KEY |
GeminiBuilder |
| Anthropic Claude | claude |
CLAUDE_API_KEY |
ClaudeBuilder |
| OpenAI | openai |
OPENAI_API_KEY |
OpenAIBuilder |
| DeepSeek | deepseek |
DEEPSEEK_API_KEY |
DeepSeekBuilder |
| Ollama (local) | ollama |
— (optional) | OllamaBuilder |
| Hugging Face Router | huggingface |
HF_TOKEN |
HuggingFaceBuilder |
| Cerebras | cerebras |
CEREBRAS_API_KEY |
CerebrasBuilder |
| mistral.rs (local in-process) | mistralrs |
— | MistralRsBuilder |
| Generic Chat Completions | completions |
depends on server | ChatCompletionsBuilder |
| Generic Responses API | responses |
depends on server | ResponsesBuilder |
| Router | router |
— | RouterBuilder |
The ollama, huggingface, cerebras, deepseek, and completions providers all share the same Chat Completions wire spec, factored into the chat-completions crate. The openai provider is a thin wrapper over chat-responses (the Responses API wire crate). Bring-your-own server: use ChatCompletionsBuilder for /v1/chat/completions servers (vLLM, llama.cpp, LiteLLM, etc.) or ResponsesBuilder for /responses servers.
For fully local in-process inference (no HTTP, no daemon), use the mistralrs provider — weights load into your process via mistral.rs.
Swapping providers is a one-line change — replace the builder, everything else stays the same:
// Gemini
let client = new
.with_model
.build;
// Claude
let client = new
.with_model
.build;
// OpenAI
let client = new
.with_model
.build;
// Ollama (local) — pulls the model if missing, then builds
let client = new
.with_model
.pull.await?
.build;
// Hugging Face Inference Providers
let client = new
.with_model
.build;
// Cerebras
let client = new
.with_model
.build;
// DeepSeek
let client = new
.with_model
.build;
// mistral.rs (local, in-process — no HTTP)
let client = new
.with_model
.with_gguf_file
.build.await?;
// Bring-your-own Chat Completions server (vLLM, llama.cpp, LiteLLM, ...)
let client = new
.with_base_url
.with_model
.with_api_key
.build;
// Bring-your-own Responses API server
let client = new
.with_base_url
.with_model
.with_api_key
.build;
// Same from here on
let mut chat = new.with_model.build;
Tool Calling
Define tools with the #[tool] macro from tools-rs and register them with collect_tools(). The framework automatically loops through tool calls until the model is done.
use ;
use ;
/// Looks up the current weather for a given city.
async
async
Python Tools
Load tools from Python scripts at runtime via the python feature (powered by tools-rs 0.3 + PyO3). Decorate functions with @tool() and point ToolsBuilder at a directory of .py files — they register alongside any native #[tool]s.
= { = "0.3.1", = ["gemini", "python"] }
# scripts/weather.py
"""Get the current weather in a city.
Args:
city: The city to look up.
"""
return
use ;
let tools = new
.with_language
.from_path
.collect?;
let mut chat = new
.with_tools
.with_model
.build;
PyO3 builds against the system Python; if your interpreter is newer than PyO3's max supported version, set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 when building.
Structured Output
Deserialize model responses directly into typed Rust structs. Your type must derive JsonSchema and Deserialize.
use JsonSchema;
use Deserialize;
let mut chat = new
.
.with_model
.build;
let response = chat.complete.await?;
println!;
Streaming
Enable the stream feature flag:
= { = "0.3.1", = ["gemini", "stream"] }
use StreamEvent;
use StreamExt;
let mut chat = new
.with_model
.build;
let mut stream = chat.stream.await?;
while let Some = stream.next.await
Human in the Loop
Mark tools that need human approval via #[tool] metadata and supply a strategy closure. When the model calls such a tool, chat.stream() yields StreamEvent::Paused(PauseReason) and terminates. Resolve the pending tools on messages (approve or reject), then call stream() again — the core loop picks up where it left off.
use ;
use ;
use Deserialize;
/// Sends an email.
async
let tools: = collect_tools?;
let scoped = new;
let mut chat = new
.with_model
.with_scoped_tools
.build;
let mut stream = chat.stream.await?;
while let Some = stream.next.await
// Call chat.stream(&mut messages) again to resume the same turn.
See examples/claude/hitl.rs, examples/openai/hitl.rs, and examples/gemini/hitl.rs for full interactive REPLs.
Embeddings
let client = new
.with_model
.with_embeddings
.build;
let mut chat = new
.with_model
.with_embeddings
.build;
let response = chat.embed.await?;
println!;
Native Tools
Provider-specific capabilities beyond standard tool calling:
// Gemini: Google Search, Code Execution, Google Maps
let client = new
.with_model
.with_google_search
.with_code_execution
.build;
// OpenAI: Web Search
let client = new
.with_model
.with_web_search
.build;
// OpenAI: Image Generation
let client = new
.with_model
.with_image_generation
.build;
OpenAI-Compatible Endpoints
For any server speaking the OpenAI Chat Completions wire spec (vLLM, llama.cpp's llama-server, LiteLLM, etc.), use ChatCompletionsBuilder directly:
use ChatCompletionsBuilder;
let client = new
.with_base_url
.with_model
.with_api_key // optional — omit for servers that don't require auth
.build;
Dedicated wrappers preset URL/env-var/auth for popular targets:
- Ollama —
OllamaBuilderdefaults tohttp://localhost:11434/v1, honorsOLLAMA_HOST, supports.pull()to fetch a model via Ollama's native API. - Hugging Face Router —
HuggingFaceBuilderdefaults tohttps://router.huggingface.co/v1, readsHF_TOKEN. - Cerebras —
CerebrasBuilderdefaults tohttps://api.cerebras.ai/v1, readsCEREBRAS_API_KEY. - DeepSeek —
DeepSeekBuilderdefaults tohttps://api.deepseek.com/v1, readsDEEPSEEK_API_KEY.
For endpoints implementing the OpenAI Responses API (POST /responses, a different wire format from Chat Completions), use ResponsesBuilder from the chat-responses crate, or OpenAIBuilder::with_custom_url() if you want to keep the OpenAI-specific defaults and native tools.
Router
Route requests across multiple providers with automatic fallback on retryable errors. Add a custom RoutingStrategy to control provider selection based on keywords, embeddings, capabilities, or any logic you need.
use ;
let gemini = new
.with_model
.build;
let claude = new
.with_model
.build;
let router = new
.add_provider
.add_provider
// .with_strategy(my_strategy) // optional custom routing
// .circuit_breaker(CircuitBreakerConfig::default()) // optional circuit breaker
.build;
let mut chat = new.with_model.build;
let mut msgs = from_user;
let res = chat.complete.await?;
Without a custom strategy, the router tries providers in order and falls back on retryable errors (rate limits, network issues). Non-retryable errors are returned immediately.
Enable the optional circuit breaker to automatically skip providers that have failed repeatedly, and probe them again after a configurable recovery timeout:
use CircuitBreakerConfig;
let router = new
.add_provider
.add_provider
.circuit_breaker
.build;
Streaming is also supported via StreamRouterBuilder — enable the stream feature flag and use providers that implement ChatProvider.
Transport Layer
Providers are generic over a pluggable Transport trait. The default transport is ReqwestTransport (HTTP via reqwest) — it's used automatically when you call .build() on any builder.
To share an HTTP client across providers:
use ;
let http = from;
let client = new
.with_model
.with_transport // Clone shares the connection pool
.build;
To use WebSocket transport (e.g. for OpenAI's Responses API over WS):
= { = "0.3.1", = ["openai", "stream", "tokio-tungstenite"] }
use ;
let ws = new
.with_message_type; // OpenAI WS envelope
let client = new
.with_model
.with_transport
.build;
Two WebSocket transports are available, feature-gated:
| Transport | Feature | Crate | Notes |
|---|---|---|---|
AsyncWsTransport |
tokio-tungstenite |
tokio-tungstenite | Fully async, recommended with tokio |
WsTransport |
tungstenite |
tungstenite | Sync WS bridged via spawn_blocking |
To use a fully custom transport (tower, hyper, WASM, etc.):
use Transport;
let client = new
.with_model
.with_transport
.build;
Transport implementations live in core/src/transport/impls/. See core/AGENTS.md for the Transport trait definition.
Architecture
chat-rs (root) ← Re-exports + feature flags
├── core/ ← Traits, types, Chat engine, builder, Transport trait + impls
├── providers/
│ ├── completions/ ← Generic OpenAI Chat Completions wire (`ChatCompletionsBuilder`)
│ ├── responses/ ← Generic OpenAI Responses API wire (`ResponsesBuilder`)
│ ├── gemini/ ← Google Gemini provider
│ ├── claude/ ← Anthropic Claude provider
│ ├── openai/ ← OpenAI (thin wrapper over `chat-responses` + embeddings + native tools)
│ ├── ollama/ ← Ollama wrapper (local daemon, pull/ping)
│ ├── huggingface/ ← Hugging Face Inference Providers (Router)
│ ├── cerebras/ ← Cerebras Inference
│ ├── deepseek/ ← DeepSeek
│ ├── mistralrs/ ← Local in-process inference (mistral.rs)
│ └── router/ ← Multi-provider router
└── examples/
├── completions/ ← Generic OAI-compat examples
├── gemini/ ← Gemini examples
├── claude/ ← Claude examples
├── openai/ ← OpenAI examples
├── ollama/ ← Ollama examples
├── huggingface/ ← Hugging Face examples
├── cerebras/ ← Cerebras examples
├── deepseek/ ← DeepSeek examples
├── mistralrs/ ← mistral.rs (local) examples
└── router/ ← Router strategy examples
See core/AGENTS.md and providers/AGENTS.md for detailed architecture documentation.
Examples
Run examples with the appropriate feature flags:
# Gemini
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
# Claude
# OpenAI
# Router
# Ollama (local)
# Hugging Face
# Cerebras
# DeepSeek
# mistral.rs (local in-process)
# Generic OpenAI-compatible server
# Retry strategies
Minimum Supported Rust Version
Rust 1.94 or later (edition 2024).
License
MIT