saorsa-ai
A unified, multi-provider LLM API for Rust with streaming, tool calling, and model metadata.
Overview
saorsa-ai provides a single, vendor-agnostic API for interacting with large language models from multiple providers:
- Anthropic (Claude) - Messages API with native tool use
- OpenAI - Chat Completions API with function calling
- Google Gemini - GenerateContent API with function calling
- Ollama - Local inference via NDJSON chat API
- OpenAI-compatible - Azure OpenAI, Groq, Mistral, OpenRouter, xAI, Cerebras, and any other OpenAI-compatible endpoint
All providers share the same request/response types, streaming events, and tool calling interface. Switch providers by changing a config value - no code changes required.
Quick Start
Add saorsa-ai to your Cargo.toml:
[]
= "0.1"
= { = "1", = ["full"] }
Non-Streaming Completion
use ;
async
Streaming Completion
use ;
async
In-Process Local Inference (mistralrs / GGUF)
If you want to run fully in-process (single binary) without an external HTTP server, saorsa-ai
provides an optional mistralrs-backed provider behind a feature flag.
Add the feature and the mistralrs dependency:
[]
= { = "0.1", = ["mistralrs"] }
= "0.7"
= { = "1", = ["full"] }
Default download/cache location for model files (Hugging Face hub cache):
HF_HOME/hubifHF_HOMEis set- otherwise
~/.cache/huggingface/hub
use Arc;
use ;
async
Provider Catalog
Anthropic (Claude)
| Detail | Value |
|---|---|
| Endpoint | https://api.anthropic.com/v1/messages |
| Auth | x-api-key header |
| Streaming | Server-Sent Events (SSE) |
| API version | 2023-06-01 |
Models:
| Model | Context | Tools | Vision |
|---|---|---|---|
claude-opus-4 |
200k | Yes | Yes |
claude-sonnet-4 |
200k | Yes | Yes |
claude-haiku-4 |
200k | Yes | Yes |
claude-3-5-sonnet |
200k | Yes | Yes |
claude-3-5-haiku |
200k | Yes | Yes |
claude-3-opus |
200k | Yes | Yes |
let config = new;
OpenAI
| Detail | Value |
|---|---|
| Endpoint | https://api.openai.com/v1/chat/completions |
| Auth | Authorization: Bearer |
| Streaming | Server-Sent Events (SSE) |
Models:
| Model | Context | Tools | Vision |
|---|---|---|---|
gpt-4o |
128k | Yes | Yes |
gpt-4o-mini |
128k | Yes | Yes |
gpt-4-turbo |
128k | Yes | Yes |
o1 |
200k | Yes | Yes |
o3-mini |
200k | Yes | No |
let config = new;
Google Gemini
| Detail | Value |
|---|---|
| Endpoint | https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent |
| Auth | x-goog-api-key header |
| Streaming | SSE via streamGenerateContent?alt=sse |
Models:
| Model | Context | Tools | Vision |
|---|---|---|---|
gemini-2.0-flash |
1M | Yes | Yes |
gemini-1.5-pro |
2M | Yes | Yes |
gemini-1.5-flash |
1M | Yes | Yes |
let config = new;
Ollama (Local)
| Detail | Value |
|---|---|
| Endpoint | http://localhost:11434/api/chat |
| Auth | Optional Bearer token |
| Streaming | Newline-delimited JSON (NDJSON) |
Models:
| Model | Context | Tools | Vision |
|---|---|---|---|
llama3 |
8k | No | No |
llama3.1 |
131k | Yes | No |
codellama |
16k | No | No |
mistral |
32k | Yes | No |
mixtral |
32k | Yes | No |
llava |
4k | No | Yes |
let config = new.with_base_url;
OpenAI-Compatible Providers
For any service that implements the OpenAI API format. Factory functions are provided for popular services:
use openai_compat;
// Azure OpenAI
let provider = azure_openai?;
// Groq
let provider = groq?;
// Mistral
let provider = mistral?;
// OpenRouter
let provider = openrouter?;
// xAI (Grok)
let provider = xai?;
// Cerebras
let provider = cerebras?;
For custom endpoints, use the builder:
use OpenAiCompatProvider;
let provider = builder
.url_path // Custom API path
.auth_header // Custom auth header
.extra_header
.build?;
Streaming
All providers return a unified stream of StreamEvent values via a tokio mpsc::Receiver:
let mut rx = provider.stream.await?;
while let Some = rx.recv.await
Each provider translates its native streaming format (SSE or NDJSON) into the same event sequence. A background tokio task handles the parsing.
Tool Calling
Define tools using JSON Schema and handle tool use/result cycles:
use ;
// 1. Define a tool
let tool = new;
// 2. Send request with tools
let request = new
.tools;
let response = provider.complete.await?;
// 3. Handle tool use
if response.stop_reason == Some
Tool calling works identically across all providers - saorsa-ai handles the format translation between Anthropic's native tool blocks, OpenAI's function calling, Gemini's function declarations, and Ollama's format.
Model Registry
Look up model metadata at runtime:
use models;
// Exact match
if let Some = lookup_model
// Prefix match (for versioned model IDs)
let info = lookup_model_by_prefix;
// Matches "claude-sonnet-4"
// Individual queries
let ctx = get_context_window; // Some(2_000_000)
let tools = supports_tools; // Some(false)
let vision = supports_vision; // Some(true)
Token Counting
Estimate token usage for context window management:
use tokens;
// Estimate tokens in text (~4 chars per token)
let count = estimate_tokens;
// Estimate message tokens (includes per-message overhead)
let msg_tokens = estimate_message_tokens;
// Estimate full conversation
let total = estimate_conversation_tokens;
// Check if conversation fits within model's context
let fits = fits_in_context;
Token counting is heuristic-based (~4 characters per token for English). For precise counts, use provider-specific tokenizers.
Error Handling
All operations return Result<T, SaorsaAiError>:
Core Types Reference
| Type | Description |
|---|---|
Provider |
Trait for non-streaming completions |
StreamingProvider |
Trait extending Provider with streaming |
ProviderConfig |
Configuration for creating a provider |
ProviderKind |
Enum of provider types (Anthropic, OpenAi, Gemini, Ollama, OpenAiCompatible) |
ProviderRegistry |
Factory for creating providers from config |
CompletionRequest |
Builder for completion requests |
CompletionResponse |
Parsed completion response |
Message |
Conversation message (user, assistant, tool result) |
Role |
Message role (User, Assistant) |
ContentBlock |
Message content (Text, ToolUse, ToolResult) |
ContentDelta |
Streaming delta (TextDelta, InputJsonDelta) |
StreamEvent |
Streaming event (message start/stop, content deltas, errors) |
StopReason |
Why generation stopped (EndTurn, MaxTokens, StopSequence, ToolUse) |
Usage |
Token usage (input_tokens, output_tokens) |
ToolDefinition |
Tool schema for function calling |
ModelInfo |
Model metadata (context window, capabilities) |
Dependencies
| Crate | Purpose |
|---|---|
reqwest |
HTTP client (rustls-tls) |
reqwest-eventsource |
Server-Sent Events parsing |
tokio |
Async runtime |
futures |
Async stream utilities |
async-trait |
Async trait support |
serde / serde_json |
JSON serialization |
tracing |
Structured logging |
thiserror |
Error type derivation |
Minimum Supported Rust Version
The MSRV is 1.88 (Rust Edition 2024). This is enforced in CI.
License
Licensed under either of:
at your option.
Contributing
Part of the saorsa-tui workspace. See the workspace root for contribution guidelines.