saorsa-ai

A unified, multi-provider LLM API for Rust with streaming, tool calling, and model metadata.

Overview

saorsa-ai provides a single, vendor-agnostic API for interacting with large language models from multiple providers:

Anthropic (Claude) - Messages API with native tool use
OpenAI - Chat Completions API with function calling
Google Gemini - GenerateContent API with function calling
Ollama - Local inference via NDJSON chat API
OpenAI-compatible - Azure OpenAI, Groq, Mistral, OpenRouter, xAI, Cerebras, and any other OpenAI-compatible endpoint

All providers share the same request/response types, streaming events, and tool calling interface. Switch providers by changing a config value - no code changes required.

Quick Start

Add saorsa-ai to your Cargo.toml:

[dependencies]
saorsa-ai = "0.1"
tokio = { version = "1", features = ["full"] }

Non-Streaming Completion

use saorsa_ai::{
    CompletionRequest, Message, Provider, ProviderConfig, ProviderKind, ProviderRegistry,
};

#[tokio::main]
async fn main() -> saorsa_ai::Result<()> {
    let config = ProviderConfig::new(
        ProviderKind::Anthropic,
        std::env::var("ANTHROPIC_API_KEY").expect("set ANTHROPIC_API_KEY"),
        "claude-sonnet-4",
    );

    let registry = ProviderRegistry::default();
    let provider = registry.create(config)?;

    let request = CompletionRequest::new(
        "claude-sonnet-4",
        vec![Message::user("What is the capital of France?")],
        1024,
    );

    let response = provider.complete(request).await?;
    for block in &response.content {
        if let saorsa_ai::ContentBlock::Text { text } = block {
            println!("{text}");
        }
    }

    Ok(())
}

Streaming Completion

use saorsa_ai::{
    CompletionRequest, ContentDelta, Message, ProviderConfig, ProviderKind,
    ProviderRegistry, StreamEvent, StreamingProvider,
};

#[tokio::main]
async fn main() -> saorsa_ai::Result<()> {
    let config = ProviderConfig::new(
        ProviderKind::OpenAi,
        std::env::var("OPENAI_API_KEY").expect("set OPENAI_API_KEY"),
        "gpt-4o",
    );

    let registry = ProviderRegistry::default();
    let provider = registry.create(config)?;

    let request = CompletionRequest::new(
        "gpt-4o",
        vec![Message::user("Explain async/await in Rust")],
        2048,
    ).system("You are a helpful programming tutor.");

    let mut rx = provider.stream(request).await?;

    while let Some(event) = rx.recv().await {
        match event? {
            StreamEvent::ContentBlockDelta {
                delta: ContentDelta::TextDelta { text }, ..
            } => print!("{text}"),
            StreamEvent::MessageDelta { stop_reason, .. } => {
                if stop_reason.is_some() {
                    println!();
                }
            }
            _ => {}
        }
    }

    Ok(())
}

In-Process Local Inference (mistralrs / GGUF)

If you want to run fully in-process (single binary) without an external HTTP server, saorsa-ai provides an optional mistralrs-backed provider behind a feature flag.

Add the feature and the mistralrs dependency:

[dependencies]
saorsa-ai = { version = "0.1", features = ["mistralrs"] }
mistralrs = "0.7"
tokio = { version = "1", features = ["full"] }

Default download/cache location for model files (Hugging Face hub cache):

HF_HOME/hub if HF_HOME is set
otherwise ~/.cache/huggingface/hub

use std::sync::Arc;

use saorsa_ai::{CompletionRequest, Message, MistralrsConfig, MistralrsProvider, StreamingProvider};

#[tokio::main]
async fn main() -> saorsa_ai::Result<()> {
    // Load a GGUF model (downloads/caches under the HF hub cache).
    // Provide the HF repo id + GGUF filename(s).
    let model = mistralrs::GgufModelBuilder::new(
        "TheBloke/CodeLlama-7B-Instruct-GGUF".to_string(),
        vec!["codellama-7b-instruct.Q4_K_M.gguf".to_string()],
    )
    .with_force_cpu()
    .build()
    .await
    .map_err(|e| saorsa_ai::SaorsaAiError::Provider {
        provider: "mistralrs".into(),
        message: e.to_string(),
    })?;

    let provider = MistralrsProvider::new(Arc::new(model), MistralrsConfig::default());

    // MVP: text-only (no tools).
    let request = CompletionRequest::new(
        "local",
        vec![Message::user("Write a short Rust function that adds two i32 values.")],
        256,
    )
    .system("You are a helpful programming assistant.");

    let mut rx = provider.stream(request).await?;
    while let Some(ev) = rx.recv().await {
        if let saorsa_ai::StreamEvent::ContentBlockDelta {
            delta: saorsa_ai::ContentDelta::TextDelta { text },
            ..
        } = ev?
        {
            print!("{text}");
        }
    }
    Ok(())
}

Provider Catalog

Anthropic (Claude)

Detail	Value
Endpoint	`https://api.anthropic.com/v1/messages`
Auth	`x-api-key` header
Streaming	Server-Sent Events (SSE)
API version	`2023-06-01`

Models:

Model	Context	Tools	Vision
`claude-opus-4`	200k	Yes	Yes
`claude-sonnet-4`	200k	Yes	Yes
`claude-haiku-4`	200k	Yes	Yes
`claude-3-5-sonnet`	200k	Yes	Yes
`claude-3-5-haiku`	200k	Yes	Yes
`claude-3-opus`	200k	Yes	Yes

let config = ProviderConfig::new(
    ProviderKind::Anthropic,
    "sk-ant-...",
    "claude-sonnet-4",
);

OpenAI

Detail	Value
Endpoint	`https://api.openai.com/v1/chat/completions`
Auth	`Authorization: Bearer`
Streaming	Server-Sent Events (SSE)

Models:

Model	Context	Tools	Vision
`gpt-4o`	128k	Yes	Yes
`gpt-4o-mini`	128k	Yes	Yes
`gpt-4-turbo`	128k	Yes	Yes
`o1`	200k	Yes	Yes
`o3-mini`	200k	Yes	No

let config = ProviderConfig::new(
    ProviderKind::OpenAi,
    "sk-...",
    "gpt-4o",
);

Google Gemini

Detail	Value
Endpoint	`https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent`
Auth	`x-goog-api-key` header
Streaming	SSE via `streamGenerateContent?alt=sse`

Models:

Model	Context	Tools	Vision
`gemini-2.0-flash`	1M	Yes	Yes
`gemini-1.5-pro`	2M	Yes	Yes
`gemini-1.5-flash`	1M	Yes	Yes

let config = ProviderConfig::new(
    ProviderKind::Gemini,
    "AIza...",
    "gemini-2.0-flash",
);

Ollama (Local)

Detail	Value
Endpoint	`http://localhost:11434/api/chat`
Auth	Optional Bearer token
Streaming	Newline-delimited JSON (NDJSON)

Models:

Model	Context	Tools	Vision
`llama3`	8k	No	No
`llama3.1`	131k	Yes	No
`codellama`	16k	No	No
`mistral`	32k	Yes	No
`mixtral`	32k	Yes	No
`llava`	4k	No	Yes

let config = ProviderConfig::new(
    ProviderKind::Ollama,
    "", // No API key needed for local
    "llama3.1",
).with_base_url("http://localhost:11434");

OpenAI-Compatible Providers

For any service that implements the OpenAI API format. Factory functions are provided for popular services:

use saorsa_ai::openai_compat;

// Azure OpenAI
let provider = openai_compat::azure_openai(
    "your-api-key",
    "https://your-resource.openai.azure.com",
    "your-deployment",
    "2024-02-01",
)?;

// Groq
let provider = openai_compat::groq("gsk_...", "llama-3.1-70b-versatile")?;

// Mistral
let provider = openai_compat::mistral("your-key", "mistral-large-latest")?;

// OpenRouter
let provider = openai_compat::openrouter("sk-or-...", "anthropic/claude-3.5-sonnet")?;

// xAI (Grok)
let provider = openai_compat::xai("xai-...", "grok-2")?;

// Cerebras
let provider = openai_compat::cerebras("csk-...", "llama3.1-70b")?;

For custom endpoints, use the builder:

use saorsa_ai::openai_compat::OpenAiCompatProvider;

let provider = OpenAiCompatProvider::builder(config)
    .url_path("/v2/chat/completions")  // Custom API path
    .auth_header("X-Custom-Key")       // Custom auth header
    .extra_header("X-Project-Id", "my-project")
    .build()?;

Streaming

All providers return a unified stream of StreamEvent values via a tokio mpsc::Receiver:

let mut rx = provider.stream(request).await?;

while let Some(event) = rx.recv().await {
    match event? {
        StreamEvent::MessageStart { id, model, usage } => {
            // Stream started
        }
        StreamEvent::ContentBlockStart { index, content_block } => {
            // New content block (text or tool use)
        }
        StreamEvent::ContentBlockDelta { index, delta } => {
            match delta {
                ContentDelta::TextDelta { text } => {
                    // Incremental text
                }
                ContentDelta::InputJsonDelta { partial_json } => {
                    // Incremental tool call JSON
                }
            }
        }
        StreamEvent::ContentBlockStop { index } => {
            // Content block complete
        }
        StreamEvent::MessageDelta { stop_reason, usage } => {
            // Final metadata (stop reason, token usage)
        }
        StreamEvent::MessageStop => {
            // Stream complete
        }
        StreamEvent::Ping => {
            // Keepalive
        }
        StreamEvent::Error { message } => {
            // Stream error
        }
    }
}

Each provider translates its native streaming format (SSE or NDJSON) into the same event sequence. A background tokio task handles the parsing.

Tool Calling

Define tools using JSON Schema and handle tool use/result cycles:

use saorsa_ai::{
    CompletionRequest, ContentBlock, Message, StopReason, ToolDefinition,
};

// 1. Define a tool
let tool = ToolDefinition::new(
    "get_weather",
    "Get the current weather for a city",
    serde_json::json!({
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "City name"
            }
        },
        "required": ["city"]
    }),
);

// 2. Send request with tools
let request = CompletionRequest::new("claude-sonnet-4", messages, 1024)
    .tools(vec![tool]);

let response = provider.complete(request).await?;

// 3. Handle tool use
if response.stop_reason == Some(StopReason::ToolUse) {
    for block in &response.content {
        if let ContentBlock::ToolUse { id, name, input } = block {
            // Execute the tool (your logic here)
            let result = execute_tool(name, input);

            // 4. Send result back
            messages.push(Message::tool_result(id, result));
        }
    }

    // 5. Continue the conversation with tool results
    let followup = CompletionRequest::new("claude-sonnet-4", messages, 1024);
    let final_response = provider.complete(followup).await?;
}

Tool calling works identically across all providers - saorsa-ai handles the format translation between Anthropic's native tool blocks, OpenAI's function calling, Gemini's function declarations, and Ollama's format.

Model Registry

Look up model metadata at runtime:

use saorsa_ai::models;

// Exact match
if let Some(info) = models::lookup_model("gpt-4o") {
    println!("Context: {} tokens", info.context_window);
    println!("Tools: {}", info.supports_tools);
    println!("Vision: {}", info.supports_vision);
}

// Prefix match (for versioned model IDs)
let info = models::lookup_model_by_prefix("claude-sonnet-4-5-20250929");
// Matches "claude-sonnet-4"

// Individual queries
let ctx = models::get_context_window("gemini-1.5-pro"); // Some(2_000_000)
let tools = models::supports_tools("llama3");            // Some(false)
let vision = models::supports_vision("gpt-4o");          // Some(true)

Token Counting

Estimate token usage for context window management:

use saorsa_ai::tokens;

// Estimate tokens in text (~4 chars per token)
let count = tokens::estimate_tokens("Hello, world!");

// Estimate message tokens (includes per-message overhead)
let msg_tokens = tokens::estimate_message_tokens(&message);

// Estimate full conversation
let total = tokens::estimate_conversation_tokens(&messages, Some("system prompt"));

// Check if conversation fits within model's context
let fits = tokens::fits_in_context(
    &messages,
    Some("system prompt"),
    "claude-sonnet-4",
    4096, // max output tokens
);

Token counting is heuristic-based (~4 characters per token for English). For precise counts, use provider-specific tokenizers.

Error Handling

All operations return Result<T, SaorsaAiError>:

pub enum SaorsaAiError {
    /// Provider-specific error
    Provider { provider: String, message: String },
    /// Authentication failure (invalid or missing API key)
    Auth(String),
    /// Network error (connection, DNS, timeout)
    Network(String),
    /// Rate limit exceeded
    RateLimit(String),
    /// Invalid request parameters
    InvalidRequest(String),
    /// Streaming error
    Streaming(String),
    /// Token limit exceeded
    TokenLimit(String),
    /// JSON serialization/deserialization error
    Json(serde_json::Error),
    /// I/O error
    Io(std::io::Error),
    /// Internal error
    Internal(String),
}

Core Types Reference

Type	Description
`Provider`	Trait for non-streaming completions
`StreamingProvider`	Trait extending `Provider` with streaming
`ProviderConfig`	Configuration for creating a provider
`ProviderKind`	Enum of provider types (`Anthropic`, `OpenAi`, `Gemini`, `Ollama`, `OpenAiCompatible`)
`ProviderRegistry`	Factory for creating providers from config
`CompletionRequest`	Builder for completion requests
`CompletionResponse`	Parsed completion response
`Message`	Conversation message (user, assistant, tool result)
`Role`	Message role (`User`, `Assistant`)
`ContentBlock`	Message content (`Text`, `ToolUse`, `ToolResult`)
`ContentDelta`	Streaming delta (`TextDelta`, `InputJsonDelta`)
`StreamEvent`	Streaming event (message start/stop, content deltas, errors)
`StopReason`	Why generation stopped (`EndTurn`, `MaxTokens`, `StopSequence`, `ToolUse`)
`Usage`	Token usage (`input_tokens`, `output_tokens`)
`ToolDefinition`	Tool schema for function calling
`ModelInfo`	Model metadata (context window, capabilities)

Dependencies

Crate	Purpose
`reqwest`	HTTP client (rustls-tls)
`reqwest-eventsource`	Server-Sent Events parsing
`tokio`	Async runtime
`futures`	Async stream utilities
`async-trait`	Async trait support
`serde` / `serde_json`	JSON serialization
`tracing`	Structured logging
`thiserror`	Error type derivation

Minimum Supported Rust Version

The MSRV is 1.88 (Rust Edition 2024). This is enforced in CI.

License

Licensed under either of:

at your option.

Contributing

Part of the saorsa-tui workspace. See the workspace root for contribution guidelines.

saorsa-ai 0.4.0

saorsa-ai

Overview

Quick Start

Non-Streaming Completion

Streaming Completion

In-Process Local Inference (mistralrs / GGUF)

Provider Catalog

Anthropic (Claude)

OpenAI

Google Gemini

Ollama (Local)

OpenAI-Compatible Providers

Streaming

Tool Calling

Model Registry

Token Counting

Error Handling

Core Types Reference

Dependencies

Minimum Supported Rust Version

License

Contributing