Siumai - Unified LLM Interface Library for Rust
Siumai (η§ε) is a unified LLM interface library for Rust that provides a consistent API across multiple AI providers. It features capability-based trait separation, type-safe parameter handling, and comprehensive streaming support.
π― Two Ways to Use Siumai
Siumai offers two distinct approaches to fit your needs:
Provider- For provider-specific clients with access to all featuresSiumai::builder()- For unified interface with provider-agnostic code
Choose Provider when you need provider-specific features, or Siumai::builder() when you want maximum portability.
π Features
- π Multi-Provider Support: OpenAI, Anthropic Claude, Google Gemini, Ollama, and custom providers
- π― Capability-Based Design: Separate traits for chat, audio, vision, tools, and embeddings
- π§ Builder Pattern: Fluent API with method chaining for easy configuration
- π Streaming Support: Full streaming capabilities with event processing
- π‘οΈ Type Safety: Leverages Rust's type system for compile-time safety
- π Parameter Mapping: Automatic translation between common and provider-specific parameters
- π¦ HTTP Customization: Support for custom reqwest clients and HTTP configurations
- π¨ Multimodal: Support for text, images, and audio content
- β‘ Async/Await: Built on tokio for high-performance async operations
- π Retry Mechanisms: Intelligent retry with exponential backoff and jitter
- π‘οΈ Error Handling: Advanced error classification with recovery suggestions
- β Parameter Validation: Cross-provider parameter validation and optimization
π Quick Start
Add Siumai to your Cargo.toml:
[]
# By default, all providers are included
= "0.10"
= { = "1.0", = ["full"] }
ποΈ Feature Selection
Siumai allows you to include only the providers you need, reducing compilation time and binary size:
[]
# Only OpenAI
= { = "0.10", = ["openai"] }
# Multiple specific providers
= { = "0.10", = ["openai", "anthropic", "google"] }
# All providers (same as default)
= { = "0.10", = ["all-providers"] }
# Only local AI (Ollama)
= { = "0.10", = ["ollama"] }
Available Features
| Feature | Providers | Description |
|---|---|---|
openai |
OpenAI + compatible | OpenAI, DeepSeek, OpenRouter, SiliconFlow |
anthropic |
Anthropic | Claude models with thinking mode |
google |
Gemini models with multimodal capabilities | |
ollama |
Ollama | Local AI models |
xai |
xAI | Grok models with reasoning |
groq |
Groq | Ultra-fast inference |
all-providers |
All | Complete provider support (default) |
Provider-Specific Clients
Use Provider when you need access to provider-specific features:
// Cargo.toml: siumai = { version = "0.10", features = ["openai"] }
use models;
use *;
async
Unified Interface
Use Siumai::builder() when you want provider-agnostic code:
// Cargo.toml: siumai = { version = "0.10", features = ["anthropic"] }
use models;
use *;
async
π‘ Feature Tip: When using specific providers, make sure to enable the corresponding feature in your
Cargo.toml. If you try to use a provider without its feature enabled, you'll get a compile-time error with a helpful message.
use models;
use *;
async
Multimodal Messages
use *;
// Create a message with text and image - use builder for complex messages
let message = user
.with_image
.build;
let request = builder
.messages
.build;
Streaming
use *;
use StreamExt;
// Create a streaming request
let stream = client.chat_stream.await?;
// Process stream events
let response = collect_stream_response.await?;
println!;
ποΈ Architecture
Siumai uses a capability-based architecture that separates different AI functionalities:
Core Traits
ChatCapability: Basic chat functionalityAudioCapability: Text-to-speech and speech-to-textImageGenerationCapability: Image generation, editing, and variationsVisionCapability: Image analysis and understandingToolCapability: Function calling and tool usageEmbeddingCapability: Text embeddingsRerankCapability: Document reranking and relevance scoring
Provider-Specific Traits
OpenAiCapability: OpenAI-specific features (structured output, batch processing)AnthropicCapability: Anthropic-specific features (prompt caching, thinking mode)GeminiCapability: Google Gemini-specific features (search integration, code execution)
π Examples
Different Providers
Provider-Specific Clients
use models;
// OpenAI - with provider-specific features
let openai_client = openai
.api_key
.model
.temperature
.build
.await?;
// Anthropic - with provider-specific features
let anthropic_client = anthropic
.api_key
.model
.temperature
.build
.await?;
// Ollama - with provider-specific features
let ollama_client = ollama
.base_url
.model
.temperature
.build
.await?;
Unified Interface
use models;
// OpenAI through unified interface
let openai_unified = builder
.openai
.api_key
.model
.temperature
.build
.await?;
// Anthropic through unified interface
let anthropic_unified = builder
.anthropic
.api_key
.model
.temperature
.build
.await?;
// Ollama through unified interface
let ollama_unified = builder
.ollama
.base_url
.model
.temperature
.build
.await?;
Custom HTTP Client
use models;
use Duration;
let custom_client = builder
.timeout
.user_agent
.build?;
// With provider-specific client
let client = openai
.api_key
.model
.http_client
.build
.await?;
// With unified interface
let unified_client = builder
.openai
.api_key
.model
.http_client
.build
.await?;
Concurrent Usage with Clone
All clients support Clone for concurrent usage scenarios:
use *;
use Arc;
use task;
async
Direct Clone Usage
// Clone clients directly (lightweight operation)
let client1 = openai
.api_key
.model
.build
.await?;
let client2 = client1.clone; // Shares HTTP client and configuration
// Both clients can be used independently
let response1 = client1.chat.await?;
let response2 = client2.chat.await?;
Provider-Specific Features
use models;
// OpenAI with structured output (provider-specific client)
let openai_client = openai
.api_key
.model
.response_format
.frequency_penalty
.build
.await?;
// Anthropic with caching (provider-specific client)
let anthropic_client = anthropic
.api_key
.model
.cache_control
.thinking_budget
.build
.await?;
// Ollama with local model management (provider-specific client)
let ollama_client = ollama
.base_url
.model
.keep_alive
.num_ctx
.num_gpu
.build
.await?;
// Unified interface with reasoning (works across all providers)
let unified_client = builder
.anthropic // or .openai(), .ollama(), etc.
.api_key
.model
.temperature
.max_tokens
.reasoning // β
Unified reasoning interface
.reasoning_budget // β
Works across all providers
.build
.await?;
π Clone Support & Concurrent Usage
All siumai clients implement Clone for easy concurrent usage. The clone operation is lightweight as it shares the underlying HTTP client and configuration:
Basic Clone Usage
use *;
async
Concurrent Processing with Arc
use *;
use Arc;
use task;
async
Multi-Provider Concurrent Usage
use *;
use task;
async
Performance Note: Clone operations are lightweight because:
- HTTP clients use internal connection pooling (Arc-based)
- Configuration parameters are small and cheap to clone
- No duplicate network connections are created
Advanced Features
Parameter Validation and Optimization
use models;
use EnhancedParameterValidator;
let params = CommonParams ;
// Validate parameters for a specific provider
let validation_result = validate_for_provider?;
// Optimize parameters for better performance
let mut optimized_params = params.clone;
let optimization_report = optimize_for_provider;
Retry Mechanisms
use ;
let policy = new
.with_max_attempts
.with_initial_delay
.with_backoff_multiplier;
let executor = new;
let result = executor.execute.await?;
Error Handling and Classification
use ;
match client.chat_with_tools.await
π§ Configuration
Common Parameters
All providers support these common parameters:
model: Model nametemperature: Randomness (0.0-2.0)max_tokens: Maximum output tokenstop_p: Nucleus sampling parameterstop_sequences: Stop generation sequencesseed: Random seed for reproducibility
Provider-Specific Parameters
Each provider can have additional parameters:
OpenAI:
response_format: Output format controltool_choice: Tool selection strategyfrequency_penalty: Frequency penaltypresence_penalty: Presence penalty
Anthropic:
cache_control: Prompt caching settingsthinking_budget: Thinking process budgetsystem: System message handling
Ollama:
keep_alive: Model memory durationraw: Bypass templatingformat: Output format (json, etc.)numa: NUMA supportnum_ctx: Context window sizenum_gpu: GPU layers to use
Ollama Local AI Examples
Basic Chat with Local Model
use *;
// Connect to local Ollama instance
let client = ollama
.base_url
.model
.temperature
.build
.await?;
let messages = vec!;
let response = client.chat_with_tools.await?;
println!;
Advanced Ollama Configuration
use ;
let config = builder
.base_url
.model
.keep_alive // Keep model in memory
.num_ctx // Context window
.num_gpu // Use GPU acceleration
.numa // Enable NUMA
.think // Enable thinking mode for thinking models
.option
.build?;
let client = new_with_config;
// Generate text with streaming
let mut stream = client.generate_stream.await?;
while let Some = stream.next.await
Thinking Models with Ollama
use *;
// Use thinking models like DeepSeek-R1
let client = new
.ollama
.base_url
.model
.reasoning // Enable reasoning mode
.temperature
.build
.await?;
let messages = vec!;
let response = client.chat.await?;
// Access the model's thinking process
if let Some = &response.thinking
// Get the final answer
if let Some = response.content_text
OpenAI API Feature Examples
Responses API (OpenAI-Specific)
OpenAI's Responses API provides stateful conversations, background processing, and built-in tools:
use models;
use ;
use OpenAiConfig;
use OpenAiBuiltInTool;
use *;
// Create Responses API client with built-in tools
let config = new
.with_model
.with_responses_api
.with_built_in_tool;
let client = new;
// Basic chat with built-in tools
let messages = vec!;
let response = client.chat_with_tools.await?;
println!;
// Background processing for complex tasks
let complex_messages = vec!;
let background_response = client
.create_response_background
.await?;
// Check if background task is ready
let is_ready = client.is_response_ready.await?;
if is_ready
Text Embedding
use models;
use *;
use ;
use EmbeddingExtensions;
// Basic unified interface - works with any provider that supports embeddings
let client = builder
.openai
.api_key
.model
.build
.await?;
let texts = vec!;
let response = client.embed.await?;
println!;
// β¨ NEW: Advanced unified interface with task types and configuration
let gemini_client = builder
.gemini
.api_key
.model
.build
.await?;
// Use task type optimization for better results
let query_request = query;
let query_response = gemini_client.embed_with_config.await?;
let doc_request = document;
let doc_response = gemini_client.embed_with_config.await?;
// Custom configuration with task type and dimensions
let custom_request = new
.with_task_type
.with_dimensions;
let custom_response = gemini_client.embed_with_config.await?;
// Provider-specific interface for advanced features
let embeddings_client = openai
.api_key
.build
.await?;
let response = embeddings_client.embed.await?;
Text-to-Speech
use models;
use ;
use AudioCapability;
use TtsRequest;
let config = new;
let client = new;
let request = TtsRequest ;
let response = client.text_to_speech.await?;
write?;
Image Generation
Generate images using OpenAI DALL-E or SiliconFlow models:
use *;
use ImageGenerationCapability;
use ImageGenerationRequest;
// OpenAI DALL-E
let client = new
.openai
.api_key
.build
.await?;
let request = ImageGenerationRequest ;
let response = client.generate_images.await?;
for image in response.images
// SiliconFlow with advanced parameters
use siliconflow;
let siliconflow_client = new
.siliconflow
.api_key
.build
.await?;
let sf_request = ImageGenerationRequest ;
let sf_response = siliconflow_client.generate_images.await?;
Provider Matrix (Features/Env Vars)
The table below summarizes feature flags, default base URLs, and environment variables. Capabilities depend on models and may vary; use examples and tests to verify.
| Provider | Feature flag | Default base URL | Env var |
|---|---|---|---|
| OpenAI | openai |
https://api.openai.com/v1 | OPENAI_API_KEY |
| Anthropic | anthropic |
https://api.anthropic.com | ANTHROPIC_API_KEY |
| Google (Gemini) | google |
https://generativeai.googleapis.com | GEMINI_API_KEY |
| Groq | groq |
https://api.groq.com/openai/v1 | GROQ_API_KEY |
| xAI | xai |
https://api.x.ai/v1 | XAI_API_KEY |
| Ollama (local) | ollama |
http://localhost:11434 | (none) |
| OpenAIβCompatible (DeepSeek/OpenRouter/SiliconFlow) | openai |
provider specific | varies (e.g., DEEPSEEK_API_KEY) |
Notes:
- Enable providers via Cargo features (selective compile) or use default
all-providers. - Capabilities (chat, streaming, embeddings, vision, images, tools, rerank) depend on provider and model.
π§ͺ Testing
Unit and Mock Tests
Run the standard test suite (no API keys required):
Integration Tests
Run mock integration tests:
Real LLM Integration Tests
β οΈ These tests use real API keys and make actual API calls!
Siumai includes comprehensive integration tests that verify functionality against real LLM providers. These tests are ignored by default to prevent accidental API usage.
Quick Setup
-
Set API keys (you only need keys for providers you want to test):
# ... other providers -
Run tests:
# Test all available providers # Test specific provider
Using Helper Scripts
For easier setup, use the provided scripts that automatically load .env files:
# Create .env file from template (optional)
# Edit .env with your API keys
# Run the script
# Linux/macOS
# Windows
Test Coverage
Each provider test includes:
- β Non-streaming chat: Basic request/response
- π Streaming chat: Real-time response streaming
- π’ Embeddings: Text embedding generation (if supported)
- π§ Reasoning: Advanced reasoning/thinking capabilities (if supported)
Supported Providers
| Provider | Chat | Streaming | Embeddings | Reasoning | Rerank | Images |
|---|---|---|---|---|---|---|
| OpenAI | β | β | β | β (o1) | β | β |
| Anthropic | β | β | β | β (thinking) | β | β |
| Gemini | β | β | β | β (thinking) | β | β |
| DeepSeek | β | β | β | β (reasoner) | β | β |
| OpenRouter | β | β | β | β (o1 models) | β | β |
| SiliconFlow | β | β | β | β (reasoner) | β | β |
| Groq | β | β | β | β | β | β |
| xAI | β | β | β | β (Grok) | β | β |
See tests/README.md for detailed instructions.
Examples
Run examples:
π Documentation
π οΈ Developer Documentation
- Adding OpenAI-Compatible Providers - Step-by-step guide for contributors
- OpenAI-Compatible Architecture - Architecture design and principles
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
π License
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
π Acknowledgments
- Inspired by the need for a unified LLM interface in Rust
- Built with love for the Rust community
- Special thanks to all contributors
Made with β€οΈ by the YumchaLabs team