Expand description
Ollama local LLM runtime API client.
This crate provides HTTP client functionality for Ollama’s local LLM runtime API, following the “Thin Client, Rich API” governing principle.
§Governing Principle : “Thin Client, Rich API”
This library exposes all server-side functionality transparently while maintaining zero client-side intelligence or automatic behaviors. This ensures:
§1. API Transparency
- Every method directly corresponds to an Ollama API endpoint
- No hidden transformations or side effects
- Method names clearly indicate exact server calls
§2. Zero Client Intelligence
- No automatic decision-making or behavior inference
- No automatic configuration-driven actions without explicit enabling
- All behaviors are explicitly requested by developers
§3. Explicit Control
- Developers have complete control over when and how API calls are made
- No background operations without explicit configuration
- Clear separation between information retrieval and action methods
§4. Information vs Action
- Information methods (like
list_models()) only retrieve data - Action methods (like
chat()) only perform requested operations - No methods that implicitly combine information gathering with actions
§Enterprise Reliability Features
The following enterprise reliability features are explicitly allowed when implemented with explicit configuration and transparent operation:
- Configurable Retry Logic: Exponential backoff with explicit configuration
- Circuit Breaker Pattern: Failure threshold management with transparent state
- Rate Limiting: Request throttling with explicit rate configuration
- Failover Support: Multi-endpoint configuration and automatic switching
- Health Checks: Periodic endpoint health verification and monitoring
§State Management Policy
✅ ALLOWED: Runtime-Stateful, Process-Stateless
- Connection pools, circuit breaker state, rate limiting buckets
- Retry logic state, failover state, health check state
- Runtime state that dies with the process
- No persistent storage or cross-process state
❌ PROHIBITED: Process-Persistent State
- File storage, databases, configuration accumulation
- State that survives process restarts
Implementation Requirements:
- Feature gating behind cargo features (
retry,circuit_breaker,rate_limiting,failover,health_checks) - Explicit configuration required (no automatic enabling)
- Transparent method naming (e.g.,
execute_with_retries(),execute_with_circuit_breaker()) - Zero overhead when features disabled
This principle ensures predictable, explicit, and transparent behavior while supporting production-ready reliability features when explicitly requested.
§api_ollama
Rust HTTP client for the Ollama local LLM runtime API.
§🎯 Architecture: Stateless HTTP Client
This API crate is designed as a stateless HTTP client with zero persistence requirements. It provides:
- Direct HTTP calls to the Ollama API
- In-memory operation state only (resets on restart)
- No external storage dependencies (databases, files, caches)
- No configuration persistence beyond environment variables
This ensures lightweight, containerized deployments and eliminates operational complexity.
§🏛️ Governing Principle: “Thin Client, Rich API”
Expose Ollama’s API directly without abstraction layers, enabling developers to access all capabilities with explicit control.
Key principles:
- API Transparency: Every method directly corresponds to an Ollama API endpoint
- Zero Client Intelligence: No automatic decision-making or behavior inference
- Explicit Control: Developers control when and how API calls are made
- Information vs Action: Clear separation between data retrieval and state changes
§Scope
§In Scope
- Chat completions (single and multi-turn)
- Text generation from prompts
- Model management (list, pull, push, copy, delete)
- Embeddings generation
- Streaming responses
- Tool/function calling
- Vision support (image inputs)
- Enterprise reliability (retry, circuit breaker, rate limiting, failover, health checks)
- Synchronous API wrappers
§Out of Scope
- Audio processing (Ollama API limitation)
- Content moderation (Ollama API limitation)
- High-level abstractions or unified interfaces
- Business logic or application features
§Features
Core Capabilities:
- Chat completions with configurable parameters
- Text generation from prompts
- Model listing and information
- Embeddings generation
- Real-time streaming responses
- Tool/function calling support
- Vision support for image inputs
- Builder patterns for request construction
Enterprise Reliability:
- Exponential backoff retry logic
- Circuit breaker pattern
- Token bucket rate limiting
- Automatic endpoint failover
- Health monitoring
- Response caching with TTL
API Patterns:
- Async API (tokio-based)
- Sync API (blocking wrappers)
- Streaming control (pause/resume/cancel)
- Dynamic configuration
§Installation
[dependencies]
api_ollama = { version = "0.1.0", features = ["full"] }§Quick Start
use api_ollama::{ OllamaClient, ChatRequest, ChatMessage, MessageRole };
#[tokio::main]
async fn main() -> Result< (), Box< dyn std::error::Error > >
{
let mut client = OllamaClient::new(
"http://localhost:11434".to_string(),
std::time::Duration::from_secs( 30 )
);
// Check availability
if !client.is_available().await
{
println!( "Ollama is not available" );
return Ok( () );
}
// List available models
let models = client.list_models().await?;
println!( "Available models: {:?}", models );
// Send chat request
let request = ChatRequest
{
model: "llama3.2".to_string(),
messages: vec![ ChatMessage
{
role: MessageRole::User,
content: "Hello!".to_string(),
images: None,
#[cfg( feature = "tool_calling" )]
tool_calls: None,
}],
stream: None,
options: None,
#[cfg( feature = "tool_calling" )]
tools: None,
#[cfg( feature = "tool_calling" )]
tool_messages: None,
};
let response = client.chat( request ).await?;
println!( "Response: {:?}", response );
Ok( () )
}§Feature Flags
| Feature | Description |
|---|---|
enabled | Master switch for basic functionality |
streaming | Real-time streaming responses |
embeddings | Text embedding generation |
vision_support | Image inputs for vision models |
tool_calling | Function/tool calling support |
builder_patterns | Fluent builder APIs |
retry | Exponential backoff retry |
circuit_breaker | Circuit breaker pattern |
rate_limiting | Token bucket rate limiting |
failover | Automatic endpoint failover |
health_checks | Endpoint health monitoring |
request_caching | Response caching with TTL |
sync_api | Synchronous blocking API |
full | Enable all features |
§Testing
# Unit tests
cargo nextest run
# Integration tests (requires running Ollama)
cargo nextest run --features integration
# Full validation
w3 .test level::3Testing Policy: Integration tests require a running Ollama instance. Tests fail clearly when Ollama is unavailable.
§Documentation
- Implementation Roadmap - Feature priorities and guidelines
- Examples - Runnable code examples
- Tests - Test documentation
- Specification - Technical specification
§Dependencies
- reqwest: HTTP client with async support
- tokio: Async runtime
- serde/serde_json: Serialization
- error_tools: Unified error handling
§License
MIT
Re-exports§
pub use crate::tokens::TokenCountRequest;pub use crate::tokens::TokenCountResponse;pub use crate::tokens::CostEstimation;pub use crate::tokens::BatchTokenRequest;pub use crate::tokens::BatchTokenResponse;pub use crate::tokens::TokenValidationConfig;pub use crate::tokens::ModelTokenCapabilities;pub use crate::cached_content::CachedContentRequest;pub use crate::cached_content::CachedContentResponse;pub use crate::cached_content::ContentCacheConfig;pub use crate::cached_content::CacheInvalidationRequest;pub use crate::cached_content::CacheInvalidationResponse;pub use crate::cached_content::CachePerformanceMetrics;pub use crate::cached_content::IntelligentCacheManager;pub use crate::batch_operations::BatchChatRequest;pub use crate::batch_operations::BatchChatResponse;pub use crate::batch_operations::BatchGenerateRequest;pub use crate::batch_operations::BatchGenerateResponse;pub use crate::batch_operations::BatchOperationConfig;pub use crate::batch_operations::BatchResult;pub use crate::batch_operations::BatchError;pub use crate::safety_settings::SafetyConfiguration;pub use crate::safety_settings::HarmPreventionLevel;pub use crate::safety_settings::ContentType;pub use crate::safety_settings::ComplianceMode;pub use crate::safety_settings::ContentFilterRequest;pub use crate::safety_settings::ContentFilterResponse;pub use crate::safety_settings::FilterCategory;pub use crate::safety_settings::SeverityLevel;pub use crate::safety_settings::SafetyAction;pub use crate::safety_settings::HarmClassificationRequest;pub use crate::safety_settings::HarmClassificationResponse;pub use crate::safety_settings::HarmType;pub use crate::safety_settings::HarmCategory;pub use crate::safety_settings::SafetyPolicyEnforcement;pub use crate::safety_settings::EnforcementLevel;pub use crate::safety_settings::EscalationRule;pub use crate::safety_settings::EscalationTrigger;pub use crate::safety_settings::EscalationAction;pub use crate::safety_settings::ComplianceReporting;pub use crate::safety_settings::ReportFrequency;pub use crate::safety_settings::ComplianceAuditTrail;pub use crate::safety_settings::SafetyAssessment;pub use crate::safety_settings::ComplianceStatus;pub use crate::safety_settings::ComplianceReportRequest;pub use crate::safety_settings::ReportType;pub use crate::safety_settings::ReportFormat;pub use crate::safety_settings::ComplianceReportResponse;pub use crate::safety_settings::SafetyStatus;pub use crate::safety_settings::SafetyPerformanceMetrics;pub use crate::safety_settings::validate_safety_configuration;
Modules§
- audio
- Audio processing functionality for the Ollama API client
- auth
- Authentication and secret management functionality for Ollama API client.
- batch_
operations - Batch Operations functionality for the Ollama API client
- buffered_
streaming - Buffered Streaming for Smoother UX
- builders
- Request builder patterns for Ollama API.
- cached_
content - Cached Content functionality for the Ollama API client
- chat
- Chat completion types for Ollama API.
- circuit_
breaker - Circuit breaker implementation for preventing cascading failures.
- client
- Ollama HTTP client implementation.
- compression
- HTTP Compression for Request/Response Optimization
- curl_
diagnostics - CURL Diagnostics for Debugging
- diagnostics
- General diagnostics types and implementation for tracking and analyzing requests.
- dynamic_
config - Dynamic configuration management for runtime updates.
- embeddings
- Embeddings generation types for Ollama API.
- enhanced_
function_ calling - Enhanced function calling with type-safe execution and validation.
- enhanced_
retry - Enhanced retry logic for Ollama API client with exponential backoff and jitter.
- enterprise_
quota - Enterprise Quota Management and Usage Tracking
- exposed
- Exposed namespace of the module.
- failover
- Failover implementation for handling multiple endpoints.
- generate
- Text generation types for Ollama API.
- health_
checks - Health check types and implementation for monitoring endpoint availability.
- helpers
- Helper functions for creating tool definitions with type safety
- input_
validation - Input validation framework for request types.
- logging
- Structured logging support for Ollama API client.
- messages
- Message types for chat conversations.
- model_
comparison - Model Comparison for A/B Testing
- models_
additional - Additional model information types for Ollama API.
- models_
enhanced - Enhanced model details types for Ollama API.
- models_
info - Model information types for Ollama API.
- models_
operations - Model operations types for Ollama API.
- orchestration
- Orchestration helpers for managing tool execution workflows
- orphan
- Orphan namespace of the module.
- own
- Own namespace of the module.
- prelude
- Prelude to use essentials:
use my_module ::prelude :: *. - rate_
limiter - Rate limiting implementation for controlling request rates.
- request_
cache - Request caching implementation with TTL and LRU eviction.
- request_
templates - Request Templates for Common Use Cases
- safety_
settings - Safety Settings functionality for the Ollama API client
- stream_
control - Stream control functionality for Ollama API.
- sync_
api - Synchronous API wrapper for Ollama client.
- tokens
- Token counting functionality for the Ollama API client
- tuning
- Model tuning functionality for Ollama API client.
- websocket
- WebSocket streaming functionality for Ollama API client.
- workspace
- Workspace secret management for Ollama client.
Structs§
- Audio
Processing Config - Configuration for audio processing operations
- Audio
Stream Chunk - Individual chunk in an audio stream
- Audio
Stream Receiver - Stream receiver for audio processing operations
- Audio
Stream Request - Request structure for audio streaming operations
- Auth
Helper - Authentication helper functions
- Benchmark
Results - Benchmark results for multiple tasks
- Benchmark
Task Result - Benchmark task result
- Cache
Entry - Cache entry with metadata
- Cache
Stats - Cache statistics for monitoring
- Chat
Message - Enhanced message with vision support
- Chat
Request - Chat completion request
- Chat
Request Builder - Builder for
ChatRequestwith fluent API - Chat
Response - Chat completion response
- Circuit
Breaker - Circuit breaker implementation for preventing cascading failures
- Circuit
Breaker Config - Configuration for circuit breaker behavior
- Comprehensive
Model Info - Comprehensive model information
- Comprehensive
Report - Comprehensive diagnostics report
- Config
Backup - Configuration backup for rollback functionality
- Config
Diff - Configuration difference calculation
- Config
Version - Versioned configuration entry
- Connection
Pool - WebSocket connection pool for connection reuse
- Controlled
Stream - Wrapper for streams with control capabilities
- Data
Preprocessor - Data preprocessing utilities
- Data
Upload Result - Training data upload result
- Delete
Model Request - Request for deleting a model
- Diagnostics
Collector - Diagnostics collector for tracking and analyzing requests
- Diagnostics
Config - Configuration for diagnostics collection
- Dynamic
Config - Dynamic configuration management for runtime updates
- Dynamic
Config Manager - Dynamic configuration manager for runtime updates
- Embeddings
Request - Embeddings generation request
- Embeddings
Request Builder - Builder for
EmbeddingsRequestwith fluent API - Embeddings
Response - Embeddings generation response
- Endpoint
Info - Endpoint information with health tracking
- Enhanced
Model Details - Enhanced model details with comprehensive metadata
- Error
Analysis - Error analysis data for diagnostics
- Error
Classifier - Error classifier for determining retry eligibility
- Failover
Manager - Failover manager for handling multiple endpoints
- Failover
Stats - Failover statistics and monitoring data
- Generate
Request - Text generation request
- Generate
Request Builder - Builder for
GenerateRequestwith fluent API - Generate
Response - Text generation response
- Health
Check Config - Configuration for health check behavior
- Health
Metrics - Health metrics for monitoring and reporting
- Health
Status - Health status information for an endpoint
- Hyper
Parameters - Hyperparameters for model training
- Local
Model Storage Info - Local model storage information
- Message
- Message in chat conversation
- Message
Queue - Message queue for reliable delivery
- Model
Benchmark - Model benchmark configuration
- Model
Checkpoint - Model checkpoint information
- Model
Details - Model details
- Model
Diagnostics - Model diagnostics information
- Model
Entry - Model entry from tags/list endpoint
- Model
Evaluation - Model evaluation metrics
- Model
Health Check - Model health check result
- Model
Info - Model information
- Model
Lifecycle Status - Model lifecycle status information
- Model
Metadata - Comprehensive model metadata
- Model
Operation History Entry - Model operation history entry
- Model
Performance Metrics - Model performance metrics
- Model
Progress Update - Progress update for model operations
- Model
Recommendation - Model family recommendation
- Model
Tuning Config - Model tuning configuration
- Model
Tuning Config Builder - Model tuning configuration builder
- Model
Tuning Job - Model tuning job
- Model
Version - Model version information
- Ollama
Client - Ollama HTTP client
- Ollama
Server Config - Ollama server configuration within workspace
- Performance
Report - Performance report data
- Pool
Statistics - Pool statistics
- Pooled
Connection - WebSocket connection pool entry
- Pull
Model Request - Request for pulling a model
- Push
Model Request - Request for pushing a model
- Queue
Info - Queue information for WebSocket connections
- Queued
Message - Message queue entry for reliable delivery
- Rate
Limiter - Enhanced rate limiter implementation
- Rate
Limiting Config - Configuration for enhanced rate limiting behavior
- Recovery
Status - Recovery status after connection issues
- Request
Cache - Request cache implementation with TTL and LRU eviction
- Request
Cache Config - Configuration for request caching behavior
- Request
Metrics - Request metrics for diagnostics tracking
- Resource
Usage - Resource usage metrics during training
- Retry
Config - Configuration for retry behavior
- Retry
Metrics - Metrics for retry operations
- Retry
Stats - Snapshot of retry statistics
- Retryable
Http Client - Wrapper for HTTP operations with retry logic
- Secret
Config - Configuration for secret management
- Secret
Store - Secure secret storage for credentials
- Show
Model Request - Request for showing detailed model information
- Speech
ToText Request - Request structure for speech-to-text conversion
- Speech
ToText Response - Response structure for speech-to-text conversion
- Stream
Buffer - Buffer for managing streaming data during pause states
- Stream
Control - Main stream control interface
- Stream
Metrics - Metrics for stream control operations
- Sync
ApiConfig - Configuration for synchronous API operations
- Sync
ApiConfig Builder - Builder for SyncApiConfig
- Sync
Ollama Client - Synchronous Ollama client that wraps async operations
- Sync
Runtime Manager - Runtime manager for sync operations
- Tags
Response - Response from tags endpoint listing available models
- Text
ToSpeech Request - Request structure for text-to-speech generation
- Text
ToSpeech Response - Response structure for text-to-speech generation
- Throughput
Report - Throughput analysis report
- Tool
Call - Tool call information
- Tool
Definition - Tool definition for function calling
- Tool
Message - Tool message for function responses
- Tool
Registry - Registry for managing and executing tools
- Training
Data - Training data container
- Training
Progress - Training progress information
- Training
Sample - Training data sample
- Validation
Result - Training data validation result
- Voice
Chat Request - Request structure for voice chat functionality
- Voice
Chat Response - Response structure for voice chat functionality
- WebSocket
Auth - Authentication context for WebSocket connections
- WebSocket
Chat Stream - WebSocket chat stream for real-time conversation
- WebSocket
Client - WebSocket client implementation
- WebSocket
Config - WebSocket configuration
- WebSocket
Connection - WebSocket connection wrapper
- WebSocket
Metrics - WebSocket connection metrics
- WebSocket
Pool - WebSocket pool for managing multiple connections
- WebSocket
Pool Config - WebSocket pool configuration
- Window
Metrics - Metrics within a specific time window
- Windowed
Metrics - Time-windowed metrics for throughput analysis
- Workspace
Config - Workspace configuration for Ollama client
- Workspace
Secret Store - Secret store that integrates with workspace_tools
Enums§
- Audio
Format - Audio format enumeration supporting common audio file types
- Auth
Status - Authentication status
- Circuit
Breaker State - Circuit breaker states
- Connection
Type - Connection type for fallback scenarios
- Endpoint
Health - Endpoint health status
- Error
Classification - Classification of errors for retry decisions
- Failover
Policy - Failover policy enum defining how failover should be handled
- Health
Check Strategy - Health check strategy options
- Message
Role - Message roles for vision-enabled chat
- Model
Lifecycle - Model lifecycle status
- Model
Operation - Model operation types
- Rate
Limiting Algorithm - Rate limiting algorithm enumeration
- Stream
Control Error - Errors related to stream control operations
- Stream
State - Streaming control functionality for pause/resume/cancel operations
- Training
Objective - Training objectives for different tasks
- Tuning
JobStatus - Status of a model tuning job
- Tuning
Method - Fine-tuning methods available
- WebSocket
Auth Method - WebSocket authentication method
- WebSocket
Error - WebSocket-specific error types
- WebSocket
Error Handling - WebSocket error handling strategy
- WebSocket
Message - WebSocket message types
- WebSocket
State - WebSocket connection state
Traits§
- Tool
Executor - Trait for executable tools with type-safe parameter handling
Functions§
- calculate_
retry_ delay - Calculate delay for retry attempt with exponential backoff and jitter
- execute_
with_ retries - Execute an operation with retry logic
- retry_
operation - Convenience function to create a retry operation from a closure
Type Aliases§
- Model
Progress Stream - Stream of progress updates
- Ollama
Result - Result type for Ollama API operations
- Tool
Result - Result type for tool execution