Crate api_ollama

Crate api_ollama 

Source
Expand description

Ollama local LLM runtime API client.

This crate provides HTTP client functionality for Ollama’s local LLM runtime API, following the “Thin Client, Rich API” governing principle.

§Governing Principle : “Thin Client, Rich API”

This library exposes all server-side functionality transparently while maintaining zero client-side intelligence or automatic behaviors. This ensures:

§1. API Transparency

  • Every method directly corresponds to an Ollama API endpoint
  • No hidden transformations or side effects
  • Method names clearly indicate exact server calls

§2. Zero Client Intelligence

  • No automatic decision-making or behavior inference
  • No automatic configuration-driven actions without explicit enabling
  • All behaviors are explicitly requested by developers

§3. Explicit Control

  • Developers have complete control over when and how API calls are made
  • No background operations without explicit configuration
  • Clear separation between information retrieval and action methods

§4. Information vs Action

  • Information methods (like list_models()) only retrieve data
  • Action methods (like chat()) only perform requested operations
  • No methods that implicitly combine information gathering with actions

§Enterprise Reliability Features

The following enterprise reliability features are explicitly allowed when implemented with explicit configuration and transparent operation:

  • Configurable Retry Logic: Exponential backoff with explicit configuration
  • Circuit Breaker Pattern: Failure threshold management with transparent state
  • Rate Limiting: Request throttling with explicit rate configuration
  • Failover Support: Multi-endpoint configuration and automatic switching
  • Health Checks: Periodic endpoint health verification and monitoring

§State Management Policy

✅ ALLOWED: Runtime-Stateful, Process-Stateless

  • Connection pools, circuit breaker state, rate limiting buckets
  • Retry logic state, failover state, health check state
  • Runtime state that dies with the process
  • No persistent storage or cross-process state

❌ PROHIBITED: Process-Persistent State

  • File storage, databases, configuration accumulation
  • State that survives process restarts

Implementation Requirements:

  • Feature gating behind cargo features (retry, circuit_breaker, rate_limiting, failover, health_checks)
  • Explicit configuration required (no automatic enabling)
  • Transparent method naming (e.g., execute_with_retries(), execute_with_circuit_breaker())
  • Zero overhead when features disabled

This principle ensures predictable, explicit, and transparent behavior while supporting production-ready reliability features when explicitly requested.

§api_ollama

stable

Rust HTTP client for the Ollama local LLM runtime API.

§🎯 Architecture: Stateless HTTP Client

This API crate is designed as a stateless HTTP client with zero persistence requirements. It provides:

  • Direct HTTP calls to the Ollama API
  • In-memory operation state only (resets on restart)
  • No external storage dependencies (databases, files, caches)
  • No configuration persistence beyond environment variables

This ensures lightweight, containerized deployments and eliminates operational complexity.

§🏛️ Governing Principle: “Thin Client, Rich API”

Expose Ollama’s API directly without abstraction layers, enabling developers to access all capabilities with explicit control.

Key principles:

  • API Transparency: Every method directly corresponds to an Ollama API endpoint
  • Zero Client Intelligence: No automatic decision-making or behavior inference
  • Explicit Control: Developers control when and how API calls are made
  • Information vs Action: Clear separation between data retrieval and state changes

§Scope

§In Scope

  • Chat completions (single and multi-turn)
  • Text generation from prompts
  • Model management (list, pull, push, copy, delete)
  • Embeddings generation
  • Streaming responses
  • Tool/function calling
  • Vision support (image inputs)
  • Enterprise reliability (retry, circuit breaker, rate limiting, failover, health checks)
  • Synchronous API wrappers

§Out of Scope

  • Audio processing (Ollama API limitation)
  • Content moderation (Ollama API limitation)
  • High-level abstractions or unified interfaces
  • Business logic or application features

§Features

Core Capabilities:

  • Chat completions with configurable parameters
  • Text generation from prompts
  • Model listing and information
  • Embeddings generation
  • Real-time streaming responses
  • Tool/function calling support
  • Vision support for image inputs
  • Builder patterns for request construction

Enterprise Reliability:

  • Exponential backoff retry logic
  • Circuit breaker pattern
  • Token bucket rate limiting
  • Automatic endpoint failover
  • Health monitoring
  • Response caching with TTL

API Patterns:

  • Async API (tokio-based)
  • Sync API (blocking wrappers)
  • Streaming control (pause/resume/cancel)
  • Dynamic configuration

§Installation

[dependencies]
api_ollama = { version = "0.1.0", features = ["full"] }

§Quick Start

use api_ollama::{ OllamaClient, ChatRequest, ChatMessage, MessageRole };

#[tokio::main]
async fn main() -> Result< (), Box< dyn std::error::Error > >
{
  let mut client = OllamaClient::new(
    "http://localhost:11434".to_string(),
    std::time::Duration::from_secs( 30 )
  );

  // Check availability
  if !client.is_available().await
  {
    println!( "Ollama is not available" );
    return Ok( () );
  }

  // List available models
  let models = client.list_models().await?;
  println!( "Available models: {:?}", models );

  // Send chat request
  let request = ChatRequest
  {
    model: "llama3.2".to_string(),
    messages: vec![ ChatMessage
    {
      role: MessageRole::User,
      content: "Hello!".to_string(),
      images: None,
      #[cfg( feature = "tool_calling" )]
      tool_calls: None,
    }],
    stream: None,
    options: None,
    #[cfg( feature = "tool_calling" )]
    tools: None,
    #[cfg( feature = "tool_calling" )]
    tool_messages: None,
  };

  let response = client.chat( request ).await?;
  println!( "Response: {:?}", response );

  Ok( () )
}

§Feature Flags

FeatureDescription
enabledMaster switch for basic functionality
streamingReal-time streaming responses
embeddingsText embedding generation
vision_supportImage inputs for vision models
tool_callingFunction/tool calling support
builder_patternsFluent builder APIs
retryExponential backoff retry
circuit_breakerCircuit breaker pattern
rate_limitingToken bucket rate limiting
failoverAutomatic endpoint failover
health_checksEndpoint health monitoring
request_cachingResponse caching with TTL
sync_apiSynchronous blocking API
fullEnable all features

§Testing

# Unit tests
cargo nextest run

# Integration tests (requires running Ollama)
cargo nextest run --features integration

# Full validation
w3 .test level::3

Testing Policy: Integration tests require a running Ollama instance. Tests fail clearly when Ollama is unavailable.

§Documentation

§Dependencies

  • reqwest: HTTP client with async support
  • tokio: Async runtime
  • serde/serde_json: Serialization
  • error_tools: Unified error handling

§License

MIT

Re-exports§

pub use crate::tokens::TokenCountRequest;
pub use crate::tokens::TokenCountResponse;
pub use crate::tokens::CostEstimation;
pub use crate::tokens::BatchTokenRequest;
pub use crate::tokens::BatchTokenResponse;
pub use crate::tokens::TokenValidationConfig;
pub use crate::tokens::ModelTokenCapabilities;
pub use crate::cached_content::CachedContentRequest;
pub use crate::cached_content::CachedContentResponse;
pub use crate::cached_content::ContentCacheConfig;
pub use crate::cached_content::CacheInvalidationRequest;
pub use crate::cached_content::CacheInvalidationResponse;
pub use crate::cached_content::CachePerformanceMetrics;
pub use crate::cached_content::IntelligentCacheManager;
pub use crate::batch_operations::BatchChatRequest;
pub use crate::batch_operations::BatchChatResponse;
pub use crate::batch_operations::BatchGenerateRequest;
pub use crate::batch_operations::BatchGenerateResponse;
pub use crate::batch_operations::BatchOperationConfig;
pub use crate::batch_operations::BatchResult;
pub use crate::batch_operations::BatchError;
pub use crate::safety_settings::SafetyConfiguration;
pub use crate::safety_settings::HarmPreventionLevel;
pub use crate::safety_settings::ContentType;
pub use crate::safety_settings::ComplianceMode;
pub use crate::safety_settings::ContentFilterRequest;
pub use crate::safety_settings::ContentFilterResponse;
pub use crate::safety_settings::FilterCategory;
pub use crate::safety_settings::SeverityLevel;
pub use crate::safety_settings::SafetyAction;
pub use crate::safety_settings::HarmClassificationRequest;
pub use crate::safety_settings::HarmClassificationResponse;
pub use crate::safety_settings::HarmType;
pub use crate::safety_settings::HarmCategory;
pub use crate::safety_settings::SafetyPolicyEnforcement;
pub use crate::safety_settings::EnforcementLevel;
pub use crate::safety_settings::EscalationRule;
pub use crate::safety_settings::EscalationTrigger;
pub use crate::safety_settings::EscalationAction;
pub use crate::safety_settings::ComplianceReporting;
pub use crate::safety_settings::ReportFrequency;
pub use crate::safety_settings::ComplianceAuditTrail;
pub use crate::safety_settings::SafetyAssessment;
pub use crate::safety_settings::ComplianceStatus;
pub use crate::safety_settings::ComplianceReportRequest;
pub use crate::safety_settings::ReportType;
pub use crate::safety_settings::ReportFormat;
pub use crate::safety_settings::ComplianceReportResponse;
pub use crate::safety_settings::SafetyStatus;
pub use crate::safety_settings::SafetyPerformanceMetrics;
pub use crate::safety_settings::validate_safety_configuration;

Modules§

audio
Audio processing functionality for the Ollama API client
auth
Authentication and secret management functionality for Ollama API client.
batch_operations
Batch Operations functionality for the Ollama API client
buffered_streaming
Buffered Streaming for Smoother UX
builders
Request builder patterns for Ollama API.
cached_content
Cached Content functionality for the Ollama API client
chat
Chat completion types for Ollama API.
circuit_breaker
Circuit breaker implementation for preventing cascading failures.
client
Ollama HTTP client implementation.
compression
HTTP Compression for Request/Response Optimization
curl_diagnostics
CURL Diagnostics for Debugging
diagnostics
General diagnostics types and implementation for tracking and analyzing requests.
dynamic_config
Dynamic configuration management for runtime updates.
embeddings
Embeddings generation types for Ollama API.
enhanced_function_calling
Enhanced function calling with type-safe execution and validation.
enhanced_retry
Enhanced retry logic for Ollama API client with exponential backoff and jitter.
enterprise_quota
Enterprise Quota Management and Usage Tracking
exposed
Exposed namespace of the module.
failover
Failover implementation for handling multiple endpoints.
generate
Text generation types for Ollama API.
health_checks
Health check types and implementation for monitoring endpoint availability.
helpers
Helper functions for creating tool definitions with type safety
input_validation
Input validation framework for request types.
logging
Structured logging support for Ollama API client.
messages
Message types for chat conversations.
model_comparison
Model Comparison for A/B Testing
models_additional
Additional model information types for Ollama API.
models_enhanced
Enhanced model details types for Ollama API.
models_info
Model information types for Ollama API.
models_operations
Model operations types for Ollama API.
orchestration
Orchestration helpers for managing tool execution workflows
orphan
Orphan namespace of the module.
own
Own namespace of the module.
prelude
Prelude to use essentials: use my_module ::prelude :: *.
rate_limiter
Rate limiting implementation for controlling request rates.
request_cache
Request caching implementation with TTL and LRU eviction.
request_templates
Request Templates for Common Use Cases
safety_settings
Safety Settings functionality for the Ollama API client
stream_control
Stream control functionality for Ollama API.
sync_api
Synchronous API wrapper for Ollama client.
tokens
Token counting functionality for the Ollama API client
tuning
Model tuning functionality for Ollama API client.
websocket
WebSocket streaming functionality for Ollama API client.
workspace
Workspace secret management for Ollama client.

Structs§

AudioProcessingConfig
Configuration for audio processing operations
AudioStreamChunk
Individual chunk in an audio stream
AudioStreamReceiver
Stream receiver for audio processing operations
AudioStreamRequest
Request structure for audio streaming operations
AuthHelper
Authentication helper functions
BenchmarkResults
Benchmark results for multiple tasks
BenchmarkTaskResult
Benchmark task result
CacheEntry
Cache entry with metadata
CacheStats
Cache statistics for monitoring
ChatMessage
Enhanced message with vision support
ChatRequest
Chat completion request
ChatRequestBuilder
Builder for ChatRequest with fluent API
ChatResponse
Chat completion response
CircuitBreaker
Circuit breaker implementation for preventing cascading failures
CircuitBreakerConfig
Configuration for circuit breaker behavior
ComprehensiveModelInfo
Comprehensive model information
ComprehensiveReport
Comprehensive diagnostics report
ConfigBackup
Configuration backup for rollback functionality
ConfigDiff
Configuration difference calculation
ConfigVersion
Versioned configuration entry
ConnectionPool
WebSocket connection pool for connection reuse
ControlledStream
Wrapper for streams with control capabilities
DataPreprocessor
Data preprocessing utilities
DataUploadResult
Training data upload result
DeleteModelRequest
Request for deleting a model
DiagnosticsCollector
Diagnostics collector for tracking and analyzing requests
DiagnosticsConfig
Configuration for diagnostics collection
DynamicConfig
Dynamic configuration management for runtime updates
DynamicConfigManager
Dynamic configuration manager for runtime updates
EmbeddingsRequest
Embeddings generation request
EmbeddingsRequestBuilder
Builder for EmbeddingsRequest with fluent API
EmbeddingsResponse
Embeddings generation response
EndpointInfo
Endpoint information with health tracking
EnhancedModelDetails
Enhanced model details with comprehensive metadata
ErrorAnalysis
Error analysis data for diagnostics
ErrorClassifier
Error classifier for determining retry eligibility
FailoverManager
Failover manager for handling multiple endpoints
FailoverStats
Failover statistics and monitoring data
GenerateRequest
Text generation request
GenerateRequestBuilder
Builder for GenerateRequest with fluent API
GenerateResponse
Text generation response
HealthCheckConfig
Configuration for health check behavior
HealthMetrics
Health metrics for monitoring and reporting
HealthStatus
Health status information for an endpoint
HyperParameters
Hyperparameters for model training
LocalModelStorageInfo
Local model storage information
Message
Message in chat conversation
MessageQueue
Message queue for reliable delivery
ModelBenchmark
Model benchmark configuration
ModelCheckpoint
Model checkpoint information
ModelDetails
Model details
ModelDiagnostics
Model diagnostics information
ModelEntry
Model entry from tags/list endpoint
ModelEvaluation
Model evaluation metrics
ModelHealthCheck
Model health check result
ModelInfo
Model information
ModelLifecycleStatus
Model lifecycle status information
ModelMetadata
Comprehensive model metadata
ModelOperationHistoryEntry
Model operation history entry
ModelPerformanceMetrics
Model performance metrics
ModelProgressUpdate
Progress update for model operations
ModelRecommendation
Model family recommendation
ModelTuningConfig
Model tuning configuration
ModelTuningConfigBuilder
Model tuning configuration builder
ModelTuningJob
Model tuning job
ModelVersion
Model version information
OllamaClient
Ollama HTTP client
OllamaServerConfig
Ollama server configuration within workspace
PerformanceReport
Performance report data
PoolStatistics
Pool statistics
PooledConnection
WebSocket connection pool entry
PullModelRequest
Request for pulling a model
PushModelRequest
Request for pushing a model
QueueInfo
Queue information for WebSocket connections
QueuedMessage
Message queue entry for reliable delivery
RateLimiter
Enhanced rate limiter implementation
RateLimitingConfig
Configuration for enhanced rate limiting behavior
RecoveryStatus
Recovery status after connection issues
RequestCache
Request cache implementation with TTL and LRU eviction
RequestCacheConfig
Configuration for request caching behavior
RequestMetrics
Request metrics for diagnostics tracking
ResourceUsage
Resource usage metrics during training
RetryConfig
Configuration for retry behavior
RetryMetrics
Metrics for retry operations
RetryStats
Snapshot of retry statistics
RetryableHttpClient
Wrapper for HTTP operations with retry logic
SecretConfig
Configuration for secret management
SecretStore
Secure secret storage for credentials
ShowModelRequest
Request for showing detailed model information
SpeechToTextRequest
Request structure for speech-to-text conversion
SpeechToTextResponse
Response structure for speech-to-text conversion
StreamBuffer
Buffer for managing streaming data during pause states
StreamControl
Main stream control interface
StreamMetrics
Metrics for stream control operations
SyncApiConfig
Configuration for synchronous API operations
SyncApiConfigBuilder
Builder for SyncApiConfig
SyncOllamaClient
Synchronous Ollama client that wraps async operations
SyncRuntimeManager
Runtime manager for sync operations
TagsResponse
Response from tags endpoint listing available models
TextToSpeechRequest
Request structure for text-to-speech generation
TextToSpeechResponse
Response structure for text-to-speech generation
ThroughputReport
Throughput analysis report
ToolCall
Tool call information
ToolDefinition
Tool definition for function calling
ToolMessage
Tool message for function responses
ToolRegistry
Registry for managing and executing tools
TrainingData
Training data container
TrainingProgress
Training progress information
TrainingSample
Training data sample
ValidationResult
Training data validation result
VoiceChatRequest
Request structure for voice chat functionality
VoiceChatResponse
Response structure for voice chat functionality
WebSocketAuth
Authentication context for WebSocket connections
WebSocketChatStream
WebSocket chat stream for real-time conversation
WebSocketClient
WebSocket client implementation
WebSocketConfig
WebSocket configuration
WebSocketConnection
WebSocket connection wrapper
WebSocketMetrics
WebSocket connection metrics
WebSocketPool
WebSocket pool for managing multiple connections
WebSocketPoolConfig
WebSocket pool configuration
WindowMetrics
Metrics within a specific time window
WindowedMetrics
Time-windowed metrics for throughput analysis
WorkspaceConfig
Workspace configuration for Ollama client
WorkspaceSecretStore
Secret store that integrates with workspace_tools

Enums§

AudioFormat
Audio format enumeration supporting common audio file types
AuthStatus
Authentication status
CircuitBreakerState
Circuit breaker states
ConnectionType
Connection type for fallback scenarios
EndpointHealth
Endpoint health status
ErrorClassification
Classification of errors for retry decisions
FailoverPolicy
Failover policy enum defining how failover should be handled
HealthCheckStrategy
Health check strategy options
MessageRole
Message roles for vision-enabled chat
ModelLifecycle
Model lifecycle status
ModelOperation
Model operation types
RateLimitingAlgorithm
Rate limiting algorithm enumeration
StreamControlError
Errors related to stream control operations
StreamState
Streaming control functionality for pause/resume/cancel operations
TrainingObjective
Training objectives for different tasks
TuningJobStatus
Status of a model tuning job
TuningMethod
Fine-tuning methods available
WebSocketAuthMethod
WebSocket authentication method
WebSocketError
WebSocket-specific error types
WebSocketErrorHandling
WebSocket error handling strategy
WebSocketMessage
WebSocket message types
WebSocketState
WebSocket connection state

Traits§

ToolExecutor
Trait for executable tools with type-safe parameter handling

Functions§

calculate_retry_delay
Calculate delay for retry attempt with exponential backoff and jitter
execute_with_retries
Execute an operation with retry logic
retry_operation
Convenience function to create a retry operation from a closure

Type Aliases§

ModelProgressStream
Stream of progress updates
OllamaResult
Result type for Ollama API operations
ToolResult
Result type for tool execution