api_ollama
Rust HTTP client for the Ollama local LLM runtime API.
🎯 Architecture: Stateless HTTP Client
This API crate is designed as a stateless HTTP client with zero persistence requirements. It provides:
- Direct HTTP calls to the Ollama API
- In-memory operation state only (resets on restart)
- No external storage dependencies (databases, files, caches)
- No configuration persistence beyond environment variables
This ensures lightweight, containerized deployments and eliminates operational complexity.
🏛️ Governing Principle: "Thin Client, Rich API"
Expose Ollama's API directly without abstraction layers, enabling developers to access all capabilities with explicit control.
Key principles:
- API Transparency: Every method directly corresponds to an Ollama API endpoint
- Zero Client Intelligence: No automatic decision-making or behavior inference
- Explicit Control: Developers control when and how API calls are made
- Information vs Action: Clear separation between data retrieval and state changes
Scope
In Scope
- Chat completions (single and multi-turn)
- Text generation from prompts
- Model management (list, pull, push, copy, delete)
- Embeddings generation
- Streaming responses
- Tool/function calling
- Vision support (image inputs)
- Enterprise reliability (retry, circuit breaker, rate limiting, failover, health checks)
- Synchronous API wrappers
Out of Scope
- Audio processing (Ollama API limitation)
- Content moderation (Ollama API limitation)
- High-level abstractions or unified interfaces
- Business logic or application features
Features
Core Capabilities:
- Chat completions with configurable parameters
- Text generation from prompts
- Model listing and information
- Embeddings generation
- Real-time streaming responses
- Tool/function calling support
- Vision support for image inputs
- Builder patterns for request construction
Enterprise Reliability:
- Exponential backoff retry logic
- Circuit breaker pattern
- Token bucket rate limiting
- Automatic endpoint failover
- Health monitoring
- Response caching with TTL
API Patterns:
- Async API (tokio-based)
- Sync API (blocking wrappers)
- Streaming control (pause/resume/cancel)
- Dynamic configuration
Installation
[]
= { = "0.1.0", = ["full"] }
Quick Start
use ;
async
Feature Flags
| Feature | Description |
|---|---|
enabled |
Master switch for basic functionality |
streaming |
Real-time streaming responses |
embeddings |
Text embedding generation |
vision_support |
Image inputs for vision models |
tool_calling |
Function/tool calling support |
builder_patterns |
Fluent builder APIs |
retry |
Exponential backoff retry |
circuit_breaker |
Circuit breaker pattern |
rate_limiting |
Token bucket rate limiting |
failover |
Automatic endpoint failover |
health_checks |
Endpoint health monitoring |
request_caching |
Response caching with TTL |
sync_api |
Synchronous blocking API |
full |
Enable all features |
Testing
# Unit tests
# Integration tests (requires running Ollama)
# Full validation
Testing Policy: Integration tests require a running Ollama instance. Tests fail clearly when Ollama is unavailable.
Documentation
- Implementation Roadmap - Feature priorities and guidelines
- Examples - Runnable code examples
- Tests - Test documentation
- Specification - Technical specification
Dependencies
- reqwest: HTTP client with async support
- tokio: Async runtime
- serde/serde_json: Serialization
- error_tools: Unified error handling
License
MIT