# api_ollama
[](https://github.com/emersion/stability-badges#stable)
Rust HTTP client for the Ollama local LLM runtime API.
## 🎯 Architecture: Stateless HTTP Client
**This API crate is designed as a stateless HTTP client with zero persistence requirements.** It provides:
- Direct HTTP calls to the Ollama API
- In-memory operation state only (resets on restart)
- No external storage dependencies (databases, files, caches)
- No configuration persistence beyond environment variables
This ensures lightweight, containerized deployments and eliminates operational complexity.
## 🏛️ Governing Principle: "Thin Client, Rich API"
**Expose Ollama's API directly without abstraction layers, enabling developers to access all capabilities with explicit control.**
Key principles:
- **API Transparency**: Every method directly corresponds to an Ollama API endpoint
- **Zero Client Intelligence**: No automatic decision-making or behavior inference
- **Explicit Control**: Developers control when and how API calls are made
- **Information vs Action**: Clear separation between data retrieval and state changes
## Scope
### In Scope
- Chat completions (single and multi-turn)
- Text generation from prompts
- Model management (list, pull, push, copy, delete)
- Embeddings generation
- Streaming responses
- Tool/function calling
- Vision support (image inputs)
- Enterprise reliability (retry, circuit breaker, rate limiting, failover, health checks)
- Synchronous API wrappers
### Out of Scope
- Audio processing (Ollama API limitation)
- Content moderation (Ollama API limitation)
- High-level abstractions or unified interfaces
- Business logic or application features
## Features
**Core Capabilities:**
- Chat completions with configurable parameters
- Text generation from prompts
- Model listing and information
- Embeddings generation
- Real-time streaming responses
- Tool/function calling support
- Vision support for image inputs
- Builder patterns for request construction
**Enterprise Reliability:**
- Exponential backoff retry logic
- Circuit breaker pattern
- Token bucket rate limiting
- Automatic endpoint failover
- Health monitoring
- Response caching with TTL
**API Patterns:**
- Async API (tokio-based)
- Sync API (blocking wrappers)
- Streaming control (pause/resume/cancel)
- Dynamic configuration
## Installation
```toml
[dependencies]
api_ollama = { version = "0.1.0", features = ["full"] }
```
## Quick Start
```rust,no_run
use api_ollama::{ OllamaClient, ChatRequest, ChatMessage, MessageRole };
#[tokio::main]
async fn main() -> Result< (), Box< dyn std::error::Error > >
{
let mut client = OllamaClient::new(
"http://localhost:11434".to_string(),
std::time::Duration::from_secs( 30 )
);
// Check availability
if !client.is_available().await
{
println!( "Ollama is not available" );
return Ok( () );
}
// List available models
let models = client.list_models().await?;
println!( "Available models: {:?}", models );
// Send chat request
let request = ChatRequest
{
model: "llama3.2".to_string(),
messages: vec![ ChatMessage
{
role: MessageRole::User,
content: "Hello!".to_string(),
images: None,
#[cfg( feature = "tool_calling" )]
tool_calls: None,
}],
stream: None,
options: None,
#[cfg( feature = "tool_calling" )]
tools: None,
#[cfg( feature = "tool_calling" )]
tool_messages: None,
};
let response = client.chat( request ).await?;
println!( "Response: {:?}", response );
Ok( () )
}
```
## Feature Flags
| `enabled` | Master switch for basic functionality |
| `streaming` | Real-time streaming responses |
| `embeddings` | Text embedding generation |
| `vision_support` | Image inputs for vision models |
| `tool_calling` | Function/tool calling support |
| `builder_patterns` | Fluent builder APIs |
| `retry` | Exponential backoff retry |
| `circuit_breaker` | Circuit breaker pattern |
| `rate_limiting` | Token bucket rate limiting |
| `failover` | Automatic endpoint failover |
| `health_checks` | Endpoint health monitoring |
| `request_caching` | Response caching with TTL |
| `sync_api` | Synchronous blocking API |
| `full` | Enable all features |
## Testing
```bash
# Unit tests
cargo nextest run
# Integration tests (requires running Ollama)
cargo nextest run --features integration
# Full validation
w3 .test level::3
```
**Testing Policy**: Integration tests require a running Ollama instance. Tests fail clearly when Ollama is unavailable.
## Documentation
- **[Implementation Roadmap](docs/implementation_roadmap.md)** - Feature priorities and guidelines
- **[Examples](https://github.com/Wandalen/api_llm/tree/master/api/ollama/examples)** - Runnable code examples
- **[Tests](tests/)** - Test documentation
- **[Specification](spec/)** - Technical specification
## Dependencies
- **reqwest**: HTTP client with async support
- **tokio**: Async runtime
- **serde/serde_json**: Serialization
- **error_tools**: Unified error handling
## License
MIT