### Part II: Internal Design (Design Recommendations)
*The following sections provide recommended approaches for implementing the public contract defined in Part I. These are flexible guidelines that can be adapted based on implementation needs and constraints.*
### 11. System Architecture
It is recommended that the `api_ollama` library be implemented using a layered architecture with clear separation of concerns and modular components. The following architecture provides a robust foundation while maintaining simplicity and performance.
### Recommended High-Level Architecture
```
┌─────────────────────────────────────────┐
│ Consumer Application │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ Public API Layer │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ OllamaClient│ │ Builder Pattern │ │
│ └─────────────┘ └─────────────────┘ │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ Request/Response Processing │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Serializers │ │ Request/Response│ │
│ │ (serde) │ │ Models │ │
│ └─────────────┘ └─────────────────┘ │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ HTTP Transport Layer │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ reqwest │ │ Stream Handler │ │
│ │ Client │ │ (if streaming) │ │
│ └─────────────┘ └─────────────────┘ │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ Ollama Server │
│ (External Dependency) │
└─────────────────────────────────────────┘
```
### Core Component Design Recommendations
**1. Client Architecture**
It is recommended that the main `OllamaClient` struct follow the builder pattern for construction and maintain minimal state:
- Store only essential configuration (base_url, timeout, http_client)
- Implement `Clone` to enable sharing across async tasks
- Use composition rather than inheritance for extending functionality
- Separate client construction from request execution logic
**2. Request/Response Model Layer**
It is recommended that all API interactions use strongly-typed Rust structs:
- Define separate request/response types for each endpoint
- Use serde attributes for precise JSON field mapping (camelCase conversion)
- Implement `Default` for request types to simplify usage
- Use `Option<T>` for optional fields with `skip_serializing_if` attributes
- Validate critical fields during deserialization where possible
**3. HTTP Transport Abstraction**
It is recommended that HTTP concerns be encapsulated in a dedicated layer:
- Wrap `reqwest::Client` to provide Ollama-specific behavior
- Handle common concerns (timeouts, headers, URL construction) centrally
- Implement consistent error mapping from HTTP to domain errors
- Support both streaming and non-streaming response patterns
**4. Error Handling Strategy**
It is recommended that error handling follow a layered approach:
- Define domain-specific error types at the library boundary
- Map underlying errors (reqwest, serde) to domain errors with context
- Preserve error chains for debugging while providing clear user messages
- Use `error_tools` patterns for consistency with wTools ecosystem
### Streaming Architecture Recommendations
For streaming functionality, it is recommended to implement a separate processing path:
- Use async streams (`futures_core::Stream`) for real-time data flow
- Process chunks incrementally to maintain constant memory usage
- Handle partial JSON objects and reconstruction across chunk boundaries
- Implement proper cleanup and cancellation for interrupted streams
- Use backpressure mechanisms to prevent memory exhaustion
### Implemented Module Organization
The library uses a highly modular structure with feature-gated components to keep lib.rs focused on orchestration and public exports (1,251 lines, under 1,500 line limit):
```
src/
├── lib.rs # Public exports, feature gates, mod_interface orchestration (1,251 lines)
├── client.rs # OllamaClient struct and core implementation (3,117 lines)
│
├── Core API Types (feature-gated)
├── messages.rs # Message types for chat (enabled feature)
├── chat.rs # Chat request/response types (enabled feature)
├── generate.rs # Generation request/response types (enabled feature)
├── embeddings.rs # Embedding request/response types (embeddings feature)
├── models_info.rs # Model listing types (enabled feature)
├── models_operations.rs # Model operations (model_details feature)
├── models_enhanced.rs # Enhanced model metadata (model_details feature)
├── models_additional.rs # Comprehensive model info (model_details feature)
│
├── Advanced Features (feature-gated)
├── circuit_breaker.rs # Circuit breaker pattern (circuit_breaker feature) - 314 lines
├── rate_limiter.rs # Rate limiting (rate_limiting feature) - 434 lines
├── request_cache.rs # Request caching (request_caching feature) - 547 lines
├── failover.rs # Failover management (failover feature) - 346 lines
├── diagnostics.rs # Diagnostics collection (general_diagnostics feature) - 903 lines
├── health_checks.rs # Health monitoring (health_checks feature) - 629 lines
├── dynamic_config.rs # Dynamic configuration (dynamic_config feature) - 430 lines
├── sync_api.rs # Synchronous API wrapper (sync_api feature) - 421 lines
│
├── Specialized Features
├── audio.rs # Audio processing types (audio_processing feature)
├── tokens.rs # Token counting (count_tokens feature)
├── tuning.rs # Model tuning (model_tuning feature)
├── cached_content.rs # Intelligent caching (cached_content feature)
├── batch_operations.rs # Batch processing (batch_operations feature)
├── safety_settings.rs # Content safety (safety_settings feature)
├── websocket.rs # WebSocket streaming (websocket_streaming feature)
│
├── Supporting Modules
├── auth.rs # Authentication (secret_management, workspace features)
├── workspace.rs # Workspace configuration (workspace feature)
├── enhanced_retry.rs # Retry logic (retry feature)
├── streaming_control.rs # Stream control (streaming_control feature)
└── models.rs # Backward compatibility shim
```
### Module Extraction Pattern
All extracted modules follow the `mod_interface` pattern for controlled exports:
```rust
// Module structure (e.g., circuit_breaker.rs)
#[ cfg( feature = "circuit_breaker" ) ]
mod private
{
// All implementation inside private module
pub struct CircuitBreaker { /* ... */ }
impl CircuitBreaker { /* ... */ }
}
#[ cfg( feature = "circuit_breaker" ) ]
crate::mod_interface!
{
exposed use private::CircuitBreaker;
exposed use private::CircuitBreakerConfig;
// Controlled public exports
}
```
This pattern ensures:
- Feature-gated compilation (only included when feature enabled)
- Controlled public API surface (via mod_interface macro)
- Clear separation between public and private implementation
- Consistent module organization across the codebase
### File Size Management
To prevent future violations of the 1,500-line limit for lib.rs:
1. **Core principle**: lib.rs should only contain:
- Feature gate declarations
- Module declarations
- Public re-exports via mod_interface
- Minimal orchestration code
2. **When to extract**: Extract impl blocks or types when:
- lib.rs approaches 1,000 lines (SHOULD limit)
- A feature-gated section exceeds 300 lines
- A logical component can be independently tested
3. **Client extension pattern**: For extending OllamaClient:
```rust
#[ cfg( feature = "feature_name" ) ]
use crate::client::OllamaClient;
#[ cfg( feature = "feature_name" ) ]
impl OllamaClient {
}
```
4. **Module extraction checklist**:
- Create feature-gated module with `mod private {}`
- Move types/impls to private module
- Add `mod_interface!` exports
- Add module declaration in lib.rs
- Update imports in lib.rs private module
- Remove original code from lib.rs
- Verify compilation with all features
- Run tests to ensure no breakage
### Configuration and Extensibility
It is recommended that the library provide multiple configuration approaches:
- Default configuration for common use cases (localhost development)
- Builder pattern for custom configuration (production deployments)
- Environment variable support for containerized environments
- Custom HTTP client injection for advanced networking requirements
### Performance Optimization Recommendations
To meet the performance requirements, it is recommended to:
- Reuse HTTP connections through connection pooling (reqwest default behavior)
- Use efficient JSON parsing with serde's zero-copy deserialization where possible
- Implement request pipelining for multiple concurrent operations
- Cache serialized request bodies for repeated identical requests
- Use streaming processing to reduce memory footprint for large responses
### Testing Architecture Integration
The architecture should support comprehensive testing through:
- Dependency injection points for mock HTTP clients in unit tests
- Clear boundaries between components to enable isolated testing
- Integration test hooks that work with real Ollama server instances
- Feature-gated integration tests that don't impact production builds
### 12. Rust Library Design: Granular Feature Gating
The `api_ollama` library **must** be designed with a granular feature-gating strategy to allow consumers to opt-in to only the functionality they need. This approach significantly benefits consumers by minimizing their dependency graph, reducing final binary sizes, and speeding up compile times.
### Principle: Disable-by-Default Architecture
The library **must** follow a disable-by-default philosophy where the minimal build contains only the absolute essential functionality. All optional capabilities **must** be gated behind explicit feature flags.
### Required Feature Flag Structure
**1. Core Control Features**
```toml
[features]
# Conservative defaults - users opt-in to only what they need
default = []
# Full feature set for comprehensive functionality
full = ["enabled", "streaming", "integration", "authentication", "advanced", "workspace"]
# Master switch for core library functionality
enabled = [
"dep:reqwest",
"dep:serde",
"dep:serde_json",
"dep:tokio",
"dep:futures-core",
"error_tools/enabled"
]
```
**2. Functional Features**
```toml
# Real-time streaming response support
streaming = [
"enabled",
"dep:futures-util"
]
# Integration testing with real Ollama servers
integration = [
"enabled"
]
# Development and debugging utilities
diagnostics = [
"enabled",
"dep:serde_yaml" # For request/response debugging output
]
# Authentication and security features
authentication = [
"enabled"
]
# Advanced production features
advanced = [
"enabled",
"dep:tokio-util" # For rate limiting and retry logic
]
# Workspace integration features
workspace = [
"enabled",
"dep:serde_yaml", # For workspace config file parsing
"dep:workspace_tools" # For workspace path resolution and secret management
]
```
### Implementation Requirements
**1. Minimal Core Build**
With no features enabled, the library **must** compile to a minimal no-op implementation:
- All public APIs present but return immediate errors indicating feature not enabled
- Zero runtime dependencies beyond Rust standard library
- Compile time under 5 seconds on modern hardware
- Binary size contribution under 50KB
**2. Conditional Compilation Strategy**
All feature-dependent code **must** use conditional compilation:
```rust
// Core client always available but functionality gated
impl OllamaClient {
#[cfg(feature = "enabled")]
pub async fn chat(&self, request: ChatRequest) -> OllamaResult<ChatResponse> {
// Implementation here
}
#[cfg(not(feature = "enabled"))]
pub async fn chat(&self, request: ChatRequest) -> OllamaResult<ChatResponse> {
Err(OllamaError::FeatureNotEnabled("chat requires 'enabled' feature".to_string()))
}
}
// Streaming methods only exist when feature is enabled
#[cfg(feature = "streaming")]
impl OllamaClient {
pub async fn chat_stream(&self, request: ChatRequest) -> OllamaResult<impl Stream<Item = OllamaResult<ChatResponse>>> {
// Streaming implementation
}
}
```
**3. Dependency Management**
All external dependencies **must** be optional and activated only by relevant features:
```toml
[dependencies]
# All dependencies optional by default
reqwest = { workspace = true, features = ["json", "stream"], optional = true }
serde = { workspace = true, features = ["derive"], optional = true }
serde_json = { workspace = true, optional = true }
tokio = { workspace = true, features = ["time"], optional = true }
futures-core = { workspace = true, optional = true }
futures-util = { workspace = true, optional = true }
error_tools = { workspace = true, optional = true }
workspace_tools = { workspace = true, features = ["secret_management"], optional = true }
[dev-dependencies]
# Development dependencies don't contribute to consumer builds
tokio-stream = { workspace = true }
```
### Feature Interaction Matrix
| `[]` (none) | std only | Error stubs only |
| `["enabled"]` | reqwest, serde, tokio | Core HTTP client |
| `["enabled", "streaming"]` | + futures-util | + Real-time streams |
| `["enabled", "integration"]` | Same as enabled | + Integration test markers |
| `["full"]` | All dependencies | Complete functionality |
### Documentation Requirements
The library's documentation **must** clearly explain:
- Default feature set and how to customize it
- Each feature's purpose and dependency impact
- Examples showing minimal feature usage
- Migration path between feature configurations
**Example Documentation Section**:
```rust
//! # Feature Flags
//!
//! This crate uses feature flags to enable optional functionality:
//!
//! - `enabled` (included in `default`): Core HTTP client functionality
//! - `streaming` (included in `default`): Real-time streaming response support
//! - `integration`: Markers for integration tests (no runtime dependencies)
//! - `full` (default): All features enabled
//!
//! ## Minimal Usage
//! ```toml
//! [dependencies]
//! api_ollama = { version = "0.1", default-features = false, features = ["enabled"] }
//! ```
```
### Rationale and Benefits
This granular feature gating approach provides:
1. **Reduced Compilation Time**: Consumers building without streaming avoid futures-util compilation
2. **Smaller Binary Size**: Minimal builds exclude streaming infrastructure code
3. **Dependency Flexibility**: Consumers can avoid version conflicts with unused dependencies
4. **Testing Efficiency**: Integration tests don't force production dependency requirements
5. **Ecosystem Compatibility**: Follows established patterns from other wTools crates
The architecture ensures that simple use cases remain lightweight while complex applications can opt-in to full functionality as needed.
### 13. Infrastructure Support
This section compares deployment options for applications using the `api_ollama` library, ranging from minimal development setups to production-scale deployments.
### Deployment Option A: Budget Development Setup
**Target Scenario**: Local development, prototyping, and small-scale applications with minimal infrastructure requirements.
**Configuration Characteristics**:
- Single `OllamaClient` instance per application
- Default timeout settings (120 seconds)
- Basic error handling with standard logging
- Minimal feature set (`enabled` only, no streaming)
- Local Ollama server on same machine
**Resource Requirements**:
- CPU: Single core sufficient for client overhead
- Memory: <10MB additional memory usage for client
- Network: Local TCP connections (no external dependencies)
- Storage: No persistent storage requirements
**Cost Profile**: $0 recurring costs for HTTP client (infrastructure costs depend on Ollama server hosting)
**Limitations Inherited from This Setup**:
- Single-threaded request processing (no concurrent requests)
- Basic error handling without advanced retry logic
- No streaming response support
- Limited to localhost Ollama server connectivity
### Deployment Option B: Production-Scale Setup
**Target Scenario**: Production applications with high availability, scalability, and advanced monitoring requirements.
**Configuration Characteristics**:
- Multiple `OllamaClient` instances with load balancing
- Custom timeout configuration per use case
- Comprehensive error handling with structured logging
- Full feature set including streaming support
- Remote Ollama server with network optimization
- Advanced HTTP client configuration (custom headers, proxy support)
**Resource Requirements**:
- CPU: Multi-core recommended for concurrent request handling
- Memory: ~50MB for connection pooling and stream buffers
- Network: Optimized TCP connections with keep-alive
- Monitoring: Integration with observability platforms
**Advanced Capabilities**:
- Support for 100+ concurrent requests per client instance
- Streaming response processing with backpressure handling
- Advanced error recovery and circuit breaker patterns (application-level)
- Performance monitoring and metrics collection
- Integration with production logging and tracing systems
**Scalability Characteristics**:
- Horizontal scaling through multiple client instances
- Connection pool optimization for reduced latency
- Stream processing with configurable buffer sizes
- Advanced timeout and retry strategies
### Deployment Comparison Matrix
| **Concurrent Requests** | 1-10 requests | 100+ concurrent |
| **Features** | Basic client only | Full feature set |
| **Error Handling** | Basic errors | Comprehensive with context |
| **Monitoring** | Console logging | Structured metrics |
| **Network Optimization** | Default settings | Custom configuration |
| **Scalability** | Single instance | Multiple instances |
| **Operational Complexity** | Minimal | Advanced configuration |
### Infrastructure Decision Framework
**Choose Budget Setup When**:
- Developing locally or prototyping
- Single-user applications or internal tools
- Minimal concurrent request requirements
- Simple error handling sufficient
**Choose Production Setup When**:
- Multi-user production applications
- High concurrent request volumes
- Advanced monitoring and observability required
- Streaming response processing needed
- Remote Ollama server deployment
This infrastructure analysis helps developers select the appropriate deployment approach based on their specific requirements and constraints.
### 14. Data Stores
This section identifies the data stores and access patterns for the `api_ollama` library. As an HTTP client library, the data storage requirements are minimal and focused on configuration and transient request/response data.
### Primary Data Access Patterns
The library operates primarily with transient data flows and requires no persistent storage mechanisms. The following data access patterns are recommended:
| **Client Configuration** | In-Memory Struct | Read-Only | Store base URL, timeout, HTTP client settings |
| **Request Models** | In-Memory Struct | Write-Once/Read-Once | Serialize user input to JSON for API calls |
| **Response Models** | In-Memory Struct | Write-Once/Read-Many | Deserialize API responses for user consumption |
| **HTTP Connection Pool** | In-Memory (reqwest) | Read/Write | Reuse connections for performance optimization |
| **Stream Buffers** | In-Memory Queue | Read/Write/Stream | Buffer streaming response chunks during processing |
| **Error Context** | In-Memory Struct | Write-Once/Read-Once | Capture error details and context for debugging |
### Configuration Data Management
It is recommended that client configuration be managed through immutable structures:
**Configuration Storage Pattern**:
- Store configuration in `OllamaClient` struct fields
- Use `Arc<>` wrapping for shared configuration across async tasks
- Implement `Clone` for lightweight configuration sharing
- Validate configuration at construction time, not at request time
**Recommended Configuration Schema**:
```rust
#[derive(Debug, Clone)]
pub struct ClientConfig {
pub base_url: String, // Ollama server endpoint
pub timeout: Duration, // Request timeout duration
pub http_client: reqwest::Client, // HTTP client instance
}
```
### Request/Response Data Flow
The library should implement a stateless data flow pattern for API interactions:
**Request Processing Flow**:
1. **Input Validation**: Validate user-provided request structures
2. **Serialization**: Convert Rust structs to JSON using serde
3. **HTTP Transmission**: Send JSON payload via reqwest
4. **Response Capture**: Receive HTTP response and extract body
5. **Deserialization**: Convert JSON response to Rust structs
6. **Error Mapping**: Transform HTTP/parsing errors to domain errors
**Recommended Data Lifetime Management**:
- Request data: Lives only for the duration of the API call
- Response data: Owned by caller, no internal caching or persistence
- Error data: Includes full context chain but does not persist beyond error handling
- Stream data: Processed incrementally with bounded memory usage
### Memory Management Strategy
It is recommended that the library implement the following memory management patterns:
**For Non-Streaming Operations**:
- Use owned data structures (`String`, `Vec<T>`) for simplicity and safety
- Avoid unnecessary cloning through strategic use of references
- Rely on Rust's automatic memory management for cleanup
- Size limits enforced at HTTP layer to prevent excessive allocations
**For Streaming Operations** (when streaming feature enabled):
- Maintain fixed-size buffers (recommended: 64KB chunks)
- Process data incrementally to avoid accumulating large responses in memory
- Use `futures::Stream` abstractions for backpressure management
- Implement proper cleanup for interrupted or failed streams
### Caching and Persistence Constraints
The library explicitly **does not** implement persistent data storage:
**No Persistent Storage**:
- No response caching mechanisms
- No request deduplication or replay storage
- No session or authentication token persistence
- No configuration file reading or writing
**Rationale for Stateless Design**:
- Simplifies library architecture and reduces complexity
- Avoids file system permissions and storage location concerns
- Enables easy integration into various application architectures
- Reduces security surface area (no data persistence vulnerabilities)
- Maintains focus on core HTTP client functionality
### Thread Safety and Concurrent Access
It is recommended that data access be designed for concurrent usage:
**Thread-Safe Patterns**:
- Client configuration stored in `Arc<ClientConfig>` for shared access
- HTTP client (`reqwest::Client`) is inherently thread-safe and shareable
- Request/response data owned by individual async tasks (no sharing required)
- Error handling uses owned data structures to avoid lifetime complexities
**Concurrency Considerations**:
- Multiple async tasks can share a single `OllamaClient` instance safely
- Each API call operates on independent request/response data
- Stream processing maintains isolation between concurrent streams
- No global mutable state that would require synchronization primitives
### Data Validation and Integrity
The library should implement validation at appropriate boundaries:
**Input Validation**:
- Validate base URLs during client construction
- Validate required fields in request structures
- Enforce reasonable limits on request sizes and parameter values
- Provide clear error messages for validation failures
**Response Validation**:
- Verify JSON structure matches expected response schemas
- Handle missing or null fields gracefully using `Option<T>`
- Validate response completeness for streaming operations
- Map malformed responses to appropriate error types
This stateless, validation-focused approach ensures the library remains lightweight while providing robust error handling and thread-safe operation patterns.
### 15. Architectural & Flow Diagrams
This section provides visual representations of the library's architecture and key operational flows using Mermaid diagrams for version-controllable documentation.
### System Context Diagram
The following diagram shows how the `api_ollama` library fits within the broader application ecosystem:
```mermaid
graph TD
subgraph "Developer's Application Environment"
A[Rust Application Code]
B[api_ollama Library]
C[Application Logic]
end
subgraph "AI Infrastructure"
D[Ollama Server Instance]
E[LLM Models<br/>llama2, mistral, etc.]
F[Model Storage]
end
subgraph "Development Tools"
G[Cargo Build System]
H[IDE/Editor]
I[Testing Framework]
end
J[Developer] --> H
J --> A
A --> C
A --> B
B -->|HTTP API Calls| D
D --> E
D --> F
G --> B
I --> B
style B fill:#e1f5fe
style D fill:#fff3e0
style A fill:#e8f5e8
```
### Deployment Architecture Diagram
This diagram illustrates common deployment patterns for applications using the library:
```mermaid
graph TD
subgraph "Production Environment"
subgraph "Application Tier"
A[Web Application]
B[Background Services]
C[api_ollama Clients]
end
subgraph "AI Infrastructure Tier"
D[Ollama Server]
E[GPU Resources]
F[Model Storage]
end
subgraph "Monitoring"
G[Logging System]
H[Metrics Collection]
end
end
I[End Users] --> A
A --> C
B --> C
C -->|HTTP/HTTPS| D
D --> E
D --> F
C -.->|Logs| G
C -.->|Metrics| H
style C fill:#e1f5fe
style D fill:#fff3e0
```
### High-Level Architecture Diagram
The following diagram illustrates the overall system structure and component relationships:
```mermaid
graph TD
subgraph "Consumer Application"
A[Rust Application Code]
end
subgraph "api_ollama Library"
B[OllamaClient]
C[Builder Pattern]
D[Request Models]
E[Response Models]
F[HTTP Transport]
G[Error Handler]
H[Stream Processor]
end
subgraph "External Dependencies"
I[reqwest HTTP Client]
J[serde JSON Parser]
K[tokio Runtime]
end
subgraph "Ollama Server"
L[/api/chat]
M[/api/generate]
N[/api/tags]
O[/api/show]
end
A --> B
A --> C
C --> B
B --> D
B --> E
B --> F
B --> G
F --> H
F --> I
D --> J
E --> J
G --> J
H --> K
I --> L
I --> M
I --> N
I --> O
```
### Chat Completion Sequence Diagram
This diagram shows the complete flow for a chat completion request:
```mermaid
sequenceDiagram
participant App as Consumer App
participant Client as OllamaClient
participant HTTP as HTTP Transport
participant Server as Ollama Server
App->>Client: chat(ChatRequest)
activate Client
Client->>Client: Validate request
Client->>HTTP: POST /api/chat
activate HTTP
HTTP->>Server: HTTP Request + JSON
activate Server
Server-->>HTTP: HTTP Response + JSON
deactivate Server
HTTP->>HTTP: Parse JSON response
HTTP-->>Client: ChatResponse
deactivate HTTP
Client-->>App: Result<ChatResponse>
deactivate Client
```
### Streaming Response Flow Diagram
This diagram illustrates the streaming response processing pattern:
```mermaid
flowchart TD
A[Client receives stream request] --> B{Streaming feature enabled?}
B -->|No| C[Return FeatureNotEnabled error]
B -->|Yes| D[Set stream=true in request]
D --> E[Send HTTP request]
E --> F[Receive chunked response]
F --> G[Parse chunk as JSON]
G --> H{Valid JSON?}
H -->|No| I[Buffer partial data]
H -->|Yes| J[Yield parsed response]
J --> K{done: true?}
K -->|No| L[Continue to next chunk]
K -->|Yes| M[Close stream]
I --> N[Read next chunk]
N --> O[Combine with buffered data]
O --> G
L --> F
```
### Error Handling Flow Diagram
This diagram shows the error categorization and handling process:
```mermaid
graph TD
A[Operation starts] --> B[Execute HTTP request]
B --> C{HTTP Success?}
C -->|Yes| D[Parse JSON response]
D --> E{JSON Valid?}
E -->|Yes| F[Return success result]
E -->|No| G[ParseError]
C -->|No| H{Error Type?}
H -->|Timeout| I[NetworkError::Timeout]
H -->|Connection| J[NetworkError::Connection]
H -->|4xx Client| K[ApiError::ClientError]
H -->|5xx Server| L[ApiError::ServerError]
H -->|Other| M[NetworkError::Unknown]
G --> N[Log error context]
I --> N
J --> N
K --> N
L --> N
M --> N
N --> O[Return Error Result]
```
### Feature Gating Architecture Diagram
This diagram illustrates how feature flags control compilation and functionality:
```mermaid
graph LR
subgraph "Feature Flags"
A[enabled]
B[streaming]
C[integration]
D[full]
end
subgraph "Core Components"
E[OllamaClient]
F[HTTP Transport]
G[Request/Response Models]
end
subgraph "Optional Components"
H[Stream Processor]
I[Integration Test Markers]
end
subgraph "Dependencies"
J[reqwest]
K[serde]
L[tokio]
M[futures-util]
end
A --> E
A --> F
A --> G
A --> J
A --> K
A --> L
B --> H
B --> M
C --> I
D --> A
D --> B
D --> C
style A fill:#e1f5fe
style B fill:#f3e5f5
style C fill:#e8f5e8
style D fill:#fff3e0
```
### Data Flow Architecture
This diagram shows how data moves through the library components:
```mermaid
flowchart LR
subgraph "Input Layer"
A[ChatRequest]
B[GenerateRequest]
end
subgraph "Processing Layer"
C[Validation]
D[JSON Serialization]
E[HTTP Client]
F[Response Processing]
G[JSON Deserialization]
end
subgraph "Output Layer"
H[ChatResponse]
I[GenerateResponse]
J[OllamaError]
end
A --> C
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
G --> I
C -->|Validation Failure| J
D -->|Serialization Error| J
E -->|Network Error| J
F -->|HTTP Error| J
G -->|Parse Error| J
style A fill:#e3f2fd
style B fill:#e3f2fd
style H fill:#e8f5e8
style I fill:#e8f5e8
style J fill:#ffebee
```
### Diagram Usage Guidelines
Each diagram serves a specific purpose in understanding the library architecture:
- **System Context**: Shows library's role in broader application ecosystem for stakeholders
- **Deployment Architecture**: Illustrates common production deployment patterns
- **High-Level Architecture**: Provides overview of system boundaries and major components
- **Chat Completion Sequence**: Details the most common usage pattern step-by-step
- **Streaming Response Flow**: Explains the complex streaming processing logic
- **Error Handling Flow**: Documents the comprehensive error categorization strategy
- **Feature Gating Architecture**: Shows how compilation features control functionality
- **Data Flow Architecture**: Illustrates request/response processing pipeline
These diagrams should be maintained as the implementation evolves to ensure documentation remains synchronized with the actual codebase structure.
### 16. Internal Data Models
This section defines the recommended internal data structures and their relationships for implementing the `api_ollama` library. These models serve as the foundation for type-safe API interactions and robust error handling.
### Core Client Data Models
**Client Configuration Model**
```rust
/// Internal configuration structure for OllamaClient
#[derive(Debug, Clone)]
pub struct ClientConfig {
/// Base URL for Ollama server (e.g., "http://localhost:11434")
pub base_url: String,
/// Request timeout duration
pub timeout: Duration,
/// Shared HTTP client instance
pub http_client: Arc<reqwest::Client>,
}
impl Default for ClientConfig
{
fn default() -> Self
{
Self
{
base_url: "http://localhost:11434".to_string(),
timeout: Duration::from_secs( 120 ),
http_client: Arc::new( reqwest::Client::new() ),
}
}
}
```
**Client Builder Model**
```rust
/// Builder for constructing OllamaClient instances
#[derive(Debug)]
pub struct ClientBuilder {
base_url: String,
timeout: Duration,
custom_client: Option<reqwest::Client>,
}
```
### Request/Response Data Models
**Common Message Model**
```rust
/// A message in a chat conversation
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Message {
/// Role of the message sender (e.g., "user", "assistant")
pub role: String,
/// Content of the message
pub content: String,
}
impl Message
{
pub fn user( content: impl Into< String > ) -> Self
{
Self
{
role: "user".to_string(),
content: content.into()
}
}
pub fn assistant( content: impl Into< String > ) -> Self
{
Self
{
role: "assistant".to_string(),
content: content.into()
}
}
}
```
**Chat API Models**
```rust
/// Request structure for chat completion
#[derive(Debug, Serialize, Default)]
pub struct ChatRequest {
/// Model name to use for completion
pub model: String,
/// Messages in the conversation
pub messages: Vec<Message>,
#[serde(skip_serializing_if = "Option::is_none")]
/// Whether to stream the response
pub stream: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
/// Additional model parameters
pub options: Option<serde_json::Value>,
}
/// Response structure for chat completion
#[derive(Debug, Deserialize)]
pub struct ChatResponse {
#[serde(default)]
/// Generated message from the model
pub message: Option<Message>,
#[serde(default)]
/// Whether generation is complete
pub done: bool,
#[serde(default)]
/// Total time taken for generation in nanoseconds
pub total_duration: Option<u64>,
#[serde(default)]
/// Number of tokens in the prompt
pub prompt_eval_count: Option<u32>,
#[serde(default)]
/// Number of tokens generated
pub eval_count: Option<u32>,
}
```
**Generation API Models**
```rust
/// Request structure for text generation
#[derive(Debug, Serialize, Default)]
pub struct GenerateRequest {
/// Model name to use for generation
pub model: String,
/// Text prompt for generation
pub prompt: String,
#[serde(skip_serializing_if = "Option::is_none")]
/// Whether to stream the response
pub stream: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
/// Additional model parameters
pub options: Option<serde_json::Value>,
}
/// Response structure for text generation
#[derive(Debug, Deserialize)]
pub struct GenerateResponse {
#[serde(default)]
/// Generated text response
pub response: String,
#[serde(default)]
/// Whether generation is complete
pub done: bool,
#[serde(default)]
/// Total time taken for generation in nanoseconds
pub total_duration: Option<u64>,
#[serde(default)]
/// Number of tokens in the prompt
pub prompt_eval_count: Option<u32>,
#[serde(default)]
/// Number of tokens generated
pub eval_count: Option<u32>,
}
```
**Model Management Data Models**
```rust
/// Information about a specific model
#[derive(Debug, Deserialize)]
pub struct ModelInfo {
/// Name of the model
pub name: String,
/// Size of the model in bytes
pub size: u64,
/// SHA256 digest of the model
pub digest: String,
/// Timestamp when the model was last modified
pub modified_at: String,
}
/// Response from listing available models
#[derive(Debug, Deserialize)]
pub struct TagsResponse {
/// List of available models
pub models: Vec<ModelInfo>,
}
```
### Error Data Models
**Primary Error Enumeration**
```rust
/// Comprehensive error types for Ollama API operations
#[derive(Debug)]
pub enum OllamaError {
/// Network-related errors (connection, timeout, etc.)
NetworkError(NetworkErrorDetails),
/// JSON parsing or serialization errors
ParseError(ParseErrorDetails),
/// API-specific errors with HTTP status codes
ApiError(ApiErrorDetails),
/// Streaming-related errors
#[cfg(feature = "streaming")]
StreamError(StreamErrorDetails),
/// Feature not enabled error
FeatureNotEnabled(String),
}
#[derive(Debug)]
pub struct NetworkErrorDetails {
pub kind: NetworkErrorKind,
pub message: String,
pub source: Option<Box<dyn std::error::Error + Send + Sync>>,
}
#[derive(Debug)]
pub enum NetworkErrorKind {
Connection,
Timeout,
Dns,
Unknown,
}
#[derive(Debug)]
pub struct ApiErrorDetails {
pub status_code: u16,
pub message: String,
pub response_body: Option<String>,
}
#[derive(Debug)]
pub struct ParseErrorDetails {
pub context: String,
pub source: serde_json::Error,
}
#[cfg(feature = "streaming")]
#[derive(Debug)]
pub struct StreamErrorDetails {
pub message: String,
pub position: Option<usize>,
}
```
### Streaming Data Models (Feature Gated)
**Stream Processing Models**
```rust
#[cfg(feature = "streaming")]
/// Internal buffer for streaming response processing
pub(crate) struct StreamBuffer {
/// Accumulated partial data
buffer: Vec<u8>,
/// Current buffer position
position: usize,
/// Maximum buffer size (64KB default)
max_size: usize,
}
#[ cfg( feature = "streaming" ) ]
impl StreamBuffer
{
pub fn new() -> Self
{
Self
{
buffer: Vec::with_capacity( 65536 ), // 64KB
position: 0,
max_size: 65536,
}
}
pub fn append( &mut self, data: &[ u8 ] ) -> Result< (), OllamaError >
{
if self.buffer.len() + data.len() > self.max_size
{
return Err
(
OllamaError::StreamError
(
StreamErrorDetails
{
message: "Stream buffer overflow".to_string(),
position: Some( self.position ),
}
)
);
}
self.buffer.extend_from_slice( data );
Ok( () )
}
}
```
### Data Model Design Principles
**Serialization Strategy**:
- Use `serde` with `snake_case` to `camelCase` conversion for JSON compatibility
- Implement `Default` for request types to simplify API usage
- Use `Option<T>` with `skip_serializing_if` for optional fields
- Preserve all response fields for future compatibility
**Memory Management**:
- Use owned types (`String`, `Vec<T>`) for simplicity and safety
- Implement `Clone` for data structures that may be shared
- Use `Arc<T>` only for expensive-to-clone shared resources
- Bounded buffers for streaming to prevent memory exhaustion
**Error Context Preservation**:
- Maintain error chains through structured error details
- Include operation context in error messages
- Preserve original error sources for debugging
- Use domain-specific error variants for clear categorization
**Validation and Constraints**:
- Validate required fields during construction where possible
- Use type system to enforce invariants (non-empty strings, valid URLs)
- Implement `Display` for user-friendly error messages
- Provide builder methods for complex data structure construction
These internal data models provide a robust foundation for type-safe, efficient, and maintainable implementation while following Rust best practices and serde conventions.
### 17. Reference Implementation
This section provides high-level workflow guidance for implementing the `api_ollama` library, including recommended patterns for key operations and integration points.
### Client Construction Workflow
It is recommended that client construction follow this pattern:
```rust
// Default construction workflow
pub fn new() -> OllamaResult< OllamaClient >
{
let config = ClientConfig::default();
Self::from_config( config )
}
// Builder pattern workflow
pub fn builder() -> ClientBuilder
{
ClientBuilder::new()
}
// Internal construction from validated config
fn from_config( config: ClientConfig ) -> OllamaResult< OllamaClient >
{
// Validate base URL format
url::Url::parse( &config.base_url )
.map_err( | e | OllamaError::InvalidArgument( format!( "Invalid base URL: {}", e ) ) )?;
Ok
(
OllamaClient
{
config: Arc::new( config ),
}
)
}
```
### HTTP Request Processing Workflow
The following workflow is recommended for all HTTP operations:
**1. Request Preparation Phase**
```rust
async fn execute_request< T, R >( &self, endpoint: &str, request: &T ) -> OllamaResult< R >
where
T: Serialize,
R: for< 'de > Deserialize< 'de >,
{
// Build complete URL
let url = format!( "{}/api/{}", self.config.base_url, endpoint );
// Serialize request to JSON
let json_body = serde_json::to_string( request )
.map_err
(
| e | OllamaError::ParseError
(
ParseErrorDetails
{
context: "Failed to serialize request".to_string(),
source: e,
}
)
)?;
// Prepare HTTP request with headers
let http_request = self.config.http_client
.post( &url )
.header( "Content-Type", "application/json" )
.body( json_body )
.timeout( self.config.timeout );
```
**2. HTTP Execution Phase**
```rust
// Execute HTTP request with error mapping
let response = http_request.send().await
.map_err( | e | self.map_reqwest_error( e ) )?;
// Check HTTP status code
if !response.status().is_success()
{
return Err
(
OllamaError::ApiError
(
ApiErrorDetails
{
status_code: response.status().as_u16(),
message: format!( "HTTP {}", response.status() ),
response_body: response.text().await.ok(),
}
)
);
}
```
**3. Response Processing Phase**
```rust
// Extract response body
let response_text = response.text().await
.map_err
(
| e | OllamaError::NetworkError
(
NetworkErrorDetails
{
kind: NetworkErrorKind::Unknown,
message: "Failed to read response body".to_string(),
source: Some( Box::new( e ) ),
}
)
)?;
// Deserialize JSON response
let parsed_response: R = serde_json::from_str( &response_text )
.map_err
(
| e | OllamaError::ParseError
(
ParseErrorDetails
{
context: format!( "Failed to parse response from {}", endpoint ),
source: e,
}
)
)?;
Ok( parsed_response )
}
```
### Streaming Implementation Workflow
For streaming operations, the following pattern is recommended:
```rust
#[ cfg( feature = "streaming" ) ]
pub async fn chat_stream( &self, mut request: ChatRequest ) -> OllamaResult< impl Stream< Item = OllamaResult< ChatResponse > > >
{
// Force streaming mode
request.stream = Some( true );
// Execute streaming HTTP request
let response = self.execute_streaming_request( "chat", &request ).await?;
// Create stream processor
let stream = response.bytes_stream()
.map( | chunk_result | self.process_stream_chunk( chunk_result ) )
.filter_map
(
| result | async move
{
match result
{
Ok( Some( response ) ) => Some( Ok( response ) ),
Ok( None ) => None, // Partial chunk, continue
Err( e ) => Some( Err( e ) ),
}
}
);
Ok( Box::pin( stream ) )
}
// Stream chunk processing with buffering
fn process_stream_chunk( &self, chunk_result: Result< Bytes, reqwest::Error > ) -> OllamaResult< Option< ChatResponse > >
{
let chunk = chunk_result.map_err
(
| e | OllamaError::StreamError
(
StreamErrorDetails
{
message: format!( "Stream chunk error: {}", e ),
position: None,
}
)
)?;
// Convert to string and attempt JSON parsing
let chunk_str = std::str::from_utf8( &chunk )
.map_err
(
| e | OllamaError::StreamError
(
StreamErrorDetails
{
message: format!( "UTF-8 decode error: {}", e ),
position: Some( e.valid_up_to() ),
}
)
)?;
// Handle partial JSON objects (implementation detail)
self.parse_json_chunk( chunk_str )
}
```
### Error Handling Implementation Pattern
It is recommended that error handling follow a consistent mapping pattern:
```rust
impl OllamaClient
{
fn map_reqwest_error( &self, error: reqwest::Error ) -> OllamaError
{
if error.is_timeout()
{
OllamaError::NetworkError
(
NetworkErrorDetails
{
kind: NetworkErrorKind::Timeout,
message: format!( "Request timed out after {}s", self.config.timeout.as_secs() ),
source: Some( Box::new( error ) ),
}
)
}
else if error.is_connect()
{
OllamaError::NetworkError
(
NetworkErrorDetails
{
kind: NetworkErrorKind::Connection,
message: format!( "Failed to connect to {}", self.config.base_url ),
source: Some( Box::new( error ) ),
}
)
}
else
{
OllamaError::NetworkError
(
NetworkErrorDetails
{
kind: NetworkErrorKind::Unknown,
message: error.to_string(),
source: Some( Box::new( error ) ),
}
)
}
}
}
```
### Feature Gating Implementation Pattern
Conditional compilation should follow this pattern:
```rust
impl OllamaClient
{
// Core functionality always available when enabled
#[ cfg( feature = "enabled" ) ]
pub async fn chat( &self, request: ChatRequest ) -> OllamaResult< ChatResponse >
{
self.execute_request( "chat", &request ).await
}
// Stub implementation when feature disabled
#[ cfg( not( feature = "enabled" ) ) ]
pub async fn chat( &self, _request: ChatRequest ) -> OllamaResult< ChatResponse >
{
Err
(
OllamaError::FeatureNotEnabled
(
"Chat functionality requires 'enabled' feature".to_string()
)
)
}
// Streaming methods only exist when both features enabled
#[ cfg( all( feature = "enabled", feature = "streaming" ) ) ]
pub async fn chat_stream( &self, request: ChatRequest ) -> OllamaResult< impl Stream< Item = OllamaResult< ChatResponse > > >
{
// Streaming implementation
}
}
```
### Testing Integration Points
The implementation should provide hooks for testing:
```rust
impl OllamaClient
{
// Constructor for testing with custom HTTP client
#[ cfg( test ) ]
pub fn with_http_client( base_url: String, client: reqwest::Client ) -> OllamaResult< Self >
{
let config = ClientConfig
{
base_url,
timeout: Duration::from_secs( 30 ),
http_client: Arc::new( client ),
};
Self::from_config( config )
}
}
#[ cfg( all( test, feature = "integration" ) ) ]
mod integration_tests
{
use super::*;
#[ tokio::test ]
async fn test_real_ollama_connection()
{
let client = OllamaClient::new().expect( "Failed to create client" );
// Only run if Ollama server is available
if !client.is_available().await
{
eprintln!( "Skipping integration test: Ollama server not available" );
return;
}
// Test actual API calls
let models = client.list_models().await.expect( "Failed to list models" );
assert!( !models.models.is_empty() );
}
}
```
### Performance Optimization Guidelines
The reference implementation should incorporate these optimizations:
**Connection Reuse**:
- Use a single `reqwest::Client` instance per `OllamaClient`
- Enable HTTP/2 connection multiplexing where supported
- Configure reasonable connection pool limits
**Memory Efficiency**:
- Use streaming deserialization for large responses where possible
- Implement bounded buffers for streaming operations
- Avoid unnecessary string cloning in hot paths
**Error Context Preservation**:
- Maintain error source chains for debugging
- Include operation context in all error messages
- Use structured error types over string-based errors
This reference implementation provides a robust foundation that meets all functional and non-functional requirements while maintaining flexibility for future enhancements.