multi-llm 1.0.0

# multi-llm: Design Document

> **Version**: 1.1
> **Status**: Living Document
> **Last Updated**: 2025-11-27

## Table of Contents

1. [Overview & Philosophy](#1-overview--philosophy)
2. [Design Goals & Non-Goals](#2-design-goals--non-goals)
3. [Architecture Overview](#3-architecture-overview)
4. [Core Abstractions](#4-core-abstractions)
5. [Provider Integration Model](#5-provider-integration-model)
6. [Public API Design](#6-public-api-design)
7. [Error Handling Strategy](#7-error-handling-strategy)
8. [Events System](#8-events-system)
9. [Testing Strategy](#9-testing-strategy)
10. [Stability & Versioning](#10-stability--versioning)
11. [Future Directions](#11-future-directions)
12. [Appendices](#appendices)
    - [A: Architecture Decision Records](#appendix-a-architecture-decision-records)
    - [B: Glossary](#appendix-b-glossary)
    - [C: Contributing](#appendix-c-contributing)

---

## 1. Overview & Philosophy

### Purpose

The `multi-llm` library provides a unified, type-safe interface for interacting with multiple Large Language Model (LLM) providers through a single abstraction. It eliminates the need to learn and maintain separate client libraries for each LLM provider by offering:

- **Unified message format** that works across OpenAI, Anthropic, Ollama, and LM Studio
- **Multi-provider support** with concurrent connections to N providers (1 to many, configured via config)
- **Provider-agnostic API** with consistent error handling and response types
- **Native support** for advanced features like prompt caching and tool calling
- **Type safety** leveraging Rust's type system to prevent runtime errors

Applications can configure one provider for simple use cases, or multiple providers for redundancy, A/B testing, or provider-specific feature access - all through configuration without code changes.

**Multi-Instance Pattern**: The library supports multiple instances of the same provider type, even with identical configurations. This enables:
- **Different models**: Fast vs powerful models from the same provider
- **Different configurations**: Varied caching, temperature, or other parameters
- **Business tracking**: Identical configs with different labels to track usage patterns via events
- **A/B testing**: Compare identical setups with different API keys or labels
- **Complete flexibility**: Library users decide how and why to instantiate multiple providers

### Design Philosophy

1. **KISS Principle (Keep It Simple, Stupid)**: Favor simplicity over complexity. Simple solutions are maintainable solutions. Complexity is a cost that must be justified.
2. **Unified Abstraction**: Single message format across all providers - write once, run anywhere
3. **Multi-Provider by Design**: Support 1 to N concurrent provider connections configured via config, not code
4. **Provider Transparency**: Don't hide provider differences; expose them clearly through configuration
5. **Library-First**: Pure library with no application assumptions or business logic
6. **Minimal Dependencies**: Every dependency impacts downstream users - be selective
7. **Async-First**: Modern Rust async/await patterns throughout
8. **Error Transparency**: Rich error types expose provider-specific failures for informed handling
9. **Type Safety**: Leverage Rust's type system to catch errors at compile time

### Scope Boundaries

**In Scope**:
- Unified message format for LLM communication
- Multi-provider support (OpenAI, Anthropic, Ollama, LM Studio)
- Tool/function calling abstraction
- Prompt caching hints (Anthropic-style, extensible to other providers)
- Provider configuration management
- Async API for all I/O operations
- Rich error types with retry information
- Optional business event logging (feature-gated)

**Out of Scope**:
- Application-level concerns (sessions, user management, authentication beyond API keys)
- Business logic or domain-specific workflows
- Built-in rate limiting or quota management (users implement via tower middleware)
- Prompt engineering utilities or templates
- Vector databases or embeddings
- Model training or fine-tuning

---

## 2. Design Goals & Non-Goals

### Primary Goals

1. **Provider Agnostic**: Write application code once, switch providers with configuration change
2. **Type Safety**: Leverage Rust's type system to prevent errors at compile time
3. **Flexibility**: Support provider-specific features without compromising core abstraction
4. **Maintainability**: Simple, consistent patterns across all provider implementations
5. **Library-Grade Quality**: Pure library suitable for use as a dependency in any Rust project

### Explicit Non-Goals

1. **Universal API Coverage**: Not trying to support every feature of every LLM provider
2. **Feature Parity Enforcement**: Providers have different capabilities; we expose differences, not hide them
3. **Application Framework**: Not an opinionated framework, just a library
4. **Streaming (pre-1.0)**: Streaming support deferred to post-1.0 due to complexity
5. **Synchronous API**: Async-only by design; no blocking APIs

### Success Criteria

- **Developer Experience**: Switching from OpenAI to Anthropic requires only config change, not code rewrite
- **Type Safety**: Common mistakes (wrong message format, missing required config) caught at compile time
- **Performance**: Zero-copy conversions where possible, minimal allocation overhead
- **Stability**: Public API stable after 1.0, internal implementations can evolve

---

## 3. Architecture Overview

### System Architecture

```mermaid
graph TD
    App[Application Code] --> Client[UnifiedLLMClient]
    Client --> Provider[LlmProvider Trait]
    Provider --> OpenAI[OpenAI Provider]
    Provider --> Anthropic[Anthropic Provider]
    Provider --> Ollama[Ollama Provider]
    Provider --> LMStudio[LM Studio Provider]

    App --> Msg[Message Types]
    Msg --> Conv[Provider Conversions]
    Conv --> OpenAI
    Conv --> Anthropic
    Conv --> Ollama
    Conv --> LMStudio

    Provider --> Events[Event Handler]
    Events -.->|Optional| EventImpl[User Event Handler]
```

### Data Flow

```mermaid
sequenceDiagram
    participant App
    participant Client
    participant Provider
    participant LLM

    App->>Client: execute(messages, config)
    Client->>Client: validate request
    Client->>Provider: execute_request(request)
    Provider->>Provider: convert Message to provider format
    Provider->>LLM: HTTP request (provider API)
    LLM-->>Provider: HTTP response
    Provider->>Provider: convert to Response
    opt Events enabled
        Provider->>Events: emit business events
    end
    Provider-->>Client: Result<Response>
    Client-->>App: Result<Response>
```

### Module Organization

```
multi-llm/
├── src/
│   ├── lib.rs               # Public API re-exports (MINIMAL)
│   ├── messages.rs          # Message, MessageRole, MessageContent (PUBLIC)
│   ├── provider.rs          # LlmProvider trait (PUBLIC)
│   ├── response.rs          # Response types (PUBLIC)
│   ├── error.rs             # LlmError (PUBLIC)
│   ├── config.rs            # Provider configs (PUBLIC)
│   ├── providers/           # Provider implementations (INTERNAL)
│   │   ├── anthropic/       # Anthropic Claude
│   │   ├── openai/          # OpenAI GPT
│   │   ├── ollama/          # Ollama
│   │   └── lmstudio/        # LM Studio
│   └── internals/           # Internal utilities (NOT exported)
│       ├── retry.rs         # Retry logic
│       ├── tokens.rs        # Token counting
│       ├── response_parser.rs
│       └── events.rs        # Event types (feature-gated)
```

**Design Principle**: Clear separation between public API (stable, documented) and internal implementation (can change freely).

---

## 4. Core Abstractions

### 4.1 Message Types

**Purpose**: Provider-agnostic message format supporting all common LLM message patterns.

**Key Decision**: Single unified format vs per-provider types
- **Chosen**: Unified format with provider-specific attributes
- **Rationale**: Enables provider switching without code changes; complexity hidden in conversion layer
- **Trade-off**: Some provider features require "escape hatch" metadata attributes

**Core Types**:

```rust
pub enum MessageRole {
    System,
    User,
    Assistant,
    Tool,
}

pub enum MessageContent {
    Text(String),
    ToolCall(ToolCallContent),
    ToolResult(ToolResultContent),
}

pub struct MessageAttributes {
    pub cache_control: Option<CacheControl>,  // Anthropic prompt caching
    pub priority: i32,                        // Message ordering hint
    pub metadata: HashMap<String, Value>,     // Provider-specific extras
}

pub struct Message {
    pub role: MessageRole,
    pub content: MessageContent,
    pub attributes: MessageAttributes,
}
```

**Builder Pattern** (for ergonomics):
```rust
let msg = Message::user("What is the capital of France?")
    .cacheable()
    .with_priority(10)
    .build();
```

**See**: [ADR-001: Unified Message Architecture](./adr/001-unified-message-architecture.md)

### 4.2 LlmProvider Trait

**Purpose**: Define contract that all provider implementations must satisfy.

```rust
#[async_trait]
pub trait LlmProvider: Send + Sync {
    async fn execute_llm(
        &self,
        request: UnifiedLLMRequest,
        config: Option<RequestConfig>,
        label: Option<&str>,
    ) -> Result<Response, LlmError>;

    fn provider_name(&self) -> &'static str;

    fn supports_caching(&self) -> bool { false }
}
```

**Design Principles**:
- **Async-only**: All providers are async (no blocking APIs)
- **Send + Sync**: Enable multi-threaded runtime usage
- **Result-based**: No panics; all errors returned as `Result`
- **Unified request/response**: Providers convert to/from their native formats internally

**See**: [ADR-002: Provider Trait Design](./adr/002-provider-trait-design.md)

### 4.3 Tool Calling

**Purpose**: Unified abstraction for function/tool calling across providers.

```rust
pub struct Tool {
    pub name: String,
    pub description: String,
    pub parameters: Value,  // JSON Schema
}

pub enum ToolChoice {
    Auto,      // Let LLM decide
    None,      // Don't use tools
    Required,  // Must use a tool
    Specific(String),  // Use specific tool
}

pub struct ToolCall {
    pub id: String,
    pub name: String,
    pub arguments: Value,
}

pub struct ToolResult {
    pub tool_call_id: String,
    pub content: String,
    pub is_error: bool,
}
```

**Validation**: Tools validated at config construction time:
- Unique tool names
- Valid JSON Schema in parameters
- Required fields present

### 4.4 Caching Hints

**Purpose**: Support Anthropic-style prompt caching without coupling to Anthropic.

**Design**:
```rust
pub struct CacheControl {
    pub cache_type: CacheType,
}

pub enum CacheType {
    Ephemeral,   // Anthropic: 5-minute cache
    Extended,    // Anthropic: 1-hour cache
    // Future: Persistent, Custom(Duration), etc.
}

// Attached to MessageAttributes
message.attributes.cache_control = Some(CacheControl::ephemeral());
message.attributes.cache_control = Some(CacheControl::extended());
```

**Rationale**:
- Anthropic has native caching with two tiers (5-minute ephemeral, 1-hour extended)
- Both cache types are exposed from the start (extended cache used in production)
- Implemented as optional attributes (ignored by providers that don't support it)
- Future-proof: new providers can adopt if they support caching

**See**: [ADR-003: Caching Hints Architecture](./adr/003-caching-hints.md)

### 4.5 Request and Response Types

**Request**:
```rust
pub struct Request {
    pub messages: Vec<Message>,
    pub config: Option<RequestConfig>,
}

pub struct RequestConfig {
    pub temperature: Option<f64>,
    pub max_tokens: Option<u32>,
    pub tools: Vec<Tool>,
    pub tool_choice: Option<ToolChoice>,
    pub response_format: Option<ResponseFormat>,
    // Provider-specific overrides in metadata
    pub metadata: HashMap<String, Value>,
}
```

**Response**:
```rust
pub struct Response {
    pub content: String,
    pub role: MessageRole,
    pub tool_calls: Vec<ToolCall>,
    pub usage: TokenUsage,
    pub finish_reason: FinishReason,
    #[cfg(feature = "events")]
    pub events: Vec<BusinessEvent>,
}

pub struct TokenUsage {
    pub prompt_tokens: u32,
    pub completion_tokens: u32,
    pub total_tokens: u32,
    // Anthropic caching stats
    pub cache_creation_tokens: Option<u32>,
    pub cache_read_tokens: Option<u32>,
}
```

---

## 5. Provider Integration Model

### Adding a New Provider

**Steps**:
1. Create module in `src/providers/new_provider/`
2. Define provider-specific types (request/response formats)
3. Implement conversion: `Message` → provider format
4. Implement conversion: provider response → `Response`
5. Implement `LlmProvider` trait
6. Add configuration in `src/config.rs`
7. Add tests (unit tests for conversions, integration tests for end-to-end)

### Example Pattern

```rust
// src/providers/openai/mod.rs

pub struct OpenAIProvider {
    config: OpenAIConfig,
    client: reqwest::Client,
}

impl OpenAIProvider {
    pub fn new(config: OpenAIConfig) -> Result<Self, LlmError> {
        let client = reqwest::Client::builder()
            .timeout(Duration::from_secs(120))
            .build()?;
        Ok(Self { config, client })
    }
}

#[async_trait]
impl LlmProvider for OpenAIProvider {
    async fn execute(
        &self,
        request: Request,
        config: Option<RequestConfig>,
    ) -> Result<Response, LlmError> {
        // 1. Convert unified types to OpenAI format
        let openai_request = convert_request(&request, config)?;

        // 2. Make HTTP request
        let response = self.client
            .post(&self.config.endpoint)
            .header("Authorization", format!("Bearer {}", self.config.api_key))
            .json(&openai_request)
            .send()
            .await
            .map_err(LlmError::network_error)?;

        // 3. Parse response
        let openai_response = response.json::<OpenAIResponse>()
            .await
            .map_err(LlmError::response_parse_error)?;

        // 4. Convert to unified Response
        convert_response(openai_response)
    }

    fn provider_name(&self) -> &'static str {
        "openai"
    }
}
```

### Consistency Requirements

All providers must follow these patterns:

1. **Configuration**: Each provider has dedicated config struct implementing `ProviderConfig` trait
2. **Error Mapping**: All provider-specific errors mapped to `LlmError` variants
3. **Logging**: Use `log_debug!`, `log_info!`, `log_warn!`, `log_error!` macros at appropriate levels
   - These macros abstract the underlying logging implementation (currently `tracing`)
   - Allows future logging framework changes without code rewrites
4. **Testing**:
   - Unit tests for message conversions
   - Unit tests for error handling
   - Integration tests for full request/response cycle (can be `#[ignore]` if requires external service)
5. **No Panics**: Return `Result` everywhere; never `unwrap()`, `expect()`, or `unreachable!()`
6. **No println!**: Use internal `log_*!` macros (available via `crate::logging`)

### Provider-Specific Features

**Handling features unique to one provider**:

1. **Configuration**: Expose via provider-specific config struct (preferred)
   ```rust
   pub struct AnthropicConfig {
       pub api_key: String,
       pub cache_ttl: Option<String>,  // Anthropic-specific
   }
   ```

2. **Metadata escape hatch**: Use `RequestConfig.metadata` for provider-specific overrides
   ```rust
   let mut config = RequestConfig::default();
   config.metadata.insert("anthropic:cache_ttl".into(), json!("5m"));
   ```

   **Important**: Metadata is a **workaround** to prevent blocking users who need provider-specific features not yet in the library. If you find yourself using metadata:
   - **File an issue** requesting the feature be added to the library properly
   - **Submit a PR** implementing the feature in a provider-agnostic way
   - Metadata should be temporary - we want to reduce its usage over time by properly supporting features

3. **Response data**: Include optional fields that only some providers populate
   ```rust
   pub struct TokenUsage {
       // Universal
       pub total_tokens: u32,
       // Anthropic-specific (None for other providers)
       pub cache_read_tokens: Option<u32>,
   }
   ```

---

## 6. Public API Design

### Stability Tiers

| Tier | Stability | Examples | Breaking Changes |
|------|-----------|----------|------------------|
| **Public API** | Locked after 1.0 | `LlmProvider` trait, `UnifiedMessage`, `Response`, `LlmError` | Requires major version bump |
| **Public Config** | Stable | Provider config structs, `RequestConfig` | Minor version if additive only |
| **Internal** | Unstable | Provider implementations, conversions, retry logic | Can change freely in any version |

### Minimal Public API Surface

**Philosophy**: Only expose what users need directly. Keep internals private.

**Public exports from `lib.rs`** (~28 types):

```rust
// Client
pub use client::UnifiedLLMClient;

// Core message types
pub use core_types::{
    UnifiedMessage, MessageRole, MessageContent, MessageAttributes, MessageCategory,
};

// Request/Response types
pub use core_types::{
    UnifiedLLMRequest, RequestConfig, Response, TokenUsage,
};

// Tool types
pub use core_types::{
    Tool, ToolCall, ToolChoice, ToolResult, ResponseFormat,
};

// Provider trait
pub use core_types::LlmProvider;

// Error types
pub use error::{LlmError, LlmResult};

// Provider configs (for construction)
pub use config::{
    LLMConfig, OpenAIConfig, AnthropicConfig, OllamaConfig, LMStudioConfig,
    DefaultLLMParams, DualLLMConfig, LLMPath, ProviderConfig,
};

// Provider implementations (for construction only)
pub use providers::{
    OpenAIProvider, AnthropicProvider, OllamaProvider, LMStudioProvider,
};

// Token counting
pub use tokens::{
    TokenCounter, TokenCounterFactory, AnthropicTokenCounter, OpenAITokenCounter,
};

// Retry configuration
pub use retry::RetryPolicy;

// Events (feature-gated)
#[cfg(feature = "events")]
pub use core_types::{BusinessEvent, EventScope, LLMBusinessEvent, event_types};
```

**NOT exported** (internal implementation details):
- `logging` module - internal tracing macros
- `response_parser` module - internal parsing logic
- `retry` internals - `CircuitBreaker`, `CircuitState`, `RetryExecutor`
- Error classification - `ErrorCategory`, `ErrorSeverity`, `UserErrorCategory`
- Internal types - `ToolCallingRound`
- Provider conversion modules
- HTTP client utilities

### API Usage Examples

**Basic usage**:
```rust
use multi_llm::{Message, Request, OpenAIProvider, OpenAIConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = OpenAIConfig {
        api_key: std::env::var("OPENAI_API_KEY")?,
        ..Default::default()
    };

    let provider = OpenAIProvider::new(config)?;

    let request = Request {
        messages: vec![
            Message::user("What is the capital of France?"),
        ],
        config: None,
    };

    let response = provider.execute(request, None).await?;
    println!("Response: {}", response.content);

    Ok(())
}
```

**Provider switching**:
```rust
use multi_llm::{LlmProvider, AnthropicProvider, OpenAIProvider};

// Common interface
async fn ask_llm(
    provider: &dyn LlmProvider,
    question: &str
) -> Result<String, LlmError> {
    let request = Request {
        messages: vec![Message::user(question)],
        config: None,
    };

    let response = provider.execute(request, None).await?;
    Ok(response.content)
}

// Works with any provider
let openai = OpenAIProvider::new(openai_config)?;
let anthropic = AnthropicProvider::new(anthropic_config)?;

let answer1 = ask_llm(&openai, "What is 2+2?").await?;
let answer2 = ask_llm(&anthropic, "What is 2+2?").await?;
```

**Tool calling**:
```rust
use multi_llm::{Tool, ToolChoice, RequestConfig};

let tools = vec![
    Tool {
        name: "get_weather".to_string(),
        description: "Get current weather for a location".to_string(),
        parameters: json!({
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }),
    },
];

let config = RequestConfig {
    tools,
    tool_choice: Some(ToolChoice::Auto),
    ..Default::default()
};

let response = provider.execute(request, Some(config)).await?;

if !response.tool_calls.is_empty() {
    for tool_call in response.tool_calls {
        println!("Tool called: {}", tool_call.name);
        println!("Arguments: {}", tool_call.arguments);
    }
}
```

---

## 7. Error Handling Strategy

### Design Principles

1. **No Panics**: Library code never panics; all errors returned as `Result`
2. **Rich Context**: Errors include provider info, HTTP status, error messages, retry hints
3. **Provider Transparency**: Expose provider-specific errors clearly
4. **Actionable**: Users can distinguish retryable vs non-retryable errors

### Error Hierarchy

```rust
#[derive(Debug, thiserror::Error)]
pub enum LlmError {
    #[error("Configuration error: {0}")]
    Configuration(String),

    #[error("Network error: {0}")]
    Network(String),

    #[error("Provider {provider} error (status {status_code:?}): {message}")]
    Provider {
        provider: String,
        status_code: Option<u16>,
        message: String,
    },

    #[error("Validation error: {0}")]
    Validation(String),

    #[error("Response parse error: {0}")]
    ResponseParse(String),

    #[error("Rate limit exceeded")]
    RateLimit { retry_after: Option<Duration> },

    #[error("Authentication failed: {0}")]
    Authentication(String),

    #[error("Timeout: {0}")]
    Timeout(String),

    #[error("Circuit breaker open for provider: {0}")]
    CircuitBreakerOpen(String),
}

impl LlmError {
    pub fn is_retryable(&self) -> bool {
        matches!(self,
            LlmError::Network(_) |
            LlmError::RateLimit { .. } |
            LlmError::Timeout(_) |
            LlmError::Provider { status_code: Some(500..=599), .. }
        )
    }

    pub fn retry_after(&self) -> Option<Duration> {
        match self {
            LlmError::RateLimit { retry_after } => *retry_after,
            _ => None,
        }
    }
}
```

### Error Mapping from Providers

Each provider maps its errors to `LlmError`:

```rust
// OpenAI 429 rate limit
LlmError::RateLimit {
    retry_after: parse_retry_after_header(response),
}

// Anthropic 401 auth error
LlmError::Authentication(
    "Invalid API key".to_string()
)

// Generic 500 server error
LlmError::Provider {
    provider: "openai",
    status_code: Some(500),
    message: "Internal server error",
    retry_after: None,
}
```

**See**: [ADR-004: Error Handling Strategy](./adr/004-error-handling-strategy.md)

---

## 8. Events System

### Purpose

The events system provides **optional** observability into LLM operations for applications that need structured event logging (caching hits, token usage, provider selection, etc.).

### Design Decision

**Status**: Feature-gated (enabled via `features = ["events"]`)

**Rationale**:
- Many applications need observability beyond basic logging
- Business events capture structured data (tokens, cache hits, costs)
- Must remain **optional** - not all library users want/need events
- Enables downstream analytics, cost tracking, performance monitoring

### Architecture

**When feature is enabled**:
```rust
#[cfg(feature = "events")]
pub struct BusinessEvent {
    pub event_id: String,
    pub timestamp: DateTime<Utc>,
    pub event_type: EventType,
    pub scope: EventScope,
    pub metadata: HashMap<String, Value>,
}

#[cfg(feature = "events")]
pub enum EventType {
    CacheHit { tokens_saved: u32 },
    CacheWrite { tokens_written: u32 },
    TokenUsage { prompt: u32, completion: u32 },
    ProviderCall { provider: String, duration_ms: u64 },
    // ... extensible
}

#[cfg(feature = "events")]
pub enum EventScope {
    Request(String),     // Per-request ID
    Session(String),     // Per-session ID (if app provides)
    User(String),        // Per-user ID (if app provides)
}
```

**When feature is disabled**:
- Event types not compiled
- No runtime overhead
- Response struct doesn't include events field

### Usage Pattern

**Provider implementation**:
```rust
#[async_trait]
impl LlmProvider for AnthropicProvider {
    async fn execute(...) -> Result<Response, LlmError> {
        // ... make request ...

        let mut response = Response {
            content: anthropic_response.content,
            // ... other fields ...
            #[cfg(feature = "events")]
            events: Vec::new(),
        };

        #[cfg(feature = "events")]
        {
            if let Some(cache_stats) = anthropic_response.usage.cache_read_input_tokens {
                response.events.push(BusinessEvent::cache_hit(cache_stats));
            }
        }

        Ok(response)
    }
}
```

**Application usage**:
```rust
#[cfg(feature = "events")]
{
    for event in response.events {
        match event.event_type {
            EventType::CacheHit { tokens_saved } => {
                log_cost_savings(tokens_saved);
            }
            EventType::TokenUsage { prompt, completion } => {
                track_usage(prompt, completion);
            }
            _ => {}
        }
    }
}
```

### Why Feature-Gated?

1. **Dependency minimization**: Events require `uuid` and `chrono` - users without events don't pay this cost
2. **Performance**: No event allocation/collection overhead when disabled
3. **Simplicity**: Users who just want basic LLM calls don't see event machinery
4. **Library principle**: Optional features should be opt-in, not forced

**See**: [ADR-005: Events System Design](./adr/005-events-system.md)

---

## 9. Testing Strategy

### Test Organization

**Unit Tests** (`src/*/tests.rs` or `#[cfg(test)]` modules):
- **Purpose**: Test individual functions/methods in isolation
- **Focus**: Message conversions, validation logic, error mapping
- **Speed**: Fast (no network, no external dependencies)
- **Run**: `cargo test --lib`

**Integration Tests** (`tests/` directory):
- **Purpose**: Test public APIs, full request/response cycles
- **Focus**: Provider switching, error propagation, end-to-end flows
- **Speed**: Slower (some require external services)
- **Run**: `cargo test --tests`
- **Note**: Tests requiring external services marked `#[ignore]`

### Test Principles

1. **Independence**: Tests don't share state, can run in any order
2. **Clarity**: Descriptive names following pattern `test_<what>_<condition>_<expected>`
3. **AAA Pattern**: Arrange, Act, Assert
4. **Fast by Default**: Slow tests marked `#[ignore]`, run separately
5. **Realistic**: Test actual usage patterns, not implementation details

### Example Tests

**Unit test (conversion)**:
```rust
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_message_to_openai_conversion_user_text() {
        // Arrange
        let message = Message::user("Hello world");

        // Act
        let openai_msg = convert_to_openai_message(&message).unwrap();

        // Assert
        assert_eq!(openai_msg.role, "user");
        assert_eq!(openai_msg.content, "Hello world");
    }

    #[test]
    fn test_tool_validation_rejects_duplicate_names() {
        // Arrange
        let config = RequestConfig {
            tools: vec![
                Tool { name: "get_weather".into(), /* ... */ },
                Tool { name: "get_weather".into(), /* ... */ },
            ],
            ..Default::default()
        };

        // Act
        let result = config.validate();

        // Assert
        assert!(result.is_err());
        assert!(matches!(result.unwrap_err(), LlmError::Validation(_)));
    }
}
```

**Integration test**:
```rust
#[tokio::test]
#[ignore] // Requires OpenAI API key
async fn test_openai_provider_basic_request() {
    let config = OpenAIConfig {
        api_key: std::env::var("OPENAI_API_KEY").unwrap(),
        ..Default::default()
    };

    let provider = OpenAIProvider::new(config).unwrap();

    let request = Request {
        messages: vec![Message::user("Say 'test'")],
        config: None,
    };

    let response = provider.execute(request, None).await.unwrap();

    assert!(!response.content.is_empty());
    assert_eq!(response.role, MessageRole::Assistant);
}
```

### Testing Guidelines

- **Don't mock provider implementations** - test them against real APIs (ignored tests)
- **Do mock external dependencies** in unit tests (use `mockall` for traits)
- **Test error paths** - verify errors are mapped correctly
- **Test provider-specific features** - caching, tool calling, etc.
- **Test provider switching** - same code works with different providers

---

## 10. Stability & Versioning

### Semantic Versioning

Following [SemVer 2.0](https://semver.org/):

**Major (X.0.0)** - Breaking changes to public API:
- Changing `Message` structure
- Removing/renaming public methods
- Changing trait signatures
- Removing public types

**Minor (0.X.0)** - Additive changes:
- New providers
- New optional fields on existing types (with defaults)
- New public methods
- New traits (not affecting existing code)

**Patch (0.0.X)** - Bug fixes and internal changes:
- Provider conversion fixes
- Performance improvements
- Documentation updates
- Internal refactoring

### Stability Guarantees

**Pre-1.0 (Current)**:
- Public API can change with minor version bumps
- Breaking changes documented in CHANGELOG
- Goal: Stabilize API based on real-world usage feedback

**Post-1.0**:
- **Public API locked**: Breaking changes require 2.0
- **Internal implementations free**: Can evolve without version bump
- **Provider additions**: Minor version bumps
- **New features**: Minor version if additive, major if breaking

### API Evolution Strategy

**Current State (0.1.x)**:
- Public API stabilized with clean naming
- Events system feature-gated
- Core types finalized (`UnifiedMessage`, `Response`, `LlmError`)
- Ready for production use

**After 1.0**:
- Public types frozen (only additive changes)
- New features via new types/traits (not breaking existing)
- Deprecation warnings before removal (one major version notice)

### Deprecation Policy

**Post-1.0**:
1. Mark deprecated in version N.x.0 with `#[deprecated]` attribute
2. Document replacement in deprecation message
3. Keep deprecated items for one major version
4. Remove in version (N+1).0.0

```rust
#[deprecated(since = "1.5.0", note = "Use `execute` instead")]
pub async fn execute_llm(...) -> Result<Response, LlmError> {
    self.execute(...).await
}
```

---

## 11. Future Directions

### Post-1.0 Planned Features

#### Streaming Support
**Status**: Deferred to post-1.0 (complex, needs careful design)

**Rationale for deferral**:
- Streaming adds significant API surface (new trait methods, stream types)
- Error handling in streams is complex (partial responses, cancellation)
- Backpressure and buffering strategies need careful design
- Want to stabilize core request/response API first

**Future design considerations**:
```rust
#[async_trait]
pub trait LlmProvider {
    // Existing
    async fn execute(...) -> Result<Response, LlmError>;

    // Future: streaming variant
    async fn execute_stream(...)
        -> Result<impl Stream<Item = Result<StreamChunk>>, LlmError>;
}

pub enum StreamChunk {
    ContentDelta(String),
    ToolCallStart { id: String, name: String },
    ToolCallDelta(String),
    ToolCallEnd,
    Done(TokenUsage),
}
```

#### Additional Providers
- Google Gemini
- Cohere
- Mistral
- Groq
- Custom provider trait for user-defined providers

#### Enhanced Features
1. **Retry Policies**: Configurable retry with exponential backoff (currently internal)
2. **Circuit Breaker**: Automatic failover on provider degradation
3. **Token Estimation**: Pre-flight token counting across providers
4. **Response Caching**: Local caching of LLM responses (user-configurable)
5. **Batch Requests**: Send multiple requests in parallel with result aggregation

### Under Consideration

- **Telemetry**: Optional OpenTelemetry integration for distributed tracing
- **Provider Health Checks**: Automatic provider availability detection
- **Cost Tracking**: Built-in cost estimation based on token usage
- **Prompt Templates**: Simple templating for common patterns

### Explicitly Ruled Out

- **Synchronous API**: Async-only by design, no blocking wrappers
- **Built-in Application Logic**: Remains pure library (no sessions, auth, etc.)
- **Database Integration**: Out of scope for this library
- **Embeddings/Vector Search**: Different concern, separate library

---

## Appendices

### Appendix A: Architecture Decision Records

Detailed rationale for major architectural decisions:

- [ADR-001: Unified Message Architecture](./adr/001-unified-message-architecture.md)
- [ADR-002: Provider Trait Design](./adr/002-provider-trait-design.md)
- [ADR-003: Caching Hints Architecture](./adr/003-caching-hints.md)
- [ADR-004: Error Handling Strategy](./adr/004-error-handling-strategy.md)
- [ADR-005: Events System Design](./adr/005-events-system.md)
- [ADR-006: Public API Stability](./adr/006-public-api-stability.md)

### Appendix B: Glossary

- **Message**: Provider-agnostic representation of LLM conversation turn
- **Provider**: Implementation of LLM API client (OpenAI, Anthropic, etc.)
- **Tool**: Function/tool that LLM can call (function calling)
- **Caching**: Provider-specific optimization to reuse prompt processing
- **Request**: Unified request type containing messages and configuration
- **Response**: Unified response type containing LLM output
- **Events**: Optional structured logging of LLM operations

### Appendix C: Contributing

When contributing to this project:

1. **Read this design doc** to understand architectural principles
2. **Follow established patterns** when adding providers or features
3. **Update relevant ADRs** if making architectural changes
4. **Add tests** for all new functionality (unit + integration)
5. **Document public APIs** with rustdoc comments
6. **No panics** in library code (return `Result` everywhere)
7. **Use `log_*!` macros** for logging (not `println!` or direct `tracing` macros)

---

**Document Maintenance**: This document should be updated when making architectural changes. Create new ADRs for new major decisions. Keep examples up-to-date with actual API.

**Last Reviewed**: 2025-11-27