multi-llm 1.0.0

Unified multi-provider LLM client with support for OpenAI, Anthropic, Ollama, and LMStudio
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
# multi-llm: Design Document

> **Version**: 1.1
> **Status**: Living Document
> **Last Updated**: 2025-11-27

## Table of Contents

1. [Overview & Philosophy]#1-overview--philosophy
2. [Design Goals & Non-Goals]#2-design-goals--non-goals
3. [Architecture Overview]#3-architecture-overview
4. [Core Abstractions]#4-core-abstractions
5. [Provider Integration Model]#5-provider-integration-model
6. [Public API Design]#6-public-api-design
7. [Error Handling Strategy]#7-error-handling-strategy
8. [Events System]#8-events-system
9. [Testing Strategy]#9-testing-strategy
10. [Stability & Versioning]#10-stability--versioning
11. [Future Directions]#11-future-directions
12. [Appendices]#appendices
    - [A: Architecture Decision Records]#appendix-a-architecture-decision-records
    - [B: Glossary]#appendix-b-glossary
    - [C: Contributing]#appendix-c-contributing

---

## 1. Overview & Philosophy

### Purpose

The `multi-llm` library provides a unified, type-safe interface for interacting with multiple Large Language Model (LLM) providers through a single abstraction. It eliminates the need to learn and maintain separate client libraries for each LLM provider by offering:

- **Unified message format** that works across OpenAI, Anthropic, Ollama, and LM Studio
- **Multi-provider support** with concurrent connections to N providers (1 to many, configured via config)
- **Provider-agnostic API** with consistent error handling and response types
- **Native support** for advanced features like prompt caching and tool calling
- **Type safety** leveraging Rust's type system to prevent runtime errors

Applications can configure one provider for simple use cases, or multiple providers for redundancy, A/B testing, or provider-specific feature access - all through configuration without code changes.

**Multi-Instance Pattern**: The library supports multiple instances of the same provider type, even with identical configurations. This enables:
- **Different models**: Fast vs powerful models from the same provider
- **Different configurations**: Varied caching, temperature, or other parameters
- **Business tracking**: Identical configs with different labels to track usage patterns via events
- **A/B testing**: Compare identical setups with different API keys or labels
- **Complete flexibility**: Library users decide how and why to instantiate multiple providers

### Design Philosophy

1. **KISS Principle (Keep It Simple, Stupid)**: Favor simplicity over complexity. Simple solutions are maintainable solutions. Complexity is a cost that must be justified.
2. **Unified Abstraction**: Single message format across all providers - write once, run anywhere
3. **Multi-Provider by Design**: Support 1 to N concurrent provider connections configured via config, not code
4. **Provider Transparency**: Don't hide provider differences; expose them clearly through configuration
5. **Library-First**: Pure library with no application assumptions or business logic
6. **Minimal Dependencies**: Every dependency impacts downstream users - be selective
7. **Async-First**: Modern Rust async/await patterns throughout
8. **Error Transparency**: Rich error types expose provider-specific failures for informed handling
9. **Type Safety**: Leverage Rust's type system to catch errors at compile time

### Scope Boundaries

**In Scope**:
- Unified message format for LLM communication
- Multi-provider support (OpenAI, Anthropic, Ollama, LM Studio)
- Tool/function calling abstraction
- Prompt caching hints (Anthropic-style, extensible to other providers)
- Provider configuration management
- Async API for all I/O operations
- Rich error types with retry information
- Optional business event logging (feature-gated)

**Out of Scope**:
- Application-level concerns (sessions, user management, authentication beyond API keys)
- Business logic or domain-specific workflows
- Built-in rate limiting or quota management (users implement via tower middleware)
- Prompt engineering utilities or templates
- Vector databases or embeddings
- Model training or fine-tuning

---

## 2. Design Goals & Non-Goals

### Primary Goals

1. **Provider Agnostic**: Write application code once, switch providers with configuration change
2. **Type Safety**: Leverage Rust's type system to prevent errors at compile time
3. **Flexibility**: Support provider-specific features without compromising core abstraction
4. **Maintainability**: Simple, consistent patterns across all provider implementations
5. **Library-Grade Quality**: Pure library suitable for use as a dependency in any Rust project

### Explicit Non-Goals

1. **Universal API Coverage**: Not trying to support every feature of every LLM provider
2. **Feature Parity Enforcement**: Providers have different capabilities; we expose differences, not hide them
3. **Application Framework**: Not an opinionated framework, just a library
4. **Streaming (pre-1.0)**: Streaming support deferred to post-1.0 due to complexity
5. **Synchronous API**: Async-only by design; no blocking APIs

### Success Criteria

- **Developer Experience**: Switching from OpenAI to Anthropic requires only config change, not code rewrite
- **Type Safety**: Common mistakes (wrong message format, missing required config) caught at compile time
- **Performance**: Zero-copy conversions where possible, minimal allocation overhead
- **Stability**: Public API stable after 1.0, internal implementations can evolve

---

## 3. Architecture Overview

### System Architecture

```mermaid
graph TD
    App[Application Code] --> Client[UnifiedLLMClient]
    Client --> Provider[LlmProvider Trait]
    Provider --> OpenAI[OpenAI Provider]
    Provider --> Anthropic[Anthropic Provider]
    Provider --> Ollama[Ollama Provider]
    Provider --> LMStudio[LM Studio Provider]

    App --> Msg[Message Types]
    Msg --> Conv[Provider Conversions]
    Conv --> OpenAI
    Conv --> Anthropic
    Conv --> Ollama
    Conv --> LMStudio

    Provider --> Events[Event Handler]
    Events -.->|Optional| EventImpl[User Event Handler]
```

### Data Flow

```mermaid
sequenceDiagram
    participant App
    participant Client
    participant Provider
    participant LLM

    App->>Client: execute(messages, config)
    Client->>Client: validate request
    Client->>Provider: execute_request(request)
    Provider->>Provider: convert Message to provider format
    Provider->>LLM: HTTP request (provider API)
    LLM-->>Provider: HTTP response
    Provider->>Provider: convert to Response
    opt Events enabled
        Provider->>Events: emit business events
    end
    Provider-->>Client: Result<Response>
    Client-->>App: Result<Response>
```

### Module Organization

```
multi-llm/
├── src/
│   ├── lib.rs               # Public API re-exports (MINIMAL)
│   ├── messages.rs          # Message, MessageRole, MessageContent (PUBLIC)
│   ├── provider.rs          # LlmProvider trait (PUBLIC)
│   ├── response.rs          # Response types (PUBLIC)
│   ├── error.rs             # LlmError (PUBLIC)
│   ├── config.rs            # Provider configs (PUBLIC)
│   ├── providers/           # Provider implementations (INTERNAL)
│   │   ├── anthropic/       # Anthropic Claude
│   │   ├── openai/          # OpenAI GPT
│   │   ├── ollama/          # Ollama
│   │   └── lmstudio/        # LM Studio
│   └── internals/           # Internal utilities (NOT exported)
│       ├── retry.rs         # Retry logic
│       ├── tokens.rs        # Token counting
│       ├── response_parser.rs
│       └── events.rs        # Event types (feature-gated)
```

**Design Principle**: Clear separation between public API (stable, documented) and internal implementation (can change freely).

---

## 4. Core Abstractions

### 4.1 Message Types

**Purpose**: Provider-agnostic message format supporting all common LLM message patterns.

**Key Decision**: Single unified format vs per-provider types
- **Chosen**: Unified format with provider-specific attributes
- **Rationale**: Enables provider switching without code changes; complexity hidden in conversion layer
- **Trade-off**: Some provider features require "escape hatch" metadata attributes

**Core Types**:

```rust
pub enum MessageRole {
    System,
    User,
    Assistant,
    Tool,
}

pub enum MessageContent {
    Text(String),
    ToolCall(ToolCallContent),
    ToolResult(ToolResultContent),
}

pub struct MessageAttributes {
    pub cache_control: Option<CacheControl>,  // Anthropic prompt caching
    pub priority: i32,                        // Message ordering hint
    pub metadata: HashMap<String, Value>,     // Provider-specific extras
}

pub struct Message {
    pub role: MessageRole,
    pub content: MessageContent,
    pub attributes: MessageAttributes,
}
```

**Builder Pattern** (for ergonomics):
```rust
let msg = Message::user("What is the capital of France?")
    .cacheable()
    .with_priority(10)
    .build();
```

**See**: [ADR-001: Unified Message Architecture](./adr/001-unified-message-architecture.md)

### 4.2 LlmProvider Trait

**Purpose**: Define contract that all provider implementations must satisfy.

```rust
#[async_trait]
pub trait LlmProvider: Send + Sync {
    async fn execute_llm(
        &self,
        request: UnifiedLLMRequest,
        config: Option<RequestConfig>,
        label: Option<&str>,
    ) -> Result<Response, LlmError>;

    fn provider_name(&self) -> &'static str;

    fn supports_caching(&self) -> bool { false }
}
```

**Design Principles**:
- **Async-only**: All providers are async (no blocking APIs)
- **Send + Sync**: Enable multi-threaded runtime usage
- **Result-based**: No panics; all errors returned as `Result`
- **Unified request/response**: Providers convert to/from their native formats internally

**See**: [ADR-002: Provider Trait Design](./adr/002-provider-trait-design.md)

### 4.3 Tool Calling

**Purpose**: Unified abstraction for function/tool calling across providers.

```rust
pub struct Tool {
    pub name: String,
    pub description: String,
    pub parameters: Value,  // JSON Schema
}

pub enum ToolChoice {
    Auto,      // Let LLM decide
    None,      // Don't use tools
    Required,  // Must use a tool
    Specific(String),  // Use specific tool
}

pub struct ToolCall {
    pub id: String,
    pub name: String,
    pub arguments: Value,
}

pub struct ToolResult {
    pub tool_call_id: String,
    pub content: String,
    pub is_error: bool,
}
```

**Validation**: Tools validated at config construction time:
- Unique tool names
- Valid JSON Schema in parameters
- Required fields present

### 4.4 Caching Hints

**Purpose**: Support Anthropic-style prompt caching without coupling to Anthropic.

**Design**:
```rust
pub struct CacheControl {
    pub cache_type: CacheType,
}

pub enum CacheType {
    Ephemeral,   // Anthropic: 5-minute cache
    Extended,    // Anthropic: 1-hour cache
    // Future: Persistent, Custom(Duration), etc.
}

// Attached to MessageAttributes
message.attributes.cache_control = Some(CacheControl::ephemeral());
message.attributes.cache_control = Some(CacheControl::extended());
```

**Rationale**:
- Anthropic has native caching with two tiers (5-minute ephemeral, 1-hour extended)
- Both cache types are exposed from the start (extended cache used in production)
- Implemented as optional attributes (ignored by providers that don't support it)
- Future-proof: new providers can adopt if they support caching

**See**: [ADR-003: Caching Hints Architecture](./adr/003-caching-hints.md)

### 4.5 Request and Response Types

**Request**:
```rust
pub struct Request {
    pub messages: Vec<Message>,
    pub config: Option<RequestConfig>,
}

pub struct RequestConfig {
    pub temperature: Option<f64>,
    pub max_tokens: Option<u32>,
    pub tools: Vec<Tool>,
    pub tool_choice: Option<ToolChoice>,
    pub response_format: Option<ResponseFormat>,
    // Provider-specific overrides in metadata
    pub metadata: HashMap<String, Value>,
}
```

**Response**:
```rust
pub struct Response {
    pub content: String,
    pub role: MessageRole,
    pub tool_calls: Vec<ToolCall>,
    pub usage: TokenUsage,
    pub finish_reason: FinishReason,
    #[cfg(feature = "events")]
    pub events: Vec<BusinessEvent>,
}

pub struct TokenUsage {
    pub prompt_tokens: u32,
    pub completion_tokens: u32,
    pub total_tokens: u32,
    // Anthropic caching stats
    pub cache_creation_tokens: Option<u32>,
    pub cache_read_tokens: Option<u32>,
}
```

---

## 5. Provider Integration Model

### Adding a New Provider

**Steps**:
1. Create module in `src/providers/new_provider/`
2. Define provider-specific types (request/response formats)
3. Implement conversion: `Message` → provider format
4. Implement conversion: provider response → `Response`
5. Implement `LlmProvider` trait
6. Add configuration in `src/config.rs`
7. Add tests (unit tests for conversions, integration tests for end-to-end)

### Example Pattern

```rust
// src/providers/openai/mod.rs

pub struct OpenAIProvider {
    config: OpenAIConfig,
    client: reqwest::Client,
}

impl OpenAIProvider {
    pub fn new(config: OpenAIConfig) -> Result<Self, LlmError> {
        let client = reqwest::Client::builder()
            .timeout(Duration::from_secs(120))
            .build()?;
        Ok(Self { config, client })
    }
}

#[async_trait]
impl LlmProvider for OpenAIProvider {
    async fn execute(
        &self,
        request: Request,
        config: Option<RequestConfig>,
    ) -> Result<Response, LlmError> {
        // 1. Convert unified types to OpenAI format
        let openai_request = convert_request(&request, config)?;

        // 2. Make HTTP request
        let response = self.client
            .post(&self.config.endpoint)
            .header("Authorization", format!("Bearer {}", self.config.api_key))
            .json(&openai_request)
            .send()
            .await
            .map_err(LlmError::network_error)?;

        // 3. Parse response
        let openai_response = response.json::<OpenAIResponse>()
            .await
            .map_err(LlmError::response_parse_error)?;

        // 4. Convert to unified Response
        convert_response(openai_response)
    }

    fn provider_name(&self) -> &'static str {
        "openai"
    }
}
```

### Consistency Requirements

All providers must follow these patterns:

1. **Configuration**: Each provider has dedicated config struct implementing `ProviderConfig` trait
2. **Error Mapping**: All provider-specific errors mapped to `LlmError` variants
3. **Logging**: Use `log_debug!`, `log_info!`, `log_warn!`, `log_error!` macros at appropriate levels
   - These macros abstract the underlying logging implementation (currently `tracing`)
   - Allows future logging framework changes without code rewrites
4. **Testing**:
   - Unit tests for message conversions
   - Unit tests for error handling
   - Integration tests for full request/response cycle (can be `#[ignore]` if requires external service)
5. **No Panics**: Return `Result` everywhere; never `unwrap()`, `expect()`, or `unreachable!()`
6. **No println!**: Use internal `log_*!` macros (available via `crate::logging`)

### Provider-Specific Features

**Handling features unique to one provider**:

1. **Configuration**: Expose via provider-specific config struct (preferred)
   ```rust
   pub struct AnthropicConfig {
       pub api_key: String,
       pub cache_ttl: Option<String>,  // Anthropic-specific
   }
   ```

2. **Metadata escape hatch**: Use `RequestConfig.metadata` for provider-specific overrides
   ```rust
   let mut config = RequestConfig::default();
   config.metadata.insert("anthropic:cache_ttl".into(), json!("5m"));
   ```

   **Important**: Metadata is a **workaround** to prevent blocking users who need provider-specific features not yet in the library. If you find yourself using metadata:
   - **File an issue** requesting the feature be added to the library properly
   - **Submit a PR** implementing the feature in a provider-agnostic way
   - Metadata should be temporary - we want to reduce its usage over time by properly supporting features

3. **Response data**: Include optional fields that only some providers populate
   ```rust
   pub struct TokenUsage {
       // Universal
       pub total_tokens: u32,
       // Anthropic-specific (None for other providers)
       pub cache_read_tokens: Option<u32>,
   }
   ```

---

## 6. Public API Design

### Stability Tiers

| Tier | Stability | Examples | Breaking Changes |
|------|-----------|----------|------------------|
| **Public API** | Locked after 1.0 | `LlmProvider` trait, `UnifiedMessage`, `Response`, `LlmError` | Requires major version bump |
| **Public Config** | Stable | Provider config structs, `RequestConfig` | Minor version if additive only |
| **Internal** | Unstable | Provider implementations, conversions, retry logic | Can change freely in any version |

### Minimal Public API Surface

**Philosophy**: Only expose what users need directly. Keep internals private.

**Public exports from `lib.rs`** (~28 types):

```rust
// Client
pub use client::UnifiedLLMClient;

// Core message types
pub use core_types::{
    UnifiedMessage, MessageRole, MessageContent, MessageAttributes, MessageCategory,
};

// Request/Response types
pub use core_types::{
    UnifiedLLMRequest, RequestConfig, Response, TokenUsage,
};

// Tool types
pub use core_types::{
    Tool, ToolCall, ToolChoice, ToolResult, ResponseFormat,
};

// Provider trait
pub use core_types::LlmProvider;

// Error types
pub use error::{LlmError, LlmResult};

// Provider configs (for construction)
pub use config::{
    LLMConfig, OpenAIConfig, AnthropicConfig, OllamaConfig, LMStudioConfig,
    DefaultLLMParams, DualLLMConfig, LLMPath, ProviderConfig,
};

// Provider implementations (for construction only)
pub use providers::{
    OpenAIProvider, AnthropicProvider, OllamaProvider, LMStudioProvider,
};

// Token counting
pub use tokens::{
    TokenCounter, TokenCounterFactory, AnthropicTokenCounter, OpenAITokenCounter,
};

// Retry configuration
pub use retry::RetryPolicy;

// Events (feature-gated)
#[cfg(feature = "events")]
pub use core_types::{BusinessEvent, EventScope, LLMBusinessEvent, event_types};
```

**NOT exported** (internal implementation details):
- `logging` module - internal tracing macros
- `response_parser` module - internal parsing logic
- `retry` internals - `CircuitBreaker`, `CircuitState`, `RetryExecutor`
- Error classification - `ErrorCategory`, `ErrorSeverity`, `UserErrorCategory`
- Internal types - `ToolCallingRound`
- Provider conversion modules
- HTTP client utilities

### API Usage Examples

**Basic usage**:
```rust
use multi_llm::{Message, Request, OpenAIProvider, OpenAIConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = OpenAIConfig {
        api_key: std::env::var("OPENAI_API_KEY")?,
        ..Default::default()
    };

    let provider = OpenAIProvider::new(config)?;

    let request = Request {
        messages: vec![
            Message::user("What is the capital of France?"),
        ],
        config: None,
    };

    let response = provider.execute(request, None).await?;
    println!("Response: {}", response.content);

    Ok(())
}
```

**Provider switching**:
```rust
use multi_llm::{LlmProvider, AnthropicProvider, OpenAIProvider};

// Common interface
async fn ask_llm(
    provider: &dyn LlmProvider,
    question: &str
) -> Result<String, LlmError> {
    let request = Request {
        messages: vec![Message::user(question)],
        config: None,
    };

    let response = provider.execute(request, None).await?;
    Ok(response.content)
}

// Works with any provider
let openai = OpenAIProvider::new(openai_config)?;
let anthropic = AnthropicProvider::new(anthropic_config)?;

let answer1 = ask_llm(&openai, "What is 2+2?").await?;
let answer2 = ask_llm(&anthropic, "What is 2+2?").await?;
```

**Tool calling**:
```rust
use multi_llm::{Tool, ToolChoice, RequestConfig};

let tools = vec![
    Tool {
        name: "get_weather".to_string(),
        description: "Get current weather for a location".to_string(),
        parameters: json!({
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }),
    },
];

let config = RequestConfig {
    tools,
    tool_choice: Some(ToolChoice::Auto),
    ..Default::default()
};

let response = provider.execute(request, Some(config)).await?;

if !response.tool_calls.is_empty() {
    for tool_call in response.tool_calls {
        println!("Tool called: {}", tool_call.name);
        println!("Arguments: {}", tool_call.arguments);
    }
}
```

---

## 7. Error Handling Strategy

### Design Principles

1. **No Panics**: Library code never panics; all errors returned as `Result`
2. **Rich Context**: Errors include provider info, HTTP status, error messages, retry hints
3. **Provider Transparency**: Expose provider-specific errors clearly
4. **Actionable**: Users can distinguish retryable vs non-retryable errors

### Error Hierarchy

```rust
#[derive(Debug, thiserror::Error)]
pub enum LlmError {
    #[error("Configuration error: {0}")]
    Configuration(String),

    #[error("Network error: {0}")]
    Network(String),

    #[error("Provider {provider} error (status {status_code:?}): {message}")]
    Provider {
        provider: String,
        status_code: Option<u16>,
        message: String,
    },

    #[error("Validation error: {0}")]
    Validation(String),

    #[error("Response parse error: {0}")]
    ResponseParse(String),

    #[error("Rate limit exceeded")]
    RateLimit { retry_after: Option<Duration> },

    #[error("Authentication failed: {0}")]
    Authentication(String),

    #[error("Timeout: {0}")]
    Timeout(String),

    #[error("Circuit breaker open for provider: {0}")]
    CircuitBreakerOpen(String),
}

impl LlmError {
    pub fn is_retryable(&self) -> bool {
        matches!(self,
            LlmError::Network(_) |
            LlmError::RateLimit { .. } |
            LlmError::Timeout(_) |
            LlmError::Provider { status_code: Some(500..=599), .. }
        )
    }

    pub fn retry_after(&self) -> Option<Duration> {
        match self {
            LlmError::RateLimit { retry_after } => *retry_after,
            _ => None,
        }
    }
}
```

### Error Mapping from Providers

Each provider maps its errors to `LlmError`:

```rust
// OpenAI 429 rate limit
LlmError::RateLimit {
    retry_after: parse_retry_after_header(response),
}

// Anthropic 401 auth error
LlmError::Authentication(
    "Invalid API key".to_string()
)

// Generic 500 server error
LlmError::Provider {
    provider: "openai",
    status_code: Some(500),
    message: "Internal server error",
    retry_after: None,
}
```

**See**: [ADR-004: Error Handling Strategy](./adr/004-error-handling-strategy.md)

---

## 8. Events System

### Purpose

The events system provides **optional** observability into LLM operations for applications that need structured event logging (caching hits, token usage, provider selection, etc.).

### Design Decision

**Status**: Feature-gated (enabled via `features = ["events"]`)

**Rationale**:
- Many applications need observability beyond basic logging
- Business events capture structured data (tokens, cache hits, costs)
- Must remain **optional** - not all library users want/need events
- Enables downstream analytics, cost tracking, performance monitoring

### Architecture

**When feature is enabled**:
```rust
#[cfg(feature = "events")]
pub struct BusinessEvent {
    pub event_id: String,
    pub timestamp: DateTime<Utc>,
    pub event_type: EventType,
    pub scope: EventScope,
    pub metadata: HashMap<String, Value>,
}

#[cfg(feature = "events")]
pub enum EventType {
    CacheHit { tokens_saved: u32 },
    CacheWrite { tokens_written: u32 },
    TokenUsage { prompt: u32, completion: u32 },
    ProviderCall { provider: String, duration_ms: u64 },
    // ... extensible
}

#[cfg(feature = "events")]
pub enum EventScope {
    Request(String),     // Per-request ID
    Session(String),     // Per-session ID (if app provides)
    User(String),        // Per-user ID (if app provides)
}
```

**When feature is disabled**:
- Event types not compiled
- No runtime overhead
- Response struct doesn't include events field

### Usage Pattern

**Provider implementation**:
```rust
#[async_trait]
impl LlmProvider for AnthropicProvider {
    async fn execute(...) -> Result<Response, LlmError> {
        // ... make request ...

        let mut response = Response {
            content: anthropic_response.content,
            // ... other fields ...
            #[cfg(feature = "events")]
            events: Vec::new(),
        };

        #[cfg(feature = "events")]
        {
            if let Some(cache_stats) = anthropic_response.usage.cache_read_input_tokens {
                response.events.push(BusinessEvent::cache_hit(cache_stats));
            }
        }

        Ok(response)
    }
}
```

**Application usage**:
```rust
#[cfg(feature = "events")]
{
    for event in response.events {
        match event.event_type {
            EventType::CacheHit { tokens_saved } => {
                log_cost_savings(tokens_saved);
            }
            EventType::TokenUsage { prompt, completion } => {
                track_usage(prompt, completion);
            }
            _ => {}
        }
    }
}
```

### Why Feature-Gated?

1. **Dependency minimization**: Events require `uuid` and `chrono` - users without events don't pay this cost
2. **Performance**: No event allocation/collection overhead when disabled
3. **Simplicity**: Users who just want basic LLM calls don't see event machinery
4. **Library principle**: Optional features should be opt-in, not forced

**See**: [ADR-005: Events System Design](./adr/005-events-system.md)

---

## 9. Testing Strategy

### Test Organization

**Unit Tests** (`src/*/tests.rs` or `#[cfg(test)]` modules):
- **Purpose**: Test individual functions/methods in isolation
- **Focus**: Message conversions, validation logic, error mapping
- **Speed**: Fast (no network, no external dependencies)
- **Run**: `cargo test --lib`

**Integration Tests** (`tests/` directory):
- **Purpose**: Test public APIs, full request/response cycles
- **Focus**: Provider switching, error propagation, end-to-end flows
- **Speed**: Slower (some require external services)
- **Run**: `cargo test --tests`
- **Note**: Tests requiring external services marked `#[ignore]`

### Test Principles

1. **Independence**: Tests don't share state, can run in any order
2. **Clarity**: Descriptive names following pattern `test_<what>_<condition>_<expected>`
3. **AAA Pattern**: Arrange, Act, Assert
4. **Fast by Default**: Slow tests marked `#[ignore]`, run separately
5. **Realistic**: Test actual usage patterns, not implementation details

### Example Tests

**Unit test (conversion)**:
```rust
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_message_to_openai_conversion_user_text() {
        // Arrange
        let message = Message::user("Hello world");

        // Act
        let openai_msg = convert_to_openai_message(&message).unwrap();

        // Assert
        assert_eq!(openai_msg.role, "user");
        assert_eq!(openai_msg.content, "Hello world");
    }

    #[test]
    fn test_tool_validation_rejects_duplicate_names() {
        // Arrange
        let config = RequestConfig {
            tools: vec![
                Tool { name: "get_weather".into(), /* ... */ },
                Tool { name: "get_weather".into(), /* ... */ },
            ],
            ..Default::default()
        };

        // Act
        let result = config.validate();

        // Assert
        assert!(result.is_err());
        assert!(matches!(result.unwrap_err(), LlmError::Validation(_)));
    }
}
```

**Integration test**:
```rust
#[tokio::test]
#[ignore] // Requires OpenAI API key
async fn test_openai_provider_basic_request() {
    let config = OpenAIConfig {
        api_key: std::env::var("OPENAI_API_KEY").unwrap(),
        ..Default::default()
    };

    let provider = OpenAIProvider::new(config).unwrap();

    let request = Request {
        messages: vec![Message::user("Say 'test'")],
        config: None,
    };

    let response = provider.execute(request, None).await.unwrap();

    assert!(!response.content.is_empty());
    assert_eq!(response.role, MessageRole::Assistant);
}
```

### Testing Guidelines

- **Don't mock provider implementations** - test them against real APIs (ignored tests)
- **Do mock external dependencies** in unit tests (use `mockall` for traits)
- **Test error paths** - verify errors are mapped correctly
- **Test provider-specific features** - caching, tool calling, etc.
- **Test provider switching** - same code works with different providers

---

## 10. Stability & Versioning

### Semantic Versioning

Following [SemVer 2.0](https://semver.org/):

**Major (X.0.0)** - Breaking changes to public API:
- Changing `Message` structure
- Removing/renaming public methods
- Changing trait signatures
- Removing public types

**Minor (0.X.0)** - Additive changes:
- New providers
- New optional fields on existing types (with defaults)
- New public methods
- New traits (not affecting existing code)

**Patch (0.0.X)** - Bug fixes and internal changes:
- Provider conversion fixes
- Performance improvements
- Documentation updates
- Internal refactoring

### Stability Guarantees

**Pre-1.0 (Current)**:
- Public API can change with minor version bumps
- Breaking changes documented in CHANGELOG
- Goal: Stabilize API based on real-world usage feedback

**Post-1.0**:
- **Public API locked**: Breaking changes require 2.0
- **Internal implementations free**: Can evolve without version bump
- **Provider additions**: Minor version bumps
- **New features**: Minor version if additive, major if breaking

### API Evolution Strategy

**Current State (0.1.x)**:
- Public API stabilized with clean naming
- Events system feature-gated
- Core types finalized (`UnifiedMessage`, `Response`, `LlmError`)
- Ready for production use

**After 1.0**:
- Public types frozen (only additive changes)
- New features via new types/traits (not breaking existing)
- Deprecation warnings before removal (one major version notice)

### Deprecation Policy

**Post-1.0**:
1. Mark deprecated in version N.x.0 with `#[deprecated]` attribute
2. Document replacement in deprecation message
3. Keep deprecated items for one major version
4. Remove in version (N+1).0.0

```rust
#[deprecated(since = "1.5.0", note = "Use `execute` instead")]
pub async fn execute_llm(...) -> Result<Response, LlmError> {
    self.execute(...).await
}
```

---

## 11. Future Directions

### Post-1.0 Planned Features

#### Streaming Support
**Status**: Deferred to post-1.0 (complex, needs careful design)

**Rationale for deferral**:
- Streaming adds significant API surface (new trait methods, stream types)
- Error handling in streams is complex (partial responses, cancellation)
- Backpressure and buffering strategies need careful design
- Want to stabilize core request/response API first

**Future design considerations**:
```rust
#[async_trait]
pub trait LlmProvider {
    // Existing
    async fn execute(...) -> Result<Response, LlmError>;

    // Future: streaming variant
    async fn execute_stream(...)
        -> Result<impl Stream<Item = Result<StreamChunk>>, LlmError>;
}

pub enum StreamChunk {
    ContentDelta(String),
    ToolCallStart { id: String, name: String },
    ToolCallDelta(String),
    ToolCallEnd,
    Done(TokenUsage),
}
```

#### Additional Providers
- Google Gemini
- Cohere
- Mistral
- Groq
- Custom provider trait for user-defined providers

#### Enhanced Features
1. **Retry Policies**: Configurable retry with exponential backoff (currently internal)
2. **Circuit Breaker**: Automatic failover on provider degradation
3. **Token Estimation**: Pre-flight token counting across providers
4. **Response Caching**: Local caching of LLM responses (user-configurable)
5. **Batch Requests**: Send multiple requests in parallel with result aggregation

### Under Consideration

- **Telemetry**: Optional OpenTelemetry integration for distributed tracing
- **Provider Health Checks**: Automatic provider availability detection
- **Cost Tracking**: Built-in cost estimation based on token usage
- **Prompt Templates**: Simple templating for common patterns

### Explicitly Ruled Out

- **Synchronous API**: Async-only by design, no blocking wrappers
- **Built-in Application Logic**: Remains pure library (no sessions, auth, etc.)
- **Database Integration**: Out of scope for this library
- **Embeddings/Vector Search**: Different concern, separate library

---

## Appendices

### Appendix A: Architecture Decision Records

Detailed rationale for major architectural decisions:

- [ADR-001: Unified Message Architecture]./adr/001-unified-message-architecture.md
- [ADR-002: Provider Trait Design]./adr/002-provider-trait-design.md
- [ADR-003: Caching Hints Architecture]./adr/003-caching-hints.md
- [ADR-004: Error Handling Strategy]./adr/004-error-handling-strategy.md
- [ADR-005: Events System Design]./adr/005-events-system.md
- [ADR-006: Public API Stability]./adr/006-public-api-stability.md

### Appendix B: Glossary

- **Message**: Provider-agnostic representation of LLM conversation turn
- **Provider**: Implementation of LLM API client (OpenAI, Anthropic, etc.)
- **Tool**: Function/tool that LLM can call (function calling)
- **Caching**: Provider-specific optimization to reuse prompt processing
- **Request**: Unified request type containing messages and configuration
- **Response**: Unified response type containing LLM output
- **Events**: Optional structured logging of LLM operations

### Appendix C: Contributing

When contributing to this project:

1. **Read this design doc** to understand architectural principles
2. **Follow established patterns** when adding providers or features
3. **Update relevant ADRs** if making architectural changes
4. **Add tests** for all new functionality (unit + integration)
5. **Document public APIs** with rustdoc comments
6. **No panics** in library code (return `Result` everywhere)
7. **Use `log_*!` macros** for logging (not `println!` or direct `tracing` macros)

---

**Document Maintenance**: This document should be updated when making architectural changes. Create new ADRs for new major decisions. Keep examples up-to-date with actual API.

**Last Reviewed**: 2025-11-27