brainwires-provider 0.11.0

LLM provider implementations (Anthropic, OpenAI Chat + Responses, Google Gemini, Ollama, Bedrock, Vertex AI, local llama.cpp) for the Brainwires Agent Framework. Speech (TTS/STT) providers live in `brainwires-provider-speech`.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
# brainwires-provider

[![Crates.io](https://img.shields.io/crates/v/brainwires-provider.svg)](https://crates.io/crates/brainwires-provider)
[![Documentation](https://img.shields.io/docsrs/brainwires-provider)](https://docs.rs/brainwires-provider)
[![License](https://img.shields.io/crates/l/brainwires-provider.svg)](LICENSE)

AI provider implementations for the Brainwires Agent Framework.

## Overview

`brainwires-provider` provides concrete implementations of the `Provider` trait for multiple AI services: Anthropic (Claude), OpenAI (GPT), Google (Gemini), Ollama, and local LLM inference via llama.cpp. Every provider converts to and from the unified `brainwires-core` message types, so callers can swap backends without changing application code.

**Design principles:**

- **Unified interface** — all providers implement the same `Provider` trait from `brainwires-core` (chat, streaming, tool calling)
- **Feature-gated backends** — cloud providers compile under `native` (default); local LLM compiles always; llama.cpp is behind `llama-cpp-2`
- **Rate limiting built in** — token-bucket `RateLimiter` and `RateLimitedClient` available to any provider
- **Streaming-first** — every provider returns `BoxStream<Result<StreamChunk>>` via `async_stream`
- **Tool calling** — Anthropic, OpenAI, Google, and Ollama all support function calling mapped to/from `brainwires_core::Tool`
- **Local inference** — CPU-optimized GGUF model support with model registry, preset configs, and inference parameter tuning

```text
  ┌───────────────────────────────────────────────────────────────────────┐
  │                        brainwires-provider                           │
  │                                                                       │
  │  ┌─── Provider trait (brainwires-core) ────────────────────────────┐  │
  │  │  chat()        ──► ChatResponse                                 │  │
  │  │  stream_chat() ──► BoxStream<StreamChunk>                       │  │
  │  │  name()        ──► &str                                         │  │
  │  └─────────────────────────────────────────────────────────────────┘  │
  │           │                                                           │
  │           ▼                                                           │
  │  ┌─── Cloud Providers (feature = "native") ────────────────────────┐  │
  │  │                                                                  │  │
  │  │  AnthropicProvider ──► SSE streaming ──► api.anthropic.com      │  │
  │  │  OpenAIProvider    ──► JSON Lines    ──► api.openai.com         │  │
  │  │  GoogleProvider    ──► event-stream  ──► generativelanguage.…   │  │
  │  │  OllamaProvider    ──► line-delim JSON ► localhost:11434        │  │
  │  │           │                                                      │  │
  │  │           ▼                                                      │  │
  │  │  RateLimitedClient ──► RateLimiter (token-bucket)               │  │
  │  └──────────────────────────────────────────────────────────────────┘  │
  │                                                                       │
  │  ┌─── Local LLM (always compiled, llama.cpp optional) ─────────────┐  │
  │  │                                                                  │  │
  │  │  LocalLlmProvider ──► generate() / route() / process()         │  │
  │  │       │                                                          │  │
  │  │       ▼                                                          │  │
  │  │  LocalLlmConfig ◄── LocalModelRegistry ◄── scan_models_dir()   │  │
  │  │  LocalModelType  ◄── chat_template() / stop_tokens()           │  │
  │  │  LocalInferenceParams ◄── factual() / creative() / routing()   │  │
  │  │  LocalLlmPool    ──► round-robin multi-instance inference       │  │
  │  └──────────────────────────────────────────────────────────────────┘  │
  │                                                                       │
  │  ┌─── Shared Types ───────────────────────────────────────────────┐   │
  │  │  ProviderType (Anthropic | OpenAI | Google | Ollama | Custom)  │   │
  │  │  ProviderConfig (provider, model, api_key, base_url, options)  │   │
  │  └────────────────────────────────────────────────────────────────┘   │
  └───────────────────────────────────────────────────────────────────────┘
```

## Quick Start

Add to your `Cargo.toml`:

```toml
[dependencies]
brainwires-provider = "0.11"
```

Send a chat request with the Anthropic provider:

```rust
use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let provider = AnthropicProvider::new("sk-ant-...".into(), "claude-sonnet-4-20250514".into());

    let messages = vec![Message::user("Explain the borrow checker in one sentence.")];
    let options = ChatOptions::default();

    let response = provider.chat(&messages, None, &options).await?;
    println!("{}", response.message.text().unwrap_or_default());
    println!("Tokens: {} in, {} out", response.usage.input_tokens, response.usage.output_tokens);

    Ok(())
}
```

## Features

| Feature | Default | Description |
|---------|---------|-------------|
| `native` | Yes | Enables cloud providers (Anthropic, OpenAI, Google, Ollama), `RateLimiter`, `RateLimitedClient`, and their dependencies (`reqwest`, `tokio`, `async-stream`, `tracing`, `uuid`) |
| `llama-cpp-2` | No | Enables local LLM inference via llama.cpp bindings. Heavy dependency (~long compile). Adds `tracing` and `tokio` even without `native` |

```toml
# Default (cloud providers only)
brainwires-provider = "0.11"

# With local LLM support
brainwires-provider = { version = "0.11", features = ["llama-cpp-2"] }

# Local LLM only (no cloud providers)
brainwires-provider = { version = "0.11", default-features = false, features = ["llama-cpp-2"] }
```

## Architecture

### Provider Trait

All providers implement the `Provider` trait from `brainwires-core`. This is the unified interface that callers program against.

| Method | Description |
|--------|-------------|
| `name()` | Provider identifier string (e.g., `"anthropic"`, `"lfm2-350m"`) |
| `max_output_tokens()` | Optional maximum output token limit for the provider |
| `chat(messages, tools, options)` | Non-streaming chat completion returning `ChatResponse` |
| `stream_chat(messages, tools, options)` | Streaming chat returning `BoxStream<Result<StreamChunk>>` |

**`ChatOptions`** controls per-request behavior:

| Field | Type | Description |
|-------|------|-------------|
| `system` | `Option<String>` | System prompt |
| `temperature` | `Option<f32>` | Sampling temperature (0.0–2.0) |
| `max_tokens` | `Option<u32>` | Maximum tokens to generate |
| `stop` | `Option<Vec<String>>` | Stop sequences |

**`StreamChunk`** variants:

| Variant | Description |
|---------|-------------|
| `Text(String)` | Generated text token |
| `Usage(Usage)` | Token usage counts (input + output) |
| `Done` | Stream completion marker |

### ProviderType

Enum identifying the AI provider backend.

| Variant | `as_str()` | `default_model()` |
|---------|-----------|-------------------|
| `Anthropic` | `"anthropic"` | `claude-sonnet-4-20250514` |
| `OpenAI` | `"openai"` | `gpt-5-mini` |
| `Google` | `"google"` | `gemini-2.5-flash` |
| `Ollama` | `"ollama"` | `llama3.3` |
| `Custom` | `"custom"` | `claude-sonnet-4-20250514` |

`FromStr` also accepts aliases: `"gemini"` maps to `Google`, `"brainwires"` maps to `Custom`.

### ProviderConfig

Configuration struct for initializing a provider.

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `provider` | `ProviderType` || Provider backend |
| `model` | `String` || Model name |
| `api_key` | `Option<String>` | `None` | API key (skipped in serialization if absent) |
| `base_url` | `Option<String>` | `None` | Custom endpoint URL |
| `options` | `HashMap<String, Value>` | `{}` | Additional provider-specific options (flattened in JSON) |

Builder methods: `new(provider, model)`, `with_api_key(key)`, `with_base_url(url)`.

### RateLimiter

Token-bucket rate limiter using atomic operations for lock-free reads.

| Field | Type | Description |
|-------|------|-------------|
| `tokens` | `AtomicU32` | Current available tokens |
| `max_tokens` | `u32` | Configured requests-per-minute limit |
| `refill_interval` | `Duration` | Time between token refills (`60s / rpm`) |
| `last_refill` | `Mutex<Instant>` | Timestamp of last refill |

| Method | Description |
|--------|-------------|
| `new(requests_per_minute)` | Create a limiter with the given RPM cap |
| `acquire()` | Async — consume one token, wait if depleted |
| `available_tokens()` | Current token count (diagnostic) |
| `max_requests_per_minute()` | Configured limit |

### RateLimitedClient

Wraps `reqwest::Client` with an optional `RateLimiter`. Every outgoing request waits for a token before sending.

| Method | Description |
|--------|-------------|
| `new()` | Create client with no rate limiting |
| `with_rate_limit(rpm)` | Create client with the given RPM limit |
| `from_client(client, rpm)` | Wrap an existing `reqwest::Client` |
| `get(url)` | Build a GET request (waits for token first) |
| `post(url)` | Build a POST request (waits for token first) |
| `inner()` | Access the underlying `reqwest::Client` |
| `available_tokens()` | Returns `Option<u32>``None` if no limiter |

### AnthropicProvider

Implements the `Provider` trait for the Anthropic Messages API (`https://api.anthropic.com/v1/messages`, version `2023-06-01`).

| Constructor | Description |
|-------------|-------------|
| `new(api_key, model)` | Create without rate limiting |
| `with_rate_limit(api_key, model, rpm)` | Create with rate limiting |

**Streaming:** Parses Server-Sent Events (SSE) with `data: ` prefix. Events include `message_start`, `content_block_delta`, `message_delta`, and `message_stop`.

**Internal types:** `AnthropicMessage`, `AnthropicContentBlock` (Text, ToolUse, ToolResult), `AnthropicTool`, `AnthropicResponse`, `AnthropicStreamEvent`, `AnthropicDelta`.

**Message conversion:** System messages are extracted from the message list and sent as a top-level `system` field. All other messages are converted to Anthropic's role/content-block format.

### OpenAIProvider

Implements the `Provider` trait for the OpenAI Chat Completions API (`https://api.openai.com/v1/chat/completions`).

| Constructor | Description |
|-------------|-------------|
| `new(api_key, model)` | Create without rate limiting |
| `with_rate_limit(api_key, model, rpm)` | Create with rate limiting |
| `with_organization(org_id)` | Set the `OpenAI-Organization` header |

**Streaming:** Parses newline-delimited JSON (JSON Lines). Each line is a `data: {json}` SSE chunk with `choices[0].delta`.

**O1/O3 model detection:** `is_o1_model()` detects reasoning models (o1, o3 prefixes) which do not support `temperature`, `max_tokens`, or system messages.

**Image support:** Converts `ContentBlock::Image` to base64-encoded `image_url` content parts.

**Internal types:** `OpenAIMessage`, `OpenAIContent` (Text or Array), `OpenAIContentPart` (Text, ImageUrl, ToolCall), `OpenAITool`, `OpenAIResponse`.

### GoogleProvider

Implements the `Provider` trait for the Gemini API (`https://generativelanguage.googleapis.com/v1beta`).

| Constructor | Description |
|-------------|-------------|
| `new(api_key, model)` | Create without rate limiting |
| `with_rate_limit(api_key, model, rpm)` | Create with rate limiting |

**Streaming:** Uses `text/event-stream` with custom Gemini event format.

**Message conversion:** System messages are filtered out and sent via `systemInstruction`. The assistant role maps to `"model"` in Gemini's API.

**Image support:** Converts images to `inlineData` parts with MIME type and base64 data.

**Internal types:** `GeminiMessage`, `GeminiPart` (Text, InlineData, FunctionCall, FunctionResult), `GeminiTool`, `GeminiResponse`.

### OllamaProvider

Implements the `Provider` trait for the Ollama REST API (default: `http://localhost:11434`).

| Constructor | Description |
|-------------|-------------|
| `new(model, base_url)` | Create with model name and optional custom URL |
| `with_rate_limit(model, base_url, rpm)` | Create with rate limiting |

**Streaming:** Line-delimited JSON where each line contains a `message` field and a `done` boolean.

**Content handling:** Multiple content blocks are flattened into a single concatenated text string, since Ollama's API expects plain text.

**Internal types:** `OllamaMessage`, `OllamaTool`, `OllamaResponse`.

### Local LLM Subsystem

Always compiled (no feature gate). The actual llama.cpp inference requires the `llama-cpp-2` feature.

#### LocalLlmConfig

Configuration for a local GGUF model.

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `id` | `String` | `"local-model"` | Unique model identifier |
| `name` | `String` | `"Local Model"` | Human-readable name |
| `model_path` | `PathBuf` || Path to the `.gguf` file |
| `context_size` | `u32` | `4096` | Context window size in tokens |
| `num_threads` | `Option<u32>` | `None` (auto) | CPU threads for inference |
| `batch_size` | `u32` | `512` | Prompt processing batch size |
| `gpu_layers` | `u32` | `0` | GPU layers to offload (0 = CPU only) |
| `use_mmap` | `bool` | `true` | Memory-map model file for faster loading |
| `use_mlock` | `bool` | `false` | Lock model in RAM to prevent swapping |
| `max_tokens` | `u32` | `2048` | Maximum tokens per response |
| `model_type` | `LocalModelType` | `Lfm2` | Model family for prompt formatting |
| `system_template` | `Option<String>` | `None` | Custom system prompt template |
| `supports_tools` | `bool` | `false` | Whether the model handles tool/function calling |
| `estimated_ram_mb` | `Option<u32>` | `None` | Estimated RAM usage (display only) |

**Preset constructors:**

| Preset | Context | RAM | Tools | Description |
|--------|---------|-----|-------|-------------|
| `lfm2_350m(path)` | 32K | 220 MB | No | Fastest, routing and binary decisions |
| `lfm2_1_2b(path)` | 32K | 700 MB | No | Sweet spot for agentic logic |
| `lfm2_2_6b_exp(path)` | 32K | 1.5 GB | Yes | Complex reasoning and tool-calling |
| `granite_nano_350m(path)` | 8K | 250 MB | No | Sub-second CPU responses |
| `granite_nano_1_5b(path)` | 8K | 900 MB | No | Balanced performance |

**Validation:** `validate()` checks model path exists, context size > 0, batch size > 0.

#### LocalModelType

Model family enum that determines chat template formatting and stop tokens.

| Variant | Chat Template Style | Stop Tokens |
|---------|-------------------|-------------|
| `Lfm2` | `<\|system\|>...<\|end\|>` | `<\|end\|>`, `<\|user\|>` |
| `Lfm2Agentic` | Same as Lfm2 | Same as Lfm2 |
| `Granite` | `<\|system\|>...\n` | `<\|user\|>`, `<\|system\|>` |
| `Qwen` | `<\|im_start\|>...<\|im_end\|>` | `<\|im_end\|>`, `<\|im_start\|>` |
| `Llama` | `<\|begin_of_text\|>...<\|eot_id\|>` | `<\|eot_id\|>`, `<\|start_header_id\|>` |
| `Phi` | Same as Lfm2 | Same as Lfm2 |
| `Generic` | `### System:...\n### User:...` | `### User:`, `### System:` |

Methods: `chat_template()`, `stop_tokens()`.

#### LocalInferenceParams

Per-request sampling parameters.

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `temperature` | `f32` | `0.7` | Sampling temperature (0.0 = deterministic) |
| `top_p` | `f32` | `0.9` | Nucleus sampling threshold |
| `top_k` | `u32` | `40` | Top-k sampling parameter |
| `repeat_penalty` | `f32` | `1.1` | Repetition penalty (1.0 = none) |
| `max_tokens` | `u32` | `2048` | Maximum tokens to generate |
| `stop_sequences` | `Vec<String>` | `[]` | Custom stop sequences |

**Presets:**

| Preset | Temperature | Top-k | Max Tokens | Use Case |
|--------|------------|-------|------------|----------|
| `factual()` | 0.1 | 20 | 1024 | Deterministic, factual responses |
| `creative()` | 0.9 | 50 | 2048 | Varied, creative output |
| `routing()` | 0.0 | 1 | 50 | Classification and routing |

#### LocalModelRegistry

Manages registered local models with persistence to `~/.config/brainwires/local_models.json`.

| Method | Description |
|--------|-------------|
| `new()` | Create an empty registry |
| `with_default_dir()` | Create with default models directory (`~/.local/share/brainwires/models/`) |
| `register(config)` | Add a model configuration |
| `get(id)` | Get model by ID |
| `get_default()` | Get the default model |
| `set_default(id)` | Set the default model (returns `false` if ID not found) |
| `remove(id)` | Remove a model (clears default if it was the removed model) |
| `list()` | List all registered models |
| `scan_models_dir()` | Auto-discover `.gguf` files and register them with detected model types |
| `load()` | Load registry from config file |
| `save()` | Save registry to config file |

**Auto-detection:** `scan_models_dir()` reads the models directory, infers `LocalModelType` from filenames (e.g., `lfm2` → `Lfm2`, `granite` → `Granite`), and estimates context size and RAM from model size indicators in the filename.

#### KnownModel

Pre-configured model definitions for easy discovery and downloading.

| Field | Description |
|-------|-------------|
| `id` | Model identifier (e.g., `"lfm2-1.2b"`) |
| `name` | Human-readable name |
| `huggingface_repo` | HuggingFace repository path |
| `filename` | Expected GGUF filename |
| `model_type` | `LocalModelType` variant |
| `context_size` | Context window size |
| `estimated_ram_mb` | RAM requirement |
| `supports_tools` | Tool-calling support |
| `description` | Short description |

Access via `known_models()` (full list) or `get_known_model(id)` (by ID).

#### LocalLlmProvider

Implements the `Provider` trait for local GGUF model inference. Lazy-loads the model on first use.

| Method | Description |
|--------|-------------|
| `new(config)` | Create provider (validates config, does not load model) |
| `lfm2_350m(path)` | Shorthand for LFM2 350M preset |
| `lfm2_1_2b(path)` | Shorthand for LFM2 1.2B preset |
| `config()` | Get the model configuration |
| `is_loaded()` | Check if model is in memory |
| `load()` | Load model into memory (initializes llama.cpp backend) |
| `unload()` | Release model from memory |
| `generate(prompt, params)` | Generate text with custom parameters |
| `route(prompt)` | Quick routing/classification (deterministic params) |
| `process(prompt)` | Summarization/processing (factual params) |

Without the `llama-cpp-2` feature, `load()` and `generate()` return an error directing the user to enable the feature.

#### LocalLlmPool

Round-robin pool of `LocalLlmProvider` instances for parallel inference.

| Method | Description |
|--------|-------------|
| `new(config, instances)` | Create pool with N identical provider instances |
| `next()` | Get the next provider (round-robin via `AtomicUsize`) |
| `load_all()` | Load all models in the pool |
| `unload_all()` | Unload all models |
| `size()` | Number of instances |
| `estimated_ram_mb()` | Total estimated RAM for the pool |

### LocalLlmConfigError

| Variant | Description |
|---------|-------------|
| `MissingModelPath` | Model path is empty |
| `ModelNotFound(PathBuf)` | File does not exist at the given path |
| `InvalidContextSize` | Context size is 0 |
| `InvalidBatchSize` | Batch size is 0 |
| `ModelLoadError(String)` | llama.cpp failed to load the model |
| `InferenceError(String)` | Error during token generation |

## Usage Examples

### Stream a response from OpenAI

```rust
use brainwires_providers::{OpenAIProvider, Provider, ChatOptions};
use brainwires_core::message::{Message, StreamChunk};
use futures::StreamExt;

let provider = OpenAIProvider::new("sk-...".into(), "gpt-5-mini".into());

let messages = vec![Message::user("Write a haiku about Rust.")];
let options = ChatOptions::default();

let mut stream = provider.stream_chat(&messages, None, &options);
while let Some(chunk) = stream.next().await {
    match chunk? {
        StreamChunk::Text(text) => print!("{}", text),
        StreamChunk::Usage(usage) => {
            println!("\n[{} in, {} out]", usage.input_tokens, usage.output_tokens);
        }
        StreamChunk::Done => break,
    }
}
```

### Use tools with the Anthropic provider

```rust
use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;
use brainwires_core::tool::Tool;

let provider = AnthropicProvider::new("sk-ant-...".into(), "claude-sonnet-4-20250514".into());

let tools = vec![Tool {
    name: "get_weather".into(),
    description: Some("Get current weather for a city".into()),
    input_schema: serde_json::json!({
        "type": "object",
        "properties": {
            "city": { "type": "string" }
        },
        "required": ["city"]
    }),
    ..Default::default()
}];

let messages = vec![Message::user("What's the weather in Seattle?")];
let options = ChatOptions::default();

let response = provider.chat(&messages, Some(&tools), &options).await?;
// response.message may contain tool_use content blocks
```

### Rate-limited HTTP requests

```rust
use brainwires_providers::{RateLimitedClient, RateLimiter};

// Standalone rate limiter
let limiter = RateLimiter::new(60); // 60 RPM
limiter.acquire().await; // blocks if depleted

// Rate-limited HTTP client
let client = RateLimitedClient::with_rate_limit(120); // 120 RPM
let response = client.post("https://api.example.com/v1/chat")
    .await
    .json(&body)
    .send()
    .await?;

println!("Tokens remaining: {:?}", client.available_tokens());
```

### Provider with rate limiting

```rust
use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;

// Create provider with 60 requests-per-minute limit
let provider = AnthropicProvider::with_rate_limit(
    "sk-ant-...".into(),
    "claude-sonnet-4-20250514".into(),
    60,
);

let messages = vec![Message::user("Hello!")];
let response = provider.chat(&messages, None, &ChatOptions::default()).await?;
```

### Configure a provider with ProviderConfig

```rust
use brainwires_providers::{ProviderType, ProviderConfig};

let config = ProviderConfig::new(ProviderType::OpenAI, "gpt-5-mini".into())
    .with_api_key("sk-...")
    .with_base_url("https://custom-openai-proxy.example.com/v1");

assert_eq!(config.provider.default_model(), "gpt-5-mini");
assert_eq!(config.provider.as_str(), "openai");

// Parse provider from string
let provider_type: ProviderType = "gemini".parse()?; // → Google
```

### Local LLM inference

```rust
use brainwires_providers::{LocalLlmProvider, LocalLlmConfig, LocalInferenceParams};
use std::path::PathBuf;

// Create provider from a preset
let provider = LocalLlmProvider::lfm2_1_2b(PathBuf::from("/models/lfm2-1.2b-q8_0.gguf"))?;

// Load model into memory
provider.load().await?;

// Quick routing (deterministic, max 50 tokens)
let route = provider.route("Classify: 'fix the login bug' → [code, question, chat]").await?;

// Full inference with custom params
let result = provider.generate(
    "Explain ownership in Rust briefly.",
    &LocalInferenceParams::factual(),
).await?;

// Or use via the Provider trait
use brainwires_providers::{Provider, ChatOptions};
use brainwires_core::message::Message;

let messages = vec![Message::user("Summarize this code.")];
let response = provider.chat(&messages, None, &ChatOptions::default()).await?;

// Unload when done
provider.unload().await;
```

### Model registry and auto-discovery

```rust
use brainwires_providers::{LocalModelRegistry, LocalLlmConfig, known_models, get_known_model};
use std::path::PathBuf;

// Load or create registry
let mut registry = LocalModelRegistry::load()?;

// Register a model manually
registry.register(LocalLlmConfig::lfm2_350m(PathBuf::from("/models/lfm2-350m.gguf")));
registry.set_default("lfm2-350m");

// Auto-discover GGUF files in the models directory
let discovered = registry.scan_models_dir()?;
for id in &discovered {
    println!("Found: {}", id);
}

// Browse known/recommended models
for model in known_models() {
    println!("{}: {} ({}MB RAM) — {}", model.id, model.name, model.estimated_ram_mb, model.description);
}

// Save registry
registry.save()?;
```

### Local LLM pool for parallel inference

```rust
use brainwires_providers::{LocalLlmPool, LocalLlmConfig};
use std::path::PathBuf;

let config = LocalLlmConfig::lfm2_350m(PathBuf::from("/models/lfm2-350m.gguf"));
let pool = LocalLlmPool::new(config, 4)?; // 4 instances

pool.load_all().await?;
println!("Pool RAM: ~{}MB", pool.estimated_ram_mb().unwrap_or(0));

// Round-robin across instances
let provider = pool.next();
let result = provider.route("classify this input").await?;

pool.unload_all().await;
```

## Integration

Use via the `brainwires` facade crate with the `providers` feature, or depend on `brainwires-provider` directly:

```toml
# Via facade
[dependencies]
brainwires = { version = "0.11", features = ["providers"] }

# Direct
[dependencies]
brainwires-provider = "0.11"
```

Re-exports at crate root for convenience:

```rust
use brainwires_providers::{
    // Trait + options (from brainwires-core)
    Provider, ChatOptions,
    // Cloud providers (native)
    AnthropicProvider, OpenAIProvider, GoogleProvider, OllamaProvider,
    // Rate limiting (native)
    RateLimiter, RateLimitedClient,
    // Shared types
    ProviderType, ProviderConfig,
    // Local LLM (always available)
    LocalLlmProvider, LocalLlmConfig, LocalModelType,
    LocalInferenceParams, LocalModelRegistry, LocalLlmPool,
    LocalLlmConfigError, KnownModel, known_models, get_known_model,
};
```

## License

Licensed under the MIT License. See [LICENSE](../../LICENSE) for details.