# brainwires-provider
[](https://crates.io/crates/brainwires-provider)
[](https://docs.rs/brainwires-provider)
[](LICENSE)
AI provider implementations for the Brainwires Agent Framework.
## Overview
`brainwires-provider` provides concrete implementations of the `Provider` trait for multiple AI services: Anthropic (Claude), OpenAI (GPT), Google (Gemini), Ollama, and local LLM inference via llama.cpp. Every provider converts to and from the unified `brainwires-core` message types, so callers can swap backends without changing application code.
**Design principles:**
- **Unified interface** — all providers implement the same `Provider` trait from `brainwires-core` (chat, streaming, tool calling)
- **Feature-gated backends** — cloud providers compile under `native` (default); local LLM compiles always; llama.cpp is behind `llama-cpp-2`
- **Rate limiting built in** — token-bucket `RateLimiter` and `RateLimitedClient` available to any provider
- **Streaming-first** — every provider returns `BoxStream<Result<StreamChunk>>` via `async_stream`
- **Tool calling** — Anthropic, OpenAI, Google, and Ollama all support function calling mapped to/from `brainwires_core::Tool`
- **Local inference** — CPU-optimized GGUF model support with model registry, preset configs, and inference parameter tuning
```text
┌───────────────────────────────────────────────────────────────────────┐
│ brainwires-provider │
│ │
│ ┌─── Provider trait (brainwires-core) ────────────────────────────┐ │
│ │ chat() ──► ChatResponse │ │
│ │ stream_chat() ──► BoxStream<StreamChunk> │ │
│ │ name() ──► &str │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─── Cloud Providers (feature = "native") ────────────────────────┐ │
│ │ │ │
│ │ AnthropicProvider ──► SSE streaming ──► api.anthropic.com │ │
│ │ OpenAIProvider ──► JSON Lines ──► api.openai.com │ │
│ │ GoogleProvider ──► event-stream ──► generativelanguage.… │ │
│ │ OllamaProvider ──► line-delim JSON ► localhost:11434 │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ RateLimitedClient ──► RateLimiter (token-bucket) │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─── Local LLM (always compiled, llama.cpp optional) ─────────────┐ │
│ │ │ │
│ │ LocalLlmProvider ──► generate() / route() / process() │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ LocalLlmConfig ◄── LocalModelRegistry ◄── scan_models_dir() │ │
│ │ LocalModelType ◄── chat_template() / stop_tokens() │ │
│ │ LocalInferenceParams ◄── factual() / creative() / routing() │ │
│ │ LocalLlmPool ──► round-robin multi-instance inference │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─── Shared Types ───────────────────────────────────────────────┐ │
│ └────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────┘
```
## Quick Start
Add to your `Cargo.toml`:
```toml
[dependencies]
brainwires-provider = "0.11"
```
Send a chat request with the Anthropic provider:
```rust
use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let provider = AnthropicProvider::new("sk-ant-...".into(), "claude-sonnet-4-20250514".into());
let messages = vec![Message::user("Explain the borrow checker in one sentence.")];
let options = ChatOptions::default();
let response = provider.chat(&messages, None, &options).await?;
println!("{}", response.message.text().unwrap_or_default());
println!("Tokens: {} in, {} out", response.usage.input_tokens, response.usage.output_tokens);
Ok(())
}
```
## Features
| `native` | Yes | Enables cloud providers (Anthropic, OpenAI, Google, Ollama), `RateLimiter`, `RateLimitedClient`, and their dependencies (`reqwest`, `tokio`, `async-stream`, `tracing`, `uuid`) |
| `llama-cpp-2` | No | Enables local LLM inference via llama.cpp bindings. Heavy dependency (~long compile). Adds `tracing` and `tokio` even without `native` |
```toml
# Default (cloud providers only)
brainwires-provider = "0.11"
# With local LLM support
brainwires-provider = { version = "0.11", features = ["llama-cpp-2"] }
# Local LLM only (no cloud providers)
brainwires-provider = { version = "0.11", default-features = false, features = ["llama-cpp-2"] }
```
## Architecture
### Provider Trait
All providers implement the `Provider` trait from `brainwires-core`. This is the unified interface that callers program against.
| `name()` | Provider identifier string (e.g., `"anthropic"`, `"lfm2-350m"`) |
| `max_output_tokens()` | Optional maximum output token limit for the provider |
| `chat(messages, tools, options)` | Non-streaming chat completion returning `ChatResponse` |
| `stream_chat(messages, tools, options)` | Streaming chat returning `BoxStream<Result<StreamChunk>>` |
**`ChatOptions`** controls per-request behavior:
| `system` | `Option<String>` | System prompt |
| `temperature` | `Option<f32>` | Sampling temperature (0.0–2.0) |
| `max_tokens` | `Option<u32>` | Maximum tokens to generate |
| `stop` | `Option<Vec<String>>` | Stop sequences |
**`StreamChunk`** variants:
| `Text(String)` | Generated text token |
| `Usage(Usage)` | Token usage counts (input + output) |
| `Done` | Stream completion marker |
### ProviderType
Enum identifying the AI provider backend.
| `Anthropic` | `"anthropic"` | `claude-sonnet-4-20250514` |
| `OpenAI` | `"openai"` | `gpt-5-mini` |
| `Google` | `"google"` | `gemini-2.5-flash` |
| `Ollama` | `"ollama"` | `llama3.3` |
| `Custom` | `"custom"` | `claude-sonnet-4-20250514` |
`FromStr` also accepts aliases: `"gemini"` maps to `Google`, `"brainwires"` maps to `Custom`.
### ProviderConfig
Configuration struct for initializing a provider.
| `provider` | `ProviderType` | — | Provider backend |
| `model` | `String` | — | Model name |
| `api_key` | `Option<String>` | `None` | API key (skipped in serialization if absent) |
| `base_url` | `Option<String>` | `None` | Custom endpoint URL |
| `options` | `HashMap<String, Value>` | `{}` | Additional provider-specific options (flattened in JSON) |
Builder methods: `new(provider, model)`, `with_api_key(key)`, `with_base_url(url)`.
### RateLimiter
Token-bucket rate limiter using atomic operations for lock-free reads.
| `tokens` | `AtomicU32` | Current available tokens |
| `max_tokens` | `u32` | Configured requests-per-minute limit |
| `refill_interval` | `Duration` | Time between token refills (`60s / rpm`) |
| `last_refill` | `Mutex<Instant>` | Timestamp of last refill |
| `new(requests_per_minute)` | Create a limiter with the given RPM cap |
| `acquire()` | Async — consume one token, wait if depleted |
| `available_tokens()` | Current token count (diagnostic) |
| `max_requests_per_minute()` | Configured limit |
### RateLimitedClient
Wraps `reqwest::Client` with an optional `RateLimiter`. Every outgoing request waits for a token before sending.
| `new()` | Create client with no rate limiting |
| `with_rate_limit(rpm)` | Create client with the given RPM limit |
| `from_client(client, rpm)` | Wrap an existing `reqwest::Client` |
| `get(url)` | Build a GET request (waits for token first) |
| `post(url)` | Build a POST request (waits for token first) |
| `inner()` | Access the underlying `reqwest::Client` |
| `available_tokens()` | Returns `Option<u32>` — `None` if no limiter |
### AnthropicProvider
Implements the `Provider` trait for the Anthropic Messages API (`https://api.anthropic.com/v1/messages`, version `2023-06-01`).
| `new(api_key, model)` | Create without rate limiting |
| `with_rate_limit(api_key, model, rpm)` | Create with rate limiting |
**Streaming:** Parses Server-Sent Events (SSE) with `data: ` prefix. Events include `message_start`, `content_block_delta`, `message_delta`, and `message_stop`.
**Internal types:** `AnthropicMessage`, `AnthropicContentBlock` (Text, ToolUse, ToolResult), `AnthropicTool`, `AnthropicResponse`, `AnthropicStreamEvent`, `AnthropicDelta`.
**Message conversion:** System messages are extracted from the message list and sent as a top-level `system` field. All other messages are converted to Anthropic's role/content-block format.
### OpenAIProvider
Implements the `Provider` trait for the OpenAI Chat Completions API (`https://api.openai.com/v1/chat/completions`).
| `new(api_key, model)` | Create without rate limiting |
| `with_rate_limit(api_key, model, rpm)` | Create with rate limiting |
| `with_organization(org_id)` | Set the `OpenAI-Organization` header |
**Streaming:** Parses newline-delimited JSON (JSON Lines). Each line is a `data: {json}` SSE chunk with `choices[0].delta`.
**O1/O3 model detection:** `is_o1_model()` detects reasoning models (o1, o3 prefixes) which do not support `temperature`, `max_tokens`, or system messages.
**Image support:** Converts `ContentBlock::Image` to base64-encoded `image_url` content parts.
**Internal types:** `OpenAIMessage`, `OpenAIContent` (Text or Array), `OpenAIContentPart` (Text, ImageUrl, ToolCall), `OpenAITool`, `OpenAIResponse`.
### GoogleProvider
Implements the `Provider` trait for the Gemini API (`https://generativelanguage.googleapis.com/v1beta`).
| `new(api_key, model)` | Create without rate limiting |
| `with_rate_limit(api_key, model, rpm)` | Create with rate limiting |
**Streaming:** Uses `text/event-stream` with custom Gemini event format.
**Message conversion:** System messages are filtered out and sent via `systemInstruction`. The assistant role maps to `"model"` in Gemini's API.
**Image support:** Converts images to `inlineData` parts with MIME type and base64 data.
**Internal types:** `GeminiMessage`, `GeminiPart` (Text, InlineData, FunctionCall, FunctionResult), `GeminiTool`, `GeminiResponse`.
### OllamaProvider
Implements the `Provider` trait for the Ollama REST API (default: `http://localhost:11434`).
| `new(model, base_url)` | Create with model name and optional custom URL |
| `with_rate_limit(model, base_url, rpm)` | Create with rate limiting |
**Streaming:** Line-delimited JSON where each line contains a `message` field and a `done` boolean.
**Content handling:** Multiple content blocks are flattened into a single concatenated text string, since Ollama's API expects plain text.
**Internal types:** `OllamaMessage`, `OllamaTool`, `OllamaResponse`.
### Local LLM Subsystem
Always compiled (no feature gate). The actual llama.cpp inference requires the `llama-cpp-2` feature.
#### LocalLlmConfig
Configuration for a local GGUF model.
| `id` | `String` | `"local-model"` | Unique model identifier |
| `name` | `String` | `"Local Model"` | Human-readable name |
| `model_path` | `PathBuf` | — | Path to the `.gguf` file |
| `context_size` | `u32` | `4096` | Context window size in tokens |
| `num_threads` | `Option<u32>` | `None` (auto) | CPU threads for inference |
| `batch_size` | `u32` | `512` | Prompt processing batch size |
| `gpu_layers` | `u32` | `0` | GPU layers to offload (0 = CPU only) |
| `use_mmap` | `bool` | `true` | Memory-map model file for faster loading |
| `use_mlock` | `bool` | `false` | Lock model in RAM to prevent swapping |
| `max_tokens` | `u32` | `2048` | Maximum tokens per response |
| `model_type` | `LocalModelType` | `Lfm2` | Model family for prompt formatting |
| `system_template` | `Option<String>` | `None` | Custom system prompt template |
| `supports_tools` | `bool` | `false` | Whether the model handles tool/function calling |
| `estimated_ram_mb` | `Option<u32>` | `None` | Estimated RAM usage (display only) |
**Preset constructors:**
| `lfm2_350m(path)` | 32K | 220 MB | No | Fastest, routing and binary decisions |
| `lfm2_1_2b(path)` | 32K | 700 MB | No | Sweet spot for agentic logic |
| `lfm2_2_6b_exp(path)` | 32K | 1.5 GB | Yes | Complex reasoning and tool-calling |
| `granite_nano_350m(path)` | 8K | 250 MB | No | Sub-second CPU responses |
| `granite_nano_1_5b(path)` | 8K | 900 MB | No | Balanced performance |
**Validation:** `validate()` checks model path exists, context size > 0, batch size > 0.
#### LocalModelType
Model family enum that determines chat template formatting and stop tokens.
| `Lfm2` | `<\|system\|>...<\|end\|>` | `<\|end\|>`, `<\|user\|>` |
| `Lfm2Agentic` | Same as Lfm2 | Same as Lfm2 |
| `Granite` | `<\|system\|>...\n` | `<\|user\|>`, `<\|system\|>` |
| `Qwen` | `<\|im_start\|>...<\|im_end\|>` | `<\|im_end\|>`, `<\|im_start\|>` |
| `Llama` | `<\|begin_of_text\|>...<\|eot_id\|>` | `<\|eot_id\|>`, `<\|start_header_id\|>` |
| `Phi` | Same as Lfm2 | Same as Lfm2 |
| `Generic` | `### System:...\n### User:...` | `### User:`, `### System:` |
Methods: `chat_template()`, `stop_tokens()`.
#### LocalInferenceParams
Per-request sampling parameters.
| `temperature` | `f32` | `0.7` | Sampling temperature (0.0 = deterministic) |
| `top_p` | `f32` | `0.9` | Nucleus sampling threshold |
| `top_k` | `u32` | `40` | Top-k sampling parameter |
| `repeat_penalty` | `f32` | `1.1` | Repetition penalty (1.0 = none) |
| `max_tokens` | `u32` | `2048` | Maximum tokens to generate |
| `stop_sequences` | `Vec<String>` | `[]` | Custom stop sequences |
**Presets:**
| `factual()` | 0.1 | 20 | 1024 | Deterministic, factual responses |
| `creative()` | 0.9 | 50 | 2048 | Varied, creative output |
| `routing()` | 0.0 | 1 | 50 | Classification and routing |
#### LocalModelRegistry
Manages registered local models with persistence to `~/.config/brainwires/local_models.json`.
| `new()` | Create an empty registry |
| `with_default_dir()` | Create with default models directory (`~/.local/share/brainwires/models/`) |
| `register(config)` | Add a model configuration |
| `get(id)` | Get model by ID |
| `get_default()` | Get the default model |
| `set_default(id)` | Set the default model (returns `false` if ID not found) |
| `remove(id)` | Remove a model (clears default if it was the removed model) |
| `list()` | List all registered models |
| `scan_models_dir()` | Auto-discover `.gguf` files and register them with detected model types |
| `load()` | Load registry from config file |
| `save()` | Save registry to config file |
**Auto-detection:** `scan_models_dir()` reads the models directory, infers `LocalModelType` from filenames (e.g., `lfm2` → `Lfm2`, `granite` → `Granite`), and estimates context size and RAM from model size indicators in the filename.
#### KnownModel
Pre-configured model definitions for easy discovery and downloading.
| `id` | Model identifier (e.g., `"lfm2-1.2b"`) |
| `name` | Human-readable name |
| `huggingface_repo` | HuggingFace repository path |
| `filename` | Expected GGUF filename |
| `model_type` | `LocalModelType` variant |
| `context_size` | Context window size |
| `estimated_ram_mb` | RAM requirement |
| `supports_tools` | Tool-calling support |
| `description` | Short description |
Access via `known_models()` (full list) or `get_known_model(id)` (by ID).
#### LocalLlmProvider
Implements the `Provider` trait for local GGUF model inference. Lazy-loads the model on first use.
| `new(config)` | Create provider (validates config, does not load model) |
| `lfm2_350m(path)` | Shorthand for LFM2 350M preset |
| `lfm2_1_2b(path)` | Shorthand for LFM2 1.2B preset |
| `config()` | Get the model configuration |
| `is_loaded()` | Check if model is in memory |
| `load()` | Load model into memory (initializes llama.cpp backend) |
| `unload()` | Release model from memory |
| `generate(prompt, params)` | Generate text with custom parameters |
| `route(prompt)` | Quick routing/classification (deterministic params) |
| `process(prompt)` | Summarization/processing (factual params) |
Without the `llama-cpp-2` feature, `load()` and `generate()` return an error directing the user to enable the feature.
#### LocalLlmPool
Round-robin pool of `LocalLlmProvider` instances for parallel inference.
| `new(config, instances)` | Create pool with N identical provider instances |
| `next()` | Get the next provider (round-robin via `AtomicUsize`) |
| `load_all()` | Load all models in the pool |
| `unload_all()` | Unload all models |
| `size()` | Number of instances |
| `estimated_ram_mb()` | Total estimated RAM for the pool |
### LocalLlmConfigError
| `MissingModelPath` | Model path is empty |
| `ModelNotFound(PathBuf)` | File does not exist at the given path |
| `InvalidContextSize` | Context size is 0 |
| `InvalidBatchSize` | Batch size is 0 |
| `ModelLoadError(String)` | llama.cpp failed to load the model |
| `InferenceError(String)` | Error during token generation |
## Usage Examples
### Stream a response from OpenAI
```rust
use brainwires_providers::{OpenAIProvider, Provider, ChatOptions};
use brainwires_core::message::{Message, StreamChunk};
use futures::StreamExt;
let provider = OpenAIProvider::new("sk-...".into(), "gpt-5-mini".into());
let messages = vec![Message::user("Write a haiku about Rust.")];
let options = ChatOptions::default();
let mut stream = provider.stream_chat(&messages, None, &options);
while let Some(chunk) = stream.next().await {
match chunk? {
StreamChunk::Text(text) => print!("{}", text),
StreamChunk::Usage(usage) => {
println!("\n[{} in, {} out]", usage.input_tokens, usage.output_tokens);
}
StreamChunk::Done => break,
}
}
```
### Use tools with the Anthropic provider
```rust
use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;
use brainwires_core::tool::Tool;
let provider = AnthropicProvider::new("sk-ant-...".into(), "claude-sonnet-4-20250514".into());
let tools = vec![Tool {
name: "get_weather".into(),
description: Some("Get current weather for a city".into()),
input_schema: serde_json::json!({
"type": "object",
"properties": {
"city": { "type": "string" }
},
"required": ["city"]
}),
..Default::default()
}];
let messages = vec![Message::user("What's the weather in Seattle?")];
let options = ChatOptions::default();
let response = provider.chat(&messages, Some(&tools), &options).await?;
// response.message may contain tool_use content blocks
```
### Rate-limited HTTP requests
```rust
use brainwires_providers::{RateLimitedClient, RateLimiter};
// Standalone rate limiter
let limiter = RateLimiter::new(60); // 60 RPM
limiter.acquire().await; // blocks if depleted
// Rate-limited HTTP client
let client = RateLimitedClient::with_rate_limit(120); // 120 RPM
let response = client.post("https://api.example.com/v1/chat")
.await
.json(&body)
.send()
.await?;
println!("Tokens remaining: {:?}", client.available_tokens());
```
### Provider with rate limiting
```rust
use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;
// Create provider with 60 requests-per-minute limit
let provider = AnthropicProvider::with_rate_limit(
"sk-ant-...".into(),
"claude-sonnet-4-20250514".into(),
60,
);
let messages = vec![Message::user("Hello!")];
let response = provider.chat(&messages, None, &ChatOptions::default()).await?;
```
### Configure a provider with ProviderConfig
```rust
use brainwires_providers::{ProviderType, ProviderConfig};
let config = ProviderConfig::new(ProviderType::OpenAI, "gpt-5-mini".into())
.with_api_key("sk-...")
.with_base_url("https://custom-openai-proxy.example.com/v1");
assert_eq!(config.provider.default_model(), "gpt-5-mini");
assert_eq!(config.provider.as_str(), "openai");
// Parse provider from string
let provider_type: ProviderType = "gemini".parse()?; // → Google
```
### Local LLM inference
```rust
use brainwires_providers::{LocalLlmProvider, LocalLlmConfig, LocalInferenceParams};
use std::path::PathBuf;
// Create provider from a preset
let provider = LocalLlmProvider::lfm2_1_2b(PathBuf::from("/models/lfm2-1.2b-q8_0.gguf"))?;
// Load model into memory
provider.load().await?;
// Quick routing (deterministic, max 50 tokens)
let route = provider.route("Classify: 'fix the login bug' → [code, question, chat]").await?;
// Full inference with custom params
let result = provider.generate(
"Explain ownership in Rust briefly.",
&LocalInferenceParams::factual(),
).await?;
// Or use via the Provider trait
use brainwires_providers::{Provider, ChatOptions};
use brainwires_core::message::Message;
let messages = vec![Message::user("Summarize this code.")];
let response = provider.chat(&messages, None, &ChatOptions::default()).await?;
// Unload when done
provider.unload().await;
```
### Model registry and auto-discovery
```rust
use brainwires_providers::{LocalModelRegistry, LocalLlmConfig, known_models, get_known_model};
use std::path::PathBuf;
// Load or create registry
let mut registry = LocalModelRegistry::load()?;
// Register a model manually
registry.register(LocalLlmConfig::lfm2_350m(PathBuf::from("/models/lfm2-350m.gguf")));
registry.set_default("lfm2-350m");
// Auto-discover GGUF files in the models directory
let discovered = registry.scan_models_dir()?;
for id in &discovered {
println!("Found: {}", id);
}
// Browse known/recommended models
for model in known_models() {
println!("{}: {} ({}MB RAM) — {}", model.id, model.name, model.estimated_ram_mb, model.description);
}
// Save registry
registry.save()?;
```
### Local LLM pool for parallel inference
```rust
use brainwires_providers::{LocalLlmPool, LocalLlmConfig};
use std::path::PathBuf;
let config = LocalLlmConfig::lfm2_350m(PathBuf::from("/models/lfm2-350m.gguf"));
let pool = LocalLlmPool::new(config, 4)?; // 4 instances
pool.load_all().await?;
println!("Pool RAM: ~{}MB", pool.estimated_ram_mb().unwrap_or(0));
// Round-robin across instances
let provider = pool.next();
let result = provider.route("classify this input").await?;
pool.unload_all().await;
```
## Integration
Use via the `brainwires` facade crate with the `providers` feature, or depend on `brainwires-provider` directly:
```toml
# Via facade
[dependencies]
brainwires = { version = "0.11", features = ["providers"] }
# Direct
[dependencies]
brainwires-provider = "0.11"
```
Re-exports at crate root for convenience:
```rust
use brainwires_providers::{
// Trait + options (from brainwires-core)
Provider, ChatOptions,
// Cloud providers (native)
AnthropicProvider, OpenAIProvider, GoogleProvider, OllamaProvider,
// Rate limiting (native)
RateLimiter, RateLimitedClient,
// Shared types
ProviderType, ProviderConfig,
// Local LLM (always available)
LocalLlmProvider, LocalLlmConfig, LocalModelType,
LocalInferenceParams, LocalModelRegistry, LocalLlmPool,
LocalLlmConfigError, KnownModel, known_models, get_known_model,
};
```
## License
Licensed under the MIT License. See [LICENSE](../../LICENSE) for details.