brainwires-providers

AI provider implementations for the Brainwires Agent Framework.

Overview

brainwires-providers provides concrete implementations of the Provider trait for multiple AI services: Anthropic (Claude), OpenAI (GPT), Google (Gemini), Ollama, and local LLM inference via llama.cpp. Every provider converts to and from the unified brainwires-core message types, so callers can swap backends without changing application code.

Design principles:

Unified interface — all providers implement the same Provider trait from brainwires-core (chat, streaming, tool calling)
Feature-gated backends — cloud providers compile under native (default); local LLM compiles always; llama.cpp is behind llama-cpp-2
Rate limiting built in — token-bucket RateLimiter and RateLimitedClient available to any provider
Streaming-first — every provider returns BoxStream<Result<StreamChunk>> via async_stream
Tool calling — Anthropic, OpenAI, Google, and Ollama all support function calling mapped to/from brainwires_core::Tool
Local inference — CPU-optimized GGUF model support with model registry, preset configs, and inference parameter tuning

  ┌───────────────────────────────────────────────────────────────────────┐
  │                        brainwires-providers                           │
  │                                                                       │
  │  ┌─── Provider trait (brainwires-core) ────────────────────────────┐  │
  │  │  chat()        ──► ChatResponse                                 │  │
  │  │  stream_chat() ──► BoxStream<StreamChunk>                       │  │
  │  │  name()        ──► &str                                         │  │
  │  └─────────────────────────────────────────────────────────────────┘  │
  │           │                                                           │
  │           ▼                                                           │
  │  ┌─── Cloud Providers (feature = "native") ────────────────────────┐  │
  │  │                                                                  │  │
  │  │  AnthropicProvider ──► SSE streaming ──► api.anthropic.com      │  │
  │  │  OpenAIProvider    ──► JSON Lines    ──► api.openai.com         │  │
  │  │  GoogleProvider    ──► event-stream  ──► generativelanguage.…   │  │
  │  │  OllamaProvider    ──► line-delim JSON ► localhost:11434        │  │
  │  │           │                                                      │  │
  │  │           ▼                                                      │  │
  │  │  RateLimitedClient ──► RateLimiter (token-bucket)               │  │
  │  └──────────────────────────────────────────────────────────────────┘  │
  │                                                                       │
  │  ┌─── Local LLM (always compiled, llama.cpp optional) ─────────────┐  │
  │  │                                                                  │  │
  │  │  LocalLlmProvider ──► generate() / route() / process()         │  │
  │  │       │                                                          │  │
  │  │       ▼                                                          │  │
  │  │  LocalLlmConfig ◄── LocalModelRegistry ◄── scan_models_dir()   │  │
  │  │  LocalModelType  ◄── chat_template() / stop_tokens()           │  │
  │  │  LocalInferenceParams ◄── factual() / creative() / routing()   │  │
  │  │  LocalLlmPool    ──► round-robin multi-instance inference       │  │
  │  └──────────────────────────────────────────────────────────────────┘  │
  │                                                                       │
  │  ┌─── Shared Types ───────────────────────────────────────────────┐   │
  │  │  ProviderType (Anthropic | OpenAI | Google | Ollama | Custom)  │   │
  │  │  ProviderConfig (provider, model, api_key, base_url, options)  │   │
  │  └────────────────────────────────────────────────────────────────┘   │
  └───────────────────────────────────────────────────────────────────────┘

Quick Start

Add to your Cargo.toml:

[dependencies]
brainwires-providers = "0.6"

Send a chat request with the Anthropic provider:

use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let provider = AnthropicProvider::new("sk-ant-...".into(), "claude-sonnet-4-20250514".into());

    let messages = vec![Message::user("Explain the borrow checker in one sentence.")];
    let options = ChatOptions::default();

    let response = provider.chat(&messages, None, &options).await?;
    println!("{}", response.message.text().unwrap_or_default());
    println!("Tokens: {} in, {} out", response.usage.input_tokens, response.usage.output_tokens);

    Ok(())
}

Features

Feature	Default	Description
`native`	Yes	Enables cloud providers (Anthropic, OpenAI, Google, Ollama), `RateLimiter`, `RateLimitedClient`, and their dependencies (`reqwest`, `tokio`, `async-stream`, `tracing`, `uuid`)
`llama-cpp-2`	No	Enables local LLM inference via llama.cpp bindings. Heavy dependency (~long compile). Adds `tracing` and `tokio` even without `native`

# Default (cloud providers only)
brainwires-providers = "0.6"

# With local LLM support
brainwires-providers = { version = "0.6", features = ["llama-cpp-2"] }

# Local LLM only (no cloud providers)
brainwires-providers = { version = "0.6", default-features = false, features = ["llama-cpp-2"] }

Architecture

Provider Trait

All providers implement the Provider trait from brainwires-core. This is the unified interface that callers program against.

Method	Description
`name()`	Provider identifier string (e.g., `"anthropic"`, `"lfm2-350m"`)
`max_output_tokens()`	Optional maximum output token limit for the provider
`chat(messages, tools, options)`	Non-streaming chat completion returning `ChatResponse`
`stream_chat(messages, tools, options)`	Streaming chat returning `BoxStream<Result<StreamChunk>>`

ChatOptions controls per-request behavior:

Field	Type	Description
`system`	`Option<String>`	System prompt
`temperature`	`Option<f32>`	Sampling temperature (0.0–2.0)
`max_tokens`	`Option<u32>`	Maximum tokens to generate
`stop`	`Option<Vec<String>>`	Stop sequences

StreamChunk variants:

Variant	Description
`Text(String)`	Generated text token
`Usage(Usage)`	Token usage counts (input + output)
`Done`	Stream completion marker

ProviderType

Enum identifying the AI provider backend.

Variant	`as_str()`	`default_model()`
`Anthropic`	`"anthropic"`	`claude-sonnet-4-20250514`
`OpenAI`	`"openai"`	`gpt-5-mini`
`Google`	`"google"`	`gemini-2.5-flash`
`Ollama`	`"ollama"`	`llama3.3`
`Custom`	`"custom"`	`claude-sonnet-4-20250514`

FromStr also accepts aliases: "gemini" maps to Google, "brainwires" maps to Custom.

ProviderConfig

Configuration struct for initializing a provider.

Field	Type	Default	Description
`provider`	`ProviderType`	—	Provider backend
`model`	`String`	—	Model name
`api_key`	`Option<String>`	`None`	API key (skipped in serialization if absent)
`base_url`	`Option<String>`	`None`	Custom endpoint URL
`options`	`HashMap<String, Value>`	`{}`	Additional provider-specific options (flattened in JSON)

Builder methods: new(provider, model), with_api_key(key), with_base_url(url).

RateLimiter

Token-bucket rate limiter using atomic operations for lock-free reads.

Field	Type	Description
`tokens`	`AtomicU32`	Current available tokens
`max_tokens`	`u32`	Configured requests-per-minute limit
`refill_interval`	`Duration`	Time between token refills (`60s / rpm`)
`last_refill`	`Mutex<Instant>`	Timestamp of last refill

Method	Description
`new(requests_per_minute)`	Create a limiter with the given RPM cap
`acquire()`	Async — consume one token, wait if depleted
`available_tokens()`	Current token count (diagnostic)
`max_requests_per_minute()`	Configured limit

RateLimitedClient

Wraps reqwest::Client with an optional RateLimiter. Every outgoing request waits for a token before sending.

Method	Description
`new()`	Create client with no rate limiting
`with_rate_limit(rpm)`	Create client with the given RPM limit
`from_client(client, rpm)`	Wrap an existing `reqwest::Client`
`get(url)`	Build a GET request (waits for token first)
`post(url)`	Build a POST request (waits for token first)
`inner()`	Access the underlying `reqwest::Client`
`available_tokens()`	Returns `Option<u32>` — `None` if no limiter

AnthropicProvider

Implements the Provider trait for the Anthropic Messages API (https://api.anthropic.com/v1/messages, version 2023-06-01).

Constructor	Description
`new(api_key, model)`	Create without rate limiting
`with_rate_limit(api_key, model, rpm)`	Create with rate limiting

Streaming: Parses Server-Sent Events (SSE) with data: prefix. Events include message_start, content_block_delta, message_delta, and message_stop.

Internal types: AnthropicMessage, AnthropicContentBlock (Text, ToolUse, ToolResult), AnthropicTool, AnthropicResponse, AnthropicStreamEvent, AnthropicDelta.

Message conversion: System messages are extracted from the message list and sent as a top-level system field. All other messages are converted to Anthropic's role/content-block format.

OpenAIProvider

Implements the Provider trait for the OpenAI Chat Completions API (https://api.openai.com/v1/chat/completions).

Constructor	Description
`new(api_key, model)`	Create without rate limiting
`with_rate_limit(api_key, model, rpm)`	Create with rate limiting
`with_organization(org_id)`	Set the `OpenAI-Organization` header

Streaming: Parses newline-delimited JSON (JSON Lines). Each line is a data: {json} SSE chunk with choices[0].delta.

O1/O3 model detection: is_o1_model() detects reasoning models (o1, o3 prefixes) which do not support temperature, max_tokens, or system messages.

Image support: Converts ContentBlock::Image to base64-encoded image_url content parts.

Internal types: OpenAIMessage, OpenAIContent (Text or Array), OpenAIContentPart (Text, ImageUrl, ToolCall), OpenAITool, OpenAIResponse.

GoogleProvider

Implements the Provider trait for the Gemini API (https://generativelanguage.googleapis.com/v1beta).

Constructor	Description
`new(api_key, model)`	Create without rate limiting
`with_rate_limit(api_key, model, rpm)`	Create with rate limiting

Streaming: Uses text/event-stream with custom Gemini event format.

Message conversion: System messages are filtered out and sent via systemInstruction. The assistant role maps to "model" in Gemini's API.

Image support: Converts images to inlineData parts with MIME type and base64 data.

Internal types: GeminiMessage, GeminiPart (Text, InlineData, FunctionCall, FunctionResult), GeminiTool, GeminiResponse.

OllamaProvider

Implements the Provider trait for the Ollama REST API (default: http://localhost:11434).

Constructor	Description
`new(model, base_url)`	Create with model name and optional custom URL
`with_rate_limit(model, base_url, rpm)`	Create with rate limiting

Streaming: Line-delimited JSON where each line contains a message field and a done boolean.

Content handling: Multiple content blocks are flattened into a single concatenated text string, since Ollama's API expects plain text.

Internal types: OllamaMessage, OllamaTool, OllamaResponse.

Local LLM Subsystem

Always compiled (no feature gate). The actual llama.cpp inference requires the llama-cpp-2 feature.

LocalLlmConfig

Configuration for a local GGUF model.

Field	Type	Default	Description
`id`	`String`	`"local-model"`	Unique model identifier
`name`	`String`	`"Local Model"`	Human-readable name
`model_path`	`PathBuf`	—	Path to the `.gguf` file
`context_size`	`u32`	`4096`	Context window size in tokens
`num_threads`	`Option<u32>`	`None` (auto)	CPU threads for inference
`batch_size`	`u32`	`512`	Prompt processing batch size
`gpu_layers`	`u32`	`0`	GPU layers to offload (0 = CPU only)
`use_mmap`	`bool`	`true`	Memory-map model file for faster loading
`use_mlock`	`bool`	`false`	Lock model in RAM to prevent swapping
`max_tokens`	`u32`	`2048`	Maximum tokens per response
`model_type`	`LocalModelType`	`Lfm2`	Model family for prompt formatting
`system_template`	`Option<String>`	`None`	Custom system prompt template
`supports_tools`	`bool`	`false`	Whether the model handles tool/function calling
`estimated_ram_mb`	`Option<u32>`	`None`	Estimated RAM usage (display only)

Preset constructors:

Preset	Context	RAM	Tools	Description
`lfm2_350m(path)`	32K	220 MB	No	Fastest, routing and binary decisions
`lfm2_1_2b(path)`	32K	700 MB	No	Sweet spot for agentic logic
`lfm2_2_6b_exp(path)`	32K	1.5 GB	Yes	Complex reasoning and tool-calling
`granite_nano_350m(path)`	8K	250 MB	No	Sub-second CPU responses
`granite_nano_1_5b(path)`	8K	900 MB	No	Balanced performance

Validation: validate() checks model path exists, context size > 0, batch size > 0.

LocalModelType

Model family enum that determines chat template formatting and stop tokens.

Variant	Chat Template Style	Stop Tokens
`Lfm2`	`<\|system\|>...<\|end\|>`	`<\|end\|>`, `<\|user\|>`
`Lfm2Agentic`	Same as Lfm2	Same as Lfm2
`Granite`	`<\|system\|>...\n`	`<\|user\|>`, `<\|system\|>`
`Qwen`	`<\|im_start\|>...<\|im_end\|>`	`<\|im_end\|>`, `<\|im_start\|>`
`Llama`	`<\|begin_of_text\|>...<\|eot_id\|>`	`<\|eot_id\|>`, `<\|start_header_id\|>`
`Phi`	Same as Lfm2	Same as Lfm2
`Generic`	`### System:...\n### User:...`	`### User:`, `### System:`

Methods: chat_template(), stop_tokens().

LocalInferenceParams

Per-request sampling parameters.

Field	Type	Default	Description
`temperature`	`f32`	`0.7`	Sampling temperature (0.0 = deterministic)
`top_p`	`f32`	`0.9`	Nucleus sampling threshold
`top_k`	`u32`	`40`	Top-k sampling parameter
`repeat_penalty`	`f32`	`1.1`	Repetition penalty (1.0 = none)
`max_tokens`	`u32`	`2048`	Maximum tokens to generate
`stop_sequences`	`Vec<String>`	`[]`	Custom stop sequences

Presets:

Preset	Temperature	Top-k	Max Tokens	Use Case
`factual()`	0.1	20	1024	Deterministic, factual responses
`creative()`	0.9	50	2048	Varied, creative output
`routing()`	0.0	1	50	Classification and routing

LocalModelRegistry

Manages registered local models with persistence to ~/.config/brainwires/local_models.json.

Method	Description
`new()`	Create an empty registry
`with_default_dir()`	Create with default models directory (`~/.local/share/brainwires/models/`)
`register(config)`	Add a model configuration
`get(id)`	Get model by ID
`get_default()`	Get the default model
`set_default(id)`	Set the default model (returns `false` if ID not found)
`remove(id)`	Remove a model (clears default if it was the removed model)
`list()`	List all registered models
`scan_models_dir()`	Auto-discover `.gguf` files and register them with detected model types
`load()`	Load registry from config file
`save()`	Save registry to config file

Auto-detection: scan_models_dir() reads the models directory, infers LocalModelType from filenames (e.g., lfm2 → Lfm2, granite → Granite), and estimates context size and RAM from model size indicators in the filename.

KnownModel

Pre-configured model definitions for easy discovery and downloading.

Field	Description
`id`	Model identifier (e.g., `"lfm2-1.2b"`)
`name`	Human-readable name
`huggingface_repo`	HuggingFace repository path
`filename`	Expected GGUF filename
`model_type`	`LocalModelType` variant
`context_size`	Context window size
`estimated_ram_mb`	RAM requirement
`supports_tools`	Tool-calling support
`description`	Short description

Access via known_models() (full list) or get_known_model(id) (by ID).

LocalLlmProvider

Implements the Provider trait for local GGUF model inference. Lazy-loads the model on first use.

Method	Description
`new(config)`	Create provider (validates config, does not load model)
`lfm2_350m(path)`	Shorthand for LFM2 350M preset
`lfm2_1_2b(path)`	Shorthand for LFM2 1.2B preset
`config()`	Get the model configuration
`is_loaded()`	Check if model is in memory
`load()`	Load model into memory (initializes llama.cpp backend)
`unload()`	Release model from memory
`generate(prompt, params)`	Generate text with custom parameters
`route(prompt)`	Quick routing/classification (deterministic params)
`process(prompt)`	Summarization/processing (factual params)

Without the llama-cpp-2 feature, load() and generate() return an error directing the user to enable the feature.

LocalLlmPool

Round-robin pool of LocalLlmProvider instances for parallel inference.

Method	Description
`new(config, instances)`	Create pool with N identical provider instances
`next()`	Get the next provider (round-robin via `AtomicUsize`)
`load_all()`	Load all models in the pool
`unload_all()`	Unload all models
`size()`	Number of instances
`estimated_ram_mb()`	Total estimated RAM for the pool

LocalLlmConfigError

Variant	Description
`MissingModelPath`	Model path is empty
`ModelNotFound(PathBuf)`	File does not exist at the given path
`InvalidContextSize`	Context size is 0
`InvalidBatchSize`	Batch size is 0
`ModelLoadError(String)`	llama.cpp failed to load the model
`InferenceError(String)`	Error during token generation

Usage Examples

Stream a response from OpenAI

use brainwires_providers::{OpenAIProvider, Provider, ChatOptions};
use brainwires_core::message::{Message, StreamChunk};
use futures::StreamExt;

let provider = OpenAIProvider::new("sk-...".into(), "gpt-5-mini".into());

let messages = vec![Message::user("Write a haiku about Rust.")];
let options = ChatOptions::default();

let mut stream = provider.stream_chat(&messages, None, &options);
while let Some(chunk) = stream.next().await {
    match chunk? {
        StreamChunk::Text(text) => print!("{}", text),
        StreamChunk::Usage(usage) => {
            println!("\n[{} in, {} out]", usage.input_tokens, usage.output_tokens);
        }
        StreamChunk::Done => break,
    }
}

Use tools with the Anthropic provider

use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;
use brainwires_core::tool::Tool;

let provider = AnthropicProvider::new("sk-ant-...".into(), "claude-sonnet-4-20250514".into());

let tools = vec![Tool {
    name: "get_weather".into(),
    description: Some("Get current weather for a city".into()),
    input_schema: serde_json::json!({
        "type": "object",
        "properties": {
            "city": { "type": "string" }
        },
        "required": ["city"]
    }),
    ..Default::default()
}];

let messages = vec![Message::user("What's the weather in Seattle?")];
let options = ChatOptions::default();

let response = provider.chat(&messages, Some(&tools), &options).await?;
// response.message may contain tool_use content blocks

Rate-limited HTTP requests

use brainwires_providers::{RateLimitedClient, RateLimiter};

// Standalone rate limiter
let limiter = RateLimiter::new(60); // 60 RPM
limiter.acquire().await; // blocks if depleted

// Rate-limited HTTP client
let client = RateLimitedClient::with_rate_limit(120); // 120 RPM
let response = client.post("https://api.example.com/v1/chat")
    .await
    .json(&body)
    .send()
    .await?;

println!("Tokens remaining: {:?}", client.available_tokens());

Provider with rate limiting

use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;

// Create provider with 60 requests-per-minute limit
let provider = AnthropicProvider::with_rate_limit(
    "sk-ant-...".into(),
    "claude-sonnet-4-20250514".into(),
    60,
);

let messages = vec![Message::user("Hello!")];
let response = provider.chat(&messages, None, &ChatOptions::default()).await?;

Configure a provider with ProviderConfig

use brainwires_providers::{ProviderType, ProviderConfig};

let config = ProviderConfig::new(ProviderType::OpenAI, "gpt-5-mini".into())
    .with_api_key("sk-...")
    .with_base_url("https://custom-openai-proxy.example.com/v1");

assert_eq!(config.provider.default_model(), "gpt-5-mini");
assert_eq!(config.provider.as_str(), "openai");

// Parse provider from string
let provider_type: ProviderType = "gemini".parse()?; // → Google

Local LLM inference

use brainwires_providers::{LocalLlmProvider, LocalLlmConfig, LocalInferenceParams};
use std::path::PathBuf;

// Create provider from a preset
let provider = LocalLlmProvider::lfm2_1_2b(PathBuf::from("/models/lfm2-1.2b-q8_0.gguf"))?;

// Load model into memory
provider.load().await?;

// Quick routing (deterministic, max 50 tokens)
let route = provider.route("Classify: 'fix the login bug' → [code, question, chat]").await?;

// Full inference with custom params
let result = provider.generate(
    "Explain ownership in Rust briefly.",
    &LocalInferenceParams::factual(),
).await?;

// Or use via the Provider trait
use brainwires_providers::{Provider, ChatOptions};
use brainwires_core::message::Message;

let messages = vec![Message::user("Summarize this code.")];
let response = provider.chat(&messages, None, &ChatOptions::default()).await?;

// Unload when done
provider.unload().await;

Model registry and auto-discovery

use brainwires_providers::{LocalModelRegistry, LocalLlmConfig, known_models, get_known_model};
use std::path::PathBuf;

// Load or create registry
let mut registry = LocalModelRegistry::load()?;

// Register a model manually
registry.register(LocalLlmConfig::lfm2_350m(PathBuf::from("/models/lfm2-350m.gguf")));
registry.set_default("lfm2-350m");

// Auto-discover GGUF files in the models directory
let discovered = registry.scan_models_dir()?;
for id in &discovered {
    println!("Found: {}", id);
}

// Browse known/recommended models
for model in known_models() {
    println!("{}: {} ({}MB RAM) — {}", model.id, model.name, model.estimated_ram_mb, model.description);
}

// Save registry
registry.save()?;

Local LLM pool for parallel inference

use brainwires_providers::{LocalLlmPool, LocalLlmConfig};
use std::path::PathBuf;

let config = LocalLlmConfig::lfm2_350m(PathBuf::from("/models/lfm2-350m.gguf"));
let pool = LocalLlmPool::new(config, 4)?; // 4 instances

pool.load_all().await?;
println!("Pool RAM: ~{}MB", pool.estimated_ram_mb().unwrap_or(0));

// Round-robin across instances
let provider = pool.next();
let result = provider.route("classify this input").await?;

pool.unload_all().await;

Integration

Use via the brainwires facade crate with the providers feature, or depend on brainwires-providers directly:

# Via facade
[dependencies]
brainwires = { version = "0.6", features = ["providers"] }

# Direct
[dependencies]
brainwires-providers = "0.6"

Re-exports at crate root for convenience:

use brainwires_providers::{
    // Trait + options (from brainwires-core)
    Provider, ChatOptions,
    // Cloud providers (native)
    AnthropicProvider, OpenAIProvider, GoogleProvider, OllamaProvider,
    // Rate limiting (native)
    RateLimiter, RateLimitedClient,
    // Shared types
    ProviderType, ProviderConfig,
    // Local LLM (always available)
    LocalLlmProvider, LocalLlmConfig, LocalModelType,
    LocalInferenceParams, LocalModelRegistry, LocalLlmPool,
    LocalLlmConfigError, KnownModel, known_models, get_known_model,
};

License

Licensed under the MIT License. See LICENSE for details.

brainwires-providers 0.6.0