brainwires-providers 0.6.0

AI provider implementations for the Brainwires Agent Framework
Documentation

brainwires-providers

Crates.io Documentation License

AI provider implementations for the Brainwires Agent Framework.

Overview

brainwires-providers provides concrete implementations of the Provider trait for multiple AI services: Anthropic (Claude), OpenAI (GPT), Google (Gemini), Ollama, and local LLM inference via llama.cpp. Every provider converts to and from the unified brainwires-core message types, so callers can swap backends without changing application code.

Design principles:

  • Unified interface — all providers implement the same Provider trait from brainwires-core (chat, streaming, tool calling)
  • Feature-gated backends — cloud providers compile under native (default); local LLM compiles always; llama.cpp is behind llama-cpp-2
  • Rate limiting built in — token-bucket RateLimiter and RateLimitedClient available to any provider
  • Streaming-first — every provider returns BoxStream<Result<StreamChunk>> via async_stream
  • Tool calling — Anthropic, OpenAI, Google, and Ollama all support function calling mapped to/from brainwires_core::Tool
  • Local inference — CPU-optimized GGUF model support with model registry, preset configs, and inference parameter tuning
  ┌───────────────────────────────────────────────────────────────────────┐
  │                        brainwires-providers                           │
  │                                                                       │
  │  ┌─── Provider trait (brainwires-core) ────────────────────────────┐  │
  │  │  chat()        ──► ChatResponse                                 │  │
  │  │  stream_chat() ──► BoxStream<StreamChunk>                       │  │
  │  │  name()        ──► &str                                         │  │
  │  └─────────────────────────────────────────────────────────────────┘  │
  │           │                                                           │
  │           ▼                                                           │
  │  ┌─── Cloud Providers (feature = "native") ────────────────────────┐  │
  │  │                                                                  │  │
  │  │  AnthropicProvider ──► SSE streaming ──► api.anthropic.com      │  │
  │  │  OpenAIProvider    ──► JSON Lines    ──► api.openai.com         │  │
  │  │  GoogleProvider    ──► event-stream  ──► generativelanguage.…   │  │
  │  │  OllamaProvider    ──► line-delim JSON ► localhost:11434        │  │
  │  │           │                                                      │  │
  │  │           ▼                                                      │  │
  │  │  RateLimitedClient ──► RateLimiter (token-bucket)               │  │
  │  └──────────────────────────────────────────────────────────────────┘  │
  │                                                                       │
  │  ┌─── Local LLM (always compiled, llama.cpp optional) ─────────────┐  │
  │  │                                                                  │  │
  │  │  LocalLlmProvider ──► generate() / route() / process()         │  │
  │  │       │                                                          │  │
  │  │       ▼                                                          │  │
  │  │  LocalLlmConfig ◄── LocalModelRegistry ◄── scan_models_dir()   │  │
  │  │  LocalModelType  ◄── chat_template() / stop_tokens()           │  │
  │  │  LocalInferenceParams ◄── factual() / creative() / routing()   │  │
  │  │  LocalLlmPool    ──► round-robin multi-instance inference       │  │
  │  └──────────────────────────────────────────────────────────────────┘  │
  │                                                                       │
  │  ┌─── Shared Types ───────────────────────────────────────────────┐   │
  │  │  ProviderType (Anthropic | OpenAI | Google | Ollama | Custom)  │   │
  │  │  ProviderConfig (provider, model, api_key, base_url, options)  │   │
  │  └────────────────────────────────────────────────────────────────┘   │
  └───────────────────────────────────────────────────────────────────────┘

Quick Start

Add to your Cargo.toml:

[dependencies]
brainwires-providers = "0.6"

Send a chat request with the Anthropic provider:

use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let provider = AnthropicProvider::new("sk-ant-...".into(), "claude-sonnet-4-20250514".into());

    let messages = vec![Message::user("Explain the borrow checker in one sentence.")];
    let options = ChatOptions::default();

    let response = provider.chat(&messages, None, &options).await?;
    println!("{}", response.message.text().unwrap_or_default());
    println!("Tokens: {} in, {} out", response.usage.input_tokens, response.usage.output_tokens);

    Ok(())
}

Features

Feature Default Description
native Yes Enables cloud providers (Anthropic, OpenAI, Google, Ollama), RateLimiter, RateLimitedClient, and their dependencies (reqwest, tokio, async-stream, tracing, uuid)
llama-cpp-2 No Enables local LLM inference via llama.cpp bindings. Heavy dependency (~long compile). Adds tracing and tokio even without native
# Default (cloud providers only)
brainwires-providers = "0.6"

# With local LLM support
brainwires-providers = { version = "0.6", features = ["llama-cpp-2"] }

# Local LLM only (no cloud providers)
brainwires-providers = { version = "0.6", default-features = false, features = ["llama-cpp-2"] }

Architecture

Provider Trait

All providers implement the Provider trait from brainwires-core. This is the unified interface that callers program against.

Method Description
name() Provider identifier string (e.g., "anthropic", "lfm2-350m")
max_output_tokens() Optional maximum output token limit for the provider
chat(messages, tools, options) Non-streaming chat completion returning ChatResponse
stream_chat(messages, tools, options) Streaming chat returning BoxStream<Result<StreamChunk>>

ChatOptions controls per-request behavior:

Field Type Description
system Option<String> System prompt
temperature Option<f32> Sampling temperature (0.0–2.0)
max_tokens Option<u32> Maximum tokens to generate
stop Option<Vec<String>> Stop sequences

StreamChunk variants:

Variant Description
Text(String) Generated text token
Usage(Usage) Token usage counts (input + output)
Done Stream completion marker

ProviderType

Enum identifying the AI provider backend.

Variant as_str() default_model()
Anthropic "anthropic" claude-sonnet-4-20250514
OpenAI "openai" gpt-5-mini
Google "google" gemini-2.5-flash
Ollama "ollama" llama3.3
Custom "custom" claude-sonnet-4-20250514

FromStr also accepts aliases: "gemini" maps to Google, "brainwires" maps to Custom.

ProviderConfig

Configuration struct for initializing a provider.

Field Type Default Description
provider ProviderType Provider backend
model String Model name
api_key Option<String> None API key (skipped in serialization if absent)
base_url Option<String> None Custom endpoint URL
options HashMap<String, Value> {} Additional provider-specific options (flattened in JSON)

Builder methods: new(provider, model), with_api_key(key), with_base_url(url).

RateLimiter

Token-bucket rate limiter using atomic operations for lock-free reads.

Field Type Description
tokens AtomicU32 Current available tokens
max_tokens u32 Configured requests-per-minute limit
refill_interval Duration Time between token refills (60s / rpm)
last_refill Mutex<Instant> Timestamp of last refill
Method Description
new(requests_per_minute) Create a limiter with the given RPM cap
acquire() Async — consume one token, wait if depleted
available_tokens() Current token count (diagnostic)
max_requests_per_minute() Configured limit

RateLimitedClient

Wraps reqwest::Client with an optional RateLimiter. Every outgoing request waits for a token before sending.

Method Description
new() Create client with no rate limiting
with_rate_limit(rpm) Create client with the given RPM limit
from_client(client, rpm) Wrap an existing reqwest::Client
get(url) Build a GET request (waits for token first)
post(url) Build a POST request (waits for token first)
inner() Access the underlying reqwest::Client
available_tokens() Returns Option<u32>None if no limiter

AnthropicProvider

Implements the Provider trait for the Anthropic Messages API (https://api.anthropic.com/v1/messages, version 2023-06-01).

Constructor Description
new(api_key, model) Create without rate limiting
with_rate_limit(api_key, model, rpm) Create with rate limiting

Streaming: Parses Server-Sent Events (SSE) with data: prefix. Events include message_start, content_block_delta, message_delta, and message_stop.

Internal types: AnthropicMessage, AnthropicContentBlock (Text, ToolUse, ToolResult), AnthropicTool, AnthropicResponse, AnthropicStreamEvent, AnthropicDelta.

Message conversion: System messages are extracted from the message list and sent as a top-level system field. All other messages are converted to Anthropic's role/content-block format.

OpenAIProvider

Implements the Provider trait for the OpenAI Chat Completions API (https://api.openai.com/v1/chat/completions).

Constructor Description
new(api_key, model) Create without rate limiting
with_rate_limit(api_key, model, rpm) Create with rate limiting
with_organization(org_id) Set the OpenAI-Organization header

Streaming: Parses newline-delimited JSON (JSON Lines). Each line is a data: {json} SSE chunk with choices[0].delta.

O1/O3 model detection: is_o1_model() detects reasoning models (o1, o3 prefixes) which do not support temperature, max_tokens, or system messages.

Image support: Converts ContentBlock::Image to base64-encoded image_url content parts.

Internal types: OpenAIMessage, OpenAIContent (Text or Array), OpenAIContentPart (Text, ImageUrl, ToolCall), OpenAITool, OpenAIResponse.

GoogleProvider

Implements the Provider trait for the Gemini API (https://generativelanguage.googleapis.com/v1beta).

Constructor Description
new(api_key, model) Create without rate limiting
with_rate_limit(api_key, model, rpm) Create with rate limiting

Streaming: Uses text/event-stream with custom Gemini event format.

Message conversion: System messages are filtered out and sent via systemInstruction. The assistant role maps to "model" in Gemini's API.

Image support: Converts images to inlineData parts with MIME type and base64 data.

Internal types: GeminiMessage, GeminiPart (Text, InlineData, FunctionCall, FunctionResult), GeminiTool, GeminiResponse.

OllamaProvider

Implements the Provider trait for the Ollama REST API (default: http://localhost:11434).

Constructor Description
new(model, base_url) Create with model name and optional custom URL
with_rate_limit(model, base_url, rpm) Create with rate limiting

Streaming: Line-delimited JSON where each line contains a message field and a done boolean.

Content handling: Multiple content blocks are flattened into a single concatenated text string, since Ollama's API expects plain text.

Internal types: OllamaMessage, OllamaTool, OllamaResponse.

Local LLM Subsystem

Always compiled (no feature gate). The actual llama.cpp inference requires the llama-cpp-2 feature.

LocalLlmConfig

Configuration for a local GGUF model.

Field Type Default Description
id String "local-model" Unique model identifier
name String "Local Model" Human-readable name
model_path PathBuf Path to the .gguf file
context_size u32 4096 Context window size in tokens
num_threads Option<u32> None (auto) CPU threads for inference
batch_size u32 512 Prompt processing batch size
gpu_layers u32 0 GPU layers to offload (0 = CPU only)
use_mmap bool true Memory-map model file for faster loading
use_mlock bool false Lock model in RAM to prevent swapping
max_tokens u32 2048 Maximum tokens per response
model_type LocalModelType Lfm2 Model family for prompt formatting
system_template Option<String> None Custom system prompt template
supports_tools bool false Whether the model handles tool/function calling
estimated_ram_mb Option<u32> None Estimated RAM usage (display only)

Preset constructors:

Preset Context RAM Tools Description
lfm2_350m(path) 32K 220 MB No Fastest, routing and binary decisions
lfm2_1_2b(path) 32K 700 MB No Sweet spot for agentic logic
lfm2_2_6b_exp(path) 32K 1.5 GB Yes Complex reasoning and tool-calling
granite_nano_350m(path) 8K 250 MB No Sub-second CPU responses
granite_nano_1_5b(path) 8K 900 MB No Balanced performance

Validation: validate() checks model path exists, context size > 0, batch size > 0.

LocalModelType

Model family enum that determines chat template formatting and stop tokens.

Variant Chat Template Style Stop Tokens
Lfm2 <|system|>...<|end|> <|end|>, <|user|>
Lfm2Agentic Same as Lfm2 Same as Lfm2
Granite <|system|>...\n <|user|>, <|system|>
Qwen <|im_start|>...<|im_end|> <|im_end|>, <|im_start|>
Llama <|begin_of_text|>...<|eot_id|> <|eot_id|>, <|start_header_id|>
Phi Same as Lfm2 Same as Lfm2
Generic ### System:...\n### User:... ### User:, ### System:

Methods: chat_template(), stop_tokens().

LocalInferenceParams

Per-request sampling parameters.

Field Type Default Description
temperature f32 0.7 Sampling temperature (0.0 = deterministic)
top_p f32 0.9 Nucleus sampling threshold
top_k u32 40 Top-k sampling parameter
repeat_penalty f32 1.1 Repetition penalty (1.0 = none)
max_tokens u32 2048 Maximum tokens to generate
stop_sequences Vec<String> [] Custom stop sequences

Presets:

Preset Temperature Top-k Max Tokens Use Case
factual() 0.1 20 1024 Deterministic, factual responses
creative() 0.9 50 2048 Varied, creative output
routing() 0.0 1 50 Classification and routing

LocalModelRegistry

Manages registered local models with persistence to ~/.config/brainwires/local_models.json.

Method Description
new() Create an empty registry
with_default_dir() Create with default models directory (~/.local/share/brainwires/models/)
register(config) Add a model configuration
get(id) Get model by ID
get_default() Get the default model
set_default(id) Set the default model (returns false if ID not found)
remove(id) Remove a model (clears default if it was the removed model)
list() List all registered models
scan_models_dir() Auto-discover .gguf files and register them with detected model types
load() Load registry from config file
save() Save registry to config file

Auto-detection: scan_models_dir() reads the models directory, infers LocalModelType from filenames (e.g., lfm2Lfm2, graniteGranite), and estimates context size and RAM from model size indicators in the filename.

KnownModel

Pre-configured model definitions for easy discovery and downloading.

Field Description
id Model identifier (e.g., "lfm2-1.2b")
name Human-readable name
huggingface_repo HuggingFace repository path
filename Expected GGUF filename
model_type LocalModelType variant
context_size Context window size
estimated_ram_mb RAM requirement
supports_tools Tool-calling support
description Short description

Access via known_models() (full list) or get_known_model(id) (by ID).

LocalLlmProvider

Implements the Provider trait for local GGUF model inference. Lazy-loads the model on first use.

Method Description
new(config) Create provider (validates config, does not load model)
lfm2_350m(path) Shorthand for LFM2 350M preset
lfm2_1_2b(path) Shorthand for LFM2 1.2B preset
config() Get the model configuration
is_loaded() Check if model is in memory
load() Load model into memory (initializes llama.cpp backend)
unload() Release model from memory
generate(prompt, params) Generate text with custom parameters
route(prompt) Quick routing/classification (deterministic params)
process(prompt) Summarization/processing (factual params)

Without the llama-cpp-2 feature, load() and generate() return an error directing the user to enable the feature.

LocalLlmPool

Round-robin pool of LocalLlmProvider instances for parallel inference.

Method Description
new(config, instances) Create pool with N identical provider instances
next() Get the next provider (round-robin via AtomicUsize)
load_all() Load all models in the pool
unload_all() Unload all models
size() Number of instances
estimated_ram_mb() Total estimated RAM for the pool

LocalLlmConfigError

Variant Description
MissingModelPath Model path is empty
ModelNotFound(PathBuf) File does not exist at the given path
InvalidContextSize Context size is 0
InvalidBatchSize Batch size is 0
ModelLoadError(String) llama.cpp failed to load the model
InferenceError(String) Error during token generation

Usage Examples

Stream a response from OpenAI

use brainwires_providers::{OpenAIProvider, Provider, ChatOptions};
use brainwires_core::message::{Message, StreamChunk};
use futures::StreamExt;

let provider = OpenAIProvider::new("sk-...".into(), "gpt-5-mini".into());

let messages = vec![Message::user("Write a haiku about Rust.")];
let options = ChatOptions::default();

let mut stream = provider.stream_chat(&messages, None, &options);
while let Some(chunk) = stream.next().await {
    match chunk? {
        StreamChunk::Text(text) => print!("{}", text),
        StreamChunk::Usage(usage) => {
            println!("\n[{} in, {} out]", usage.input_tokens, usage.output_tokens);
        }
        StreamChunk::Done => break,
    }
}

Use tools with the Anthropic provider

use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;
use brainwires_core::tool::Tool;

let provider = AnthropicProvider::new("sk-ant-...".into(), "claude-sonnet-4-20250514".into());

let tools = vec![Tool {
    name: "get_weather".into(),
    description: Some("Get current weather for a city".into()),
    input_schema: serde_json::json!({
        "type": "object",
        "properties": {
            "city": { "type": "string" }
        },
        "required": ["city"]
    }),
    ..Default::default()
}];

let messages = vec![Message::user("What's the weather in Seattle?")];
let options = ChatOptions::default();

let response = provider.chat(&messages, Some(&tools), &options).await?;
// response.message may contain tool_use content blocks

Rate-limited HTTP requests

use brainwires_providers::{RateLimitedClient, RateLimiter};

// Standalone rate limiter
let limiter = RateLimiter::new(60); // 60 RPM
limiter.acquire().await; // blocks if depleted

// Rate-limited HTTP client
let client = RateLimitedClient::with_rate_limit(120); // 120 RPM
let response = client.post("https://api.example.com/v1/chat")
    .await
    .json(&body)
    .send()
    .await?;

println!("Tokens remaining: {:?}", client.available_tokens());

Provider with rate limiting

use brainwires_providers::{AnthropicProvider, Provider, ChatOptions};
use brainwires_core::message::Message;

// Create provider with 60 requests-per-minute limit
let provider = AnthropicProvider::with_rate_limit(
    "sk-ant-...".into(),
    "claude-sonnet-4-20250514".into(),
    60,
);

let messages = vec![Message::user("Hello!")];
let response = provider.chat(&messages, None, &ChatOptions::default()).await?;

Configure a provider with ProviderConfig

use brainwires_providers::{ProviderType, ProviderConfig};

let config = ProviderConfig::new(ProviderType::OpenAI, "gpt-5-mini".into())
    .with_api_key("sk-...")
    .with_base_url("https://custom-openai-proxy.example.com/v1");

assert_eq!(config.provider.default_model(), "gpt-5-mini");
assert_eq!(config.provider.as_str(), "openai");

// Parse provider from string
let provider_type: ProviderType = "gemini".parse()?; // → Google

Local LLM inference

use brainwires_providers::{LocalLlmProvider, LocalLlmConfig, LocalInferenceParams};
use std::path::PathBuf;

// Create provider from a preset
let provider = LocalLlmProvider::lfm2_1_2b(PathBuf::from("/models/lfm2-1.2b-q8_0.gguf"))?;

// Load model into memory
provider.load().await?;

// Quick routing (deterministic, max 50 tokens)
let route = provider.route("Classify: 'fix the login bug' → [code, question, chat]").await?;

// Full inference with custom params
let result = provider.generate(
    "Explain ownership in Rust briefly.",
    &LocalInferenceParams::factual(),
).await?;

// Or use via the Provider trait
use brainwires_providers::{Provider, ChatOptions};
use brainwires_core::message::Message;

let messages = vec![Message::user("Summarize this code.")];
let response = provider.chat(&messages, None, &ChatOptions::default()).await?;

// Unload when done
provider.unload().await;

Model registry and auto-discovery

use brainwires_providers::{LocalModelRegistry, LocalLlmConfig, known_models, get_known_model};
use std::path::PathBuf;

// Load or create registry
let mut registry = LocalModelRegistry::load()?;

// Register a model manually
registry.register(LocalLlmConfig::lfm2_350m(PathBuf::from("/models/lfm2-350m.gguf")));
registry.set_default("lfm2-350m");

// Auto-discover GGUF files in the models directory
let discovered = registry.scan_models_dir()?;
for id in &discovered {
    println!("Found: {}", id);
}

// Browse known/recommended models
for model in known_models() {
    println!("{}: {} ({}MB RAM) — {}", model.id, model.name, model.estimated_ram_mb, model.description);
}

// Save registry
registry.save()?;

Local LLM pool for parallel inference

use brainwires_providers::{LocalLlmPool, LocalLlmConfig};
use std::path::PathBuf;

let config = LocalLlmConfig::lfm2_350m(PathBuf::from("/models/lfm2-350m.gguf"));
let pool = LocalLlmPool::new(config, 4)?; // 4 instances

pool.load_all().await?;
println!("Pool RAM: ~{}MB", pool.estimated_ram_mb().unwrap_or(0));

// Round-robin across instances
let provider = pool.next();
let result = provider.route("classify this input").await?;

pool.unload_all().await;

Integration

Use via the brainwires facade crate with the providers feature, or depend on brainwires-providers directly:

# Via facade
[dependencies]
brainwires = { version = "0.6", features = ["providers"] }

# Direct
[dependencies]
brainwires-providers = "0.6"

Re-exports at crate root for convenience:

use brainwires_providers::{
    // Trait + options (from brainwires-core)
    Provider, ChatOptions,
    // Cloud providers (native)
    AnthropicProvider, OpenAIProvider, GoogleProvider, OllamaProvider,
    // Rate limiting (native)
    RateLimiter, RateLimitedClient,
    // Shared types
    ProviderType, ProviderConfig,
    // Local LLM (always available)
    LocalLlmProvider, LocalLlmConfig, LocalModelType,
    LocalInferenceParams, LocalModelRegistry, LocalLlmPool,
    LocalLlmConfigError, KnownModel, known_models, get_known_model,
};

License

Licensed under the MIT License. See LICENSE for details.