Crate cognate_llm

Expand description

§Cognate

Type-safe LLM framework for Rust. Multi-provider support, compile-time validation, streaming, tool calling—all with zero-cost abstractions.

§Overview

Cognate provides production-grade abstractions for building LLM-powered applications in Rust. Unlike fragmented HTTP clients or Python-style libraries, Cognate offers:

Unified multi-provider interface (OpenAI, Anthropic, Groq, Ollama, custom)
Type-safe tool calling via #[derive(Tool)]
Compile-time prompt validation via #[derive(Prompt)]
Production middleware (retry, rate-limit, tracing)
Streaming-first design with proper error handling
Axum web server integration out of the box
RAG pipeline support with pluggable vector stores

§Quick Links

Documentation: https://docs.rs/cognate-core
Examples: See examples/
Contributing: CONTRIBUTING.md
Architecture: ARCHITECTURE.md
Getting Started: GETTING_STARTED.md

§Why Cognate?

§Feature Comparison

Feature Comparison

Feature	Cognate	async-openai	rig
Multi-provider	OpenAI, Anthropic, Groq, Ollama	OpenAI only	OpenAI, Anthropic
Type-safe tools	#[derive(Tool)] with validation	Manual JSON	Runtime definition
Compile-time validation	Prompts checked at build time	No	No
Axum integration	Built-in extractors + middleware	No	No
Middleware system	Retry, rate-limit, tracing, observability	Basic retry	Limited
RAG support	Vector search + memory traits	No	No

§Performance

Cognate is designed for production workloads where latency and throughput matter.

Metric	Cognate	async-openai (Rust)	Python LangChain
P50 Latency	<1ms (overhead)	<1ms (overhead)	45ms
P99 Latency	<5ms (overhead)	<5ms (overhead)	150ms
Requests/sec	2500+	2800+	200-400
Memory (RSS)	12-15 MB	12-15 MB	120-150 MB
Compile time	8-12s (clean)	6-8s (clean)	N/A

See BENCHMARKS.md for detailed metrics and reproducible measurements.

§Installation

Add Cognate to your Cargo.toml:

[dependencies]
cognate-core = "0.1"
cognate-providers = "0.1"
cognate-tools = "0.1"
cognate-prompts = "0.1"
tokio = { version = "1.0", features = ["full"] }

§Quick Start

§Basic Chat

use cognate_core::{Provider, Request, Message};
use cognate_providers::OpenAiProvider;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let provider = OpenAiProvider::new(
        std::env::var("OPENAI_API_KEY")?
    )?;
    
    let request = Request::new()
        .with_model("gpt-4o-mini")
        .with_messages(vec![
            Message::user("Explain Rust type safety in one sentence"),
        ]);

    let response = provider.complete(request).await?;
    println!("{}", response.content());
    
    Ok(())
}

§Type-Safe Tool Calling

use cognate_tools::Tool;
use cognate_core::{Provider, Request, Message};
use serde::{Deserialize, Serialize};
use schemars::JsonSchema;

#[derive(Tool, Serialize, Deserialize, JsonSchema)]
#[tool(description = "Add two numbers")]
struct Calculator {
    /// First number
    a: i32,
    /// Second number
    b: i32,
}

impl Calculator {
    async fn run(&self) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
        Ok(format!("{} + {} = {}", self.a, self.b, self.a + self.b))
    }
}

// Use in request
let request = Request::new()
    .with_model("gpt-4o")
    .with_messages(vec![
        Message::user("What is 15 + 23?"),
    ])
    .with_tool(Calculator);

§Streaming Responses

use cognate_core::Provider;
use futures::StreamExt;

let provider = /* ... */;

let mut stream = provider.stream(request).await?;

while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    print!("{}", chunk.text);
}

§Architecture

Cognate is organized into specialized crates:

Architecture Diagram

cognate-core: Provider trait, request/response types, middleware system
cognate-providers: OpenAI, Anthropic, retry, fallback implementations
cognate-tools: Tool dispatch, automatic execution loop, #[derive(Tool)]
cognate-prompts: Template system, compile-time validation, #[derive(Prompt)]
cognate-rag: Vector search traits, memory abstraction, embedding utilities
cognate-axum: Axum extractors, middleware layers, web integration
cognate-cli: CLI tools for development and testing

§Examples

All examples are in the respective crate directories:

Simple Chat - Basic API usage
Streaming Chat - Response streaming
Tool Usage - Tool calling and dispatch
Agent Loop - Multi-turn tool-using agent
RAG Pipeline - Search + generation
Axum ChatGPT Clone - Web server

Run an example:

cargo run --example simple_chat -p cognate-providers

§Configuration

§OpenAI Provider

use cognate_providers::{OpenAiProvider, RetryConfig};
use std::time::Duration;

let provider = OpenAiProvider::new(api_key)?
    .with_timeout(Duration::from_secs(30))
    .with_retry(RetryConfig {
        max_retries: 3,
        initial_backoff: Duration::from_millis(100),
        max_backoff: Duration::from_secs(10),
    });

§Custom Providers

Implement the Provider trait:

use cognate_core::{Provider, Request, Response};
use async_trait::async_trait;

struct MyProvider;

#[async_trait]
impl Provider for MyProvider {
    async fn complete(&self, req: Request) -> cognate_core::Result<Response> {
        // Your implementation
        todo!()
    }
}

§Production Considerations

§Observability

Cognate integrates with standard Rust tracing:

use tracing::{info, span, Level};

let span = span!(Level::INFO, "llm_request", model = "gpt-4o");
let _guard = span.enter();

let response = provider.complete(request).await?;

§Error Handling

Comprehensive error types:

use cognate_core::{Error, Result};

match provider.complete(request).await {
    Ok(response) => println!("{}", response.content()),
    Err(Error::RateLimited { retry_after }) => {
        println!("Rate limited, retry in {:?}", retry_after);
    }
    Err(e) => eprintln!("Error: {}", e),
}

§Testing

Cognate includes a mock provider for testing:

use cognate_core::MockProvider;

let mock = MockProvider::new()
    .queue_response(Response::text("Hello, world!"));

let response = mock.complete(request).await?;

§Status

Cognate is in active development (v0.1.0). The API is stable and suitable for production use.

All 9 crates compile cleanly
17 unit tests + 7 doc tests passing
Compatible with Rust 1.70 and newer
Production middleware included
Streaming support verified

§License

Dual-licensed under MIT and Apache-2.0.

Choose whichever license works best for your project.

§Contributing

Contributions are welcome. Please read CONTRIBUTING.md first.

Development setup:

git clone https://github.com/YOUR_ORG/cognate
cd cognate
cargo test --workspace
cargo fmt
cargo clippy --workspace

§Support

Documentation: https://docs.rs/cognate-core
Examples: See examples/
Issues: https://github.com/YOUR_ORG/cognate/issues

§Roadmap

Vector store integrations (Qdrant, Pinecone, Weaviate)
Additional providers (Groq, Ollama embedded)
Streaming cost estimation
Advanced caching layer
Web dashboard for monitoring

§Cognate

A modular, extensible LLM framework for Rust with multi-provider support, type-safe tools, and RAG capabilities.

§Quick Start

use cognate::prelude::*;

#[tokio::main]
async fn main() {
    let client = cognate::providers::OpenAiProvider::new("sk-...".to_string());
    // Use the client...
}

§Features

providers - OpenAI and Anthropic provider support (default)
tools - Type-safe tool calling with derive macros (default)
prompts - Compile-time validated prompt templates (default)
rag - Retrieval-Augmented Generation support
axum - Axum web framework integration
full - All features

Re-exports§

pub use cognate_axum;
pub use cognate_tools_derive;
pub use cognate_prompts_derive;

Modules§

anthropic: Anthropic provider implementation.
error: Error types for Cognate.
middleware: Tower-inspired middleware system for Cognate providers.
openai: OpenAI provider implementation.
prelude: Prelude module for convenient imports
prompts: Prompt templating
providers: Provider implementations
ratelimit: Token bucket rate limiting implementation
retry: Retry logic with exponential backoff
sse: Server-Sent Events (SSE) streaming parser
tools: Tool calling and execution
types: Re-exports of core types for ergonomic imports.

Structs§

AnthropicProvider: Provider client for the Anthropic API (Claude models).
Document: A document stored in a vector store.
FallbackProvider: A provider that falls back to a secondary provider on retryable errors.
MemoryVectorStore: A simple in-memory VectorStore.
Message: A single message in a conversation.
OpenAiProvider: Provider client for the OpenAI API.
RagPipeline: A high-level RAG pipeline combining an embedding provider with a vector store.
Request: A completion request sent to a provider.
Response: A completed response from a provider.
RetryConfig: Configuration for exponential-backoff retry logic.
ToolExecutor: Executes a request with automatic tool-call dispatch.

Enums§

Error: The main error type for all Cognate operations.

Traits§

Layer: A factory that wraps a Provider to produce a new Provider.
Prompt: A type-safe, compile-time validated prompt template.
Provider: Core trait for all LLM providers.
Tool: The core trait for all callable tools.
VectorStore: A persistent or in-memory store of embedded documents.

Derive Macros§

DerivePromptMacro: Derive the [cognate_prompts::Prompt] trait for a struct.
DeriveToolMacro: Derive the [cognate_tools::Tool] trait for a struct.
Prompt: Derive the [cognate_prompts::Prompt] trait for a struct.
Tool: Derive the [cognate_tools::Tool] trait for a struct.