Expand description
§Cognate
Type-safe LLM framework for Rust. Multi-provider support, compile-time validation, streaming, tool calling—all with zero-cost abstractions.
§Overview
Cognate provides production-grade abstractions for building LLM-powered applications in Rust. Unlike fragmented HTTP clients or Python-style libraries, Cognate offers:
- Unified multi-provider interface (OpenAI, Anthropic, Groq, Ollama, custom)
- Type-safe tool calling via
#[derive(Tool)] - Compile-time prompt validation via
#[derive(Prompt)] - Production middleware (retry, rate-limit, tracing)
- Streaming-first design with proper error handling
- Axum web server integration out of the box
- RAG pipeline support with pluggable vector stores
§Quick Links
- Documentation: https://docs.rs/cognate-core
- Examples: See examples/
- Contributing: CONTRIBUTING.md
- Architecture: ARCHITECTURE.md
- Getting Started: GETTING_STARTED.md
§Why Cognate?
§Feature Comparison
| Feature | Cognate | async-openai | rig |
|---|---|---|---|
| Multi-provider | OpenAI, Anthropic, Groq, Ollama | OpenAI only | OpenAI, Anthropic |
| Type-safe tools | #[derive(Tool)] with validation | Manual JSON | Runtime definition |
| Compile-time validation | Prompts checked at build time | No | No |
| Axum integration | Built-in extractors + middleware | No | No |
| Middleware system | Retry, rate-limit, tracing, observability | Basic retry | Limited |
| RAG support | Vector search + memory traits | No | No |
§Performance
Cognate is designed for production workloads where latency and throughput matter.
| Metric | Cognate | async-openai (Rust) | Python LangChain |
|---|---|---|---|
| P50 Latency | <1ms (overhead) | <1ms (overhead) | 45ms |
| P99 Latency | <5ms (overhead) | <5ms (overhead) | 150ms |
| Requests/sec | 2500+ | 2800+ | 200-400 |
| Memory (RSS) | 12-15 MB | 12-15 MB | 120-150 MB |
| Compile time | 8-12s (clean) | 6-8s (clean) | N/A |
See BENCHMARKS.md for detailed metrics and reproducible measurements.
§Installation
Add Cognate to your Cargo.toml:
[dependencies]
cognate-core = "0.1"
cognate-providers = "0.1"
cognate-tools = "0.1"
cognate-prompts = "0.1"
tokio = { version = "1.0", features = ["full"] }§Quick Start
§Basic Chat
use cognate_core::{Provider, Request, Message};
use cognate_providers::OpenAiProvider;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let provider = OpenAiProvider::new(
std::env::var("OPENAI_API_KEY")?
)?;
let request = Request::new()
.with_model("gpt-4o-mini")
.with_messages(vec![
Message::user("Explain Rust type safety in one sentence"),
]);
let response = provider.complete(request).await?;
println!("{}", response.content());
Ok(())
}§Type-Safe Tool Calling
use cognate_tools::Tool;
use cognate_core::{Provider, Request, Message};
use serde::{Deserialize, Serialize};
use schemars::JsonSchema;
#[derive(Tool, Serialize, Deserialize, JsonSchema)]
#[tool(description = "Add two numbers")]
struct Calculator {
/// First number
a: i32,
/// Second number
b: i32,
}
impl Calculator {
async fn run(&self) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
Ok(format!("{} + {} = {}", self.a, self.b, self.a + self.b))
}
}
// Use in request
let request = Request::new()
.with_model("gpt-4o")
.with_messages(vec![
Message::user("What is 15 + 23?"),
])
.with_tool(Calculator);§Streaming Responses
use cognate_core::Provider;
use futures::StreamExt;
let provider = /* ... */;
let mut stream = provider.stream(request).await?;
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
print!("{}", chunk.text);
}§Architecture
Cognate is organized into specialized crates:
- cognate-core: Provider trait, request/response types, middleware system
- cognate-providers: OpenAI, Anthropic, retry, fallback implementations
- cognate-tools: Tool dispatch, automatic execution loop, #[derive(Tool)]
- cognate-prompts: Template system, compile-time validation, #[derive(Prompt)]
- cognate-rag: Vector search traits, memory abstraction, embedding utilities
- cognate-axum: Axum extractors, middleware layers, web integration
- cognate-cli: CLI tools for development and testing
§Examples
All examples are in the respective crate directories:
- Simple Chat - Basic API usage
- Streaming Chat - Response streaming
- Tool Usage - Tool calling and dispatch
- Agent Loop - Multi-turn tool-using agent
- RAG Pipeline - Search + generation
- Axum ChatGPT Clone - Web server
Run an example:
cargo run --example simple_chat -p cognate-providers§Configuration
§OpenAI Provider
use cognate_providers::{OpenAiProvider, RetryConfig};
use std::time::Duration;
let provider = OpenAiProvider::new(api_key)?
.with_timeout(Duration::from_secs(30))
.with_retry(RetryConfig {
max_retries: 3,
initial_backoff: Duration::from_millis(100),
max_backoff: Duration::from_secs(10),
});§Custom Providers
Implement the Provider trait:
use cognate_core::{Provider, Request, Response};
use async_trait::async_trait;
struct MyProvider;
#[async_trait]
impl Provider for MyProvider {
async fn complete(&self, req: Request) -> cognate_core::Result<Response> {
// Your implementation
todo!()
}
}§Production Considerations
§Observability
Cognate integrates with standard Rust tracing:
use tracing::{info, span, Level};
let span = span!(Level::INFO, "llm_request", model = "gpt-4o");
let _guard = span.enter();
let response = provider.complete(request).await?;§Error Handling
Comprehensive error types:
use cognate_core::{Error, Result};
match provider.complete(request).await {
Ok(response) => println!("{}", response.content()),
Err(Error::RateLimited { retry_after }) => {
println!("Rate limited, retry in {:?}", retry_after);
}
Err(e) => eprintln!("Error: {}", e),
}§Testing
Cognate includes a mock provider for testing:
use cognate_core::MockProvider;
let mock = MockProvider::new()
.queue_response(Response::text("Hello, world!"));
let response = mock.complete(request).await?;§Status
Cognate is in active development (v0.1.0). The API is stable and suitable for production use.
- All 9 crates compile cleanly
- 17 unit tests + 7 doc tests passing
- Compatible with Rust 1.70 and newer
- Production middleware included
- Streaming support verified
§License
Dual-licensed under MIT and Apache-2.0.
Choose whichever license works best for your project.
§Contributing
Contributions are welcome. Please read CONTRIBUTING.md first.
Development setup:
git clone https://github.com/YOUR_ORG/cognate
cd cognate
cargo test --workspace
cargo fmt
cargo clippy --workspace§Support
- Documentation: https://docs.rs/cognate-core
- Examples: See examples/
- Issues: https://github.com/YOUR_ORG/cognate/issues
§Roadmap
- Vector store integrations (Qdrant, Pinecone, Weaviate)
- Additional providers (Groq, Ollama embedded)
- Streaming cost estimation
- Advanced caching layer
- Web dashboard for monitoring
§Cognate
A modular, extensible LLM framework for Rust with multi-provider support, type-safe tools, and RAG capabilities.
§Quick Start
use cognate::prelude::*;
#[tokio::main]
async fn main() {
let client = cognate::providers::OpenAiProvider::new("sk-...".to_string());
// Use the client...
}§Features
providers- OpenAI and Anthropic provider support (default)tools- Type-safe tool calling with derive macros (default)prompts- Compile-time validated prompt templates (default)rag- Retrieval-Augmented Generation supportaxum- Axum web framework integrationfull- All features
Re-exports§
pub use cognate_axum;pub use cognate_tools_derive;pub use cognate_prompts_derive;
Modules§
- anthropic
- Anthropic provider implementation.
- error
- Error types for Cognate.
- middleware
- Tower-inspired middleware system for Cognate providers.
- openai
- OpenAI provider implementation.
- prelude
- Prelude module for convenient imports
- prompts
- Prompt templating
- providers
- Provider implementations
- ratelimit
- Token bucket rate limiting implementation
- retry
- Retry logic with exponential backoff
- sse
- Server-Sent Events (SSE) streaming parser
- tools
- Tool calling and execution
- types
- Re-exports of core types for ergonomic imports.
Structs§
- Anthropic
Provider - Provider client for the Anthropic API (Claude models).
- Document
- A document stored in a vector store.
- Fallback
Provider - A provider that falls back to a secondary provider on retryable errors.
- Memory
Vector Store - A simple in-memory
VectorStore. - Message
- A single message in a conversation.
- Open
AiProvider - Provider client for the OpenAI API.
- RagPipeline
- A high-level RAG pipeline combining an embedding provider with a vector store.
- Request
- A completion request sent to a provider.
- Response
- A completed response from a provider.
- Retry
Config - Configuration for exponential-backoff retry logic.
- Tool
Executor - Executes a request with automatic tool-call dispatch.
Enums§
- Error
- The main error type for all Cognate operations.
Traits§
- Layer
- A factory that wraps a
Providerto produce a newProvider. - Prompt
- A type-safe, compile-time validated prompt template.
- Provider
- Core trait for all LLM providers.
- Tool
- The core trait for all callable tools.
- Vector
Store - A persistent or in-memory store of embedded documents.
Derive Macros§
- Derive
Prompt Macro - Derive the [
cognate_prompts::Prompt] trait for a struct. - Derive
Tool Macro - Derive the [
cognate_tools::Tool] trait for a struct. - Prompt
- Derive the [
cognate_prompts::Prompt] trait for a struct. - Tool
- Derive the [
cognate_tools::Tool] trait for a struct.