cerebro-1.1.1 has been yanked.

Cerebro 🧠

Cerebro is a blazing-fast, universal, and storage-agnostic Memory Layer + Multi-Agent Swarm Engine for AI Agents and LLM Applications, written in pure Rust.

Why Cerebro?

While typical vector database wrappers just push raw vectors into a database, Cerebro functions as the Hippocampus for autonomous AI. It natively understands Agentic Memory structures:

Short-Term Episodic Memory (Conversations)
Working Memory (KV State)
Long-Term Semantic Memory (Vector Search with Temporal Decay)

And now with SwarmForge, Cerebro provides a built-in multi-agent orchestration engine — enabling teams of specialized AI agents to collaborate through its three-tier memory system.

Key Features

🚀 Minimal Overhead: Powered by a lean async pipeline in Rust, designed for high-scale agentic workloads.
🔌 Universal Storage: Trait-based backends — swap between MemoryVectorStore, PgVectorStore, or Qdrant.
🧠 Pluggable Compute: Route embeddings through local models (Candle) or remote APIs (OpenAI, Anthropic).
🔄 Active Consolidation: Background "Sleep Cycle" worker for autonomous memory pruning and semantic organization.
🔍 Hybrid Search: Native RRF (Reciprocal Rank Fusion) combining keyword and vector retrieval for highest precision.
🐝 SwarmForge: Multi-agent swarming engine with sequential, parallel, and hierarchical orchestration patterns.
🤖 Universal LLM: Native support for Ollama, OpenAI, Gemini, Anthropic, and any OpenAI-compatible API.
🌐 MCP Ready: Native Model Context Protocol server (cerebro-mcp) for AI desktop apps.
🦀 Multi-Language: Native Python (PyO3) and WASM bindings.
📄 Complex Ingestion: PDF extraction and HTML-aware semantic chunking.

SwarmForge — Multi-Agent Swarming

Model swarming is an AI design pattern where multiple specialized agents collaborate — each with its own expertise, system prompt, and LLM — to solve tasks that a single model can't handle well alone. Think of it as a team of AI specialists instead of one generalist.

Cerebro's SwarmForge is unique because agents don't just pass messages — they share a three-tier memory system:

Memory Tier	How Agents Use It
Working Memory (KVStore)	Fast state — current task, step count, handoff targets
Episodic Memory (Conversations)	Full message history per agent within a run
Semantic Memory (VectorStore)	Agents commit outputs as Documents → other agents recall via vector search

This means the Security Agent's findings are semantically searchable by the Performance Agent that runs after it. Knowledge compounds across the swarm.

Orchestration Patterns

Sequential Pipeline — Each agent's output feeds the next:

[Security Agent] → [Performance Agent] → [Style Agent] → Final Report

Parallel Fan-Out / Fan-In — Multiple agents analyze simultaneously, a merger synthesizes:

             ┌→ [Security Agent]  ──┐
  Input ─────┼→ [Performance Agent] ┼→ [Synthesizer] → Output
             └→ [Style Agent]     ──┘

Hierarchical Supervisor — A supervisor decomposes, delegates, and synthesizes:

          [Supervisor Agent]
         /        |         \
  [Backend]   [Frontend]   [Testing]

Supported LLM Providers

Each agent in the swarm can use a different LLM provider:

Provider	Config	Covers
Ollama	`LlmProvider::Ollama`	Any local model — Llama 3, Mistral, Phi, Gemma
OpenAI	`LlmProvider::OpenAI`	GPT-4o, GPT-4, o3, o4-mini
Gemini	`LlmProvider::Gemini`	Gemini Pro, Flash, Ultra
Anthropic	`LlmProvider::Anthropic`	Claude 4, Sonnet, Haiku, Opus
Any OpenAI-compatible	`LlmProvider::OpenAICompatible`	Groq, Together, Mistral, DeepSeek, LM Studio, vLLM

Getting Started

[dependencies]
cerebro = "1.1.1"

Basic Example

use cerebro::prelude::*;
use std::sync::Arc;

#[tokio::main]
async fn main() {
    let chunker = Arc::new(RecursiveCharacterChunker::new(512, 50));
    let embedder = Arc::new(MockEmbedder::new(1536));
    let store = Arc::new(MemoryVectorStore::new());

    let engine = MemoryEngine::new(chunker, embedder, store);

    let doc = Document::new("The Rust programming language ensures memory safety.");
    engine.ingest_document(doc).await.unwrap();

    let memories = engine.query("What language is safe?", 5).await.unwrap();
    
    for (node, score) in memories {
        println!("Match: {} (Score: {})", node.chunk.text, score);
    }
}

Swarm Example — Multi-Agent Code Review

use cerebro::prelude::*;
use cerebro::swarm::prelude::*;
use std::sync::Arc;

#[tokio::main]
async fn main() {
    let engine = Arc::new(MemoryEngine::new(
        Arc::new(RecursiveCharacterChunker::new(512, 50)),
        Arc::new(MockEmbedder::new(8)),
        Arc::new(MemoryVectorStore::new()),
    ));
    let memory = Arc::new(CerebroMemoryBus::new(engine, Arc::new(MemoryKVStore::new())));

    let mut orch = SwarmOrchestrator::new(memory);

    orch.register_agent(AgentConfig {
        id: "security".into(),
        name: "Security Reviewer".into(),
        system_prompt: "Analyze code for security vulnerabilities.".into(),
        model: LlmProvider::Ollama { model: "llama3".into(), base_url: "http://localhost:11434".into() },
        tools: vec![], handoff_targets: vec![], max_steps: 10,
    });

    orch.register_agent(AgentConfig {
        id: "perf".into(),
        name: "Performance Reviewer".into(),
        system_prompt: "Analyze code for performance issues.".into(),
        model: LlmProvider::Anthropic { model: "claude-sonnet-4-20250514".into(), api_key: "sk-...".into(), max_tokens: 4096 },
        tools: vec![], handoff_targets: vec![], max_steps: 10,
    });

    let result = orch.execute(
        SwarmPattern::Sequential { agent_order: vec!["security".into(), "perf".into()] },
        "Review this function: fn process(input: &str) { unsafe { ... } }",
    ).await.unwrap();

    println!("{}", result.final_output);
}

Documentation

For in-depth guides and technical details:

USER_GUIDE.md: Implementation examples and usage guides.
ARCHITECTURE.md: Structural layout and data pipelines.
CHANGELOG.md: Release history and version updates.

Author: Suraj Kumar Nanda | www.surajkumarnanda.com

cerebro 1.1.1