Cerebro 🧠
Cerebro is a blazing-fast, universal, and storage-agnostic Memory Layer + Multi-Agent Swarm Engine for AI Agents and LLM Applications, written in pure Rust.
Why Cerebro?
While typical vector database wrappers just push raw vectors into a database, Cerebro functions as the Hippocampus for autonomous AI. It natively understands Agentic Memory structures:
- Short-Term Episodic Memory (Conversations)
- Working Memory (KV State)
- Long-Term Semantic Memory (Vector Search with Temporal Decay)
And now with SwarmForge, Cerebro provides a built-in multi-agent orchestration engine — enabling teams of specialized AI agents to collaborate through its three-tier memory system.
Key Features
- 🚀 Minimal Overhead: Powered by a lean async pipeline in Rust, designed for high-scale agentic workloads.
- 🔌 Universal Storage: Trait-based backends — swap between
MemoryVectorStore,PgVectorStore, orQdrant. - 🧠 Pluggable Compute: Route embeddings through local models (
Candle) or remote APIs (OpenAI,Anthropic). - 🔄 Active Consolidation: Background "Sleep Cycle" worker for autonomous memory pruning and semantic organization.
- 🔍 Hybrid Search: Native RRF (Reciprocal Rank Fusion) combining keyword and vector retrieval for highest precision.
- 🐝 SwarmForge: Multi-agent swarming engine with sequential, parallel, and hierarchical orchestration patterns.
- 🤖 Universal LLM: Native support for Ollama, OpenAI, Gemini, Anthropic, and any OpenAI-compatible API.
- 🌐 MCP Ready: Native Model Context Protocol server (
cerebro-mcp) for AI desktop apps. - 🦀 Multi-Language: Native Python (
PyO3) and WASM bindings. - 📄 Complex Ingestion: PDF extraction and HTML-aware semantic chunking.
SwarmForge — Multi-Agent Swarming
Model swarming is an AI design pattern where multiple specialized agents collaborate — each with its own expertise, system prompt, and LLM — to solve tasks that a single model can't handle well alone. Think of it as a team of AI specialists instead of one generalist.
Cerebro's SwarmForge is unique because agents don't just pass messages — they share a three-tier memory system:
| Memory Tier | How Agents Use It |
|---|---|
| Working Memory (KVStore) | Fast state — current task, step count, handoff targets |
| Episodic Memory (Conversations) | Full message history per agent within a run |
| Semantic Memory (VectorStore) | Agents commit outputs as Documents → other agents recall via vector search |
This means the Security Agent's findings are semantically searchable by the Performance Agent that runs after it. Knowledge compounds across the swarm.
Orchestration Patterns
Sequential Pipeline — Each agent's output feeds the next:
[Security Agent] → [Performance Agent] → [Style Agent] → Final Report
Parallel Fan-Out / Fan-In — Multiple agents analyze simultaneously, a merger synthesizes:
┌→ [Security Agent] ──┐
Input ─────┼→ [Performance Agent] ┼→ [Synthesizer] → Output
└→ [Style Agent] ──┘
Hierarchical Supervisor — A supervisor decomposes, delegates, and synthesizes:
[Supervisor Agent]
/ | \
[Backend] [Frontend] [Testing]
Supported LLM Providers
Each agent in the swarm can use a different LLM provider:
| Provider | Config | Covers |
|---|---|---|
| Ollama | LlmProvider::Ollama |
Any local model — Llama 3, Mistral, Phi, Gemma |
| OpenAI | LlmProvider::OpenAI |
GPT-4o, GPT-4, o3, o4-mini |
| Gemini | LlmProvider::Gemini |
Gemini Pro, Flash, Ultra |
| Anthropic | LlmProvider::Anthropic |
Claude 4, Sonnet, Haiku, Opus |
| Any OpenAI-compatible | LlmProvider::OpenAICompatible |
Groq, Together, Mistral, DeepSeek, LM Studio, vLLM |
Getting Started
[]
= "1.1.1"
Basic Example
use *;
use Arc;
async
Swarm Example — Multi-Agent Code Review
use *;
use *;
use Arc;
async
Documentation
For in-depth guides and technical details:
- USER_GUIDE.md: Implementation examples and usage guides.
- ARCHITECTURE.md: Structural layout and data pipelines.
- CHANGELOG.md: Release history and version updates.
Author: Suraj Kumar Nanda | www.surajkumarnanda.com