cerebro 0.1.7

Blazing-fast, storage-agnostic semantic memory engine for AI Agents — written in pure Rust
# Cerebro Architecture

## The Overall Data Pipeline

```mermaid
graph TD
    A[Raw Document / URL / Event] --> B(Chunker)
    C[OpenAI / LLM App] -->|MCP / Rust API| API(MemoryEngine)
    
    B -->|Chunks| API
    API --> D{Router & Compute Engine}
    
    D <-->|Traits: Embedder| E1[Local Rust ML: Candle/ORT]
    D <-->|Traits: Embedder| E2[Remote APIs: OpenAI/Gemini]
    
    D -->|Semantic Vectors| F1[(PgVectorStore)]
    D -->|Working State| F2[(MemoryVectorStore)]
```

## Module Structure

Cerebro is organized as a single crate with clean module boundaries:

### `models` 
Core data structures: `Document`, `Chunk`, `Node`, `Metadata`. All fully serializable via Serde.

### `traits`
The universal trait system that all backends implement:
* `Chunker` — splits Documents into Chunks
* `Embedder` — converts text into vector embeddings
* `VectorStore` — persists and searches Nodes
* `KVStore` — fast key-value state for Working Memory
* `CerebroError` — unified error hierarchy

### `chunker`
* `RecursiveCharacterChunker` — character-boundary-safe text splitter.
* `HtmlSemanticChunker` — layout-aware semantic splitter.

### `compute`
Embedding providers:
- `MockEmbedder` — deterministic dummy embeddings for offline testing.
- `OpenAIEmbedder` — production OpenAI API integration.
- `LocalEmbedder` — native CPU inference using HuggingFace's `candle`.
- `AnthropicVoyageEmbedder` — Claude-aligned vectors through Voyage AI.

### `storage`
Vector store backends:
- `MemoryVectorStore` — fast in-memory concurrent storage.
- `PgVectorStore` — PostgreSQL + pgvector (Hybrid search support).
- `QdrantVectorStore` — high-volume distributed Qdrant driver.
- `MemoryKVStore` — fast key-value state for working memory.

### `engine`
The core orchestration layer:
* `MemoryEngine` — coordinates the primary ingest/query flows.
* `ConsolidationWorker` — background Tokio task that prunes memory and optimizes semantic density.
* `GraphMemoryLayer` — bridge for Neo4j/Cypher entity persistence.

### `ingest`
Specialized document extractors:
- `PdfIngestor` — extracts text from raw PDF bytes.

### `ffi`
Cross-language interface bridge:
- `python` — PyO3 bindings.
- `wasm` — Wasm-bindgen targets.

---
*Author: Suraj Kumar Nanda* | [Surajkumarnanda.com]https://Surajkumarnanda.com