MnemeFusion

Atomic memory engine for AI applications — one database per entity.

MnemeFusion gives each entity its own self-contained memory database. Five retrieval dimensions (semantic, keyword, temporal, causal, entity profile) are fused into a single ranked result, all in one portable .mfdb file with zero external dependencies.

Think SQLite for AI memory: one file per user, per contact, or per conversation — embedded in your application.

MnemeFusion was designed and directed by George Kanellopoulos, with implementation substantially assisted by Claude Code (Anthropic). The project grew out of an exploration into building a complex, multi-dimensional AI memory engine through human-AI collaboration — the commit history reflects the authentic development process.

Atomic Architecture

MnemeFusion follows an atomic design: each entity (a user, a contact, a conversation) maps to its own .mfdb database file. This 1:1 mapping is the core architectural principle.

Memory retrieval degrades when unrelated conversations share a database — relevant memories get buried by noise from other entities. By scoping each database to a single entity, all five retrieval dimensions stay focused and retrieval stays precise, even as conversation history grows to thousands of turns.

This mirrors how production AI systems work: a personal assistant remembers one user's conversations, a CRM agent tracks one contact's history, a therapy bot maintains one patient's sessions. Each gets its own .mfdb file.

Features

Five Retrieval Pathways: Semantic vector search, BM25 keyword matching, temporal range queries, causal graph traversal, entity profile scoring
Reciprocal Rank Fusion: Fuses all five dimensions into a single ranked result set
Entity Profiles: LLM-powered entity extraction builds structured knowledge graphs from unstructured text
Single File Storage: All data in one portable .mfdb file with ACID transactions (redb)
Intent Classification: Automatic query routing (temporal, causal, entity, factual)
Namespace Isolation: Multi-user memory separation
Rust Core: Memory-safe, high-performance embedded library
Python Bindings: First-class Python API via PyO3
Optional GPU Acceleration: CUDA-accelerated entity extraction via llama-cpp

Benchmarks

Evaluated on two established conversational memory benchmarks (LoCoMo, LongMemEval) using standard protocols. The LongMemEval results validate the atomic architecture — per-entity databases maintain high accuracy where a shared database collapses:

Benchmark	Mode	What it tests	Score
LoCoMo	Standard	Overall accuracy across 10 conversations (1,540 questions)	69.9% ± 0.4%
LongMemEval	Oracle	Pipeline quality — extraction + RAG + scoring (500 questions)	91.4%
LongMemEval	Per-entity	Production pattern — one DB per conversation, ~500 turns each (176 questions)	67.6%
LongMemEval	Shared DB	All conversations in one DB — the anti-pattern (500 questions)	37.2%

Reading the numbers: The oracle result (91.4%) proves the pipeline works when given the right evidence. The per-entity result (67.6%) shows production performance with the recommended atomic architecture. The shared-DB result (37.2%) demonstrates why per-entity scoping matters — accuracy drops by 54 points when unrelated conversations compete for retrieval slots.

See evals/ for full methodology, per-category breakdowns, datasets, and reproduction instructions.

Quick Start

For a complete runnable example, see examples/minimal.py — no GPU or GGUF model required. For an interactive demo, see the Chat Demo (Streamlit).

Python

# CPU-only (development / experimentation)

pip install mnemefusion-cpu sentence-transformers


# GPU with CUDA (production — Linux x86_64, requires NVIDIA driver 525+)

pip install mnemefusion sentence-transformers

import mnemefusion
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-base-en-v1.5")

# Open or create a database (768 = BGE-base embedding dimension)
mem = mnemefusion.Memory("./brain.mfdb", {"embedding_dim": 768})

# Set embedding function for automatic vectorization
mem.set_embedding_fn(lambda text: model.encode(text).tolist())

# Add memories
mem.add("Alice loves hiking in the mountains", metadata={"speaker": "narrator"})
mem.add("Bob started learning piano last month", metadata={"speaker": "narrator"})

# Multi-dimensional query — returns (intent, results, profile_context)
intent, results, profiles = mem.query("What are Alice's hobbies?", limit=10)

print(f"Intent: {intent['intent']} (confidence: {intent['confidence']:.2f})")
for memory_dict, scores_dict in results:
    print(f"  [{scores_dict['fused_score']:.3f}] {memory_dict['content']}")

# Profile context contains entity facts for RAG augmentation
for fact_str in profiles:
    print(f"  Profile: {fact_str}")

With User Identity

# Namespace isolation + first-person pronoun resolution
mem = mnemefusion.Memory("./brain.mfdb", {"embedding_dim": 768}, user="alice")
mem.set_embedding_fn(lambda text: model.encode(text).tolist())

# Memories are namespaced to "alice"
mem.add("I love hiking in the mountains")

# Map "I"/"me"/"my" → "alice" entity profile at query time
mem.set_user_entity("alice")

# "my hobbies" resolves to alice's profile
intent, results, profiles = mem.query("What are my hobbies?")

With LLM Entity Extraction

Entity extraction uses a local GGUF model (no cloud API needed). Download a supported model:

pip install huggingface-hub


# Recommended: Phi-4-mini (3.8B, ~2.3GB, best accuracy)*

# Requires Hugging Face authentication: huggingface-cli login

huggingface-cli download microsoft/Phi-4-mini-instruct-gguf Phi-4-mini-instruct-Q4_K_M.gguf --local-dir models/


# Alternative (no auth required): Qwen2.5-3B (~2GB)

huggingface-cli download Qwen/Qwen2.5-3B-Instruct-GGUF qwen2.5-3b-instruct-q4_k_m.gguf --local-dir models/

*MnemeFusion's extraction prompts have been tested and tuned with Phi-4-mini. Other models may work but with reduced extraction quality.

mem = mnemefusion.Memory("./brain.mfdb", {"embedding_dim": 768})
mem.set_embedding_fn(lambda text: model.encode(text).tolist())
mem.enable_llm_entity_extraction("models/Phi-4-mini-instruct-Q4_K_M.gguf", tier="balanced")

# Entity extraction runs automatically on add()
mem.add("Caroline studies marine biology at Stanford")

# Entity profiles are built incrementally
profile = mem.get_entity_profile("caroline")
# {'name': 'caroline', 'entity_type': 'person', 'facts': {...}, 'summary': '...'}

Requires a GPU with 4GB+ VRAM for reasonable speed. CPU-only works but is ~10x slower. For GPU acceleration, install the GPU package: pip install mnemefusion.

Rust

[dependencies]

mnemefusion-core = "0.1"

use mnemefusion_core::{MemoryEngine, Config};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let engine = MemoryEngine::open("./brain.mfdb", Config::default())?;

    // Add a memory with embedding vector
    let embedding = vec![0.1; 384]; // From your embedding model
    engine.add(
        "Project deadline moved to March 15th".to_string(),
        embedding,
        None, // metadata
        None, // timestamp
        None, // source
        None, // namespace
    )?;

    // Query with multi-dimensional fusion
    let query_embedding = vec![0.1; 384];
    let (_intent, results, _profiles) = engine.query(
        "When is the project deadline?",
        query_embedding,
        10,    // limit
        None,  // namespace
        None,  // filters
    )?;

    for (memory, scores) in &results {
        println!("[{:.3}] {}", scores.fused_score, memory.content);
    }

    engine.close()?;
    Ok(())
}

Architecture

MnemeFusion Architecture

Python API Reference

Core Operations

Method	Description
`Memory(path, config=None, user=None)`	Open or create a database
`add(content, embedding=None, metadata=None, timestamp=None, source=None, namespace=None)`	Add a memory
`query(query_text, query_embedding=None, limit=10, namespace=None, filters=None)`	Multi-dimensional query returning `(intent, results, profiles)`
`search(query_embedding, top_k, namespace=None, filters=None)`	Pure semantic similarity search
`get(memory_id)`	Retrieve memory by ID
`delete(memory_id)`	Delete memory by ID
`close()`	Close database and save indexes

Batch Operations

Method	Description
`add_batch(memories, namespace=None)`	Bulk insert (10x+ faster)
`add_with_dedup(content, embedding, ...)`	Add with duplicate detection
`upsert(key, content, embedding, ...)`	Insert or update by logical key
`delete_batch(memory_ids)`	Bulk delete

Entity & Profile Management

Method	Description
`enable_llm_entity_extraction(model_path, tier="balanced", extraction_passes=1)`	Enable LLM extraction
`set_user_entity(name)`	Map first-person pronouns to user entity
`list_entity_profiles()`	List all entity profiles
`get_entity_profile(name)`	Get profile by name (case-insensitive)
`consolidate_profiles()`	Remove noise from profiles
`summarize_profiles()`	Generate profile summaries

Diagnostics

Method	Description
`last_query_trace()`	Step-by-step trace of the most recent `query()` call (requires `enable_trace=True` in config)

Metadata Filtering

# Filter by metadata key-value pairs (AND logic)
filters = [
    {"metadata_key": "speaker", "metadata_value": "Alice"},
    {"metadata_key": "session", "metadata_value": "2024-01-15"},
]
intent, results, profiles = mem.query("hiking plans", filters=filters)

Namespace System

# Add to specific namespace
mem.add("secret note", namespace="alice")

# Query within namespace
intent, results, profiles = mem.query("notes", namespace="alice")

# Or use the user= constructor shortcut
mem = mnemefusion.Memory("brain.mfdb", user="alice")
# All add/query calls default to the "alice" namespace

Configuration

config = {
    "embedding_dim": 384,              # Must match your embedding model
    "entity_extraction_enabled": True,  # Enable built-in entity extraction
    "llm_model": "path/to/model.gguf", # Auto-enables LLM extraction
    "extraction_passes": 3,             # Multi-pass diverse extraction
    "async_extraction_threshold": 500,  # Defer extraction for large docs
    "enable_trace": True,               # Record step-by-step query traces
}
mem = mnemefusion.Memory("brain.mfdb", config=config)

use mnemefusion_core::Config;

let config = Config::new()
    .with_embedding_dim(384)
    .with_entity_extraction(true);

let engine = MemoryEngine::open("./brain.mfdb", config)?;

Error Handling

All errors surface as standard Python exceptions — no custom exception types.

Exception	When	Recoverable
`IOError`	Database open/close fails, disk full, file not found, concurrent open of same file	Usually yes (fix path, free disk, close other instance)
`ValueError`	Wrong embedding dimension, invalid memory ID, bad config	Yes (fix input)
`RuntimeError`	Calling methods after `close()`	Reopen with a new `Memory()` instance

import mnemefusion

mem = mnemefusion.Memory("brain.mfdb")

# After close(), all operations raise RuntimeError
mem.close()
try:
    mem.add("text")
except RuntimeError as e:
    print(e)  # "Database is closed"

# Each .mfdb file supports one open instance at a time
mem1 = mnemefusion.Memory("brain.mfdb")
try:
    mem2 = mnemefusion.Memory("brain.mfdb")  # Same file
except IOError as e:
    print(e)  # File lock error

Building from Source

Prerequisites

Rust 1.75+
Python 3.9+ (for Python bindings)

Build

git clone https://github.com/gkanellopoulos/mnemefusion.git

cd mnemefusion


# Build core library

cargo build --release


# Run tests (520+ tests)

cargo test -p mnemefusion-core --lib


# Build Python bindings

cd mnemefusion-python

maturin develop --release


# With CUDA GPU support (requires CUDA toolkit)

maturin develop --release --features entity-extraction-cuda

Testing

# All library unit tests

cargo test -p mnemefusion-core --lib


# With output

cargo test -p mnemefusion-core --lib -- --nocapture


# Run specific test module

cargo test -p mnemefusion-core profile

Language Support

MnemeFusion's core search works with any language via multilingual embeddings. Entity extraction and intent classification are currently English-optimized.

Feature	Language Support
Vector search	All languages (use multilingual embeddings)
BM25 keyword search	English-optimized (Porter stemming)
Temporal indexing	All languages
Causal links	All languages
Entity extraction	English (optional, can be disabled)
Metadata filtering	All languages

For non-English use, disable entity extraction:

config = {"entity_extraction_enabled": False, "embedding_dim": 768}
mem = mnemefusion.Memory("brain.mfdb", config=config)

API Stability

MnemeFusion is pre-1.0. The following APIs are considered stable and will not change without a version bump:

API	Stable Since
`Memory(path, config, user)`	0.1.0
`add(content, embedding, metadata, timestamp)`	0.1.0
`query(query_text, query_embedding, limit, namespace, filters)`	0.1.0
`search(query_embedding, top_k, namespace, filters)`	0.1.0
`get(memory_id)` / `delete(memory_id)`	0.1.0
`close()`	0.1.0
`add_batch(memories, namespace)`	0.1.0
`set_embedding_fn(fn)`	0.1.0

Everything else (entity extraction API, profile management, config keys) may change between minor versions. The .mfdb file format includes embedded version metadata — format-breaking changes will be documented in the CHANGELOG.

Performance Characteristics

Operation	Complexity	Typical Latency
`add()`	O(log n) HNSW insertion + O(n) BM25 update	<5ms without entity extraction
`add()` with LLM extraction	Same + LLM inference	~3-9s depending on GPU
`query()`	O(k·log n) across all dimensions + RRF fusion	~50ms at 5K memories, ~200ms at 50K
`search()`	O(k·log n) vector-only	<10ms
`get()` / `delete()`	O(1) key lookup	<1ms
Storage overhead	~1.5-2x raw content size (384-dim embeddings)	—

Tested with up to 10K memories in a single .mfdb file. MnemeFusion is designed for per-entity databases — each user, contact, or conversation gets its own .mfdb file, typically containing 1K-10K memories. This atomic pattern keeps retrieval precise and scales horizontally.

Contributing

Contributions are welcome! See CONTRIBUTING.md for build instructions, test commands, and PR guidelines.

License

Licensed under either of:

at your option.

Acknowledgments

Built on excellent open-source libraries:

redb — Embedded key-value store
usearch — HNSW vector search
petgraph — Graph algorithms
llama-cpp-2 — Rust bindings for llama.cpp
PyO3 — Rust-Python interop
Claude Code — AI-assisted development

"SQLite for AI memory" — One entity, one file. Five dimensions. Zero complexity.

mnemefusion-core 0.1.4