MnemeFusion
Atomic memory engine for AI applications — one database per entity.
MnemeFusion gives each entity its own self-contained memory database. Five retrieval dimensions (semantic, keyword, temporal, causal, entity profile) are fused into a single ranked result, all in one portable .mfdb file with zero external dependencies.
Think SQLite for AI memory: one file per user, per contact, or per conversation — embedded in your application.
MnemeFusion was designed and directed by George Kanellopoulos, with implementation substantially assisted by Claude Code (Anthropic). The project grew out of an exploration into building a complex, multi-dimensional AI memory engine through human-AI collaboration — the commit history reflects the authentic development process.
Atomic Architecture
MnemeFusion follows an atomic design: each entity (a user, a contact, a conversation) maps to its own .mfdb database file. This 1:1 mapping is the core architectural principle.
Memory retrieval degrades when unrelated conversations share a database — relevant memories get buried by noise from other entities. By scoping each database to a single entity, all five retrieval dimensions stay focused and retrieval stays precise, even as conversation history grows to thousands of turns.
This mirrors how production AI systems work: a personal assistant remembers one user's conversations, a CRM agent tracks one contact's history, a therapy bot maintains one patient's sessions. Each gets its own .mfdb file.
Features
- Five Retrieval Pathways: Semantic vector search, BM25 keyword matching, temporal range queries, causal graph traversal, entity profile scoring
- Reciprocal Rank Fusion: Fuses all five dimensions into a single ranked result set
- Entity Profiles: LLM-powered entity extraction builds structured knowledge graphs from unstructured text
- Single File Storage: All data in one portable
.mfdbfile with ACID transactions (redb) - Intent Classification: Automatic query routing (temporal, causal, entity, factual)
- Namespace Isolation: Multi-user memory separation
- Rust Core: Memory-safe, high-performance embedded library
- Python Bindings: First-class Python API via PyO3
- Optional GPU Acceleration: CUDA-accelerated entity extraction via llama-cpp
Benchmarks
Evaluated on two established conversational memory benchmarks (LoCoMo, LongMemEval) using standard protocols. The LongMemEval results validate the atomic architecture — per-entity databases maintain high accuracy where a shared database collapses:
| Benchmark | Mode | What it tests | Score |
|---|---|---|---|
| LoCoMo | Standard | Overall accuracy across 10 conversations (1,540 questions) | 69.9% ± 0.4% |
| LongMemEval | Oracle | Pipeline quality — extraction + RAG + scoring (500 questions) | 91.4% |
| LongMemEval | Per-entity | Production pattern — one DB per conversation, ~500 turns each (176 questions) | 67.6% |
| LongMemEval | Shared DB | All conversations in one DB — the anti-pattern (500 questions) | 37.2% |
Reading the numbers: The oracle result (91.4%) proves the pipeline works when given the right evidence. The per-entity result (67.6%) shows production performance with the recommended atomic architecture. The shared-DB result (37.2%) demonstrates why per-entity scoping matters — accuracy drops by 54 points when unrelated conversations compete for retrieval slots.
See evals/ for full methodology, per-category breakdowns, datasets, and reproduction instructions.
Quick Start
For a complete runnable example, see examples/minimal.py — no GPU or GGUF model required. For an interactive demo, see the Chat Demo (Streamlit).
Python
# CPU-only (development / experimentation)
# GPU with CUDA (production — Linux x86_64, requires NVIDIA driver 525+)
=
# Open or create a database (768 = BGE-base embedding dimension)
=
# Set embedding function for automatic vectorization
# Add memories
# Multi-dimensional query — returns (intent, results, profile_context)
, , =
# Profile context contains entity facts for RAG augmentation
With User Identity
# Namespace isolation + first-person pronoun resolution
=
# Memories are namespaced to "alice"
# Map "I"/"me"/"my" → "alice" entity profile at query time
# "my hobbies" resolves to alice's profile
, , =
With LLM Entity Extraction
Entity extraction uses a local GGUF model (no cloud API needed). Download a supported model:
# Recommended: Phi-4-mini (3.8B, ~2.3GB, best accuracy)*
# Requires Hugging Face authentication: huggingface-cli login
# Alternative (no auth required): Qwen2.5-3B (~2GB)
*MnemeFusion's extraction prompts have been tested and tuned with Phi-4-mini. Other models may work but with reduced extraction quality.
=
# Entity extraction runs automatically on add()
# Entity profiles are built incrementally
=
# {'name': 'caroline', 'entity_type': 'person', 'facts': {...}, 'summary': '...'}
Requires a GPU with 4GB+ VRAM for reasonable speed. CPU-only works but is ~10x slower. For GPU acceleration, install the GPU package: pip install mnemefusion.
Rust
[]
= "0.1"
use ;
Architecture
Python API Reference
Core Operations
| Method | Description |
|---|---|
Memory(path, config=None, user=None) |
Open or create a database |
add(content, embedding=None, metadata=None, timestamp=None, source=None, namespace=None) |
Add a memory |
query(query_text, query_embedding=None, limit=10, namespace=None, filters=None) |
Multi-dimensional query returning (intent, results, profiles) |
search(query_embedding, top_k, namespace=None, filters=None) |
Pure semantic similarity search |
get(memory_id) |
Retrieve memory by ID |
delete(memory_id) |
Delete memory by ID |
close() |
Close database and save indexes |
Batch Operations
| Method | Description |
|---|---|
add_batch(memories, namespace=None) |
Bulk insert (10x+ faster) |
add_with_dedup(content, embedding, ...) |
Add with duplicate detection |
upsert(key, content, embedding, ...) |
Insert or update by logical key |
delete_batch(memory_ids) |
Bulk delete |
Entity & Profile Management
| Method | Description |
|---|---|
enable_llm_entity_extraction(model_path, tier="balanced", extraction_passes=1) |
Enable LLM extraction |
set_user_entity(name) |
Map first-person pronouns to user entity |
list_entity_profiles() |
List all entity profiles |
get_entity_profile(name) |
Get profile by name (case-insensitive) |
consolidate_profiles() |
Remove noise from profiles |
summarize_profiles() |
Generate profile summaries |
Diagnostics
| Method | Description |
|---|---|
last_query_trace() |
Step-by-step trace of the most recent query() call (requires enable_trace=True in config) |
Metadata Filtering
# Filter by metadata key-value pairs (AND logic)
=
, , =
Namespace System
# Add to specific namespace
# Query within namespace
, , =
# Or use the user= constructor shortcut
=
# All add/query calls default to the "alice" namespace
Configuration
=
=
use Config;
let config = new
.with_embedding_dim
.with_entity_extraction;
let engine = open?;
Error Handling
All errors surface as standard Python exceptions — no custom exception types.
| Exception | When | Recoverable |
|---|---|---|
IOError |
Database open/close fails, disk full, file not found, concurrent open of same file | Usually yes (fix path, free disk, close other instance) |
ValueError |
Wrong embedding dimension, invalid memory ID, bad config | Yes (fix input) |
RuntimeError |
Calling methods after close() |
Reopen with a new Memory() instance |
=
# After close(), all operations raise RuntimeError
# "Database is closed"
# Each .mfdb file supports one open instance at a time
=
= # Same file
# File lock error
Building from Source
Prerequisites
- Rust 1.75+
- Python 3.9+ (for Python bindings)
Build
# Build core library
# Run tests (520+ tests)
# Build Python bindings
# With CUDA GPU support (requires CUDA toolkit)
Testing
# All library unit tests
# With output
# Run specific test module
Language Support
MnemeFusion's core search works with any language via multilingual embeddings. Entity extraction and intent classification are currently English-optimized.
| Feature | Language Support |
|---|---|
| Vector search | All languages (use multilingual embeddings) |
| BM25 keyword search | English-optimized (Porter stemming) |
| Temporal indexing | All languages |
| Causal links | All languages |
| Entity extraction | English (optional, can be disabled) |
| Metadata filtering | All languages |
For non-English use, disable entity extraction:
=
=
API Stability
MnemeFusion is pre-1.0. The following APIs are considered stable and will not change without a version bump:
| API | Stable Since |
|---|---|
Memory(path, config, user) |
0.1.0 |
add(content, embedding, metadata, timestamp) |
0.1.0 |
query(query_text, query_embedding, limit, namespace, filters) |
0.1.0 |
search(query_embedding, top_k, namespace, filters) |
0.1.0 |
get(memory_id) / delete(memory_id) |
0.1.0 |
close() |
0.1.0 |
add_batch(memories, namespace) |
0.1.0 |
set_embedding_fn(fn) |
0.1.0 |
Everything else (entity extraction API, profile management, config keys) may change between minor versions. The .mfdb file format includes embedded version metadata — format-breaking changes will be documented in the CHANGELOG.
Performance Characteristics
| Operation | Complexity | Typical Latency |
|---|---|---|
add() |
O(log n) HNSW insertion + O(n) BM25 update | <5ms without entity extraction |
add() with LLM extraction |
Same + LLM inference | ~3-9s depending on GPU |
query() |
O(k·log n) across all dimensions + RRF fusion | ~50ms at 5K memories, ~200ms at 50K |
search() |
O(k·log n) vector-only | <10ms |
get() / delete() |
O(1) key lookup | <1ms |
| Storage overhead | ~1.5-2x raw content size (384-dim embeddings) | — |
Tested with up to 10K memories in a single .mfdb file. MnemeFusion is designed for per-entity databases — each user, contact, or conversation gets its own .mfdb file, typically containing 1K-10K memories. This atomic pattern keeps retrieval precise and scales horizontally.
Contributing
Contributions are welcome! See CONTRIBUTING.md for build instructions, test commands, and PR guidelines.
License
Licensed under either of:
at your option.
Acknowledgments
Built on excellent open-source libraries:
- redb — Embedded key-value store
- usearch — HNSW vector search
- petgraph — Graph algorithms
- llama-cpp-2 — Rust bindings for llama.cpp
- PyO3 — Rust-Python interop
- Claude Code — AI-assisted development
"SQLite for AI memory" — One entity, one file. Five dimensions. Zero complexity.