# MnemeFusion
**Atomic memory engine for AI applications — one database per entity.**
MnemeFusion gives each entity its own self-contained memory database. Five retrieval dimensions (semantic, keyword, temporal, causal, entity profile) are fused into a single ranked result, all in one portable `.mfdb` file with zero external dependencies.
Think SQLite for AI memory: one file per user, per contact, or per conversation — embedded in your application.
[](https://github.com/gkanellopoulos/mnemefusion/actions/workflows/ci.yml)
[](https://crates.io/crates/mnemefusion-core)
[](https://pypi.org/project/mnemefusion-cpu/)
[](https://pypi.org/project/mnemefusion/)
[](https://docs.rs/mnemefusion-core)
[](LICENSE-MIT)
*MnemeFusion was designed and directed by [George Kanellopoulos](https://github.com/gkanellopoulos), with implementation substantially assisted by [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (Anthropic). The project grew out of an exploration into building a complex, multi-dimensional AI memory engine through human-AI collaboration — the commit history reflects the authentic development process.*
## Atomic Architecture
MnemeFusion follows an **atomic design**: each entity (a user, a contact, a conversation) maps to its own `.mfdb` database file. This 1:1 mapping is the core architectural principle.
Memory retrieval degrades when unrelated conversations share a database — relevant memories get buried by noise from other entities. By scoping each database to a single entity, all five retrieval dimensions stay focused and retrieval stays precise, even as conversation history grows to thousands of turns.
This mirrors how production AI systems work: a personal assistant remembers *one user's* conversations, a CRM agent tracks *one contact's* history, a therapy bot maintains *one patient's* sessions. Each gets its own `.mfdb` file.
## Features
- **Five Retrieval Pathways**: Semantic vector search, BM25 keyword matching, temporal range queries, causal graph traversal, entity profile scoring
- **Reciprocal Rank Fusion**: Fuses all five dimensions into a single ranked result set
- **Entity Profiles**: LLM-powered entity extraction builds structured knowledge graphs from unstructured text
- **Single File Storage**: All data in one portable `.mfdb` file with ACID transactions (redb)
- **Intent Classification**: Automatic query routing (temporal, causal, entity, factual)
- **Namespace Isolation**: Multi-user memory separation
- **Rust Core**: Memory-safe, high-performance embedded library
- **Python Bindings**: First-class Python API via PyO3
- **Optional GPU Acceleration**: CUDA-accelerated entity extraction via llama-cpp
## Benchmarks
Evaluated on two established conversational memory benchmarks ([LoCoMo](evals/locomo/), [LongMemEval](evals/longmemeval/)) using standard protocols. The LongMemEval results validate the atomic architecture — per-entity databases maintain high accuracy where a shared database collapses:
| [LoCoMo](evals/locomo/) | Standard | Overall accuracy across 10 conversations (1,540 questions) | **69.9% ± 0.4%** |
| [LongMemEval](evals/longmemeval/) | Oracle | Pipeline quality — extraction + RAG + scoring (500 questions) | **91.4%** |
| [LongMemEval](evals/longmemeval/) | Per-entity | Production pattern — one DB per conversation, ~500 turns each (176 questions) | **67.6%** |
| [LongMemEval](evals/longmemeval/) | Shared DB | All conversations in one DB — the anti-pattern (500 questions) | 37.2% |
**Reading the numbers:** The oracle result (91.4%) proves the pipeline works when given the right evidence. The per-entity result (67.6%) shows production performance with the recommended atomic architecture. The shared-DB result (37.2%) demonstrates why per-entity scoping matters — accuracy drops by 54 points when unrelated conversations compete for retrieval slots.
See [evals/](evals/) for full methodology, per-category breakdowns, datasets, and reproduction instructions.
## Quick Start
For a complete runnable example, see [`examples/minimal.py`](examples/minimal.py) — no GPU or GGUF model required. For an interactive demo, see the [Chat Demo](apps/) (Streamlit).
### Python
```bash
# CPU-only (development / experimentation)
pip install mnemefusion-cpu sentence-transformers
# GPU with CUDA (production — Linux x86_64, requires NVIDIA driver 525+)
pip install mnemefusion sentence-transformers
```
```python
import mnemefusion
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-base-en-v1.5")
# Open or create a database (768 = BGE-base embedding dimension)
mem = mnemefusion.Memory("./brain.mfdb", {"embedding_dim": 768})
# Set embedding function for automatic vectorization
mem.set_embedding_fn(lambda text: model.encode(text).tolist())
# Add memories
mem.add("Alice loves hiking in the mountains", metadata={"speaker": "narrator"})
mem.add("Bob started learning piano last month", metadata={"speaker": "narrator"})
# Multi-dimensional query — returns (intent, results, profile_context)
intent, results, profiles = mem.query("What are Alice's hobbies?", limit=10)
print(f"Intent: {intent['intent']} (confidence: {intent['confidence']:.2f})")
for memory_dict, scores_dict in results:
print(f" [{scores_dict['fused_score']:.3f}] {memory_dict['content']}")
# Profile context contains entity facts for RAG augmentation
for fact_str in profiles:
print(f" Profile: {fact_str}")
```
### With User Identity
```python
# Namespace isolation + first-person pronoun resolution
mem = mnemefusion.Memory("./brain.mfdb", {"embedding_dim": 768}, user="alice")
mem.set_embedding_fn(lambda text: model.encode(text).tolist())
# Memories are namespaced to "alice"
mem.add("I love hiking in the mountains")
# Map "I"/"me"/"my" → "alice" entity profile at query time
mem.set_user_entity("alice")
# "my hobbies" resolves to alice's profile
intent, results, profiles = mem.query("What are my hobbies?")
```
### With LLM Entity Extraction
Entity extraction uses a local GGUF model (no cloud API needed). Download a supported model:
```bash
pip install huggingface-hub
# Recommended: Phi-4-mini (3.8B, ~2.3GB, best accuracy)*
# Requires Hugging Face authentication: huggingface-cli login
huggingface-cli download microsoft/Phi-4-mini-instruct-gguf Phi-4-mini-instruct-Q4_K_M.gguf --local-dir models/
# Alternative (no auth required): Qwen2.5-3B (~2GB)
huggingface-cli download Qwen/Qwen2.5-3B-Instruct-GGUF qwen2.5-3b-instruct-q4_k_m.gguf --local-dir models/
```
*\*MnemeFusion's extraction prompts have been tested and tuned with Phi-4-mini. Other models may work but with reduced extraction quality.*
```python
mem = mnemefusion.Memory("./brain.mfdb", {"embedding_dim": 768})
mem.set_embedding_fn(lambda text: model.encode(text).tolist())
mem.enable_llm_entity_extraction("models/Phi-4-mini-instruct-Q4_K_M.gguf", tier="balanced")
# Entity extraction runs automatically on add()
mem.add("Caroline studies marine biology at Stanford")
# Entity profiles are built incrementally
profile = mem.get_entity_profile("caroline")
# {'name': 'caroline', 'entity_type': 'person', 'facts': {...}, 'summary': '...'}
```
Requires a GPU with 4GB+ VRAM for reasonable speed. CPU-only works but is ~10x slower. For GPU acceleration, install the GPU package: `pip install mnemefusion`.
### Rust
```toml
[dependencies]
mnemefusion-core = "0.1"
```
```rust
use mnemefusion_core::{MemoryEngine, Config};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let engine = MemoryEngine::open("./brain.mfdb", Config::default())?;
// Add a memory with embedding vector
let embedding = vec![0.1; 384]; // From your embedding model
engine.add(
"Project deadline moved to March 15th".to_string(),
embedding,
None, // metadata
None, // timestamp
None, // source
None, // namespace
)?;
// Query with multi-dimensional fusion
let query_embedding = vec![0.1; 384];
let (_intent, results, _profiles) = engine.query(
"When is the project deadline?",
query_embedding,
10, // limit
None, // namespace
None, // filters
)?;
for (memory, scores) in &results {
println!("[{:.3}] {}", scores.fused_score, memory.content);
}
engine.close()?;
Ok(())
}
```
## Architecture

## Python API Reference
### Core Operations
| `Memory(path, config=None, user=None)` | Open or create a database |
| `add(content, embedding=None, metadata=None, timestamp=None, source=None, namespace=None)` | Add a memory |
| `query(query_text, query_embedding=None, limit=10, namespace=None, filters=None)` | Multi-dimensional query returning `(intent, results, profiles)` |
| `search(query_embedding, top_k, namespace=None, filters=None)` | Pure semantic similarity search |
| `get(memory_id)` | Retrieve memory by ID |
| `delete(memory_id)` | Delete memory by ID |
| `close()` | Close database and save indexes |
### Batch Operations
| `add_batch(memories, namespace=None)` | Bulk insert (10x+ faster) |
| `add_with_dedup(content, embedding, ...)` | Add with duplicate detection |
| `upsert(key, content, embedding, ...)` | Insert or update by logical key |
| `delete_batch(memory_ids)` | Bulk delete |
### Entity & Profile Management
| `enable_llm_entity_extraction(model_path, tier="balanced", extraction_passes=1)` | Enable LLM extraction |
| `set_user_entity(name)` | Map first-person pronouns to user entity |
| `list_entity_profiles()` | List all entity profiles |
| `get_entity_profile(name)` | Get profile by name (case-insensitive) |
| `consolidate_profiles()` | Remove noise from profiles |
| `summarize_profiles()` | Generate profile summaries |
### Diagnostics
| `last_query_trace()` | Step-by-step trace of the most recent `query()` call (requires `enable_trace=True` in config) |
### Metadata Filtering
```python
# Filter by metadata key-value pairs (AND logic)
filters = [
{"metadata_key": "speaker", "metadata_value": "Alice"},
{"metadata_key": "session", "metadata_value": "2024-01-15"},
]
intent, results, profiles = mem.query("hiking plans", filters=filters)
```
### Namespace System
```python
# Add to specific namespace
mem.add("secret note", namespace="alice")
# Query within namespace
intent, results, profiles = mem.query("notes", namespace="alice")
# Or use the user= constructor shortcut
mem = mnemefusion.Memory("brain.mfdb", user="alice")
# All add/query calls default to the "alice" namespace
```
## Configuration
```python
config = {
"embedding_dim": 384, # Must match your embedding model
"entity_extraction_enabled": True, # Enable built-in entity extraction
"llm_model": "path/to/model.gguf", # Auto-enables LLM extraction
"extraction_passes": 3, # Multi-pass diverse extraction
"async_extraction_threshold": 500, # Defer extraction for large docs
"enable_trace": True, # Record step-by-step query traces
}
mem = mnemefusion.Memory("brain.mfdb", config=config)
```
```rust
use mnemefusion_core::Config;
let config = Config::new()
.with_embedding_dim(384)
.with_entity_extraction(true);
let engine = MemoryEngine::open("./brain.mfdb", config)?;
```
## Error Handling
All errors surface as standard Python exceptions — no custom exception types.
| `IOError` | Database open/close fails, disk full, file not found, concurrent open of same file | Usually yes (fix path, free disk, close other instance) |
| `ValueError` | Wrong embedding dimension, invalid memory ID, bad config | Yes (fix input) |
| `RuntimeError` | Calling methods after `close()` | Reopen with a new `Memory()` instance |
```python
import mnemefusion
mem = mnemefusion.Memory("brain.mfdb")
# After close(), all operations raise RuntimeError
mem.close()
try:
mem.add("text")
except RuntimeError as e:
print(e) # "Database is closed"
# Each .mfdb file supports one open instance at a time
mem1 = mnemefusion.Memory("brain.mfdb")
try:
mem2 = mnemefusion.Memory("brain.mfdb") # Same file
except IOError as e:
print(e) # File lock error
```
## Building from Source
### Prerequisites
- Rust 1.75+
- Python 3.9+ (for Python bindings)
### Build
```bash
git clone https://github.com/gkanellopoulos/mnemefusion.git
cd mnemefusion
# Build core library
cargo build --release
# Run tests (520+ tests)
cargo test -p mnemefusion-core --lib
# Build Python bindings
cd mnemefusion-python
maturin develop --release
# With CUDA GPU support (requires CUDA toolkit)
maturin develop --release --features entity-extraction-cuda
```
## Testing
```bash
# All library unit tests
cargo test -p mnemefusion-core --lib
# With output
cargo test -p mnemefusion-core --lib -- --nocapture
# Run specific test module
cargo test -p mnemefusion-core profile
```
## Language Support
MnemeFusion's core search works with any language via multilingual embeddings. Entity extraction and intent classification are currently English-optimized.
| Vector search | All languages (use multilingual embeddings) |
| BM25 keyword search | English-optimized (Porter stemming) |
| Temporal indexing | All languages |
| Causal links | All languages |
| Entity extraction | English (optional, can be disabled) |
| Metadata filtering | All languages |
For non-English use, disable entity extraction:
```python
config = {"entity_extraction_enabled": False, "embedding_dim": 768}
mem = mnemefusion.Memory("brain.mfdb", config=config)
```
## API Stability
MnemeFusion is pre-1.0. The following APIs are considered **stable** and will not change without a version bump:
| `Memory(path, config, user)` | 0.1.0 |
| `add(content, embedding, metadata, timestamp)` | 0.1.0 |
| `query(query_text, query_embedding, limit, namespace, filters)` | 0.1.0 |
| `search(query_embedding, top_k, namespace, filters)` | 0.1.0 |
| `get(memory_id)` / `delete(memory_id)` | 0.1.0 |
| `close()` | 0.1.0 |
| `add_batch(memories, namespace)` | 0.1.0 |
| `set_embedding_fn(fn)` | 0.1.0 |
Everything else (entity extraction API, profile management, config keys) may change between minor versions. The `.mfdb` file format includes embedded version metadata — format-breaking changes will be documented in the [CHANGELOG](CHANGELOG.md).
## Performance Characteristics
| `add()` | O(log n) HNSW insertion + O(n) BM25 update | <5ms without entity extraction |
| `add()` with LLM extraction | Same + LLM inference | ~3-9s depending on GPU |
| `query()` | O(k·log n) across all dimensions + RRF fusion | ~50ms at 5K memories, ~200ms at 50K |
| `search()` | O(k·log n) vector-only | <10ms |
| `get()` / `delete()` | O(1) key lookup | <1ms |
| Storage overhead | ~1.5-2x raw content size (384-dim embeddings) | — |
Tested with up to 10K memories in a single `.mfdb` file. MnemeFusion is designed for per-entity databases — each user, contact, or conversation gets its own `.mfdb` file, typically containing 1K-10K memories. This atomic pattern keeps retrieval precise and scales horizontally.
## Contributing
Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for build instructions, test commands, and PR guidelines.
## License
Licensed under either of:
- [Apache License, Version 2.0](LICENSE-APACHE)
- [MIT License](LICENSE-MIT)
at your option.
## Acknowledgments
Built on excellent open-source libraries:
- [redb](https://github.com/cberner/redb) — Embedded key-value store
- [usearch](https://github.com/unum-cloud/usearch) — HNSW vector search
- [petgraph](https://github.com/petgraph/petgraph) — Graph algorithms
- [llama-cpp-2](https://github.com/utilityai/llama-cpp-rs) — Rust bindings for llama.cpp
- [PyO3](https://github.com/PyO3/pyo3) — Rust-Python interop
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) — AI-assisted development
---
**"SQLite for AI memory"** — One entity, one file. Five dimensions. Zero complexity.