mcp-memory
A Model Context Protocol (MCP) server providing LLM agents with a persistent knowledge graph memory — entities, relations, and observations stored in a compact custom binary log with write-ahead durability.
Speaks MCP over stdio, TCP, and HTTP transports.
┌──────────────────────────────────────┐
│ mcp-memory server │
│ │
┌───────┐ │ ┌──────────┐ ┌────────────────┐ │
│Claude │──────│─>│ stdio / │───>│ KnowledgeGraph │ │
│Desktop│ │ │ TCP / │ │ • entities │ │
└───────┘ │ │ HTTP │ │ • relations │ │
│ └──────────┘ │ • observations │ │
│ │ └───────┬────────┘ │
│ │ │ │
│ v v │
│ ┌──────────────────────────────┐ │
│ │ Binary write-ahead log │ │
│ │ (append-only, fsynced) │ │
│ └──────────────────────────────┘ │
└──────────────────────────────────────┘
Installation
Quick start
The memory file path is resolved in order:
--memory-file/-fflagMEMORY_FILE_PATHenvironment variable- Default:
memory.mcpmemin the working directory
Transports
| Transport | Flag | Description |
|---|---|---|
| stdio | --transport stdio |
Newline-delimited JSON over stdin/stdout (default, for Claude Desktop / Claude Code) |
| tcp | --transport tcp --bind 0.0.0.0:8080 |
Newline-delimited JSON over TCP, concurrent connections |
| http | --transport http --bind 0.0.0.0:8080 |
MCP Streamable HTTP (POST/GET /mcp) |
Claude Desktop config
Data model
Entity(name, entityType, observations[])
| |
| ——— relationType ———→ |
v v
Entity(name, entityType, observations[])
- Entity — a named node with a type (e.g.
person,company,project) and free-form observation strings. - Relation — a directed edge
(from, to, relationType)between two entities. Relations are undirected in traversal (BFS follows both ways). - Observation — an unstructured fact attached to an entity (e.g.
"likes coffee","founded in 2021").
All strings are interned on write — repeated values share storage. The
search index tokenizes names, types, and observations (on whitespace) and
matches a query against tokens case-insensitively by prefix — e.g. "cof"
matches the token "coffee", but "ffee" does not. Results are ranked by a
simple relevance proxy: the number of token hits an entity accumulates for the
query (higher is better). This is a token-hit count, not BM25 — there is no
TF saturation, length normalization, or IDF weighting.
Data structures & performance
| Component | Implementation | Notes |
|---|---|---|
| String interning | Arena-backed StringInterner with capacity-graded growth |
O(1) intern/lookup via get_optional |
| Entity lookup | 4-shard open-addressing hash table with ctrl-byte probing (Swiss-table style) | L1-touch on probe; ~1/128 false-positive key compares |
| Search | Inverted token index; case-insensitive prefix match, token-hit-count ranking | Incremental insert on add; remove_entity is a single retain pass |
| Relation storage | Flat Vec<StoredRelation> (12 B/record) |
Bulk iteration is a single cache-friendly linear scan |
| Temporary maps/sets | ahash::AHashMap / AHashSet (not SipHash) |
2-5x faster hashing for BFS, adjacency, dedup |
| Concurrency | parking_lot::RwLock (no poisoning, fair queuing) |
~30% faster uncontended; readers never block readers |
| Persistence | Append-only binary WAL + compact (atomic rename) | compact rewrites state as minimal create-records |
Benchmarks
All microbenchmarks measure a single-threaded KnowledgeGraph with 40,000 entities and 120,000 relations (≈10 MB as JSONL). Write-tool benchmarks use a fresh 100-entity, 300-relation graph so the setup cost does not dominate the measurement.
Results are from a MacBook Pro (Apple M4 Pro, 24 GB). Run cargo bench on your target hardware.
Read operations (40K entities, 120K relations)
| Operation | Latency | Ops/sec | vs JS |
|---|---|---|---|
get_entity (Swiss-table lookup) |
200 ns | 5,000,000 | ~250,000× |
batch_get_entities (10 names) |
2.1 µs | 476,000 | — |
graph_stats (linear scan) |
37 µs | 27,000 | — |
describe_entity (entity + incident relations) |
60 µs | 16,700 | — |
entity_type_counts (linear scan + hash tally) |
89 µs | 11,200 | — |
search_relations (from/to filter, linear scan) |
40 µs | 25,000 | — |
relation_type_counts (linear scan + hash tally) |
257 µs | 3,900 | — |
neighbors depth=1 |
392 µs | 2,600 | — |
read_graph_filtered (type + pagination) |
479 µs | 2,100 | — |
open_nodes (10 names + incident relations) |
676 µs | 1,500 | — |
search_nodes_filtered (prefix index + filter) |
1.25 ms | 800 | — |
find_all_paths (DFS, maxDepth=4, maxPaths=10) |
1.68 ms | 595 | — |
find_path (BFS shortest path) |
2.5 ms | 400 | — |
extract_subgraph depth=1 (3 seeds) |
2.7 ms | 375 | — |
extract_subgraph depth=2 (3 seeds) |
2.8 ms | 350 | — |
neighbors depth=2 |
4.8 ms | 208 | — |
read_graph (full dump) |
10.6 ms | 94 | ~5× |
search_nodes (prefix index, broad query) |
12.2 ms | 82 | ~4× |
export_json |
20.3 ms | 49 | — |
export_dot |
20.0 ms | 50 | — |
export_mermaid |
22.3 ms | 45 | — |
Write operations (100 entities, 300 relations)
| Operation | Latency | Ops/sec | vs JS |
|---|---|---|---|
add_observations (10 new, single entity) |
24 µs | 41,000 | — |
merge_entities (source→target full merge) |
49 µs | 20,400 | — |
create_relations (100 new) |
65 µs | 15,400 | — |
delete_observations (per entity) |
17 µs | 58,000 | — |
delete_relations (100 tuples) |
161 µs | 6,200 | — |
delete_entities (100 names + cascade) |
177 µs | 5,600 | — |
create_entities (100 new) |
227 µs | 4,400 | — |
upsert_existing (100 merges) |
570 µs | 1,800 | — |
upsert_new (100 creates) |
740 µs | 1,400 | — |
compact (after 50 deletes) |
12.7 ms | 79 | — |
Why is Rust faster than the JS reference?
The official JS server-memory stores the graph as a JSONL file and loads/saves the entire file on every operation:
| Factor | mcp-memory (Rust) |
@modelcontextprotocol/server-memory (JS) |
|---|---|---|
| Data model | In-memory Swiss-table + flat vecs | JSONL file, full read/write per op |
| Entity lookup | O(1) hash table (~200 ns) | O(N) Array.find over full file |
| Search | Inverted token index, prefix match | O(N) Array.filter + String.includes |
| Writes | Append-only binary WAL (~µs) | Full JSONL rewrite (~10 MB, ~ms) |
| Concurrency | parking_lot::RwLock (parallel reads) |
Single-threaded, no parallelism |
| String interning | Arena-backed dedup (no alloc per str) | Full strings heap-allocated per lookup |
| Persistence | Custom binary format, 12 B/relation | JSONL, ~60 B/relation + whitespace |
For a graph of 40K/120K, every JS mutation reads ~10 MB from disk, parses 160K JSON lines, modifies an array, serialises back, and writes the file — each operation takes 50–180 ms. The Rust server keeps everything in cache-friendly flat arrays with an append-only WAL; read operations touch zero disk and writes log a few dozen bytes.
Write tools
create_entities
Create new entities. Already-existing names (exact match) are silently skipped.
Input:
Complexity: O(N × O) where N = entities, O = observations per entity. Dedup check via hash lookup before insert. Each entity writes one WAL record.
create_relations
Create directed relations between entities. Duplicates (same from/to/type) are silently skipped.
Input:
Complexity: O(R) with dedup via HashSet<(StrId,StrId,StrId)>.
add_observations
Append observations to existing entities. Duplicate observation contents are skipped per entity.
Input:
Complexity: O(M) per entity where M = new observations. Dedup via HashSet<StrId> of existing.
delete_entities
Delete entities by name. Relations touching any deleted entity (from or to) are cascaded away.
Input:
Complexity: O(E + R) — entity lookup is O(1) each, relation cascade is
retain over the flat relation vec.
delete_observations
Remove specific observations from an entity.
Input:
delete_relations
Remove exact (from, to, relationType) tuples.
Input:
Complexity: O(R) — retain with a HashSet of target triples.
upsert_entities
Create-or-merge entities idempotently. New entities are created; existing entities keep their type and gain any new (deduplicated) observations. Safe to re-assert facts without overwriting accumulated knowledge.
Input:
Returns: per-entity { created: bool, addedObservations: [...] }.
Complexity: O(O) for merge (dedup by observation HashSet), O(1) create.
merge_entities
Merge source entity into target. All observations from source are added
to target (deduplicated via add_observations), all relations involving source
are redirected to target (deduplicated via batch create_relations), and then
source is deleted. Self-loop relations (which become target→target after
redirect) are silently dropped.
Use this to collapse duplicate entities the LLM accidentally created.
Input:
Returns:
WAL: Three sub-operations — add_observations, create_relations, delete_entities — each log their own records. Replay from disk reconstructs the merge exactly.
Complexity: O(S + R) where S = source observations, R = incident relations.
compact
Rewrite the binary log from current in-memory state, discarding all deleted
entities, tombstoned observations, and rolled-back relations. Produces a fresh
minimal log containing only the current graph. Crash-safe: writes to a temp
file, then rename(2) atomically over the original. Also rebuilds the
in-memory interner (reclaiming freed arena space).
Input: (none)
Complexity: O(V + E) scan plus full rewrite to temp file.
Read tools
read_graph
Return all entities and relations (or a filtered, paginated slice). With no arguments, streams the full graph through a borrowing serialization view that avoids allocating intermediate strings.
When entityType is specified, relations are restricted to those whose both
endpoints are in the returned entity page (internally consistent slice).
Input:
Complexity: O(V + E) unfiltered; O(V) filter (one linear scan of slots).
search_nodes
Relevance-ranked search over entity names, types, and observation content (case-insensitive, per-token prefix match). Returns matching entities and relations connected to them. Results ordered by token-hit count descending.
Input:
Complexity: O(I) search index query + O(K) score-take for K results. Incremental index (no full rebuild on mutation).
open_nodes
Fetch specific entities by name, plus any relations connected to them (from either endpoint). Returns entities in input order; nonexistent names are silently omitted.
Input:
Complexity: O(N + R') where N = requested names, R' = incident relations.
get_entity
Fetch a single entity by exact name (entity + observations only, no relations).
Input:
Complexity: O(1) — Swiss-table lookup → entity output.
batch_get_entities
Fetch multiple entities by name in one call (order preserved, null for missing).
Cuts N serial get_entity round-trips (each with dispatch + JSON parse +
JSON serialize overhead) down to one.
Input:
Returns:
Complexity: O(N) — each lookup is O(1) name-table probe.
graph_stats
Aggregate statistics about the knowledge graph.
Input: (none)
Returns:
Complexity: O(V + E) — one linear scan.
search_relations
Filter relations by optional from, to, and/or relationType. Omitted or
empty fields match all values. Returns matching relation objects.
Input:
Complexity: O(R) — linear scan with optional filter on interned StrIds.
find_path
Single BFS shortest path (undirected) between two entities. Returns the sequence of entity names connecting them inclusive of both endpoints.
Input:
Returns: ["Alice", "Bob", "Charlie"]
Complexity: O(V + E) — builds adjacency map once, then BFS.
find_all_paths
DFS with backtracking enumerating all simple paths (not just the shortest) between two entities. Use this to discover alternative routes the LLM may not have considered — e.g., two people might be connected through work, through family, and through a shared hobby, each being a different path.
The maxPaths cap prevents combinatorial explosion in dense graphs.
Input:
Returns:
Complexity: O(b^d) worst-case where b = avg branching factor, d = maxDepth.
The maxPaths and maxDepth parameters bound actual runtime.
extract_subgraph
Extract a connected subgraph around one or more seed entity names, expanding
out to depth hops along all relations (undirected). Returns all reached
entities plus the relations among them (only those whose both endpoints are
in the reached set). Replaces N serial get_neighbors calls with a single
O(R + V) pass.
Input:
Complexity: O(R) to build adjacency map + O(V + E) BFS for the reached
subgraph. Uses ahash maps/sets for hashing.
describe_entity
One-shot context bundle for a single entity: the entity with its observations,
every incident relation (with direction), distinct neighbor names, and degree.
Replaces a get_entity plus two search_relations calls.
Input:
Returns:
Complexity: O(R') — one linear pass over relations touching the entity.
list_entity_types
List all distinct entity types with live-entity counts, ranked by count descending (ties by name). Use this to discover the schema of the memory.
Input: (none)
Returns:
Complexity: O(V) — one linear scan, count via AHmap.
list_relation_types
List all distinct relation types with counts, ranked by count descending.
Input: (none)
Returns:
Complexity: O(R) — one linear scan.
export_graph
Export the entire graph as json (entities + relations), mermaid (diagram),
or dot (Graphviz).
Input:
Format examples:
json — the full { entities: [...], relations: [...] } object.
mermaid — a flowchart:
graph LR
n0["Alice"]
n1["Bob"]
n0 -->|knows| n1
dot — a directed graph:
digraph G
Architecture
main.rs
│
├── MCPServer::run_stdio() — stdio transport (newline-delimited JSON-RPC)
├── MCPServer::run_tcp() — TCP transport (same framing, concurrent conns)
└── MCPServer::run_http() — MCP Streamable HTTP (axum, POST/GET /mcp)
│
└── process_request()
│
├── "initialize" → protocol version + capabilities
├── "tools/list" → cached from tools.json
├── "tools/call" → dispatches to handler by name
├── "ping" → null
└── "notifications/" → no reply
All three transports share process_value() / dispatch_line() / dispatch_http_body()
— the dispatch core is transport-agnostic. HTTP uses tokio::task::spawn_blocking
for graph access since the reads are synchronous.
Locking
The KnowledgeGraph is guarded by parking_lot::RwLock:
- Concurrent readers never block each other (proven by test).
- Writers wait for all readers to drain, then proceed exclusively.
- No poisoning — lock acquisition returns the guard directly, not a
Result.
Write-ahead log (WAL)
Every mutation goes through this sequence before touching in-memory state:
- Encode the mutation as a binary record
- Append to
BufWriter<File> - Update in-memory structures
- On flush:
BufWriter::flush+File::sync_all
This means a crash between steps 2 and 3 results in the record being replayed on next startup — the mutation is applied during replay.
The file begins with an 8-byte magic header (MCPMEMV1). Each record that
follows is:
| Field | Size | Description |
|---|---|---|
| Len | 4 B | Total record length (LE) = 1 (kind) + payload bytes; capped at 1 MiB |
| Kind | 1 B | Record variant (CreateEntity, CreateRelation, AddObservations, …) |
| Data | variable | Hand-rolled length-prefixed payload (see src/store.rs) |
Strings inside a payload are encoded as a u16 little-endian byte length
followed by the UTF-8 bytes (so any single string is at most 64 KiB).
No per-record checksum. Replay tolerates a torn trailing record (a partial write at the tail is dropped), but it does not detect a corrupted record in the middle of the log. Treat the file as trusted local storage.
compact rewrites the log to contain only the minimal CreateEntity /
CreateRelation records needed to reconstruct the current state.
Transactions. Multi-record operations (currently merge_entities) wrap
their records in a TxnBegin … TxnCommit pair. On replay, records inside a
transaction are buffered and applied only when the commit is seen; an unclosed
transaction (the process crashed mid-operation) is discarded wholesale, so the
graph can never observe a half-applied merge.
Development
The test suite includes:
- Unit tests — intern, search (incl. prefix-vs-substring behavior), protocol, store encode/decode, tools, server dispatch
- Integration tests — CRUD, persistence roundtrip, WAL-ordering and stale-temp
compaction regressions, transactional
merge_entitiescrash-atomicity, search, paths, export, concurrency, RwLock proofs, and all 24 tool handlers - Fuzzy tests — randomized CRUD sequences asserting graph invariants, Unicode/large-string stress, and concurrent mutation consistency
License
Licensed under the Apache License, Version 2.0.