mcp-memory
A Model Context Protocol (MCP) server that gives LLM agents a persistent knowledge graph memory — entities, relations, and observations stored in an embedded SQLite database with FTS5 full-text search.
The crate ships two binaries:
| Binary | What it adds | Tools |
|---|---|---|
mcp-memory |
The knowledge-graph server | 26 |
mcp-memory-vec |
Everything in mcp-memory plus vector embeddings and semantic / hybrid search (usearch HNSW) |
32 |
Both speak MCP over stdio, TCP, and HTTP (with optional bearer-token auth and TLS).
┌────────────────────────────────────────────────┐
│ mcp-memory / mcp-memory-vec │
│ │
┌───────┐ │ ┌──────────┐ ┌─────────────────────────┐ │
│Claude │──────│─>│ stdio / │──>│ GraphHandle │ │
│ / LLM │ │ │ TCP / │ │ ├ LRU entity cache │ │
└───────┘ │ │ HTTP │ │ ├ FxHashMap name→ID │ │
│ └────┬─────┘ │ └ FTS5 full-text index │ │
│ │ └───────────┬─────────────┘ │
│ │ (vec binary only) │ │
│ v ┌───────────┴─────────────┐ │
│ ┌─────────┐ │ VectorStore │ │
│ │ dispatch│───>│ ├ usearch HNSW index │ │
│ └─────────┘ │ └ petgraph adjacency │ │
│ │ └───────────┬─────────────┘ │
│ v v │
│ ┌──────────────────────────────────────────┐ │
│ │ SQLite (WAL, 16 KB pages) │ │
│ │ entity, observation, relation, *_fts, │ │
│ │ type_dict, vector_embedding │ │
│ └──────────────────────────────────────────┘ │
└────────────────────────────────────────────────┘
Installation
This installs both mcp-memory and mcp-memory-vec.
Quick start
# Knowledge-graph server
# Knowledge-graph + vector search
The database path is resolved in order:
--memory-file/-fflagMEMORY_FILE_PATHenvironment variable (mcp-memoryonly)- Default:
memory.mcpmemin the working directory
Both binaries open the same SQLite file, so you can populate the graph with
mcp-memory and later serve it with mcp-memory-vec (or run a single
mcp-memory-vec for everything).
Transports
| Transport | Flag | Description |
|---|---|---|
| stdio | --transport stdio |
Newline-delimited JSON over stdin/stdout (default, for Claude Desktop / Claude Code) |
| tcp | --transport tcp --bind 0.0.0.0:8080 |
Newline-delimited JSON over TCP, concurrent connections |
| http | --transport http --bind 0.0.0.0:8080 |
MCP Streamable HTTP (POST/GET /mcp, SSE) |
Claude Desktop / Claude Code config
Swap "command" for "mcp-memory-vec" (and add "args": ["--embedding-dims", "384"])
to enable vector search.
Authentication
The tcp and http transports accept an optional bearer token (stdio is never
authenticated, on either binary). Set it with --auth-token or --auth-token-file
(trimmed; an empty file is rejected). mcp-memory additionally falls back to the
MCP_MEMORY_AUTH_TOKEN environment variable.
On HTTP the token is sent as Authorization: Bearer <token>; on TCP it is the
first line of the connection. Comparison is constant-time. Binding a non-loopback
address without a token exposes the entire graph to the network.
TLS (HTTPS)
The http transport can be served over TLS (rustls, ring provider). Provide a
PEM certificate chain and private key via --tls-cert / --tls-key; both must be
supplied together or startup is refused. mcp-memory also accepts the
MCP_TLS_CERT / MCP_TLS_KEY environment variables. When neither is set the
transport stays plaintext (the default).
Vector search (mcp-memory-vec)
mcp-memory-vec layers a vector store on top of the knowledge graph. Each
embedding is attached to an existing entity (by name), indexed in an in-memory
usearch HNSW index, and persisted as a
blob in the vector_embedding SQLite table. On startup the index is rebuilt from
those blobs.
- Bring your own embeddings. The server stores and searches vectors; it does
not call an embedding model. Compute embeddings client-side (e.g. with an
embedding API) and pass them in. All vectors must match
--embedding-dims. - Semantic search —
vector_search_entitiesreturns the nearest entities by cosine similarity (configurable), optionally filtered by entity type. - Hybrid search —
hybrid_searchruns vector search and FTS5 text search in parallel and fuses the two rankings with Reciprocal Rank Fusion (RRF, constant 60), then optionally boosts results by graph centrality from an in-memory petgraph adjacency cache.
Vector configuration
The HNSW index is tunable from the command line:
| Flag | Default | Meaning |
|---|---|---|
--embedding-dims |
384 |
Vector dimension; all embeddings must match |
--vec-metric |
cos |
Distance metric: cos, ip (dot product), or l2sq |
--vec-quantization |
f32 |
Scalar storage: f32, f16, or i8 (lower = less memory) |
--vec-connectivity |
16 |
HNSW graph degree M (higher = better recall, more memory) |
--vec-expansion-add |
200 |
HNSW efConstruction (higher = better index quality, slower inserts) |
--vec-expansion-search |
50 |
HNSW efSearch (higher = better recall, slower queries) |
The petgraph adjacency cache used for the hybrid-search centrality boost is built
lazily; call vector_refresh_graph_cache after mutating relations to refresh it.
MCP compliance
Implements the Model Context Protocol revision
2025-11-25 over JSON-RPC 2.0, via stdio, TCP, or HTTP.
| Area | Support |
|---|---|
| Transports | stdio, TCP, Streamable HTTP (POST/GET /mcp, SSE) |
| Protocol version | 2025-11-25, negotiates down to 2025-06-18 / 2025-03-26 / 2024-11-05 |
initialize |
version negotiation + instructions |
tools/list, tools/call |
26 tools (mcp-memory) / 32 tools (mcp-memory-vec) |
CallToolResult |
content[] + isError |
| Auth | optional bearer token on TCP/HTTP (constant-time) |
| Capabilities advertised | tools only |
Tool failures are returned as CallToolResults with isError: true (not as
JSON-RPC protocol errors) so the model can self-correct.
Data model
Entity(name, entityType, observations[]) ──relationType──▶ Entity(...)
- Entity — a named node with a type (e.g.
person,company,project) and free-form observation strings. Names are unique and case-sensitive. - Relation — a directed edge
(from, to, relationType). Traversal is undirected (BFS/DFS follow both directions). - Observation — an unstructured fact attached to an entity.
- Embedding (vec binary) — a fixed-dimension
f32vector attached to an entity, plus an optional model identifier.
Search uses FTS5 full-text indexing with unicode61 remove_diacritics 2
tokenization. Names and observation bodies live in separate external-content FTS5
tables (name_fts, obs_fts).
Storage & performance
SQLite (WAL mode)
A single SQLite database in WAL mode:
| Table | Key | Purpose |
|---|---|---|
entity |
INTEGER PRIMARY KEY (rowid) |
Primary entity storage; materialized obs_count, out_deg, in_deg; name_hash for O(1) routing |
observation |
entity_id (FK) + rowid |
1:N observations per entity |
relation |
composite indexes | Directed edges; covering indexes rel_out(from_id,type_id,to_id) and rel_in(to_id,type_id,from_id) for index-only scans |
name_fts |
content_rowid |
External-content FTS5 over entity.name |
obs_fts |
content_rowid |
External-content FTS5 over observation.body |
type_dict |
name | Interned entity/relation types with live counts (loaded into RAM) |
graph_stat |
key (singleton) | WITHOUT ROWID counters: entities, relations, observations, sequences |
vector_embedding |
entity_id |
(vec binary) dims, blob (f32 vector), model, created_us |
Key pragmas: page_size=16384, journal_mode=WAL, synchronous=NORMAL,
cache_size=-50000 (~50 MB), mmap_size=256 MB, temp_store=MEMORY,
busy_timeout=5000.
In-memory caches
| Cache | Purpose |
|---|---|
| Entity LRU (10,000 entries) | Avoids deserializing hot entities; stores EntityMeta{id, type_id, obs_count, out_deg, in_deg} |
| Name-hash map | O(1) name-to-ID resolution via 64-bit hash |
| Prepared-statement cache | Reuses compiled SQLite queries |
| usearch HNSW index (vec) | In-memory ANN index, rebuilt from vector_embedding on startup |
| petgraph adjacency (vec) | Directed graph cache for the hybrid-search centrality boost |
Write batching
Every mutation goes through a layered write path that collapses transaction count
from O(N) to O(1) per create_entities / create_relations call:
- Batch existence checks in one read transaction
- Batch commit of all new entities/relations in one write transaction
- Batch FTS index updates in one write transaction
- Cache invalidation for affected names
Durability
| Mode | Behavior | Data-loss window |
|---|---|---|
async (default) |
Flush to kernel page cache, background sync | Up to ~1 s on power failure |
sync |
fsync before every write | Zero |
Set on mcp-memory via MCP_MEMORY_DURABILITY=sync. (mcp-memory-vec runs in
async mode.)
Background maintenance
A background tokio task runs every 5 minutes: WAL checkpoint
(PRAGMA wal_checkpoint(TRUNCATE)), planner analysis (PRAGMA optimize), and FTS
optimization.
Benchmarks
Measured end-to-end via the bench binary, 1,000 entities (5 observations each) +
999 relations pre-populated, on a MacBook Pro (Apple M1 Pro, 32 GB). Numbers
are averages and will vary by hardware — run cargo run --release --bin bench on
your own target.
| Operation | Avg latency | Notes |
|---|---|---|
degree (cache hit) |
~44 ns | Materialized column |
relation_type_counts |
~2.3 µs | RAM-cached type dictionary |
get_entity_count |
~3.0 µs | RAM counter |
entity_type_counts |
~4.5 µs | RAM-cached type dictionary |
get_entity (cache hit) |
~5.4 µs | LRU hit; no SQLite I/O |
describe_entity |
~5.4 µs | Entity + incident relations |
search_relations (from / from+type) |
~6.3 µs | Covering index scan |
delete_observations (1) |
~11 µs | |
find_all_paths (A→C, depth 5) |
~12 µs | Bounded DFS |
upsert_entities (type change + obs) |
~27 µs | |
entities_exist (10 names) |
~38 µs | Hash lookups |
batch_get_entities (10) |
~42 µs | Batch fetch |
neighbors (depth 1 / depth 2) |
~50 µs | Index-only covering scan |
open_nodes (single / 5 names) |
~53–77 µs | LRU + SQLite |
search_nodes (name match) |
~96 µs | FTS5 query + entity lookup |
add_observations (2) |
~163 µs | Append + FTS index |
search_nodes (obs match) |
~161 µs | FTS5 over observation bodies |
find_path (BFS) |
~453 µs | Worst case: full BFS |
search_nodes (filtered) |
~623 µs | FTS5 + type filter |
export (JSON) |
~2.5 ms | Serialize all entities + relations |
read_graph (all) |
~3.4 ms | Full dump |
create_relations (999) |
~10 ms | Batch write + degree updates |
create_entities (1000) |
~41 ms | Batch write + FTS index |
Tools
Knowledge-graph tools (both binaries)
Write: create_entities, create_relations, add_observations,
delete_entities, delete_observations, delete_relations, upsert_entities,
merge_entities, compact.
Read: read_graph, search_nodes, open_nodes, batch_get_entities,
get_entity, entity_exists, graph_stats, search_relations,
describe_entity, degree, find_path, find_all_paths, extract_subgraph,
get_neighbors, list_entity_types, list_relation_types, export_graph.
Vector tools (mcp-memory-vec only)
vector_upsert_embedding— attach/replace an embedding on an existing entityvector_search_entities— top-K nearest entities by vector similarity (optional type filter)vector_delete_embedding— remove an entity's embedding (entity is kept)hybrid_search— vector + FTS5 fused by RRF, optional graph-centrality boostvector_refresh_graph_cache— rebuild the petgraph adjacency cache from relationsvector_store_stats— embedding count, dimension, index/graph sizes
Architecture
main.rs / vec_main.rs
├── run_stdio() — newline-delimited JSON-RPC over stdio
├── run_tcp() — same framing, concurrent connections
└── run_http() — MCP Streamable HTTP (axum, POST/GET /mcp)
└── process_request()
├── "initialize" → protocol version + capabilities
├── "tools/list" → cached tool list
├── "tools/call" → dispatch to handler by name
├── "ping" → null
└── "notifications/…" → no reply
All transports share the transport-agnostic dispatch core
(dispatch_line() / dispatch_http_body()).
Concurrency & locking
GraphHandleusesparking_lot::Mutexfor the writer connection and caches; a read-only connection pool serves concurrent reads under WAL.- The
VectorStoreusesDashMapfor name↔ID maps and anRwLockover the petgraph cache; the usearch index is internally synchronized. - Heavy dispatch (graph lock + optional fsync) is offloaded to
tokio::task::spawn_blockingto keep the reactor responsive. - TCP connections are capped at 128 concurrent.
Request size limits
| Parameter | Limit |
|---|---|
| Max request body | 16 MB |
| Name max bytes | 1,024 |
| Observation max bytes | 65,536 |
| Max entities / relations / observations / names per request | 1,000 |
| Max search limit | 1,000 |
| Max neighbor depth | 16 |
Max find_all_paths depth / results |
10 / 100 |
| Max embedding dimensions (vec) | 4,096 |
Max topK (vec) |
100 |
Development
The test suite covers protocol handling, all tool handlers, CRUD/search/path persistence, concurrency, fuzzy invariant checks, and — for the vector server — end-to-end stdio tool flows, input validation, the tunable index config, and HTTP bearer-token authentication.
Versioning & compatibility
Follows Semantic Versioning. The current line is 3.x,
targeting MCP revision 2025-11-25.
| mcp-memory | MCP revision (default) | Negotiates |
|---|---|---|
| 3.x | 2025-11-25 |
2025-06-18, 2025-03-26, 2024-11-05 |
| 2.x | 2025-11-25 |
2025-06-18, 2025-03-26, 2024-11-05 |
| ≤ 1.x | 2024-11-05 |
— |
License
Licensed under the Apache License, Version 2.0.