mcp-memory

A Model Context Protocol (MCP) server that gives LLM agents a persistent knowledge graph memory — entities, relations, and observations stored in an embedded SQLite database with FTS5 full-text search.

The crate ships two binaries:

Binary	What it adds	Tools
`mcp-memory`	The knowledge-graph server	26
`mcp-memory-vec`	Everything in `mcp-memory` plus vector embeddings and semantic / hybrid search (usearch HNSW)	32

Both speak MCP over stdio, TCP, and HTTP (with optional bearer-token auth and TLS).

                    ┌────────────────────────────────────────────────┐
                    │         mcp-memory / mcp-memory-vec            │
                    │                                                │
     ┌───────┐      │  ┌──────────┐   ┌─────────────────────────┐   │
     │Claude │──────│─>│  stdio / │──>│ GraphHandle             │   │
     │ / LLM │      │  │  TCP /   │   │  ├ LRU entity cache      │   │
     └───────┘      │  │  HTTP    │   │  ├ FxHashMap name→ID     │   │
                    │  └────┬─────┘   │  └ FTS5 full-text index  │   │
                    │       │         └───────────┬─────────────┘   │
                    │       │   (vec binary only) │                 │
                    │       v         ┌───────────┴─────────────┐   │
                    │  ┌─────────┐    │ VectorStore             │   │
                    │  │ dispatch│───>│  ├ usearch HNSW index    │   │
                    │  └─────────┘    │  └ petgraph adjacency    │   │
                    │       │         └───────────┬─────────────┘   │
                    │       v                     v                  │
                    │  ┌──────────────────────────────────────────┐ │
                    │  │ SQLite (WAL, 16 KB pages)                 │ │
                    │  │ entity, observation, relation, *_fts,     │ │
                    │  │ type_dict, vector_embedding               │ │
                    │  └──────────────────────────────────────────┘ │
                    └────────────────────────────────────────────────┘

Installation

cargo install mcp-memory

This installs both mcp-memory and mcp-memory-vec.

Quick start

# Knowledge-graph server
mcp-memory --transport stdio

# Knowledge-graph + vector search
mcp-memory-vec --transport stdio --embedding-dims 384

The database path is resolved in order:

--memory-file / -f flag
MEMORY_FILE_PATH environment variable (mcp-memory only)
Default: memory.mcpmem in the working directory

Both binaries open the same SQLite file, so you can populate the graph with mcp-memory and later serve it with mcp-memory-vec (or run a single mcp-memory-vec for everything).

Transports

Transport	Flag	Description
stdio	`--transport stdio`	Newline-delimited JSON over stdin/stdout (default, for Claude Desktop / Claude Code)
tcp	`--transport tcp --bind 0.0.0.0:8080`	Newline-delimited JSON over TCP, concurrent connections
http	`--transport http --bind 0.0.0.0:8080`	MCP Streamable HTTP (POST/GET `/mcp`, SSE)

Claude Desktop / Claude Code config

{
  "mcpServers": {
    "memory": {
      "command": "mcp-memory"
    }
  }
}

Swap "command" for "mcp-memory-vec" (and add "args": ["--embedding-dims", "384"]) to enable vector search.

Authentication

The tcp and http transports accept an optional bearer token (stdio is never authenticated, on either binary). Set it with --auth-token or --auth-token-file (trimmed; an empty file is rejected). mcp-memory additionally falls back to the MCP_MEMORY_AUTH_TOKEN environment variable.

mcp-memory      --transport http --bind 0.0.0.0:8080 --auth-token "s3cr3t"
mcp-memory-vec  --transport http --bind 0.0.0.0:8080 --auth-token "s3cr3t"

On HTTP the token is sent as Authorization: Bearer <token>; on TCP it is the first line of the connection. Comparison is constant-time. Binding a non-loopback address without a token exposes the entire graph to the network.

TLS (HTTPS)

The http transport can be served over TLS (rustls, ring provider). Provide a PEM certificate chain and private key via --tls-cert / --tls-key; both must be supplied together or startup is refused. mcp-memory also accepts the MCP_TLS_CERT / MCP_TLS_KEY environment variables. When neither is set the transport stays plaintext (the default).

mcp-memory-vec --transport http --bind 0.0.0.0:8080 \
  --tls-cert ./cert.pem --tls-key ./key.pem

Vector search (`mcp-memory-vec`)

mcp-memory-vec layers a vector store on top of the knowledge graph. Each embedding is attached to an existing entity (by name), indexed in an in-memory usearch HNSW index, and persisted as a blob in the vector_embedding SQLite table. On startup the index is rebuilt from those blobs.

Bring your own embeddings. The server stores and searches vectors; it does not call an embedding model. Compute embeddings client-side (e.g. with an embedding API) and pass them in. All vectors must match --embedding-dims.
Semantic search — vector_search_entities returns the nearest entities by cosine similarity (configurable), optionally filtered by entity type.
Hybrid search — hybrid_search runs vector search and FTS5 text search in parallel and fuses the two rankings with Reciprocal Rank Fusion (RRF, constant 60), then optionally boosts results by graph centrality from an in-memory petgraph adjacency cache.

Vector configuration

The HNSW index is tunable from the command line:

Flag	Default	Meaning
`--embedding-dims`	`384`	Vector dimension; all embeddings must match
`--vec-metric`	`cos`	Distance metric: `cos`, `ip` (dot product), or `l2sq`
`--vec-quantization`	`f32`	Scalar storage: `f32`, `f16`, or `i8` (lower = less memory)
`--vec-connectivity`	`16`	HNSW graph degree `M` (higher = better recall, more memory)
`--vec-expansion-add`	`200`	HNSW `efConstruction` (higher = better index quality, slower inserts)
`--vec-expansion-search`	`50`	HNSW `efSearch` (higher = better recall, slower queries)

mcp-memory-vec --transport http --bind 0.0.0.0:8080 \
  --embedding-dims 768 --vec-metric cos --vec-quantization f16 \
  --vec-connectivity 32 --vec-expansion-search 128

The petgraph adjacency cache used for the hybrid-search centrality boost is built lazily; call vector_refresh_graph_cache after mutating relations to refresh it.

MCP compliance

Implements the Model Context Protocol revision 2025-11-25 over JSON-RPC 2.0, via stdio, TCP, or HTTP.

Area	Support
Transports	stdio, TCP, Streamable HTTP (POST/GET `/mcp`, SSE)
Protocol version	`2025-11-25`, negotiates down to `2025-06-18` / `2025-03-26` / `2024-11-05`
`initialize`	version negotiation + `instructions`
`tools/list`, `tools/call`	26 tools (`mcp-memory`) / 32 tools (`mcp-memory-vec`)
`CallToolResult`	`content[]` + `isError`
Auth	optional bearer token on TCP/HTTP (constant-time)
Capabilities advertised	`tools` only

Tool failures are returned as CallToolResults with isError: true (not as JSON-RPC protocol errors) so the model can self-correct.

Data model

Entity(name, entityType, observations[])   ──relationType──▶   Entity(...)

Entity — a named node with a type (e.g. person, company, project) and free-form observation strings. Names are unique and case-sensitive.
Relation — a directed edge (from, to, relationType). Traversal is undirected (BFS/DFS follow both directions).
Observation — an unstructured fact attached to an entity.
Embedding (vec binary) — a fixed-dimension f32 vector attached to an entity, plus an optional model identifier.

Search uses FTS5 full-text indexing with unicode61 remove_diacritics 2 tokenization. Names and observation bodies live in separate external-content FTS5 tables (name_fts, obs_fts).

Storage & performance

SQLite (WAL mode)

A single SQLite database in WAL mode:

Table	Key	Purpose
`entity`	`INTEGER PRIMARY KEY` (rowid)	Primary entity storage; materialized `obs_count`, `out_deg`, `in_deg`; `name_hash` for O(1) routing
`observation`	`entity_id` (FK) + rowid	1:N observations per entity
`relation`	composite indexes	Directed edges; covering indexes `rel_out(from_id,type_id,to_id)` and `rel_in(to_id,type_id,from_id)` for index-only scans
`name_fts`	`content_rowid`	External-content FTS5 over `entity.name`
`obs_fts`	`content_rowid`	External-content FTS5 over `observation.body`
`type_dict`	name	Interned entity/relation types with live counts (loaded into RAM)
`graph_stat`	key (singleton)	`WITHOUT ROWID` counters: entities, relations, observations, sequences
`vector_embedding`	`entity_id`	(vec binary) `dims`, `blob` (f32 vector), `model`, `created_us`

Key pragmas: page_size=16384, journal_mode=WAL, synchronous=NORMAL, cache_size=-50000 (~50 MB), mmap_size=256 MB, temp_store=MEMORY, busy_timeout=5000.

In-memory caches

Cache	Purpose
Entity LRU (10,000 entries)	Avoids deserializing hot entities; stores `EntityMeta{id, type_id, obs_count, out_deg, in_deg}`
Name-hash map	O(1) name-to-ID resolution via 64-bit hash
Prepared-statement cache	Reuses compiled SQLite queries
usearch HNSW index (vec)	In-memory ANN index, rebuilt from `vector_embedding` on startup
petgraph adjacency (vec)	Directed graph cache for the hybrid-search centrality boost

Write batching

Every mutation goes through a layered write path that collapses transaction count from O(N) to O(1) per create_entities / create_relations call:

Batch existence checks in one read transaction
Batch commit of all new entities/relations in one write transaction
Batch FTS index updates in one write transaction
Cache invalidation for affected names

Durability

Mode	Behavior	Data-loss window
`async` (default)	Flush to kernel page cache, background sync	Up to ~1 s on power failure
`sync`	fsync before every write	Zero

Set on mcp-memory via MCP_MEMORY_DURABILITY=sync. (mcp-memory-vec runs in async mode.)

Background maintenance

A background tokio task runs every 5 minutes: WAL checkpoint (PRAGMA wal_checkpoint(TRUNCATE)), planner analysis (PRAGMA optimize), and FTS optimization.

Benchmarks

Measured end-to-end via the bench binary, 1,000 entities (5 observations each) + 999 relations pre-populated, on a MacBook Pro (Apple M1 Pro, 32 GB). Numbers are averages and will vary by hardware — run cargo run --release --bin bench on your own target.

Operation	Avg latency	Notes
`degree` (cache hit)	~44 ns	Materialized column
`relation_type_counts`	~2.3 µs	RAM-cached type dictionary
`get_entity_count`	~3.0 µs	RAM counter
`entity_type_counts`	~4.5 µs	RAM-cached type dictionary
`get_entity` (cache hit)	~5.4 µs	LRU hit; no SQLite I/O
`describe_entity`	~5.4 µs	Entity + incident relations
`search_relations` (from / from+type)	~6.3 µs	Covering index scan
`delete_observations` (1)	~11 µs
`find_all_paths` (A→C, depth 5)	~12 µs	Bounded DFS
`upsert_entities` (type change + obs)	~27 µs
`entities_exist` (10 names)	~38 µs	Hash lookups
`batch_get_entities` (10)	~42 µs	Batch fetch
`neighbors` (depth 1 / depth 2)	~50 µs	Index-only covering scan
`open_nodes` (single / 5 names)	~53–77 µs	LRU + SQLite
`search_nodes` (name match)	~96 µs	FTS5 query + entity lookup
`add_observations` (2)	~163 µs	Append + FTS index
`search_nodes` (obs match)	~161 µs	FTS5 over observation bodies
`find_path` (BFS)	~453 µs	Worst case: full BFS
`search_nodes` (filtered)	~623 µs	FTS5 + type filter
`export` (JSON)	~2.5 ms	Serialize all entities + relations
`read_graph` (all)	~3.4 ms	Full dump
`create_relations` (999)	~10 ms	Batch write + degree updates
`create_entities` (1000)	~41 ms	Batch write + FTS index

Tools

Knowledge-graph tools (both binaries)

Write: create_entities, create_relations, add_observations, delete_entities, delete_observations, delete_relations, upsert_entities, merge_entities, compact.

Read: read_graph, search_nodes, open_nodes, batch_get_entities, get_entity, entity_exists, graph_stats, search_relations, describe_entity, degree, find_path, find_all_paths, extract_subgraph, get_neighbors, list_entity_types, list_relation_types, export_graph.

Vector tools (`mcp-memory-vec` only)

vector_upsert_embedding — attach/replace an embedding on an existing entity
vector_search_entities — top-K nearest entities by vector similarity (optional type filter)
vector_delete_embedding — remove an entity's embedding (entity is kept)
hybrid_search — vector + FTS5 fused by RRF, optional graph-centrality boost
vector_refresh_graph_cache — rebuild the petgraph adjacency cache from relations
vector_store_stats — embedding count, dimension, index/graph sizes

Architecture

main.rs / vec_main.rs
  ├── run_stdio()  — newline-delimited JSON-RPC over stdio
  ├── run_tcp()    — same framing, concurrent connections
  └── run_http()   — MCP Streamable HTTP (axum, POST/GET /mcp)
        └── process_request()
              ├── "initialize"      → protocol version + capabilities
              ├── "tools/list"      → cached tool list
              ├── "tools/call"      → dispatch to handler by name
              ├── "ping"            → null
              └── "notifications/…" → no reply

All transports share the transport-agnostic dispatch core (dispatch_line() / dispatch_http_body()).

Concurrency & locking

GraphHandle uses parking_lot::Mutex for the writer connection and caches; a read-only connection pool serves concurrent reads under WAL.
The VectorStore uses DashMap for name↔ID maps and an RwLock over the petgraph cache; the usearch index is internally synchronized.
Heavy dispatch (graph lock + optional fsync) is offloaded to tokio::task::spawn_blocking to keep the reactor responsive.
TCP connections are capped at 128 concurrent.

Request size limits

Parameter	Limit
Max request body	16 MB
Name max bytes	1,024
Observation max bytes	65,536
Max entities / relations / observations / names per request	1,000
Max search limit	1,000
Max neighbor depth	16
Max `find_all_paths` depth / results	10 / 100
Max embedding dimensions (vec)	4,096
Max `topK` (vec)	100

Development

cargo test                       # 100+ unit + integration tests
cargo clippy                     # lint (lib + binaries)
cargo build --release            # LTO + fat, opt-level 3
cargo run --release --bin bench  # standalone benchmark

The test suite covers protocol handling, all tool handlers, CRUD/search/path persistence, concurrency, fuzzy invariant checks, and — for the vector server — end-to-end stdio tool flows, input validation, the tunable index config, and HTTP bearer-token authentication.

Versioning & compatibility

Follows Semantic Versioning. The current line is 3.x, targeting MCP revision 2025-11-25.

mcp-memory	MCP revision (default)	Negotiates
3.x	`2025-11-25`	`2025-06-18`, `2025-03-26`, `2024-11-05`
2.x	`2025-11-25`	`2025-06-18`, `2025-03-26`, `2024-11-05`
≤ 1.x	`2024-11-05`	—

License

Licensed under the Apache License, Version 2.0.

mcp-memory 3.2.0