mcp-memory 3.2.0

MCP server for knowledge graph memory — entities, relations, and observations in SQLite with FTS5 search, plus optional vector/semantic + hybrid search (usearch HNSW)
Documentation

mcp-memory

A Model Context Protocol (MCP) server that gives LLM agents a persistent knowledge graph memory — entities, relations, and observations stored in an embedded SQLite database with FTS5 full-text search.

The crate ships two binaries:

Binary What it adds Tools
mcp-memory The knowledge-graph server 26
mcp-memory-vec Everything in mcp-memory plus vector embeddings and semantic / hybrid search (usearch HNSW) 32

Both speak MCP over stdio, TCP, and HTTP (with optional bearer-token auth and TLS).

                    ┌────────────────────────────────────────────────┐
                    │         mcp-memory / mcp-memory-vec            │
                    │                                                │
     ┌───────┐      │  ┌──────────┐   ┌─────────────────────────┐   │
     │Claude │──────│─>│  stdio / │──>│ GraphHandle             │   │
     │ / LLM │      │  │  TCP /   │   │  ├ LRU entity cache      │   │
     └───────┘      │  │  HTTP    │   │  ├ FxHashMap name→ID     │   │
                    │  └────┬─────┘   │  └ FTS5 full-text index  │   │
                    │       │         └───────────┬─────────────┘   │
                    │       │   (vec binary only) │                 │
                    │       v         ┌───────────┴─────────────┐   │
                    │  ┌─────────┐    │ VectorStore             │   │
                    │  │ dispatch│───>│  ├ usearch HNSW index    │   │
                    │  └─────────┘    │  └ petgraph adjacency    │   │
                    │       │         └───────────┬─────────────┘   │
                    │       v                     v                  │
                    │  ┌──────────────────────────────────────────┐ │
                    │  │ SQLite (WAL, 16 KB pages)                 │ │
                    │  │ entity, observation, relation, *_fts,     │ │
                    │  │ type_dict, vector_embedding               │ │
                    │  └──────────────────────────────────────────┘ │
                    └────────────────────────────────────────────────┘

Installation

cargo install mcp-memory

This installs both mcp-memory and mcp-memory-vec.

Quick start

# Knowledge-graph server
mcp-memory --transport stdio

# Knowledge-graph + vector search
mcp-memory-vec --transport stdio --embedding-dims 384

The database path is resolved in order:

  1. --memory-file / -f flag
  2. MEMORY_FILE_PATH environment variable (mcp-memory only)
  3. Default: memory.mcpmem in the working directory

Both binaries open the same SQLite file, so you can populate the graph with mcp-memory and later serve it with mcp-memory-vec (or run a single mcp-memory-vec for everything).

Transports

Transport Flag Description
stdio --transport stdio Newline-delimited JSON over stdin/stdout (default, for Claude Desktop / Claude Code)
tcp --transport tcp --bind 0.0.0.0:8080 Newline-delimited JSON over TCP, concurrent connections
http --transport http --bind 0.0.0.0:8080 MCP Streamable HTTP (POST/GET /mcp, SSE)

Claude Desktop / Claude Code config

{
  "mcpServers": {
    "memory": {
      "command": "mcp-memory"
    }
  }
}

Swap "command" for "mcp-memory-vec" (and add "args": ["--embedding-dims", "384"]) to enable vector search.

Authentication

The tcp and http transports accept an optional bearer token (stdio is never authenticated, on either binary). Set it with --auth-token or --auth-token-file (trimmed; an empty file is rejected). mcp-memory additionally falls back to the MCP_MEMORY_AUTH_TOKEN environment variable.

mcp-memory      --transport http --bind 0.0.0.0:8080 --auth-token "s3cr3t"
mcp-memory-vec  --transport http --bind 0.0.0.0:8080 --auth-token "s3cr3t"

On HTTP the token is sent as Authorization: Bearer <token>; on TCP it is the first line of the connection. Comparison is constant-time. Binding a non-loopback address without a token exposes the entire graph to the network.

TLS (HTTPS)

The http transport can be served over TLS (rustls, ring provider). Provide a PEM certificate chain and private key via --tls-cert / --tls-key; both must be supplied together or startup is refused. mcp-memory also accepts the MCP_TLS_CERT / MCP_TLS_KEY environment variables. When neither is set the transport stays plaintext (the default).

mcp-memory-vec --transport http --bind 0.0.0.0:8080 \
  --tls-cert ./cert.pem --tls-key ./key.pem

Vector search (mcp-memory-vec)

mcp-memory-vec layers a vector store on top of the knowledge graph. Each embedding is attached to an existing entity (by name), indexed in an in-memory usearch HNSW index, and persisted as a blob in the vector_embedding SQLite table. On startup the index is rebuilt from those blobs.

  • Bring your own embeddings. The server stores and searches vectors; it does not call an embedding model. Compute embeddings client-side (e.g. with an embedding API) and pass them in. All vectors must match --embedding-dims.
  • Semantic searchvector_search_entities returns the nearest entities by cosine similarity (configurable), optionally filtered by entity type.
  • Hybrid searchhybrid_search runs vector search and FTS5 text search in parallel and fuses the two rankings with Reciprocal Rank Fusion (RRF, constant 60), then optionally boosts results by graph centrality from an in-memory petgraph adjacency cache.

Vector configuration

The HNSW index is tunable from the command line:

Flag Default Meaning
--embedding-dims 384 Vector dimension; all embeddings must match
--vec-metric cos Distance metric: cos, ip (dot product), or l2sq
--vec-quantization f32 Scalar storage: f32, f16, or i8 (lower = less memory)
--vec-connectivity 16 HNSW graph degree M (higher = better recall, more memory)
--vec-expansion-add 200 HNSW efConstruction (higher = better index quality, slower inserts)
--vec-expansion-search 50 HNSW efSearch (higher = better recall, slower queries)
mcp-memory-vec --transport http --bind 0.0.0.0:8080 \
  --embedding-dims 768 --vec-metric cos --vec-quantization f16 \
  --vec-connectivity 32 --vec-expansion-search 128

The petgraph adjacency cache used for the hybrid-search centrality boost is built lazily; call vector_refresh_graph_cache after mutating relations to refresh it.

MCP compliance

Implements the Model Context Protocol revision 2025-11-25 over JSON-RPC 2.0, via stdio, TCP, or HTTP.

Area Support
Transports stdio, TCP, Streamable HTTP (POST/GET /mcp, SSE)
Protocol version 2025-11-25, negotiates down to 2025-06-18 / 2025-03-26 / 2024-11-05
initialize version negotiation + instructions
tools/list, tools/call 26 tools (mcp-memory) / 32 tools (mcp-memory-vec)
CallToolResult content[] + isError
Auth optional bearer token on TCP/HTTP (constant-time)
Capabilities advertised tools only

Tool failures are returned as CallToolResults with isError: true (not as JSON-RPC protocol errors) so the model can self-correct.

Data model

Entity(name, entityType, observations[])   ──relationType──▶   Entity(...)
  • Entity — a named node with a type (e.g. person, company, project) and free-form observation strings. Names are unique and case-sensitive.
  • Relation — a directed edge (from, to, relationType). Traversal is undirected (BFS/DFS follow both directions).
  • Observation — an unstructured fact attached to an entity.
  • Embedding (vec binary) — a fixed-dimension f32 vector attached to an entity, plus an optional model identifier.

Search uses FTS5 full-text indexing with unicode61 remove_diacritics 2 tokenization. Names and observation bodies live in separate external-content FTS5 tables (name_fts, obs_fts).

Storage & performance

SQLite (WAL mode)

A single SQLite database in WAL mode:

Table Key Purpose
entity INTEGER PRIMARY KEY (rowid) Primary entity storage; materialized obs_count, out_deg, in_deg; name_hash for O(1) routing
observation entity_id (FK) + rowid 1:N observations per entity
relation composite indexes Directed edges; covering indexes rel_out(from_id,type_id,to_id) and rel_in(to_id,type_id,from_id) for index-only scans
name_fts content_rowid External-content FTS5 over entity.name
obs_fts content_rowid External-content FTS5 over observation.body
type_dict name Interned entity/relation types with live counts (loaded into RAM)
graph_stat key (singleton) WITHOUT ROWID counters: entities, relations, observations, sequences
vector_embedding entity_id (vec binary) dims, blob (f32 vector), model, created_us

Key pragmas: page_size=16384, journal_mode=WAL, synchronous=NORMAL, cache_size=-50000 (~50 MB), mmap_size=256 MB, temp_store=MEMORY, busy_timeout=5000.

In-memory caches

Cache Purpose
Entity LRU (10,000 entries) Avoids deserializing hot entities; stores EntityMeta{id, type_id, obs_count, out_deg, in_deg}
Name-hash map O(1) name-to-ID resolution via 64-bit hash
Prepared-statement cache Reuses compiled SQLite queries
usearch HNSW index (vec) In-memory ANN index, rebuilt from vector_embedding on startup
petgraph adjacency (vec) Directed graph cache for the hybrid-search centrality boost

Write batching

Every mutation goes through a layered write path that collapses transaction count from O(N) to O(1) per create_entities / create_relations call:

  1. Batch existence checks in one read transaction
  2. Batch commit of all new entities/relations in one write transaction
  3. Batch FTS index updates in one write transaction
  4. Cache invalidation for affected names

Durability

Mode Behavior Data-loss window
async (default) Flush to kernel page cache, background sync Up to ~1 s on power failure
sync fsync before every write Zero

Set on mcp-memory via MCP_MEMORY_DURABILITY=sync. (mcp-memory-vec runs in async mode.)

Background maintenance

A background tokio task runs every 5 minutes: WAL checkpoint (PRAGMA wal_checkpoint(TRUNCATE)), planner analysis (PRAGMA optimize), and FTS optimization.

Benchmarks

Measured end-to-end via the bench binary, 1,000 entities (5 observations each) + 999 relations pre-populated, on a MacBook Pro (Apple M1 Pro, 32 GB). Numbers are averages and will vary by hardware — run cargo run --release --bin bench on your own target.

Operation Avg latency Notes
degree (cache hit) ~44 ns Materialized column
relation_type_counts ~2.3 µs RAM-cached type dictionary
get_entity_count ~3.0 µs RAM counter
entity_type_counts ~4.5 µs RAM-cached type dictionary
get_entity (cache hit) ~5.4 µs LRU hit; no SQLite I/O
describe_entity ~5.4 µs Entity + incident relations
search_relations (from / from+type) ~6.3 µs Covering index scan
delete_observations (1) ~11 µs
find_all_paths (A→C, depth 5) ~12 µs Bounded DFS
upsert_entities (type change + obs) ~27 µs
entities_exist (10 names) ~38 µs Hash lookups
batch_get_entities (10) ~42 µs Batch fetch
neighbors (depth 1 / depth 2) ~50 µs Index-only covering scan
open_nodes (single / 5 names) ~53–77 µs LRU + SQLite
search_nodes (name match) ~96 µs FTS5 query + entity lookup
add_observations (2) ~163 µs Append + FTS index
search_nodes (obs match) ~161 µs FTS5 over observation bodies
find_path (BFS) ~453 µs Worst case: full BFS
search_nodes (filtered) ~623 µs FTS5 + type filter
export (JSON) ~2.5 ms Serialize all entities + relations
read_graph (all) ~3.4 ms Full dump
create_relations (999) ~10 ms Batch write + degree updates
create_entities (1000) ~41 ms Batch write + FTS index

Tools

Knowledge-graph tools (both binaries)

Write: create_entities, create_relations, add_observations, delete_entities, delete_observations, delete_relations, upsert_entities, merge_entities, compact.

Read: read_graph, search_nodes, open_nodes, batch_get_entities, get_entity, entity_exists, graph_stats, search_relations, describe_entity, degree, find_path, find_all_paths, extract_subgraph, get_neighbors, list_entity_types, list_relation_types, export_graph.

Vector tools (mcp-memory-vec only)

  • vector_upsert_embedding — attach/replace an embedding on an existing entity
  • vector_search_entities — top-K nearest entities by vector similarity (optional type filter)
  • vector_delete_embedding — remove an entity's embedding (entity is kept)
  • hybrid_search — vector + FTS5 fused by RRF, optional graph-centrality boost
  • vector_refresh_graph_cache — rebuild the petgraph adjacency cache from relations
  • vector_store_stats — embedding count, dimension, index/graph sizes

Architecture

main.rs / vec_main.rs
  ├── run_stdio()  — newline-delimited JSON-RPC over stdio
  ├── run_tcp()    — same framing, concurrent connections
  └── run_http()   — MCP Streamable HTTP (axum, POST/GET /mcp)
        └── process_request()
              ├── "initialize"      → protocol version + capabilities
              ├── "tools/list"      → cached tool list
              ├── "tools/call"      → dispatch to handler by name
              ├── "ping"            → null
              └── "notifications/…" → no reply

All transports share the transport-agnostic dispatch core (dispatch_line() / dispatch_http_body()).

Concurrency & locking

  • GraphHandle uses parking_lot::Mutex for the writer connection and caches; a read-only connection pool serves concurrent reads under WAL.
  • The VectorStore uses DashMap for name↔ID maps and an RwLock over the petgraph cache; the usearch index is internally synchronized.
  • Heavy dispatch (graph lock + optional fsync) is offloaded to tokio::task::spawn_blocking to keep the reactor responsive.
  • TCP connections are capped at 128 concurrent.

Request size limits

Parameter Limit
Max request body 16 MB
Name max bytes 1,024
Observation max bytes 65,536
Max entities / relations / observations / names per request 1,000
Max search limit 1,000
Max neighbor depth 16
Max find_all_paths depth / results 10 / 100
Max embedding dimensions (vec) 4,096
Max topK (vec) 100

Development

cargo test                       # 100+ unit + integration tests
cargo clippy                     # lint (lib + binaries)
cargo build --release            # LTO + fat, opt-level 3
cargo run --release --bin bench  # standalone benchmark

The test suite covers protocol handling, all tool handlers, CRUD/search/path persistence, concurrency, fuzzy invariant checks, and — for the vector server — end-to-end stdio tool flows, input validation, the tunable index config, and HTTP bearer-token authentication.

Versioning & compatibility

Follows Semantic Versioning. The current line is 3.x, targeting MCP revision 2025-11-25.

mcp-memory MCP revision (default) Negotiates
3.x 2025-11-25 2025-06-18, 2025-03-26, 2024-11-05
2.x 2025-11-25 2025-06-18, 2025-03-26, 2024-11-05
≤ 1.x 2024-11-05

License

Licensed under the Apache License, Version 2.0.