mcp-memory 4.0.1

MCP server for knowledge graph memory — entities, relations, and observations in SQLite with FTS5 search, plus optional vector/semantic + hybrid search (usearch HNSW or IVF-Flat) with batch upsert, more-like-this, recommendations, and MMR diversification
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
# mcp-memory

A [Model Context Protocol](https://modelcontextprotocol.io) (MCP) server that gives
LLM agents a persistent **knowledge graph memory** — entities, relations, and
observations stored in an embedded SQLite database with FTS5 full-text search.

It is **one unified server** with an opt-in vector subsystem:

| Invocation | What you get | Tools |
|---|---|---|
| `mcp-memory` | The knowledge-graph server | 26 |
| `mcp-memory --vectors` | Everything above **plus** vector embeddings and semantic / hybrid / MMR search (usearch HNSW **or** IVF-Flat) | 38 |
| `mcp-memory-vec` | Backward-compatible alias for `mcp-memory --vectors` | 38 |

> **v4 note:** the former separate `mcp-memory-vec` server has been merged into
> `mcp-memory`. Vectors are now enabled with the `--vectors` flag; `mcp-memory-vec`
> remains as a thin alias that turns the flag on, so existing configs keep working.

It speaks MCP over **stdio, TCP, and HTTP** (with optional bearer-token auth and TLS).

```
                    ┌────────────────────────────────────────────────┐
                    │      mcp-memory  (+ --vectors / -vec alias)     │
                    │                                                │
     ┌───────┐      │  ┌──────────┐   ┌─────────────────────────┐   │
     │Claude │──────│─>│  stdio / │──>│ GraphHandle             │   │
     │ / LLM │      │  │  TCP /   │   │  ├ LRU entity cache      │   │
     └───────┘      │  │  HTTP    │   │  ├ FxHashMap name→ID     │   │
                    │  └────┬─────┘   │  └ FTS5 full-text index  │   │
                    │       │         └───────────┬─────────────┘   │
                    │       │     (--vectors only) │                 │
                    │       v         ┌───────────┴─────────────┐   │
                    │  ┌─────────┐    │ VectorStore             │   │
                    │  │ dispatch│───>│  ├ ANN: HNSW *or* IVF    │   │
                    │  └─────────┘    │  └ petgraph adjacency    │   │
                    │       │         └───────────┬─────────────┘   │
                    │       v                     v                  │
                    │  ┌──────────────────────────────────────────┐ │
                    │  │ SQLite (WAL, 4 KB pages, auto_vacuum)     │ │
                    │  │ entity, observation, relation, *_fts,     │ │
                    │  │ type_dict, vector_embedding               │ │
                    │  └──────────────────────────────────────────┘ │
                    └────────────────────────────────────────────────┘
```

## Installation

```sh
cargo install mcp-memory
```

This installs both `mcp-memory` and `mcp-memory-vec`.

## Quick start

```sh
# Knowledge-graph server
mcp-memory --transport stdio

# Knowledge-graph + vector search
mcp-memory --vectors --transport stdio --embedding-dims 384

# Equivalent backward-compatible alias
mcp-memory-vec --transport stdio --embedding-dims 384
```

The database path is resolved in order:

1. `--memory-file` / `-f` flag
2. `MEMORY_FILE_PATH` environment variable
3. Default: `memory.mcpmem` in the working directory

The same SQLite file works with or without `--vectors`, so you can populate the
graph plain and later serve it with vectors enabled. With `--vectors` off, the
vector tools are neither advertised in `tools/list` nor served.

### Transports

| Transport | Flag | Description |
|-----------|------|-------------|
| stdio | `--transport stdio` | Newline-delimited JSON over stdin/stdout (default, for Claude Desktop / Claude Code) |
| tcp | `--transport tcp --bind 0.0.0.0:8080` | Newline-delimited JSON over TCP, concurrent connections |
| http | `--transport http --bind 0.0.0.0:8080` | MCP Streamable HTTP (POST/GET `/mcp`, SSE) |

### Claude Desktop / Claude Code config

```json
{
  "mcpServers": {
    "memory": {
      "command": "mcp-memory"
    }
  }
}
```

Add `"args": ["--vectors", "--embedding-dims", "384"]` to enable vector search
(or use `"command": "mcp-memory-vec"`).

### Authentication

The `tcp` and `http` transports accept an optional bearer token (stdio is never
authenticated). Set it with `--auth-token` or `--auth-token-file` (trimmed; an
empty file is rejected), or the `MCP_MEMORY_AUTH_TOKEN` environment variable.

```sh
mcp-memory --transport http --bind 0.0.0.0:8080 --auth-token "s3cr3t"
mcp-memory --vectors --transport http --bind 0.0.0.0:8080 --auth-token "s3cr3t"
```

On HTTP the token is sent as `Authorization: Bearer <token>`; on TCP it is the
first line of the connection. Comparison is constant-time. Binding a non-loopback
address **without** a token exposes the entire graph to the network.

### TLS (HTTPS)

The `http` transport can be served over TLS (rustls, `ring` provider). Provide a
PEM certificate chain and private key via `--tls-cert` / `--tls-key`; both must be
supplied together or startup is refused. The `MCP_TLS_CERT` / `MCP_TLS_KEY`
environment variables are accepted as fallbacks. When neither is set the transport
stays plaintext (the default).

```sh
mcp-memory --transport http --bind 0.0.0.0:8080 \
  --tls-cert ./cert.pem --tls-key ./key.pem
```

## Vector search (`--vectors`)

With `--vectors`, the server layers a vector store on top of the knowledge graph.
Each embedding is attached to an **existing** entity (by name), indexed in an
in-memory ANN index, and persisted as a blob in the `vector_embedding` SQLite
table. On startup the index is rebuilt from those blobs.

- **Bring your own embeddings.** The server stores and searches vectors; it does
  not call an embedding model. Compute embeddings client-side (e.g. with an
  embedding API) and pass them in. All vectors must match `--embedding-dims`.
- **Semantic search**`vector_search_entities` returns the nearest entities by
  cosine similarity (configurable), optionally filtered by entity type.
- **More-like-this & recommendations**`vector_search_by_entity` finds entities
  similar to a given entity's own embedding; `vector_recommend` builds a query from
  positive (minus negative) example entities.
- **MMR diversification**`vector_mmr_search` returns results that balance
  relevance against novelty (Maximal Marginal Relevance), a common RAG
  context-selection step that suppresses near-duplicate hits.
- **Batch ingestion**`vector_batch_upsert` upserts up to 1,024 embeddings per
  call, reporting per-item failures instead of aborting.
- **Hybrid search**`hybrid_search` runs vector search and FTS5 text search in
  parallel and fuses the two rankings with Reciprocal Rank Fusion (RRF, constant
  60), then optionally boosts results by graph centrality from an in-memory
  petgraph adjacency cache.

### Index backends: HNSW vs IVF-Flat

Two ANN backends are available via `--vec-index`:

| Backend | When to use | Notes |
|---|---|---|
| `hnsw` *(default)* | Best recall/latency for most workloads | [usearch]https://github.com/unum-cloud/usearch graph index; supports `f16`/`bf16`/`i8` quantization |
| `ivf` | Large, batch-ingested, periodically-rebuilt corpora | k-means partitioned (IVF-Flat); cheaper to build, lighter memory. **Exact (brute-force) until trained**, so results are always correct |

The IVF index trains automatically when a populated database is opened. After a
large batch ingestion into a fresh database, call `vector_reindex` to (re)run
k-means and keep recall high (no-op for HNSW).

### Vector configuration

The index is tunable from the command line (all require `--vectors`):

| Flag | Default | Meaning |
|---|---|---|
| `--embedding-dims` | `384` | Vector dimension; all embeddings must match |
| `--vec-index` | `hnsw` | ANN backend: `hnsw` or `ivf` |
| `--vec-metric` | `cos` | Distance metric: `cos`, `ip` (dot product), or `l2sq` |
| `--vec-quantization` | `f32` | HNSW scalar storage: `f32`, `f16`, `bf16`, or `i8` (lower = less memory) |
| `--vec-connectivity` | `16` | HNSW graph degree `M` (higher = better recall, more memory) |
| `--vec-expansion-add` | `200` | HNSW `efConstruction` (higher = better index quality, slower inserts) |
| `--vec-expansion-search` | `50` | HNSW `efSearch` (higher = better recall, slower queries) |
| `--ivf-nlist` | `256` | IVF number of Voronoi cells / centroids |
| `--ivf-nprobe` | `8` | IVF cells probed per query (higher = better recall, slower) |

```sh
# HNSW with half-precision storage
mcp-memory --vectors --transport http --bind 0.0.0.0:8080 \
  --embedding-dims 768 --vec-metric cos --vec-quantization f16 \
  --vec-connectivity 32 --vec-expansion-search 128

# IVF-Flat for a large corpus
mcp-memory --vectors --embedding-dims 768 \
  --vec-index ivf --ivf-nlist 1024 --ivf-nprobe 16
```

The petgraph adjacency cache used for the hybrid-search centrality boost is built
lazily; call `vector_refresh_graph_cache` after mutating relations to refresh it.

## MCP compliance

Implements the [Model Context Protocol](https://modelcontextprotocol.io) revision
**`2025-11-25`** over JSON-RPC 2.0, via stdio, TCP, or HTTP.

| Area | Support |
|---|---|
| Transports | stdio, TCP, **Streamable HTTP** (POST/GET `/mcp`, SSE) |
| Protocol version | `2025-11-25`, negotiates down to `2025-06-18` / `2025-03-26` / `2024-11-05` |
| `initialize` | version negotiation + `instructions` |
| `tools/list`, `tools/call` | 26 tools (KG only) / 38 tools (with `--vectors`) |
| `CallToolResult` | `content[]` + `isError` |
| Auth | optional bearer token on TCP/HTTP (constant-time) |
| Capabilities advertised | `tools` only |

Tool failures are returned as `CallToolResult`s with `isError: true` (not as
JSON-RPC protocol errors) so the model can self-correct.

## Data model

```
Entity(name, entityType, observations[])   ──relationType──▶   Entity(...)
```

- **Entity** — a named node with a type (e.g. `person`, `company`, `project`) and
  free-form observation strings. Names are unique and case-sensitive.
- **Relation** — a directed edge `(from, to, relationType)`. Traversal is
  undirected (BFS/DFS follow both directions).
- **Observation** — an unstructured fact attached to an entity.
- **Embedding** *(`--vectors`)* — a fixed-dimension `f32` vector attached to an
  entity, plus an optional model identifier.

Search uses FTS5 full-text indexing with `unicode61 remove_diacritics 2`
tokenization. Names and observation bodies live in separate external-content FTS5
tables (`name_fts`, `obs_fts`).

## Storage & performance

### SQLite (WAL mode)

A single SQLite database in WAL mode:

| Table | Key | Purpose |
|---|---|---|
| `entity` | `INTEGER PRIMARY KEY` (rowid) | Primary entity storage; materialized `obs_count`, `out_deg`, `in_deg`; `name_hash` for O(1) routing |
| `observation` | `entity_id` (FK) + rowid | 1:N observations per entity |
| `relation` | composite indexes | Directed edges; covering indexes `rel_out(from_id,type_id,to_id)` and `rel_in(to_id,type_id,from_id)` for index-only scans |
| `name_fts` | `content_rowid` | External-content FTS5 over `entity.name` |
| `obs_fts` | `content_rowid` | External-content FTS5 over `observation.body` |
| `type_dict` | name | Interned entity/relation types with live counts (loaded into RAM) |
| `graph_stat` | key (singleton) | `WITHOUT ROWID` counters: entities, relations, observations, sequences |
| `vector_embedding` | `entity_id` | *(`--vectors`)* `dims`, `blob` (f32 vector), `model`, `created_us` |

Key pragmas (defaults, all tunable via flags): `page_size=4096`,
`journal_mode=WAL`, `auto_vacuum=INCREMENTAL`, `synchronous=NORMAL`,
`cache_size=-50000` (~50 MB, `--cache-size-mb`), `mmap_size=256 MB`
(`--mmap-size`), `temp_store=MEMORY`, `busy_timeout=5000` (`--busy-timeout-ms`).
A background `wal_checkpoint(PASSIVE)` runs every `--wal-flush-ms` (default 250 ms)
to bound the async durability window.

### In-memory caches

| Cache | Purpose |
|---|---|
| Entity LRU (10,000 entries) | Avoids deserializing hot entities; stores `EntityMeta{id, type_id, obs_count, out_deg, in_deg}` |
| Name-hash map | O(1) name-to-ID resolution via 64-bit hash |
| Prepared-statement cache | Reuses compiled SQLite queries |
| ANN index *(`--vectors`)* | In-memory HNSW or IVF-Flat index, rebuilt from `vector_embedding` on startup |
| petgraph adjacency *(`--vectors`)* | Directed graph cache for the hybrid-search centrality boost |

### Write batching

Every mutation goes through a layered write path that collapses transaction count
from O(N) to O(1) per `create_entities` / `create_relations` call:

1. Batch existence checks in one read transaction
2. Batch commit of all new entities/relations in one write transaction
3. Batch FTS index updates in one write transaction
4. Cache invalidation for affected names

### Durability

| Mode | Behavior | Data-loss window |
|---|---|---|
| `async` (default) | Flush to kernel page cache, background sync | Up to ~1 s on power failure |
| `sync` | fsync before every write | Zero |

Set via the `MCP_MEMORY_DURABILITY=sync` environment variable (applies whether or
not `--vectors` is on).

### Background maintenance

A background tokio task runs every 5 minutes: WAL checkpoint
(`PRAGMA wal_checkpoint(TRUNCATE)`), planner analysis (`PRAGMA optimize`), and FTS
optimization.

## Benchmarks

Measured end-to-end via the `bench` binary, 1,000 entities (5 observations each) +
999 relations pre-populated, on a **MacBook Pro (Apple M1 Pro, 32 GB)**. Numbers
are averages and will vary by hardware — run `cargo run --release --bin bench` on
your own target.

| Operation | Avg latency | Notes |
|---|---|---|
| `degree` (cache hit) | ~44 ns | Materialized column |
| `relation_type_counts` | ~2.3 µs | RAM-cached type dictionary |
| `get_entity_count` | ~3.0 µs | RAM counter |
| `entity_type_counts` | ~4.5 µs | RAM-cached type dictionary |
| `get_entity` (cache hit) | ~5.4 µs | LRU hit; no SQLite I/O |
| `describe_entity` | ~5.4 µs | Entity + incident relations |
| `search_relations` (from / from+type) | ~6.3 µs | Covering index scan |
| `delete_observations` (1) | ~11 µs | |
| `find_all_paths` (A→C, depth 5) | ~12 µs | Bounded DFS |
| `upsert_entities` (type change + obs) | ~27 µs | |
| `entities_exist` (10 names) | ~38 µs | Hash lookups |
| `batch_get_entities` (10) | ~42 µs | Batch fetch |
| `neighbors` (depth 1 / depth 2) | ~50 µs | Index-only covering scan |
| `open_nodes` (single / 5 names) | ~53–77 µs | LRU + SQLite |
| `search_nodes` (name match) | ~96 µs | FTS5 query + entity lookup |
| `add_observations` (2) | ~163 µs | Append + FTS index |
| `search_nodes` (obs match) | ~161 µs | FTS5 over observation bodies |
| `find_path` (BFS) | ~453 µs | Worst case: full BFS |
| `search_nodes` (filtered) | ~623 µs | FTS5 + type filter |
| `export` (JSON) | ~2.5 ms | Serialize all entities + relations |
| `read_graph` (all) | ~3.4 ms | Full dump |
| `create_relations` (999) | ~10 ms | Batch write + degree updates |
| `create_entities` (1000) | ~41 ms | Batch write + FTS index |

## Tools

### Knowledge-graph tools (always available)

**Write:** `create_entities`, `create_relations`, `add_observations`,
`delete_entities`, `delete_observations`, `delete_relations`, `upsert_entities`,
`merge_entities`, `compact`.

**Read:** `read_graph`, `search_nodes`, `open_nodes`, `batch_get_entities`,
`get_entity`, `entity_exists`, `graph_stats`, `search_relations`,
`describe_entity`, `degree`, `find_path`, `find_all_paths`, `extract_subgraph`,
`get_neighbors`, `list_entity_types`, `list_relation_types`, `export_graph`.

### Vector tools (`--vectors` only)

- `vector_upsert_embedding` — attach/replace an embedding on an existing entity
- `vector_batch_upsert` — bulk-upsert up to 1,024 embeddings; per-item error reporting
- `vector_get_embedding` — fetch the stored embedding (and model) for an entity
- `vector_search_entities` — top-K nearest entities by vector similarity (optional type filter)
- `vector_search_by_entity` — "more like this": nearest to an entity's own embedding
- `vector_recommend` — example-based recommendation from positive/negative entities
- `vector_mmr_search` — diversified retrieval via Maximal Marginal Relevance (`lambda`)
- `hybrid_search` — vector + FTS5 fused by RRF, optional graph-centrality boost
- `vector_delete_embedding` — remove an entity's embedding (entity is kept)
- `vector_reindex` — retrain the IVF index over current vectors (no-op for HNSW)
- `vector_refresh_graph_cache` — rebuild the petgraph adjacency cache from relations
- `vector_store_stats` — embedding count, dimension, backend kind, index/graph sizes

## Architecture

```
main.rs / vec_main.rs → MCPServer { kg, vs: Option<VectorStore> }
  ├── run_stdio()  — newline-delimited JSON-RPC over stdio
  ├── run_tcp()    — same framing, concurrent connections
  └── run_http()   — MCP Streamable HTTP (axum, POST/GET /mcp)
        └── process_request()
              ├── "initialize"      → protocol version + capabilities
              ├── "tools/list"      → cached tool list
              ├── "tools/call"      → dispatch to handler by name
              ├── "ping"            → null
              └── "notifications/…" → no reply
```

All transports share the transport-agnostic dispatch core
(`dispatch_line()` / `dispatch_http_body()`).

### Concurrency & locking

- `GraphHandle` uses `parking_lot::Mutex` for the writer connection and caches; a
  read-only connection pool serves concurrent reads under WAL.
- The `VectorStore` uses `DashMap` for name↔ID maps and an `RwLock` over the
  petgraph cache; the HNSW index is internally synchronized, the IVF index behind
  its own `RwLock`. Vector tools are gated behind `--vectors`; a pure-KG server
  carries no vector state.
- Heavy dispatch (graph lock + optional fsync) is offloaded to
  `tokio::task::spawn_blocking` to keep the reactor responsive.
- TCP connections are capped at 128 concurrent.

### Request size limits

| Parameter | Limit |
|---|---|
| Max request body | 16 MB |
| Name max bytes | 1,024 |
| Observation max bytes | 65,536 |
| Max entities / relations / observations / names per request | 1,000 |
| Max search limit | 1,000 |
| Max neighbor depth | 16 |
| Max `find_all_paths` depth / results | 10 / 100 |
| Max embedding dimensions *(`--vectors`)* | 4,096 |
| Max `topK` *(`--vectors`)* | 100 |
| Max items per `vector_batch_upsert` | 1,024 |

## Development

```sh
cargo test                       # 100+ unit + integration tests
cargo clippy                     # lint (lib + binaries)
cargo build --release            # LTO + fat, opt-level 3
cargo run --release --bin bench  # standalone benchmark
```

The test suite covers protocol handling, all tool handlers, CRUD/search/path
persistence, concurrency, fuzzy invariant checks, and — for the vector subsystem —
the IVF-Flat index (training, probe search, upsert/remove, metrics), both ANN
backends end-to-end, the modern retrieval tools (batch upsert, more-like-this,
recommend, MMR), vector gating when `--vectors` is off, input validation, the
tunable index config, and HTTP bearer-token authentication.

## Versioning & compatibility

Follows [Semantic Versioning](https://semver.org). The current line is **4.x**,
targeting MCP revision `2025-11-25`.

**4.0 breaking changes:** the separate `mcp-memory-vec` *server* is gone — vectors
are now an opt-in subsystem of `mcp-memory` behind `--vectors`. The
`mcp-memory-vec` binary remains as a thin alias (`= mcp-memory --vectors`), so
existing configs and the shared on-disk format are unaffected. New fresh databases
default to 4 KB SQLite pages (was 16 KB) and `auto_vacuum=INCREMENTAL`; existing
databases keep their original page size.

| mcp-memory | MCP revision (default) | Negotiates |
|---|---|---|
| 4.x | `2025-11-25` | `2025-06-18`, `2025-03-26`, `2024-11-05` |
| 3.x | `2025-11-25` | `2025-06-18`, `2025-03-26`, `2024-11-05` |
| 2.x | `2025-11-25` | `2025-06-18`, `2025-03-26`, `2024-11-05` |
| ≤ 1.x | `2024-11-05` ||

## License

Licensed under the [Apache License, Version 2.0](LICENSE).