memory-mcp
A semantic memory server for AI coding agents. Memories are stored as markdown files in a git repository and indexed for semantic retrieval using local embeddings — no API keys, no cloud dependency for inference.
Built on the Model Context Protocol (MCP) so any compatible agent (Claude Code, Cursor, Windsurf, custom agents) can remember, recall, and sync knowledge across sessions and devices.
Why
AI coding agents are stateless between sessions. They lose context about your preferences, your codebase's architecture, past decisions, and hard-won debugging knowledge. memory-mcp gives agents a persistent, searchable memory that:
- Survives across sessions — what an agent learned yesterday is available today
- Syncs across devices — git push/pull keeps memories consistent everywhere
- Stays private — embeddings run locally (no data leaves your machine), storage is a git repo you control
- Scales with you — semantic search finds relevant memories even as the collection grows into hundreds or thousands
Quick start
Install from crates.io
Or from source
Run the server
On first run, the embedding model (~130MB) is downloaded from HuggingFace Hub.
You can pre-download it with memory-mcp warmup.
# Starts on 127.0.0.1:8080 with a local git repo at ~/.memory-mcp
# Or configure via environment variables
MEMORY_MCP_BIND=0.0.0.0:9090 \
MEMORY_MCP_REPO_PATH=/path/to/memories \
Connect your editor
memory-mcp uses Streamable HTTP transport. Most MCP clients support it natively.
Add to ~/.claude.json or your project's .mcp.json:
Add to .cursor/mcp.json (project) or ~/.cursor/mcp.json (global):
Add to .vscode/mcp.json in your workspace:
Note: VS Code uses "servers" as the root key, not "mcpServers".
Add to ~/.codeium/windsurf/mcp_config.json:
Note: Windsurf uses "serverUrl", not "url".
Add to .continue/mcpServers/memory.yaml:
mcpServers:
- name: memory
type: streamable-http
url: http://localhost:8080/mcp
Add via Settings > Connectors > Add custom server with URL http://localhost:8080/mcp.
Alternatively, use mcp-remote as a stdio bridge in claude_desktop_config.json:
Zed does not yet support Streamable HTTP natively. Use mcp-remote as a stdio bridge in ~/.config/zed/settings.json:
The agent can now use remember, recall, read, edit, forget, list, and sync as tools.
Tools
| Tool | Description |
|---|---|
| remember | Store a new memory with content, name, tags, and scope. Embeds and indexes it for semantic search. |
| recall | Search memories by natural-language query. Returns the top matches ranked by semantic similarity. |
| read | Fetch a specific memory by name with full content and metadata. |
| edit | Update an existing memory. Supports partial updates — omit fields to preserve them. |
| forget | Delete a memory by name. Removes from git and the search index. |
| list | Browse all memories, optionally filtered by scope. |
| sync | Push/pull the memory repo with a git remote. Handles conflicts via recency-based resolution. |
Example: agent remembers a debugging insight
Tool: remember
{
"name": "postgres/connection-pool-timeout",
"content": "When the connection pool times out under load, the issue is usually...",
"tags": ["postgres", "debugging", "performance"],
"scope": "project:my-api"
}
Example: agent recalls relevant context
Tool: recall
{
"query": "database connection issues under high load",
"scope": "project:my-api",
"limit": 5
}
How it works
Agent ──MCP──▶ memory-mcp ──▶ candle (local BERT embeddings)
│ │
▼ ▼
git repo usearch HNSW index
(markdown files) (semantic search)
│
▼
git remote
(sync across devices)
- Storage: memories are markdown files with YAML frontmatter (tags, scope, timestamps) committed to a local git repository
- Embeddings: content is embedded locally using candle with a BERT model — no external API calls
- Search: embeddings are indexed in an HNSW graph (usearch) for fast approximate nearest-neighbor search
- Sync: the git repo can push/pull to a remote (GitHub, GitLab, etc.) for cross-device sync with automatic conflict resolution
- Auth: GitHub tokens via OAuth device flow (
memory-mcp auth login), stored in the system keyring or a Kubernetes Secret
Memory format
id: 550e8400-e29b-41d4-a716-446655440000
name: postgres/connection-pool-timeout
tags: [postgres, debugging, performance]
scope:
type: Project
name: my-api
created_at: 2026-03-18T12:00:00Z
updated_at: 2026-03-18T12:00:00Z
source: debugging-session
When the connection pool times out under load, the issue is usually...
Scoping
Memories are scoped to control visibility:
global— available to all projects (preferences, standards, general knowledge)project:{name}— scoped to a specific project (architecture decisions, debugging context, team conventions)
Configuration
All options can be set via CLI flags or environment variables:
| Flag | Env var | Default | Description |
|---|---|---|---|
--bind |
MEMORY_MCP_BIND |
127.0.0.1:8080 |
Address to bind the HTTP server |
--repo-path |
MEMORY_MCP_REPO_PATH |
~/.memory-mcp |
Path to the git-backed memory repository |
--mcp-path |
MEMORY_MCP_PATH |
/mcp |
URL path for the MCP endpoint |
--remote-url |
MEMORY_MCP_REMOTE_URL |
(none) | Git remote URL. Omit for local-only mode. |
--branch |
MEMORY_MCP_BRANCH |
main |
Branch for push/pull operations |
Authentication
For syncing with a private GitHub remote:
# Interactive OAuth device flow — opens browser, stores token in keyring
# Or specify storage explicitly
# Kubernetes deployments (requires --features k8s)
# Check current auth status
Token resolution order: MEMORY_MCP_GITHUB_TOKEN env var → ~/.config/memory-mcp/token file → system keyring.
Embedding model
Embeddings are computed locally using candle with BGE-small-en-v1.5 (384 dimensions). The model is downloaded from HuggingFace Hub on first run — no API keys required. Use memory-mcp warmup to pre-download.
Deployment
Container image
# Pull from GitHub Container Registry
# Or build locally
The container image:
- Uses a multi-stage build (compile → model warmup → slim runtime)
- Ships with the embedding model pre-downloaded (no internet needed at startup)
- Runs as a non-root user (
memory-mcp, uid 1000) - Includes SLSA provenance and SBOM attestations
Kubernetes
Manifests are provided in deploy/k8s/:
The deployment is hardened with:
readOnlyRootFilesystem,runAsNonRoot,drop: [ALL]capabilities- Split ServiceAccounts (runtime vs bootstrap)
- Seccomp
RuntimeDefaultprofile
See docs/deployment.md for the full guide.
Architecture decisions
Significant design decisions are documented as Architecture Decision Records in docs/adr/. Each ADR captures the context, decision, and consequences of a choice — giving future contributors the "why" behind the codebase.
Security
- Local inference: embeddings are computed on your machine. Memory content never leaves your network unless you push to a remote.
- Token handling: tokens are stored in the system keyring (or Kubernetes Secrets), never in CLI arguments or git history. Process umask is set to
0o077. - Input validation: memory names, content size, and nesting depth are validated. Path traversal and symlink attacks are blocked.
- Container hardening: non-root user, read-only filesystem, dropped capabilities, seccomp profile.
- Supply chain: CI pins all GitHub Actions to commit SHAs. Container images include SLSA provenance and SBOM attestations. Dependencies are audited with
cargo auditon every build.
Roadmap
The core memory engine is stable — store, search, sync, and authenticate all work today. Planned next:
- BM25 keyword search alongside semantic search (#55)
- Cross-platform vector index with brute-force fallback for Windows (#56)
- Deduplication on remember (semantic similarity threshold)
- Tag-based filtering in recall
- Richer observability with structured tracing across all subsystems (#52)
See TODO.md for the full plan and open issues for what's in flight.
Development
# Run tests
# With Kubernetes feature
# Lint
# Audit dependencies
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.