memory-mcp 0.3.0

MCP server for semantic memory — pure-Rust embeddings, vector search, git-backed storage
Documentation
# memory-mcp

A semantic memory server for AI coding agents. Memories are stored as markdown files in a git repository and indexed for semantic retrieval using local embeddings — no API keys, no cloud dependency for inference.

Built on the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) so any compatible agent (Claude Code, Cursor, Windsurf, custom agents) can remember, recall, and sync knowledge across sessions and devices.

## Why

AI coding agents are stateless between sessions. They lose context about your preferences, your codebase's architecture, past decisions, and hard-won debugging knowledge. memory-mcp gives agents a persistent, searchable memory that:

- **Survives across sessions** — what an agent learned yesterday is available today
- **Syncs across devices** — git push/pull keeps memories consistent everywhere
- **Stays private** — embeddings run locally (no data leaves your machine), storage is a git repo you control
- **Scales with you** — semantic search finds relevant memories even as the collection grows into hundreds or thousands

## Quick start

### From source

```bash
git clone https://github.com/butterflyskies/memory-mcp.git
cd memory-mcp
cargo build --release
```

### Run the server

```bash
# Starts on 127.0.0.1:8080 with a local git repo at ~/.memory-mcp
./target/release/memory-mcp serve

# Or configure via environment variables
MEMORY_MCP_BIND=0.0.0.0:9090 \
MEMORY_MCP_REPO_PATH=/path/to/memories \
./target/release/memory-mcp serve
```

### Connect Claude Code

Add to `~/.claude.json` or your project's `.mcp.json`:

```json
{
  "mcpServers": {
    "memory": {
      "type": "http",
      "url": "http://localhost:8080/mcp"
    }
  }
}
```

The agent can now use `remember`, `recall`, `read`, `edit`, `forget`, `list`, and `sync` as tools.

## Tools

| Tool | Description |
|------|-------------|
| **remember** | Store a new memory with content, name, tags, and scope. Embeds and indexes it for semantic search. |
| **recall** | Search memories by natural-language query. Returns the top matches ranked by semantic similarity. |
| **read** | Fetch a specific memory by name with full content and metadata. |
| **edit** | Update an existing memory. Supports partial updates — omit fields to preserve them. |
| **forget** | Delete a memory by name. Removes from git and the search index. |
| **list** | Browse all memories, optionally filtered by scope. |
| **sync** | Push/pull the memory repo with a git remote. Handles conflicts via recency-based resolution. |

### Example: agent remembers a debugging insight

```
Tool: remember
{
  "name": "postgres/connection-pool-timeout",
  "content": "When the connection pool times out under load, the issue is usually...",
  "tags": ["postgres", "debugging", "performance"],
  "scope": "project:my-api"
}
```

### Example: agent recalls relevant context

```
Tool: recall
{
  "query": "database connection issues under high load",
  "scope": "project:my-api",
  "limit": 5
}
```

## How it works

```
Agent ──MCP──▶ memory-mcp ──▶ candle (local BERT embeddings)
                    │                    │
                    ▼                    ▼
              git repo            usearch HNSW index
            (markdown files)    (semantic search)
              git remote
            (sync across devices)
```

1. **Storage**: memories are markdown files with YAML frontmatter (tags, scope, timestamps) committed to a local git repository
2. **Embeddings**: content is embedded locally using [candle]https://github.com/huggingface/candle with a BERT model — no external API calls
3. **Search**: embeddings are indexed in an HNSW graph ([usearch]https://github.com/unum-cloud/usearch) for fast approximate nearest-neighbor search
4. **Sync**: the git repo can push/pull to a remote (GitHub, GitLab, etc.) for cross-device sync with automatic conflict resolution
5. **Auth**: GitHub tokens via OAuth device flow (`memory-mcp auth login`), stored in the system keyring or a Kubernetes Secret

### Memory format

```markdown
---
id: 550e8400-e29b-41d4-a716-446655440000
name: postgres/connection-pool-timeout
tags: [postgres, debugging, performance]
scope:
  type: Project
  name: my-api
created_at: 2026-03-18T12:00:00Z
updated_at: 2026-03-18T12:00:00Z
source: debugging-session
---

When the connection pool times out under load, the issue is usually...
```

### Scoping

Memories are scoped to control visibility:

- **`global`** — available to all projects (preferences, standards, general knowledge)
- **`project:{name}`** — scoped to a specific project (architecture decisions, debugging context, team conventions)

## Configuration

All options can be set via CLI flags or environment variables:

| Flag | Env var | Default | Description |
|------|---------|---------|-------------|
| `--bind` | `MEMORY_MCP_BIND` | `127.0.0.1:8080` | Address to bind the HTTP server |
| `--repo-path` | `MEMORY_MCP_REPO_PATH` | `~/.memory-mcp` | Path to the git-backed memory repository |
| `--mcp-path` | `MEMORY_MCP_PATH` | `/mcp` | URL path for the MCP endpoint |
| `--remote-url` | `MEMORY_MCP_REMOTE_URL` | *(none)* | Git remote URL. Omit for local-only mode. |
| `--branch` | `MEMORY_MCP_BRANCH` | `main` | Branch for push/pull operations |

## Authentication

For syncing with a private GitHub remote:

```bash
# Interactive OAuth device flow — opens browser, stores token in keyring
memory-mcp auth login

# Or specify storage explicitly
memory-mcp auth login --store keyring   # system keyring (default)
memory-mcp auth login --store file      # ~/.memory-mcp-token
memory-mcp auth login --store stdout    # print token, pipe to your own storage

# Kubernetes deployments (requires --features k8s)
memory-mcp auth login --store k8s-secret

# Check current auth status
memory-mcp auth status
```

Token resolution order: `MEMORY_MCP_GITHUB_TOKEN` env var → `~/.memory-mcp-token` file → system keyring.

## Embedding model

Embeddings are computed locally using [candle](https://github.com/huggingface/candle) with [BGE-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) (384 dimensions). The model is downloaded from HuggingFace Hub on first run — no API keys required. Use `memory-mcp warmup` to pre-download.

## Deployment

### Container image

```bash
# Pull from GitHub Container Registry
docker pull ghcr.io/butterflyskies/memory-mcp:latest

# Or build locally
docker build -t memory-mcp .
```

The container image:
- Uses a multi-stage build (compile → model warmup → slim runtime)
- Ships with the embedding model pre-downloaded (no internet needed at startup)
- Runs as a non-root user (`memory-mcp`, uid 1000)
- Includes SLSA provenance and SBOM attestations

### Kubernetes

Manifests are provided in `deploy/k8s/`:

```bash
kubectl apply -f deploy/k8s/namespace.yml
kubectl apply -f deploy/k8s/rbac.yml
kubectl apply -f deploy/k8s/pvc.yml
kubectl apply -f deploy/k8s/service.yml
kubectl apply -f deploy/k8s/deployment.yml
```

The deployment is hardened with:
- `readOnlyRootFilesystem`, `runAsNonRoot`, `drop: [ALL]` capabilities
- Split ServiceAccounts (runtime vs bootstrap)
- Seccomp `RuntimeDefault` profile

See [docs/deployment.md](docs/deployment.md) for the full guide.

## Architecture decisions

Significant design decisions are documented as Architecture Decision Records in [`docs/adr/`](docs/adr/). Each ADR captures the context, decision, and consequences of a choice — giving future contributors the "why" behind the codebase.

## Security

- **Local inference**: embeddings are computed on your machine. Memory content never leaves your network unless you push to a remote.
- **Token handling**: tokens are stored in the system keyring (or Kubernetes Secrets), never in CLI arguments or git history. Process umask is set to `0o077`.
- **Input validation**: memory names, content size, and nesting depth are validated. Path traversal and symlink attacks are blocked.
- **Container hardening**: non-root user, read-only filesystem, dropped capabilities, seccomp profile.
- **Supply chain**: CI pins all GitHub Actions to commit SHAs. Container images include SLSA provenance and SBOM attestations. Dependencies are audited with `cargo audit` on every build.

## Development

```bash
# Run tests
cargo nextest run --workspace --no-fail-fast

# With Kubernetes feature
cargo nextest run --workspace --no-fail-fast --features k8s

# Lint
cargo fmt --check
cargo clippy --workspace -- -D warnings

# Audit dependencies
cargo audit
```

## License

Licensed under either of

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License ([LICENSE-MIT]LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.