memory-mcp 0.10.1

MCP server for semantic memory — pure-Rust embeddings, vector search, git-backed storage
Documentation
# Deployment

## Quick start (any Kubernetes cluster)

The manifests in `deploy/k8s/` are generic and work on any cluster.

### 1. Build and push the image

```bash
docker build -t YOUR_REGISTRY/memory-mcp:latest .
docker push YOUR_REGISTRY/memory-mcp:latest
```

Or let the [GitHub Actions workflow](../.github/workflows/build.yml) do it on
push to `main`. The workflow pushes to `ghcr.io/butterflyskies/memory-mcp`
using the built-in `GITHUB_TOKEN` — no extra secrets needed.

### 2. Apply the manifests

```bash
# Create namespace, RBAC, PVC, and Service
kubectl apply -f deploy/k8s/namespace.yml
kubectl apply -f deploy/k8s/rbac.yml
kubectl apply -f deploy/k8s/pvc.yml
kubectl apply -f deploy/k8s/service.yml
```

### 3. Create the GitHub token Secret

Option A — use the built-in auth subcommand. Run from a host with kubeconfig
access, or as a one-off Kubernetes Job using the `memory-mcp-bootstrap`
ServiceAccount (see `deploy/k8s/rbac.yml`):

```bash
memory-mcp auth login \
  --store k8s-secret \
  --k8s-namespace memory-mcp \
  --k8s-secret-name memory-mcp-github-token
```

Option B — apply the template manually after filling in the token:

```bash
# Replace <TOKEN> with your GitHub personal access token (repo scope)
kubectl create secret generic memory-mcp-github-token \
  --namespace memory-mcp \
  --from-literal=token=<TOKEN>
```

### 4. Deploy

Edit `deploy/k8s/deployment.yml`:
- Set `image:` to your registry/image:tag
- Uncomment and set `MEMORY_MCP_REMOTE_URL` to your private GitHub repository URL
- Uncomment the `MEMORY_MCP_GITHUB_TOKEN` secretKeyRef block (required when using a remote)

Then apply:

```bash
kubectl apply -f deploy/k8s/deployment.yml
```

### 5. Expose externally

The Service is ClusterIP only. Add an Ingress or Gateway API HTTPRoute to
expose the server outside the cluster. TLS should terminate at the gateway;
the binary serves plain HTTP on port 8080.

Example with a basic Ingress:

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: memory-mcp
  namespace: memory-mcp
spec:
  rules:
    - host: memories.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: memory-mcp
                port:
                  number: 8080
```

### 6. Connect Claude Code

Add to `~/.claude.json` (or your project-local `.mcp.json`):

```json
{
  "mcpServers": {
    "memory": {
      "type": "http",
      "url": "https://memories.example.com/mcp"
    }
  }
}
```

---

## Operational notes

### Resource sizing

- **Memory**: ~300MB baseline (model weights + index). Grows with memory
  count — plan for 512MB request, 1Gi limit as a starting point.
- **CPU**: Embedding computation is CPU-bound. Single-query latency is
  ~50ms on modern hardware. 250m request, 1 core limit.
- **Disk**: The git repo + HF model cache. The BGE-small-en-v1.5 model
  weighs ~130MB; each memory is a small markdown file (~1–5KB). 1Gi is
  comfortable for thousands of memories. Scale the PVC if your repo
  grows significantly or if you switch to a larger embedding model.

### Model weights mmap safety

The embedding model is memory-mapped from the HuggingFace cache. If the
cache is mutated while the server is running (e.g., by `huggingface-cli`),
the mmap'd region becomes undefined. The container image pre-warms the
model so this is only a concern if the cache volume is shared. Keep
`HF_HOME` on a volume dedicated to the pod.

### Backup

The git repo IS the backup — it syncs to a GitHub remote. If the PVC is
lost, re-clone from the remote. The vector index is rebuilt from the repo
on startup.