MAG — Persistent memory for your AI tools

One memory store. Works across every tool on your machine.

MAG stores the context your AI tools lose when you close the session. It runs as a local binary, single SQLite file, no external services required. API embedding models are optional. Every tool that supports MCP reads from the same memory: Cursor, Claude Code, Windsurf, Cline, Claude Desktop. The retrieval pipeline is hybrid FTS5 + ONNX embeddings + graph traversal, built in Rust, benchmarked at 91.1% word-overlap on LoCoMo. AutoMem's published score is 90.5%.

Why MAG

One binary, one file. No Python runtime, no external service required. All data stays local by default.
Hybrid retrieval. FTS5, ONNX embeddings, graph traversal, and multi-phase advanced search in a single pipeline.
Works across tools. Cursor, Claude Code, Windsurf, Cline, Claude Desktop — same memory, same file.
Agent-oriented. Checkpoints, reminders, lessons, profile state, lifecycle tools, and MCP integration.
Portable. Additive migrations, JSON export/import, no standing service dependency.

Compared With AutoMem / OpenMemory MCP

MAG and AutoMem solve the same class of problem: durable memory for local agents. The differences are in the operating model.

	MAG	AutoMem / OpenMemory MCP
Runtime	Rust binary, no Python	Python + pip/uv
Storage	SQLite (single file)	SQLite
Retrieval	Hybrid FTS5 + ONNX semantic + graph traversal	Vector search
Embedding	Local ONNX (bge-small, Apache 2.0)	Local or API
Migrations	Additive (no breaking schema changes)	—
License	MIT	Varies
External services	None required	None required

On the shared LoCoMo benchmark (word-overlap scoring), MAG scores 91.1% vs AutoMem's published 90.5%. See Benchmarks below.

Install

Shell (macOS / Linux)

curl -fsSL https://raw.githubusercontent.com/George-RD/mag/main/install.sh | sh

Homebrew

brew install George-RD/mag/mag

npm

npm install -g mag-memory

pip

pip install mag-memory

Cargo

cargo install mag-memory

From source

git clone https://github.com/George-RD/mag.git
cd mag
cargo build --release
# Binary: ./target/release/mag

GitHub Releases

Download prebuilt binaries for macOS (x64, ARM), Linux (x64, ARM), and Windows (x64) from Releases.

Configure Your MCP Client

MAG runs as an MCP server. Add it to your client's config to start using it.

Claude Desktop / Cursor / Windsurf

Add to your MCP config file:

{
  "mcpServers": {
    "mag": {
      "command": "mag",
      "args": ["serve"]
    }
  }
}

If mag is not on your PATH, use the full path:

{
  "mcpServers": {
    "mag": {
      "command": "/Users/you/.mag/bin/mag",
      "args": ["serve"]
    }
  }
}

Claude Code

claude mcp add mag -- mag serve

npx (no install)

Use MAG without installing anything:

{
  "mcpServers": {
    "mag": {
      "command": "npx",
      "args": ["-y", "mag-memory", "serve"]
    }
  }
}

CLI

mag ingest "The retry logic should use exponential backoff with jitter"
mag search "retry logic"
mag semantic-search "how should retries work?"
mag advanced-search "deployment rollback process"
mag recent --limit 5
mag --help

Models download automatically on first use. Data is stored in ~/.mag/. Run mag paths to see active locations.

Benchmarks

Current benchmark snapshots were captured on 2026-03-19 at commit 26e51cf3 on macOS aarch64. Word-overlap is the primary LoCoMo metric (comparable to AutoMem's published 90.5%).

LoCoMo (word-overlap scoring, 2 samples, bge-small-en-v1.5)

Category	Word-Overlap
Overall	91.1%
Evidence recall	90.2%
Single-hop	87.6%
Temporal	91.5%
Multi-hop	75.6%
Open-domain	94.0%
Adversarial	90.9%

788 memories, 304 questions across 2 samples. Avg query: 34 ms, avg embed: 7.5 ms, seed time: 9.7 s.

Embedding Model Comparison (LoCoMo word-overlap, 2 samples)

ONNX models use int8 quantization unless marked ¹ (no pre-built int8 available); the voyage-4-nano 1024-dim row uses mixed int8/fp32 quantization (int8 and fp32 runs averaged). API models are unquantized. Temporal Reasoning is 91.5% for every model and excluded. Scores across models within ~1 pp are within benchmark variance (304 questions, ~1.5% SE).

Model	Params	Dim	WO%	EvRec%	1-Hop	Multi-Hop	Open	Adv	AvgEmb	File	RAM
granite-embedding-30m-english ¹	30M	384	90.5%	87.5%	88.9%	76.9%	91.7%	91.1%	3.8 ms	116 MB	350 MB
snowflake-arctic-embed-xs int8	22M	384	90.2%	88.7%	87.0%	76.9%	92.7%	89.5%	3.9 ms	22 MB	137 MB
e5-small-v2 int8	33M	384	90.8%	88.6%	88.4%	73.1%	93.0%	91.1%	4.8 ms	32 MB	152 MB
all-MiniLM-L6-v2 int8	22M	384	91.3%	89.2%	88.5%	76.9%	93.1%	92.3%	7.4 ms	22 MB	95 MB
bge-small-en-v1.5 int8 (default)	33M	384	91.1%	90.2%	87.6%	75.6%	94.0%	90.9%	7.0 ms	32 MB	180 MB
snowflake-arctic-embed-s int8	33M	384	90.8%	87.8%	89.5%	73.1%	93.0%	90.8%	7.8 ms	32 MB	178 MB
bge-base-en-v1.5 int8	109M	768	91.8%	90.4%	87.1%	76.9%	94.9%	92.7%	10.5 ms	105 MB	265 MB
gte-small int8	33M	384	90.9%	89.5%	86.2%	73.1%	94.0%	91.7%	11.7 ms	32 MB	162 MB
all-MiniLM-L12-v2 int8	33M	384	91.1%	90.4%	86.8%	75.6%	94.3%	90.9%	12.3 ms	32 MB	158 MB
nomic-embed-text-v1.5 int8	137M	768	90.0%	86.6%	88.4%	74.4%	90.8%	91.0%	42 ms	131 MB	351 MB
voyage-4-nano int8 512-dim	—	512	91.3%	91.6%	88.8%	75.6%	93.7%	91.6%	58 ms	—	—
voyage-4-nano int8/fp32 1024-dim	—	1024	91.8%	91.3%	93.5%	75.6%	93.3%	91.6%	82–172 ms	—	—
voyage-4-lite (API)	—	1024	91.1%	91.0%	91.1%	73.1%	93.4%	90.2%	304 ms	—	—
voyage-4 (API)	—	1024	92.0%	92.7%	92.3%	75.6%	94.8%	90.6%	297 ms	—	—
text-embedding-3-large (API)	—	3072	93.0%	93.4%	94.6%	74.4%	95.3%	93.1%	444 ms	—	—

¹ FP32 (no pre-built int8 ONNX available).

bge-small-en-v1.5 is the default (Apache 2.0, Xenova int8). It uses 32 MB on disk and 180 MB peak RSS — a 35% reduction vs the previous FP32 default (277 MB) with identical quality. MiniLM-L6 int8 is the lightest option at 22 MB / 95 MB RSS with equivalent quality. bge-base int8 is the best local ONNX model at +0.7 pp for only 1.4× the latency. Switching embedding models requires re-embedding stored data — see issue #89. Multi-hop is stuck at 73–77% across all models — architectural issue tracked in issue #84.

Other Benchmarks (earlier snapshot, 2026-03-12)

Benchmark	Result	Notes
Local LongMemEval-style set	`98 / 100`	`1538 ms` seeding, `1013 ms` querying, `335568 KB` peak RSS
Scale benchmark	`100% Recall@5` at `1K`, `5K`, `10K`	`19.61 ms` mean, `42.56 ms` p95, `51.94 ms` p99 at `10K`
`omega-memory` comparison	MAG `98 / 100` vs omega `90 / 100`	omega seeded and queried faster on this local workload
Official `LongMemEval_S` sample	`8 / 10`	external dataset fetch works; full `500`-question publication is still pending

Full methodology, commands, and result tables are in docs/benchmarks.md. Historical runs are tracked in docs/benchmark_log.csv.

Benchmark Safety

Benchmark runs do not touch the normal MAG production database. The official LongMemEval harness uses a fresh in-memory SQLite database per question, and the LoCoMo harness uses a fresh in-memory SQLite database per sample. The main persistent side effect is dataset/model caching under the active MAG root.

Compatibility

Tool	macOS	Linux	Windows	Status	Last Verified
Claude Desktop	✅	✅	—	Supported	2026-03-20
Cursor	✅	✅	—	Supported	2026-03-20
Claude Code	✅	✅	—	Supported	2026-03-20
Windsurf	✅	✅	—	Community-reported	2026-03-20
Cline	✅	✅	—	Community-reported	2026-03-20

Windows support is untested. If you verify MAG on a tool or platform not listed here, open an issue to update the matrix.

Guarantee

MAG has no servers to shut down. Your memories live in a single SQLite file. Open it with any SQLite browser. Export everything with one command. The binary keeps working whether we maintain this project or not. MIT licensed, no tiers.

By default, the binary reads and writes a local SQLite file. API embedding models (optional) send query text to external services. See SECURITY.md for the full data flow audit.

Retrieval Model

MAG currently supports:

text search over FTS5
semantic search over ONNX embeddings
similar-memory lookup from a stored memory ID
graph traversal and version-chain lookup
advanced retrieval that fuses vector and lexical candidates

The advanced path combines vector similarity and FTS hits with reciprocal-rank fusion, then refines ranking with event type, time decay, word overlap, importance, priority, and feedback signals. Queries are classified by intent (Keyword / Factual / Conceptual / General) to weight retrieval modes appropriately. Entity extraction runs at ingest time for auto-tagging and graph-edge creation.

Architecture

mag
├── src/
│   ├── main.rs
│   ├── cli.rs
│   ├── mcp_server.rs
│   ├── app_paths.rs
│   ├── benchmarking.rs
│   └── memory_core/
│       └── storage/sqlite/
│           └── entities.rs
├── benches/
│   ├── longmemeval/
│   ├── locomo/
│   ├── onnx_profile.rs
│   └── scale_bench.rs
└── tests/

Development

Local Quality Gate

cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-features

Benchmark Commands

The recommended way to run the standard LoCoMo benchmark:

./scripts/bench.sh

For individual benchmarks or the omega-memory comparison, clone omega-memory locally first. The comparison script accepts either --omega-repo or the OMEGA_REPO environment variable.

cargo run --release --bin fetch_benchmark_data -- --dataset all
cargo run --release --bin longmemeval_bench -- --json
cargo run --release --bin longmemeval_bench -- --official --questions 10 --json
cargo run --release --bin locomo_bench -- --json
cargo run --release --bin scale_bench -- --max-scale 10000 --search-queries 50
OMEGA_REPO=/path/to/omega-memory uv run --project "$OMEGA_REPO" python benches/python_comparison.py

License

MIT

mag-memory 0.1.0