steel-memory 0.1.2

A spatial memory palace for AI agents with semantic vector search, knowledge graphs, and MCP tools
Documentation

steel-memory

A Rust implementation of mempalace — a spatial memory palace for AI agents. Rewritten in Rust for size, speed, and efficiency.

Features

  • 22 MCP tools matching the mempalacejs API
  • Semantic vector search using FastEmbed (AllMiniLML6V2, ~90MB, downloaded on first use)
  • SQLite storage — no external database needed (~/.steel-memory/palace.sqlite3)
  • Knowledge graph — temporal RDF-style triples with invalidation (knowledge_graph.sqlite3)
  • Palace graph — BFS traversal across rooms/wings, tunnel detection
  • AAAK dialect — compressed memory format for efficient context priming
  • 4-layer memory stack — L0 identity, L1 AAAK story, L2 on-demand recall, L3 semantic search
  • Agent diary — per-agent timestamped journal

Installation

cargo build --release

Binary: target/release/steel-memory

MCP Server

Communicates over stdin/stdout using the Model Context Protocol JSON-RPC protocol.

Claude Desktop / MCP Config

{
  "mcpServers": {
    "steel-memory": {
      "command": "/path/to/steel-memory"
    }
  }
}

Tools

Tool Description
mempalace_status Total drawers and palace path
mempalace_list_wings All wings with counts
mempalace_list_rooms All rooms (optional wing filter)
mempalace_get_taxonomy Full wing → room → count map
mempalace_search Semantic search (query, limit?, wing?, room?)
mempalace_check_duplicate Duplicate detection by similarity
mempalace_wake_up L0 identity + L1 AAAK context (wing?)
mempalace_recall L2 on-demand drawer list (wing?, room?, limit?)
mempalace_build_graph Palace room/wing graph
mempalace_traverse_graph BFS from a room (start_room, max_hops?)
mempalace_find_tunnels Rooms shared across wings
mempalace_graph_stats Graph topology statistics
mempalace_add_drawer Add memory (wing, room, content)
mempalace_delete_drawer Delete memory by ID
mempalace_kg_query Query KG by entity (direction?)
mempalace_kg_add Add triple (subject, predicate, object)
mempalace_kg_invalidate Soft-delete a triple
mempalace_kg_timeline Chronological triples for an entity
mempalace_kg_stats Knowledge graph statistics
mempalace_diary_write Write agent diary entry
mempalace_diary_read Read agent diary entries
mempalace_get_aaak_spec AAAK dialect specification

Configuration

Path Description
~/.steel-memory/ Palace root directory
~/.steel-memory/palace.sqlite3 Vector + drawer storage
~/.steel-memory/knowledge_graph.sqlite3 Knowledge graph
~/.steel-memory/identity.txt L0 identity (create manually)

Set STEEL_MEMORY_COLLECTION env var to change the collection name.

First Run

On first run with semantic search enabled, the AllMiniLML6V2 model (~90MB) will be downloaded automatically from Hugging Face. Subsequent runs use the cached model.

Testing

cargo test

LongMemEval Benchmarking

You can benchmark steel-memory retrieval against LongMemEval with the included CLI.

The easiest way to download a dataset and run the benchmark is the helper script:

./scripts/run-longmemeval-benchmark.sh --dataset oracle

The script downloads the requested dataset into .benchmarks/longmemeval/ and writes per-question results to .benchmarks/longmemeval/results/.

Useful options:

  • --dataset oracle|s|m|all picks which LongMemEval file to download and run
  • --granularity session|turn controls whether the benchmark indexes whole sessions or user turns
  • --max-questions N limits the number of evaluation instances for a smaller test run
  • --runner bench|bin uses either cargo bench or cargo run --release
  • --download-only fetches datasets without starting the benchmark

If you want to run the benchmark manually with an existing file, use:

cargo bench --bench longmemeval -- \
  --data /absolute/path/to/longmemeval_oracle.json \
  --granularity session \
  --output /absolute/path/to/longmemeval-results.jsonl

If you prefer, the same benchmark can also be run with cargo run --bin longmemeval-benchmark -- ....

Options:

  • --granularity session|turn controls whether the benchmark indexes whole sessions or user turns
  • --max-questions N limits the number of evaluation instances for a smaller test run
  • --output writes per-question retrieval results as JSONL while the summary is printed to stdout

The benchmark uses steel-memory's embedding and SQLite-backed search stack, and reports LongMemEval-style retrieval metrics such as recall_any@k, recall_all@k, and ndcg_any@k. On first use, the embedding model may still need to be downloaded.

The benchmark uses the same FastEmbed AllMiniLML6V2 model as the MCP server. The first run may download roughly 90MB of model data before benchmarking starts.