recall-echo
Persistent four-layer memory system for pulse-null entities. Gives AI agents long-term recall across sessions — a knowledge graph with Bayesian confidence, curated facts, recent session context, and searchable conversation archives.
Why
LLM coding agents start every session from zero. Built-in memory is typically a single flat file with no session continuity, no short-term vs long-term distinction, and no searchable history. Memory management that depends on the agent remembering to save things is circular.
recall-echo makes the memory lifecycle mechanical. When running as a pulse-null plugin, archival and checkpointing happen automatically. The agent writes to MEMORY.md during sessions. Everything else is handled by the system.
Architecture
recall-echo provides a four-layer memory model:
┌──────────────────────────────────────────────────────────┐
│ MEMORY ARCHITECTURE │
│ │
│ Layer 0: KNOWLEDGE GRAPH (structured, semantic) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ SurrealDB + FastEmbed │ │
│ │ Entities, relationships, episodes │ │
│ │ Bayesian confidence · Semantic search (HNSW) │ │
│ │ LLM-powered extraction + deduplication │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ Layer 1: CURATED (always in context) │
│ ┌───────────┐ │
│ │ MEMORY.md │ Facts, preferences, patterns │
│ └───────────┘ Distilled & maintained by the agent │
│ │
│ Layer 2: SHORT-TERM (FIFO rolling window) │
│ ┌───────────────┐ │
│ │ EPHEMERAL.md │ Last N session summaries │
│ └───────────────┘ Appended on archive, auto-trimmed │
│ │
│ Layer 3: LONG-TERM (searched on demand) │
│ ┌─────────────┐ ┌────────────────────────────┐ │
│ │ ARCHIVE.md │───→│ conversations/ │ │
│ └─────────────┘ │ conversation-001.md │ │
│ │ conversation-002.md │ │
│ │ ... │ │
│ └────────────────────────────┘ │
│ YAML frontmatter + markdown │
│ LLM-summarized or algorithmic │
└──────────────────────────────────────────────────────────┘
Knowledge Graph (Layer 0, default)
The knowledge graph is the structural foundation of recall-echo. It turns conversation archives into structured, searchable memory. Enabled by default via the graph feature.
What it does. When conversations are archived, recall-echo extracts entities (people, projects, tools, concepts) and the relationships between them, then stores them in an embedded SurrealDB graph database. Semantic search via fastembed embeddings lets agents find relevant memories by meaning, not just keywords — so a search for "authentication" surfaces conversations about JWT, OAuth, and login flows even if those exact words weren't in the query.
Why Bayesian confidence. Traditional knowledge graphs store facts as absolutes — "Dani uses NeoVim" is either true or not. But memories aren't binary. Things change, context matters, and some things are more certain than others. recall-echo uses a Beta-Binomial Bayesian confidence model on every relationship edge:
- Each relationship starts with a confidence prior based on how it was established: authoritative (1.0), explicit (0.9), inferred (0.6), or speculative (0.3)
- When new evidence corroborates a relationship, confidence increases. When evidence contradicts it, confidence decreases
- Updates are gradual — it takes ~10 observations to overwhelm the prior, so a single contradictory mention doesn't erase established knowledge
- Multi-hop queries compound confidence along the path, naturally preferring shorter, higher-confidence routes
This means the graph handles contradictions, reinforces patterns over time, and lets uncertain or stale knowledge fade gracefully — instead of requiring manual cleanup or producing false-positive retrievals.
Entity types: person, project, tool, service, preference, decision, event, concept, case, pattern, thread, thought, question, observation, policy, measurement, outcome. Mutable types (person, project, tool, etc.) can be updated; immutable types (decision, event, case, etc.) are append-only.
Extraction pipeline: When conversations are archived, an LLM-powered pipeline chunks the text (~500 tokens), extracts entities and relationships in parallel (up to 10 concurrent), then deduplicates sequentially with LLM-assisted skip/create/merge decisions. Re-extracted relationships receive Bayesian corroboration updates, so knowledge confirmed across multiple conversations gains confidence automatically.
Tiered content: Entities store content at three levels — L0 (abstract, used for embeddings and cheap traversal), L1 (overview, used for reranking), and L2 (full content, pulled on demand). This keeps graph traversal fast.
Graph commands:
# Core
# Search & traversal
# Data management
# Pipeline & integrations
All paths are relative to an entity root directory:
{entity_root}/memory/
├── MEMORY.md # Layer 1 — curated facts (≤200 lines)
├── EPHEMERAL.md # Layer 2 — rolling session window (default 5)
├── ARCHIVE.md # Layer 3 — conversation index
├── conversations/ # Layer 3 — full conversation archives
│ ├── conversation-001.md
│ ├── conversation-002.md
│ └── ...
├── graph/ # Layer 0 — knowledge graph
│ ├── surreal/ # SurrealDB embedded data
│ └── models/ # FastEmbed cached models
└── .recall-echo.toml # Optional configuration
How It Works
recall-echo operates in two modes:
As a pulse-null Plugin
recall-echo is a native pulse-null plugin implementing the Plugin trait from pulse-system-types. It fills the required Memory role (exactly one per entity).
- pulse-null calls
archive::archive_session()at session end — creates a conversation archive with LLM-generated summary, updates ARCHIVE.md index, appends to EPHEMERAL.md - pulse-null calls
checkpoint::create_checkpoint()before context compaction — preserves conversation state before details are lost - Health checks report memory directory state (Healthy / Degraded / Down)
- Setup wizard prompts for entity_root during
pulse-null init
use RecallEcho;
// pulse-null creates the plugin via factory:
let plugin = create.await?;
// plugin.role() == PluginRole::Memory
As a Standalone CLI
For administration and use outside pulse-null:
Installation
cargo install
Prebuilt binaries
Download from GitHub Releases for:
x86_64-unknown-linux-gnuaarch64-unknown-linux-gnux86_64-apple-darwinaarch64-apple-darwin(Apple Silicon)
From source
Commands
recall-echo init
Create the memory directory structure under entity_root. Creates memory/ with MEMORY.md, EPHEMERAL.md, ARCHIVE.md, and conversations/. Idempotent — never overwrites existing files.
recall-echo status
Health check with a dashboard showing memory usage, ephemeral state, archive count, recent sessions, and health assessment. Color-coded bars show MEMORY.md capacity (green → yellow → red at 75% / 90%).
recall-echo — healthy
MEMORY.md: 142/200 lines (71%)
EPHEMERAL.md: 3 entries
Archives: 23 conversations
recall-echo search
Search conversation archives.
Ranked search scores files by match count, word coverage, and recency.
recall-echo distill
Analyze MEMORY.md and suggest cleanup. Identifies sections over 30 lines that could be extracted to topic files (e.g., memory/debugging.md) with references left in MEMORY.md.
recall-echo consume
Output EPHEMERAL.md content wrapped in memory markers. Used by hooks or scripts that need to inject recent session context into an agent's input.
recall-echo archive-session
Archive a Claude Code session from a JSONL transcript. Extracts messages, generates a summary (LLM-powered when available, algorithmic fallback), updates ARCHIVE.md, and appends to EPHEMERAL.md. Designed to run as a SessionEnd hook.
recall-echo checkpoint
Save a checkpoint before context compression. Creates a numbered checkpoint file so the agent can fill in summary details. Designed to run as a PreCompact hook.
recall-echo graph
Knowledge graph operations. See the Architecture section above for the full command list.
Search & traversal:
graph search <query>— Semantic search across entities. Supports--limit,--type(filter by entity type), and--keyword(filter by name/abstract).graph query <query>— Hybrid query combining semantic search, confidence-weighted graph expansion, and optional episode retrieval. Supports--depth(expansion depth, default 1, 0 = semantic only),--episodes(include episode results),--limit,--type,--keyword.graph traverse <entity>— DFS traversal from a named entity with cycle detection. Displays confidence percentages on edges (e.g.[85%]). Edges below 0.1 confidence are filtered. Supports--depth(default 2) and--type-filter.
Data management:
graph add-entity— Manually add an entity. Requires--name,--type,--abstract. Supports--overviewand--source.graph relate <from> --rel <type> --target <to>— Create a relationship between two entities. Supports--descriptionand--source.graph ingest <archive>— Ingest a single archive file (creates episodes, no LLM required).graph ingest-all— Scan conversations/ and ingest all un-ingested archives.graph extract— LLM-powered entity extraction. Supports--log <N>(single archive),--all(all un-extracted),--dry-run,--model,--provider(anthropic or openai),--delay-ms.
Pipeline & integrations:
graph pipeline sync— Sync pipeline documents (LEARNING.md, THOUGHTS.md, CURIOSITY.md, REFLECTIONS.md, PRAXIS.md) into the graph. Idempotent — diffs parsed entries vs existing graph entities.graph pipeline status— Pipeline health with staleness tracking.graph pipeline flow <entity>— Trace an entity's lineage through the pipeline stages.graph pipeline stale— List stale pipeline entities. Supports--days(threshold, default 7).graph vigil-sync— Sync vigil-pulse metacognitive signals and caliber outcomes into the graph as Measurement and Outcome entities. Supports--signals-pathand--outcomes-path.
Archive Format
Conversation archives use YAML frontmatter with markdown content:
---
log: 5
date: "2026-03-06T10:30:00Z"
session_id: "abc123"
message_count: 34
duration: "30m"
source: "session"
topics:
---
## Summary
Summary of the conversation with key outcomes.
**Decisions**: Chose JWT for authentication.
**Action Items**: Implement token refresh endpoint.
### User
(message content)
### Assistant
(message content)
## Tags
**Files**: src/auth.rs, src/middleware.rs
**Tools**: Read, Edit, Bash
Summaries are LLM-generated when a provider is available (via pulse-null), with silent fallback to algorithmic extraction.
Configuration
Optional .recall-echo.toml in the memory directory:
[]
= 5 # Rolling window size (1-50, default 5)
[]
= "claude-code" # LLM provider: "claude", "claude-code", or "ollama"
= "" # Model name (provider default if empty)
= "" # Custom API base URL (provider default if empty)
[]
= "/path/to/journal" # Directory containing pipeline documents
= true # Auto-sync pipeline docs to graph on archive
| Section | Key | Default | Description |
|---|---|---|---|
ephemeral |
max_entries |
5 |
Rolling window size for session summaries (1-50) |
llm |
provider |
claude |
LLM backend for summarization (claude, claude-code, ollama) |
llm |
model |
provider default | Model name |
llm |
api_base |
provider default | Custom API base URL |
pipeline |
docs_dir |
— | Path to pipeline documents (LEARNING.md, THOUGHTS.md, etc.) |
pipeline |
auto_sync |
false |
Sync pipeline documents to the knowledge graph on archive |
All settings have sensible defaults. Missing file or invalid values fall back silently.
Contributing
See CONTRIBUTING.md for branch naming, commit conventions, and workflow.