zeph 0.21.2 - Docs.rs

---
aliases:
  - Memory System
  - Memory Pipeline
  - Semantic Memory
tags:
  - sdd
  - spec
  - memory
  - persistence
  - contract
created: 2026-04-08
status: approved
related:
  - "[[MOC-specs]]"
  - "[[001-system-invariants/spec#6. Memory Pipeline Contract]]"
  - "[[002-agent-loop/spec]]"
  - "[[004-6-graph-memory]]"
  - "[[004-16-shadow-memory-safety]]"
  - "[[004-17-implicit-conflict-detection]]"
  - "[[004-18-five-signal-retrieval]]"
  - "[[012-graph-memory/spec]]"
  - "[[031-database-abstraction/spec]]"
---

# Spec: Memory System (Parent Index)

> [!info]
> SQLite + Qdrant dual backend, semantic response cache, anchored summarization,
> compaction probe, importance scoring, admission control, and cost-sensitive routing.

## Overview

This is the **parent specification** for the memory subsystem. For detailed information on
specific areas, refer to the child specs below.

---

## Child Specifications

| Spec | Topic | Purpose |
|------|-------|---------|
| [[004-1-architecture]] | Core Pipeline | Conversation storage, message lifecycle, recall architecture |
| [[004-2-compaction]] | Deferred Summaries | Tool pair summarization, context pressure thresholds, compaction probe |
| [[004-3-admission-control]] | A-MAC & Filtering | Five-factor importance scoring, admission gates, noise filtering |
| [[004-4-embeddings]] | Embedding Generation | Batch strategies, backfill, concurrent workers, TUI integration |
| [[004-5-temporal-decay]] | Retention Scoring | Ebbinghaus forgetting curve, access frequency, decay-based eviction |
| [[004-6-graph-memory]] | Graph Memory | Entity graph, BFS recall, MAGMA typed edges, SYNAPSE spreading activation, A-MEM link weights |

---

## System Architecture

```
SemanticMemory (Arc)
├── SqliteStore         — conversation history, message metadata
├── QdrantStore         — vector embeddings for semantic search
├── GraphStore          — entity/edge graph, see [[004-6-graph-memory]]
└── ResponseCache       — deduplicated LLM response cache
```

---

## Key Contracts

### Message Storage
- Every user + assistant turn persisted to SQLite immediately
- Messages are never deleted — only marked with `compacted_at` or summarized
- `MessageMetadata`: `agent_visible`, `user_visible`, `focus_pinned` — all respected
- Conversation identified by `ConversationId`; one per agent session

### Admission Control
- Not all messages admitted to memory (noise filtering via A-MAC)
- Six-factor scoring: recency, relevance, tool-use, entity-density, length, frequency
- Frequency factor: entity mention count with temporal decay (ref. [[004-3-admission-control]])
- Threshold-based gate: score < threshold → rejected (returns None)
- Fail-open: admission error → admit message anyway

### Compaction & Eviction
- Soft threshold (~60%) marks tool pairs for summary
- Hard threshold (~90%) applies summaries before LLM call
- Eviction prioritizes low-retention-score messages (Ebbinghaus model)
- Original messages stored in SQLite even after compaction

### Embedding Pipeline
- All admitted messages queued for embedding (async)
- Batched embedding with configurable batch size and timeout
- Backfill at boot recovers unembed messages
- TUI shows queue depth, batch status, backfill progress

### Retention Scoring
- Based on Ebbinghaus forgetting curve: `R(t) = e^(-t / halflife)`
- Boosted by access frequency (messages accessed more often decay slower)
- Scores [0.0, 1.0]: 1.0 fresh+accessed, 0.0 old+never-accessed
- Drives eviction and (optionally) admission decisions

### Tier-Aware Retrieval Gating

Memory is organized in three tiers; retrieval strategy adapts to query complexity:

**Shallow queries** (simple entity lookup, single-turn context, dependency depth < 2 hops)
- Search: SQLite working + episodic stores only
- Outcome: fast, low token cost, reduced latency p95

**Medium queries** (multi-turn reasoning, tool-output dependencies)
- Search: episodic (SQLite) + semantic vectors (Qdrant with MMR reranking)
- Outcome: balanced accuracy and cost

**Deep queries** (complex reasoning, cross-session patterns, causal inference)
- Search: all three tiers (working + episodic + semantic), aggregate by relevance
- Outcome: highest recall, higher token cost

Complexity classification via [[024-complexity-triage-routing]]; same signal applied to memory tier selection.
See [[004-14-memory-tiering-rfc-decision]] for design rationale.

---

## Skill Promotion Pathways

Memory clusters and dense graph regions can be automatically promoted to the skills layer via multiple pathways (see [[004-15-memory-skill-coevolution-rfc-decision]]):

### HeLa-Mem Consolidation Path (Implemented)
Periodic daemon identifies high-weight clusters in the episodic memory graph:
- Query: `SELECT node_id, degree, AVG(weight) FROM ... GROUP BY node_id HAVING degree * AVG(weight) > consolidation_threshold`
- Collect neighboring node summaries; pass to `consolidate_provider` LLM
- LLM output → stored as `PersistentRule` or enqueued as skill draft
- See [[004-11-memory-hela-mem]] §3.4 for implementation details

### Cognitive Folding Path (RFC #4218, P2)
Idle-time memory reorganization via clustering on co-occurrence patterns:
- Triggers when agent idle > `idle_window_ms` (default 5s)
- Extract dense subgraphs via community detection (edge weight > `clustering_threshold`)
- Diversity check: skip homogeneous clusters (entropy threshold)
- Candidates → skill drafts or new episodic entities
- See [[004-11-memory-hela-mem]] §3.6 for details

### Future: MemQ Value Learning Path (RFC #4218, P3)
When [[042-experiments]] framework matures:
- Track `(memory_fact_id, retrieval_context) → outcome` tuples
- Compute Q-values via temporal difference learning
- High-Q facts promoted based on retrieval-context utility
- Tracked in issue #4042

All promotion pathways terminate in the skill registry (see [[005-skills/spec]]) via the `draft_skill()` pathway. No new skill storage infrastructure is required.

---

## Experience Compression Spectrum

`[memory.compression_spectrum]` (disabled by default, #3305, #3350): introduces
`CompressionLevel` (Episodic / Procedural / Declarative) and a `RetrievalPolicy` that
skips episodic recall when the token budget is below configurable thresholds. A background
`PromotionEngine` scans recent episodic memory and promotes repeated patterns to SKILL.md
entries (off hot path, via `JoinSet`).

`ExperienceStore` records tool outcomes fire-and-forget via `TaskClass::Telemetry`;
evolution sweep runs every N user turns; both gate on `memory.graph.experience.enabled`
with zero overhead when disabled (#3318, #3349).

### Key Invariants

- `PromotionEngine` runs off the hot path — NEVER on the agent turn thread
- `ExperienceStore` wiring must be guarded by `memory.graph.experience.enabled`
- `MemoryError::Promotion` is a distinct error variant in `zeph-memory` (thiserror, no anyhow)

## MemFlow Tiered Retrieval (#3791, arXiv:2605.03312)

Intent-driven tiered retrieval with three depth tiers controlled by LLM-based classifier and validator.

| Tier | Intent | Retrieval Scope |
|------|--------|----------------|
| `ProfileLookup` | Simple entity/fact lookup | SQLite working store only |
| `TargetedRetrieval` | Multi-turn reasoning | Episodic + Qdrant semantic |
| `DeepReasoning` | Complex cross-session inference | All tiers + graph traversal |

- Classifier LLM call determines the tier before retrieval; validator LLM call verifies the result post-retrieval
- Both calls route via configurable `*_provider` fields (multi-model pattern)
- Fail-open heuristic: on classifier error or timeout → default to `TargetedRetrieval`
- Disabled by default: `[memory.memflow] enabled = false`

### Config

```toml
[memory.memflow]
enabled = false
classifier_provider = ""   # [[llm.providers]] name; empty = primary
validator_provider  = ""
```

---

## ScrapMem Optical Forgetting (#3791, arXiv:2605.03804)

Progressive `ContentFidelity` decay for messages that have not been accessed recently,
combined with an Episodic Memory Graph (EM-Graph) for causal-temporal event linking.

| Fidelity Level | Storage | Description |
|----------------|---------|-------------|
| `Full` | Complete content | No decay applied |
| `Compressed` | Summarized form | Low-access messages; summary generated at decay point |
| `SummaryOnly` | Brief summary | Very low-access; original tokens freed |

- EM-Graph edges link events by causal and temporal proximity; used for context-aware decay decisions
- Decay is driven by a background loop (`optical_forgetting_loop`) that runs off the hot path
- Disabled by default: `[memory.scrap_mem] enabled = false`

### Key Invariants

- `optical_forgetting_loop` MUST NOT run on the agent turn thread
- Decay is irreversible within a session; original content is not restored on access
- EM-Graph edges persist in SQLite (episodic graph table) — decay state is recoverable across restarts

---

## Tiered Recall (`recall_tiered`) Wired to Agent Loop (#3968)

`recall_tiered` and `optical_forgetting_loop` are now wired into the production agent loop
(previously implemented but not called from `zeph-core`). `recall_tiered` is called from
`ContextAssembler::gather()` as the default semantic recall path when MemFlow is enabled.
When disabled, the prior `recall_semantic` path is used unchanged.

---

## Sub-Specifications

| Sub-spec | Feature |
|---|---|
| [[004-10-memory-memmachine-retrieval]] | MemMachine retrieval depth, query bias correction, episode preservation |
| [[004-11-memory-hela-mem]] | HeLa-Mem Hebbian edge weights, consolidation, spreading activation |
| [[004-12-memory-reasoning-bank]] | ReasoningBank distilled strategy memory, self-judge pipeline |
| [[004-16-shadow-memory-safety]] | Shadow Memory Safety — trajectory-level attack defense (MAGE, issue #3695) |
| [[004-17-implicit-conflict-detection]] | Implicit Conflict Detection — STALE/CUPMem fuzzy predicate matching and propagation-aware SYNAPSE recall (issue #3702) |
| [[004-18-five-signal-retrieval]] | Five-Signal Retrieval — access frequency, causal distance, novelty signals + async consolidation daemon (MemTier, issue #3703) |

## Integration Points

- [[002-agent-loop/spec]] — context assembly calls recall pipeline
- [[001-system-invariants/spec]] — memory pipeline contract
- [[012-graph-memory/spec]] — optional graph-based entity tracking
- [[031-database-abstraction/spec]] — SQLite persistence layer

---

## Sources

### External
- **A-MEM** (NeurIPS 2025) — agentic write-time memory linking: https://arxiv.org/abs/2502.12110
- **Zep: Temporal Knowledge Graph** (Jan 2025) — temporal edges, LongMemEval +18.5%: https://arxiv.org/abs/2501.13956
- **TA-Mem** (Mar 2026) — adaptive retrieval dispatch: https://arxiv.org/abs/2603.09297
- **Episodic-to-Semantic Memory Promotion** (Jan 2025): https://arxiv.org/pdf/2501.11739
- **MAGMA** (Jan 2026) — multi-graph agent memory: https://arxiv.org/abs/2601.03236
- **Context Engineering in Manus** (Oct 2025) — tool output reference pattern: https://rlancemartin.github.io/2025/10/15/manus/
- **Structured Anchored Summarization** (Factory.ai, 2025) — typed schemas: https://factory.ai/news/compressing-context

### Internal
| File | Contents |
|---|---|
| `crates/zeph-memory/src/semantic/mod.rs` | `SemanticMemory`, recall pipeline, compaction |
| `crates/zeph-memory/src/graph/mod.rs` | Graph memory integration |
| `crates/zeph-llm/src/provider.rs` | `MessagePart`, `MessageMetadata` definitions |
| `crates/zeph-core/src/agent/mod.rs` | `MemoryState`, deferred summary apply logic |

---

## See Also

- [[MOC-specs]] — Master index of all specifications
- [[001-system-invariants/spec]] — System-wide non-negotiable rules