ai-memory 0.6.1

# Developer Guide

## Architecture Overview

`ai-memory` is an AI-agnostic memory management system built as a single Rust binary that serves three roles:

1. **MCP tool server** -- stdio JSON-RPC server exposing 21 memory tools + 2 MCP prompts for any MCP-compatible AI client (Claude AI, OpenAI ChatGPT, xAI Grok, META Llama, and others)
2. **CLI tool** -- direct SQLite operations for store, recall, search, list, etc. (completely AI-agnostic)
3. **HTTP daemon** -- an Axum web server exposing the same operations as a REST API with 24 endpoints (completely AI-agnostic)

**Key architectural features:** Zero token cost (no context loaded until recall), TOON compact default response format (79% smaller than JSON), MCP prompts capability (`recall-first` behavioral rules + `memory-workflow` reference card), 4 feature tiers with optional local LLMs via Ollama, true dedup on title+namespace, 6-factor recall scoring with score field in responses.

All three interfaces share the same database layer (`db.rs`) and validation layer (`validate.rs`). The daemon adds automatic garbage collection (every 30 minutes) and graceful shutdown with WAL checkpointing.

```
main.rs          -- CLI parsing (clap), daemon setup (axum), command dispatch (26 commands)
models.rs        -- Data structures: Memory, MemoryLink, query types, constants
handlers.rs      -- HTTP request handlers (Axum extractors + JSON responses), error sanitization
db.rs            -- All SQLite operations: CRUD, FTS5, recall scoring, GC, migration, FTS query sanitization, transactional touch/consolidate
mcp.rs           -- MCP (Model Context Protocol) server over stdio JSON-RPC, 26 tools, notification handling
validate.rs      -- Input validation for all write paths
errors.rs        -- Structured error types (ApiError, MemoryError), error sanitization for HTTP responses
color.rs         -- ANSI color output for CLI (zero dependencies, auto-detects terminal)
config.rs        -- Tier configuration system (keyword, semantic, smart, autonomous), feature gating, TtlConfig, and archive_on_gc
embeddings.rs    -- Embedding pipeline: HuggingFace model loading, vector generation, cosine similarity
llm.rs           -- LLM integration via Ollama for query expansion, auto-tagging, contradiction detection
mine.rs          -- Retroactive conversation import from Claude, ChatGPT, and Slack exports
reranker.rs      -- Hybrid recall algorithm: blends semantic (embedding) and keyword (FTS5) scores
hnsw.rs          -- In-memory HNSW vector index for approximate nearest-neighbor search
```

### Embedding Pipeline (semantic tier and above)

When running at the `semantic` tier or higher, ai-memory loads a HuggingFace embedding model at startup and generates dense vector embeddings for each memory. The pipeline:

1. **Model loading** (`embeddings.rs`) -- downloads and caches a sentence-transformer model from HuggingFace on first run
2. **Embedding generation** -- new memories are embedded at insert time; existing memories are backfilled on first startup with embeddings enabled
3. **Storage** -- embeddings are stored as BLOB columns in the `memories` table (schema migration v3)
4. **Hybrid recall** (`reranker.rs`) -- at recall time, the query is embedded and compared against stored embeddings via cosine similarity, then blended with FTS5 keyword scores to produce a final ranking

**Embedding models:**
- `all-MiniLM-L6-v2` (384 dimensions, ~90 MB) -- used at the `semantic` tier
- `nomic-embed-text-v1.5` (768 dimensions, ~270 MB) -- used at the `smart` and `autonomous` tiers

## Code Structure

### `src/main.rs`

- `Cli` struct with `clap` derive -- defines all CLI commands and global flags (`--db`, `--json`)
- `Command` enum -- `Serve`, `Mcp`, `Store`, `Update`, `Recall`, `Search`, `Get`, `List`, `Delete`, `Promote`, `Forget`, `Link`, `Consolidate`, `Resolve`, `Shell`, `Sync`, `AutoConsolidate`, `Gc`, `Stats`, `Namespaces`, `Export`, `Import`, `Completions`, `Man`, `Mine`, `Archive` (26 commands)
- `StoreArgs` includes `--expires-at` and `--ttl-secs` flags for custom expiration
- `UpdateArgs` includes `--expires-at` flag for setting expiration on existing memories
- `ListArgs` includes `--offset` flag for pagination
- `auto_namespace()` -- detects namespace from git remote URL or directory name
- `human_age()` -- formats ISO timestamps as "2h ago", "3d ago" for CLI output
- `serve()` -- starts the Axum server with all routes (24 endpoints including `POST /memories/{id}/promote` and 4 archive endpoints), spawns GC task, handles graceful shutdown via SIGINT with WAL checkpoint
- `cmd_*()` functions -- one per CLI command, each opens the DB directly

### `src/models.rs`

- `Tier` enum (`Short`, `Mid`, `Long`) with TTL defaults: 6h, 7d, none
- `Memory` struct -- the core data type with 15 fields (includes extensible `metadata` JSON column)
- `MemoryLink` struct -- typed directional links between memories
- Request types: `CreateMemory`, `UpdateMemory`, `SearchQuery`, `ListQuery`, `RecallQuery`, `RecallBody`, `LinkBody`, `ForgetQuery`, `ConsolidateBody`, `ImportBody`
- Response types: `Stats`, `TierCount`, `NamespaceCount`
- `TtlConfig` struct -- per-tier TTL overrides loaded from `config.toml` (`short_ttl_secs`, `mid_ttl_secs`, `long_ttl_secs`, `short_extend_secs`, `mid_extend_secs`)
- `ResolvedTtl` struct -- resolved TTL values after merging config defaults with per-tier overrides
- Constants: `MAX_CONTENT_SIZE` (65536), `PROMOTION_THRESHOLD` (5), `SHORT_TTL_EXTEND_SECS` (3600), `MID_TTL_EXTEND_SECS` (86400)

### `src/mcp.rs`

The MCP (Model Context Protocol) server implementation. MCP is an open standard -- this server works with any MCP-compatible AI client. Runs over stdio, processing one JSON-RPC message per line. Exposes **26 tools**.

- `RpcRequest` / `RpcResponse` / `RpcError` -- JSON-RPC 2.0 types
- `tool_definitions()` -- returns the 23 tool schemas for `tools/list` (includes `memory_capabilities`, `memory_expand_query`, `memory_auto_tag`, `memory_detect_contradiction`, `memory_archive_list`, `memory_archive_restore`, `memory_archive_purge`, `memory_archive_stats`)
  - `memory_recall` schema includes `until` parameter and `format` parameter (enum: `"json"`, `"toon"`, `"toon_compact"`, default: `"toon_compact"`)
  - `memory_search` schema includes `format` parameter (enum: `"json"`, `"toon"`, `"toon_compact"`, default: `"toon_compact"`) and enforces `maximum: 200` on limit
  - `memory_list` schema includes `format` parameter (enum: `"json"`, `"toon"`, `"toon_compact"`, default: `"toon_compact"`) and enforces `maximum: 200` on limit
  - `memory_consolidate` schema enforces `minItems: 2, maxItems: 100` on IDs
  - `memory_update` schema includes `expires_at` parameter
- `handle_store()`, `handle_recall()`, `handle_search()`, `handle_list()`, `handle_delete()`, `handle_promote()`, `handle_forget()`, `handle_stats()`, `handle_update()`, `handle_get()`, `handle_link()`, `handle_get_links()`, `handle_consolidate()`, `handle_archive_list()`, `handle_archive_restore()`, `handle_archive_purge()`, `handle_archive_stats()` -- one handler per tool
- `handle_request()` -- routes JSON-RPC methods: `initialize`, `notifications/initialized`, `tools/list`, `tools/call`, `ping`
- Notification handling: all JSON-RPC notifications (requests without an `id` field) are correctly skipped without sending a response, per the JSON-RPC 2.0 specification
- `run_mcp_server()` -- main loop: reads lines from stdin, parses JSON-RPC, dispatches, writes responses to stdout

Protocol version: `2024-11-05`. All tool responses are wrapped in MCP content blocks (`{"content": [{"type": "text", "text": "..."}]}`). The protocol is AI-agnostic -- any MCP client can connect.

**MCP Prompts:** The server exposes 2 prompts via `prompts/list`:
- **recall-first** -- System prompt with 8 behavioral rules for proactive memory use. Supports an optional `namespace` argument for scoped recall.
- **memory-workflow** -- Quick reference card for all 26 tool usage patterns.

**MCP Error Codes:** The server uses standard JSON-RPC 2.0 error codes:
- `-32700` -- Parse error (malformed JSON)
- `-32600` -- Invalid request (missing required fields)
- `-32601` -- Method not found (unknown JSON-RPC method)
- `-32602` -- Invalid params (bad tool arguments)
- Application-level errors are returned as text in the MCP content block with `"isError": true`, not as JSON-RPC error codes.

### `src/validate.rs`

Input validation for every write path. Called by CLI, HTTP handlers, and MCP handlers.

| Function | Validates |
|----------|-----------|
| `validate_title()` | Non-empty, max 512 bytes, no null bytes |
| `validate_content()` | Non-empty, max 64KB, no null bytes |
| `validate_namespace()` | Non-empty, max 128 bytes, no slashes/spaces/nulls |
| `validate_source()` | Must be one of: user, claude, hook, api, cli, import, consolidation, system |
| `validate_tags()` | Max 50 tags, each max 128 bytes, no empty strings |
| `validate_id()` | Non-empty, max 128 bytes, no null bytes |
| `validate_expires_at()` | Valid RFC3339, not in the past |
| `validate_ttl_secs()` | Positive, max 1 year |
| `validate_relation()` | Must be one of: related_to, supersedes, contradicts, derived_from |
| `validate_confidence()` | Finite number, 0.0 to 1.0 |
| `validate_priority()` | Integer, 1 to 10 |
| `validate_create()` | Full validation for CreateMemory |
| `validate_memory()` | Full validation for Memory (import) |
| `validate_update()` | Validates only present fields |
| `validate_link()` | Validates both IDs, relation, and rejects self-links |
| `validate_consolidate()` | 2-100 IDs, validates title, summary, namespace |

### `src/color.rs`

ANSI color output for CLI -- zero external dependencies. Auto-detects terminal via `std::io::IsTerminal`.

- `init()` -- sets global color flag based on terminal detection
- `short()`, `mid()`, `long()` -- tier-specific colors (red, yellow, green)
- `dim()`, `bold()`, `cyan()` -- semantic colors
- `tier_color()` -- dispatches to tier color by string name
- `priority_bar()` -- renders a 10-character bar (`█████░░░░░`) colored by priority level (green for 8+, yellow for 5-7, red for 1-4)

Colors are suppressed when stdout is not a terminal (e.g., piping to file). The `--json` flag bypasses color output entirely.

### `src/errors.rs`

Structured error types for the HTTP API:

- `ApiError` -- serializable error with `code` and `message` fields
- `MemoryError` enum -- `NotFound`, `ValidationFailed`, `DatabaseError`, `Conflict`
- Implements `IntoResponse` for Axum, mapping to appropriate HTTP status codes
- Implements `From<anyhow::Error>` and `From<rusqlite::Error>`
- **Error sanitization**: `DatabaseError` responses return a generic `"Internal server error"` message to clients, never leaking internal database error details. Detailed errors are logged server-side.

### `src/handlers.rs`

All HTTP handlers for the 24-endpoint REST API. State is `Arc<Mutex<(Connection, PathBuf)>>`. Each handler acquires the lock, validates input, performs DB operations, returns JSON.

Key handlers:
- `create_memory` / `bulk_create` -- memory creation with deduplication (bulk limited to 1,000 items)
- `get_memory` / `list_memories` / `update_memory` / `delete_memory` -- standard CRUD
- `promote_memory` -- `POST /memories/{id}/promote` endpoint for promoting to long-term
- `search` / `recall` -- FTS-powered search with sanitized queries
- `forget` / `consolidate` -- bulk operations
- `import_memories` -- import with 1,000 item limit
- `archive_list` / `archive_restore` / `archive_purge` / `archive_stats` -- archive management endpoints
- All ID path parameters are validated before database access

> **Note:** HTTP handlers are tested via integration tests (`tests/integration.rs`), not unit tests.

### `src/db.rs`

The database layer. Key functions:

| Function | Description |
|----------|-------------|
| `open()` | Opens DB, sets WAL mode, creates schema, runs migrations |
| `insert()` | Upsert on `(title, namespace)` -- never downgrades tier, keeps max priority |
| `get()` | Fetch by ID |
| `touch()` | Bump access count, extend TTL, auto-promote mid->long at 5 accesses, reinforce priority every 10 accesses. **Uses BEGIN IMMEDIATE/COMMIT transaction** for atomicity. |
| `update()` | Partial update of any fields |
| `delete()` | Delete by ID (links cascade) |
| `forget()` | Bulk delete by namespace + FTS pattern + tier |
| `list()` | List with filters: namespace, tier, priority, date range, tags, offset |
| `search()` | FTS5 AND search with 6-factor composite scoring |
| `recall()` | FTS5 OR search + touch + auto-promote + TTL extension |
| `find_contradictions()` | Find memories in same namespace with similar titles |
| `consolidate()` | Merge multiple memories, delete originals, aggregate tags and max priority. **Uses BEGIN IMMEDIATE/COMMIT transaction** for atomicity. |
| `sanitize_fts_query()` | Strips special characters and quotes tokens to prevent FTS injection |
| `create_link()` / `get_links()` / `delete_link()` | Memory linking (ON DELETE CASCADE) |
| `gc()` | Delete expired memories |
| `stats()` | Aggregate statistics (totals, by tier, by namespace, expiring soon, links, DB size) |
| `list_namespaces()` | List namespaces with memory counts |
| `export_all()` / `export_links()` | Full data export |
| `checkpoint()` | WAL checkpoint (TRUNCATE) for clean shutdown |
| `archive_memory()` | Move a memory to the archive table |
| `list_archived()` | List all archived memories |
| `restore_archived()` | Restore an archived memory to the active table |
| `purge_archive()` | Permanently delete all archived memories |
| `archive_stats()` | Archive statistics (count, size, date range) |
| `health_check()` | Verifies DB accessibility and FTS5 integrity |

**Transaction safety**: `touch()` and `consolidate()` use `BEGIN IMMEDIATE` to acquire a write lock upfront, preventing deadlocks and ensuring the entire read-modify-write cycle is atomic. This is critical for `touch()` because it reads the current access count, computes promotion/reinforcement logic, and writes back -- all of which must be atomic under concurrent access.

**FTS query sanitization**: The `sanitize_fts_query()` function strips all FTS5 special characters (`*`, `"`, `(`, `)`, `:`, `+`, `-`, `~`, `^`, `{`, `}`, `[`, `]`, `|`, `\`) from user input and wraps each remaining token in double quotes. This prevents injection of FTS query syntax that could cause unexpected results or errors.

**Migration error handling**: The migration logic only ignores "duplicate column" errors (indicating the migration already ran). All other errors are propagated, ensuring real failures are caught early.

### `src/hnsw.rs`

In-memory HNSW (Hierarchical Navigable Small World) vector index for approximate nearest-neighbor search. The `VectorIndex` struct provides `insert`, `search`, and `remove` operations on dense embeddings. When the index is small (below the HNSW threshold), it falls back to linear scan. The index has no persistence -- it is rebuilt from the database on startup. This keeps the on-disk format simple (embeddings stored as BLOBs in SQLite) while providing fast in-memory ANN search during runtime.

### `src/toon.rs`

TOON (Token-Oriented Object Notation) serializer. Converts JSON recall/search/list responses into the compact TOON wire format. The format spec is documented in the [TOON Format Specification](#toon-format-specification) section below.

## Database Schema

### `memories` table

```sql
CREATE TABLE memories (
    id               TEXT PRIMARY KEY,
    tier             TEXT NOT NULL,           -- 'short', 'mid', 'long'
    namespace        TEXT NOT NULL DEFAULT 'global',
    title            TEXT NOT NULL,
    content          TEXT NOT NULL,
    tags             TEXT NOT NULL DEFAULT '[]',  -- JSON array
    priority         INTEGER NOT NULL DEFAULT 5,  -- 1-10
    confidence       REAL NOT NULL DEFAULT 1.0,   -- 0.0-1.0
    source           TEXT NOT NULL DEFAULT 'api', -- 'user', 'claude', 'hook', 'api', 'cli', etc.
    access_count     INTEGER NOT NULL DEFAULT 0,
    created_at       TEXT NOT NULL,           -- ISO 8601 / RFC3339
    updated_at       TEXT NOT NULL,
    last_accessed_at TEXT,
    expires_at       TEXT,                    -- NULL for long-term
    embedding        BLOB                     -- dense vector (v3 migration, NULL if keyword tier)
);

-- Indexes
CREATE INDEX idx_memories_tier ON memories(tier);
CREATE INDEX idx_memories_namespace ON memories(namespace);
CREATE INDEX idx_memories_priority ON memories(priority DESC);
CREATE INDEX idx_memories_expires ON memories(expires_at);

-- Unique constraint enables upsert/deduplication behavior
CREATE UNIQUE INDEX idx_memories_title_ns ON memories(title, namespace);
```

### `memories_fts` virtual table

```sql
CREATE VIRTUAL TABLE memories_fts USING fts5(
    title, content, tags,
    content=memories, content_rowid=rowid
);
```

Kept in sync via `AFTER INSERT`, `AFTER DELETE`, and `AFTER UPDATE` triggers on `memories`.

### `memory_links` table

```sql
CREATE TABLE memory_links (
    source_id   TEXT NOT NULL REFERENCES memories(id) ON DELETE CASCADE,
    target_id   TEXT NOT NULL REFERENCES memories(id) ON DELETE CASCADE,
    relation    TEXT NOT NULL DEFAULT 'related_to',
    created_at  TEXT NOT NULL,
    PRIMARY KEY (source_id, target_id, relation)
);
```

Relation types: `related_to`, `supersedes`, `contradicts`, `derived_from`.

### `archived_memories` table

```sql
CREATE TABLE archived_memories (
    id               TEXT PRIMARY KEY,
    tier             TEXT NOT NULL,
    namespace        TEXT NOT NULL DEFAULT 'global',
    title            TEXT NOT NULL,
    content          TEXT NOT NULL,
    tags             TEXT NOT NULL DEFAULT '[]',
    priority         INTEGER NOT NULL DEFAULT 5,
    confidence       REAL NOT NULL DEFAULT 1.0,
    source           TEXT NOT NULL DEFAULT 'api',
    access_count     INTEGER NOT NULL DEFAULT 0,
    created_at       TEXT NOT NULL,
    updated_at       TEXT NOT NULL,
    last_accessed_at TEXT,
    expires_at       TEXT,
    archived_at      TEXT NOT NULL,
    archive_reason   TEXT NOT NULL DEFAULT 'gc'
);

-- Indexes
CREATE INDEX idx_archived_memories_namespace ON archived_memories(namespace);
CREATE INDEX idx_archived_memories_archived_at ON archived_memories(archived_at);
```

Added in schema migration v3 -> v4. Stores memories archived by GC before deletion. The 16 columns mirror the `memories` table with two additions: `archived_at` (timestamp of archival) and `archive_reason` (why the memory was archived, e.g., `'gc'`).

### `schema_version` table

Tracks migration state. Current version: 4.

## Recall Scoring Formula

The recall function uses a 6-factor composite score to rank results:

```
score = (fts_rank * -1)                                              -- FTS5 relevance (negated: lower = better in SQLite)
      + (priority * 0.5)                                             -- Priority weight (1-10 -> 0.5-5.0)
      + (MIN(access_count, 50) * 0.1)                                         -- Frequency bonus
      + (confidence * 2.0)                                           -- Certainty weight (0.0-1.0 -> 0.0-2.0)
      + tier_boost                                                   -- long=3.0, mid=1.0, short=0.0
      + (1.0 / (1.0 + (julianday('now') - julianday(updated_at)) * 0.1))  -- Recency decay
```

The `search` function uses the same formula minus the tier boost.

### Hybrid Recall Algorithm (semantic tier and above)

At the `semantic` tier and above, the `reranker.rs` module blends two scoring signals:

1. **Semantic score** -- cosine similarity between the query embedding and each memory's stored embedding (0.0 to 1.0)
2. **Keyword score** -- the existing 6-factor FTS5 composite score, normalized to 0.0-1.0

The final score is a weighted blend: `final = (semantic_weight * semantic_score) + (keyword_weight * keyword_score)`. The default weights are 0.6 semantic / 0.4 keyword. Results from both pipelines are merged, deduplicated by memory ID, and sorted by the blended score.

### Tier Configuration System

The `config.rs` module defines 4 feature tiers that gate functionality:

| Tier | Embeddings | LLM | Tools Available |
|------|-----------|-----|-----------------|
| `keyword` | No | No | 13 base tools + `memory_capabilities` + 4 archive tools |
| `semantic` | Yes | No | 14 base tools + `memory_capabilities` + 4 archive tools |
| `smart` | Yes | Yes | All 26 tools |
| `autonomous` | Yes | Yes | All 26 tools + autonomous behaviors |

The tier is set at startup via `ai-memory mcp --tier <tier>` and cannot be changed at runtime. The `memory_capabilities` tool reports the active tier and which features are available, allowing AI clients to adapt their behavior.

> **Note:** Configuration is loaded once at process startup. Changes to `config.toml` require restarting the ai-memory process (MCP server, HTTP daemon, or CLI) to take effect.

The recency decay factor ensures that recent memories rank higher when other factors are similar. A memory updated today gets a boost of ~1.0, a memory from 10 days ago gets ~0.5, and a memory from 100 days ago gets ~0.09.

### TOON Format Specification

TOON (Token-Oriented Object Notation) is a token-efficient serialization format designed for LLM communication. It replaces JSON for recall, search, and list responses, reducing output size by 40-60% by declaring field names once as a header and listing values row by row with pipe delimiters.

The implementation is in `src/toon.rs`.

#### Structure Overview

A TOON response consists of three parts in order:

1. **Metadata line** (optional) -- key:value pairs for scalar fields
2. **Header line** -- declares field names once
3. **Data rows** -- one per object, values matching header column order

#### Metadata Line Syntax

Scalar (non-array) response fields are serialized as pipe-delimited `key:value` pairs on the first line:

```
count:3|mode:hybrid
```

If there are no metadata fields, this line is omitted entirely.

#### Header Line Syntax

The header declares the array name followed by field names in square brackets, pipe-delimited, ending with a colon:

```
memories[id|title|tier|namespace|priority|confidence|score|access_count|tags|source|created_at|updated_at]:
```

Field names appear exactly once in the entire output regardless of how many data rows follow. This is the primary source of token savings over JSON.

#### Data Row Syntax

Each data row contains values pipe-delimited in the same order as the header fields:

```
abc-123|PostgreSQL 16 config|long|infra|9|1.0|0.763|2|postgres,database|claude|2026-04-03T15:00:00+00:00|2026-04-03T15:00:00+00:00
```

- **Strings** are output as-is (unless they require escaping)
- **Numbers** (integers and floats) are output as their string representation
- **Booleans** are output as `1` (true) or `0` (false)
- **Arrays** (e.g., tags) are joined with commas: `postgres,database`
- **Objects** are output as the literal `[object]`
- **Null/missing values** are represented as an empty string (zero characters between the delimiters), e.g., `abc||mid` means the second field is null

#### Escaping Rules

Two characters require escaping in TOON values:

| Character | Escaped As | Reason |
|-----------|-----------|--------|
| `\|` (pipe) | `\\|` | Pipe is the field delimiter |
| `\n` (newline) | `\\n` | Newline is the row delimiter |

Escaping is only applied when the value actually contains a pipe or newline character. Values without these characters are output verbatim with no additional escaping.

Example: a title containing a pipe like `A|B` is serialized as `A\|B` in the data row.

#### Compact vs Full Mode

TOON supports two modes that differ only in which fields are included:

**Full mode** (12 fields):
```
memories[id|title|tier|namespace|priority|confidence|score|access_count|tags|source|created_at|updated_at]:
```

**Compact mode** (7 fields) -- omits timestamps, confidence, access_count, and source for tighter output:
```
memories[id|title|tier|namespace|priority|score|tags]:
```

The MCP server defaults to compact mode (`toon_compact`). Clients can request `"toon"` for full mode or `"json"` for standard JSON via the `format` parameter on recall, search, and list tools.

#### Search Response Normalization

Search responses use a `"results"` key instead of `"memories"`. The TOON serializer normalizes this internally -- the output always uses the `memories[...]` header regardless of the source key.

#### Complete Parsing Example

Given this JSON response:

```json
{
  "memories": [
    {"id": "abc-123", "title": "PostgreSQL config", "tier": "long", "namespace": "infra", "priority": 9, "score": 0.763, "tags": ["postgres", "db"]},
    {"id": "def-456", "title": "Redis cache", "tier": "long", "namespace": "infra", "priority": 8, "score": 0.541, "tags": ["redis"]},
    {"id": "ghi-789", "title": "Deploy notes", "tier": "mid", "namespace": "infra", "priority": 5, "score": 0.320, "tags": []}
  ],
  "count": 3,
  "mode": "hybrid"
}
```

TOON compact output:

```
count:3|mode:hybrid
memories[id|title|tier|namespace|priority|score|tags]:
abc-123|PostgreSQL config|long|infra|9|0.763|postgres,db
def-456|Redis cache|long|infra|8|0.541|redis
ghi-789|Deploy notes|mid|infra|5|0.32|
```

To parse TOON:

1. Read the first line. If it does not start with a bracket-containing identifier (e.g., `memories[`), parse it as metadata: split on `|`, then split each segment on `:` to get key-value pairs.
2. Read the header line. Extract the array name and field list: strip the trailing `:`, extract the portion inside `[...]`, and split on `|` to get the ordered field names.
3. Read each subsequent non-empty line as a data row. Split on `|` (respecting `\|` escapes), mapping each positional value to the corresponding header field name.
4. Unescape `\|` to `|` and `\n` to newline in each value. Empty values represent null/missing fields.

## API Reference

Base URL: `http://127.0.0.1:9077/api/v1`

All responses are JSON. Error responses include `{"error": "message"}`. Database errors are sanitized -- clients receive `"Internal server error"` instead of raw SQLite error details.

The HTTP API exposes **24 endpoints**.

### Health Check

```
GET /health
```

Deep health check: verifies DB is readable and FTS5 integrity-check passes.

Response (200): `{"status": "ok", "service": "ai-memory"}`
Response (503): `{"status": "error", "service": "ai-memory"}`

### Create Memory

```
POST /memories
Content-Type: application/json

{
  "title": "Project uses Axum",
  "content": "The HTTP server is built with Axum 0.8.",
  "tier": "mid",
  "namespace": "ai-memory",
  "tags": ["rust", "web"],
  "priority": 6,
  "confidence": 1.0,
  "source": "api",
  "expires_at": "2026-04-06T00:00:00Z",
  "ttl_secs": 86400
}
```

Response (201):
```json
{
  "id": "a1b2c3d4-...",
  "tier": "mid",
  "namespace": "ai-memory",
  "title": "Project uses Axum",
  "potential_contradictions": ["id1", "id2"]
}
```

Defaults: `tier=mid`, `namespace=global`, `priority=5`, `confidence=1.0`, `source=api`.

Optional: `expires_at` (RFC3339), `ttl_secs` (overrides tier default). Deduplicates on title+namespace (upsert).

### Bulk Create

```
POST /memories/bulk
Content-Type: application/json

[
  {"title": "Memory 1", "content": "..."},
  {"title": "Memory 2", "content": "..."}
]
```

Response: `{"created": 2, "errors": []}`

Limited to **1,000 items per request**.

### Get Memory

```
GET /memories/{id}
```

Response:
```json
{
  "memory": { ... },
  "links": [ ... ]
}
```

### Update Memory

```
PUT /memories/{id}
Content-Type: application/json

{
  "content": "Updated content",
  "priority": 8,
  "expires_at": "2026-06-01T00:00:00Z"
}
```

All fields are optional. Only provided fields are updated. Validated before write.

### Delete Memory

```
DELETE /memories/{id}
```

Response: `{"deleted": true}`. Links are cascade-deleted.

### Promote Memory

```
POST /memories/{id}/promote
```

Promotes a memory to long-term tier and clears its expiry.

Response: `{"promoted": true}`

### List Memories

```
GET /memories?namespace=my-app&tier=long&limit=20&offset=0&min_priority=5&since=2026-01-01T00:00:00Z&until=2026-12-31T23:59:59Z&tags=rust
```

All query parameters are optional. Max limit is 200.

Response: `{"memories": [...], "count": 5}`

### Search (AND semantics)

```
GET /search?q=database+migration&namespace=my-app&tier=mid&limit=10&since=...&until=...&tags=...
```

Response: `{"results": [...], "count": 3, "query": "database migration"}`

Uses 6-factor scoring (without tier boost). Queries are sanitized to prevent FTS injection.

### Recall (OR semantics + touch)

```
GET /recall?context=auth+flow+jwt&namespace=my-app&limit=10&tags=auth&since=2026-01-01T00:00:00Z&until=2026-12-31T23:59:59Z
```

Or via POST:

```
POST /recall
Content-Type: application/json

{"context": "auth flow jwt", "namespace": "my-app", "limit": 10}
```

Response: `{"memories": [...], "count": 5}`

Recall automatically: bumps `access_count`, extends TTL, and auto-promotes mid-tier memories with 5+ accesses to long-term. The touch operation is transactional.

### Forget (Bulk Delete)

```
POST /forget
Content-Type: application/json

{"namespace": "my-app", "pattern": "deprecated API", "tier": "short"}
```

At least one field is required. Pattern uses FTS matching (sanitized). Response: `{"deleted": 3}`

### Consolidate

```
POST /consolidate
Content-Type: application/json

{
  "ids": ["id1", "id2", "id3"],
  "title": "Auth system summary",
  "summary": "JWT with refresh tokens, RBAC middleware, Redis sessions.",
  "namespace": "my-app",
  "tier": "long"
}
```

Requires 2-100 IDs. Deletes source memories, creates new with aggregated tags and max priority. The entire operation is transactional. Response (201): `{"id": "new-id", "consolidated": 3}`

### Create Link

```
POST /links
Content-Type: application/json

{"source_id": "id1", "target_id": "id2", "relation": "related_to"}
```

Relations: `related_to`, `supersedes`, `contradicts`, `derived_from`. Self-links rejected. Response (201): `{"linked": true}`

### Get Links

```
GET /links/{id}
```

Response: `{"links": [{"source_id": "...", "target_id": "...", "relation": "...", "created_at": "..."}]}`

### Namespaces

```
GET /namespaces
```

Response: `{"namespaces": [{"namespace": "my-app", "count": 42}]}`

### Stats

```
GET /stats
```

Response:
```json
{
  "total": 150,
  "by_tier": [{"tier": "long", "count": 80}, ...],
  "by_namespace": [{"namespace": "my-app", "count": 42}, ...],
  "expiring_soon": 5,
  "links_count": 12,
  "db_size_bytes": 524288
}
```

### Garbage Collection

```
POST /gc
```

Response: `{"expired_deleted": 3}`

### Export

```
GET /export
```

Response: full JSON dump of all memories and links with `exported_at` timestamp.

### Import

```
POST /import
Content-Type: application/json

{"memories": [...], "links": [...]}
```

Validates each memory before import. Limited to **1,000 memories per request**. Response: `{"imported": 50, "errors": []}`

## Error Code Reference

Structured error codes returned by the HTTP API and MCP server:

| Code | HTTP Status | Description |
|------|-------------|-------------|
| `NOT_FOUND` | 404 | Memory or resource not found |
| `VALIDATION_FAILED` | 400 | Invalid input parameters |
| `DATABASE_ERROR` | 500 | SQLite or internal error |
| `CONFLICT` | 409 | Duplicate or conflicting operation |

Error responses are JSON: `{"code": "NOT_FOUND", "message": "Memory not found"}`. `DATABASE_ERROR` responses are sanitized -- clients receive a generic `"Internal server error"` message; detailed errors are logged server-side only.

## CLI Reference

Global flags:
- `--db <path>` -- database path (default: `ai-memory.db`, env: `AI_MEMORY_DB`)
- `--json` -- output as machine-parseable JSON

### `serve`

Start the HTTP daemon (24 endpoints).

```bash
ai-memory serve --host 127.0.0.1 --port 9077
```

### `mcp`

Run as an MCP tool server over stdio. This is the primary integration path for any MCP-compatible AI client. Exposes 26 tools.

```bash
ai-memory mcp
ai-memory mcp --tier semantic   # default
ai-memory mcp --tier smart      # enables LLM-powered tools (requires Ollama)
```

Reads JSON-RPC from stdin, writes responses to stdout. Logs to stderr. Correctly handles notifications (no response sent). Works with any MCP-compatible client (Claude AI, OpenAI ChatGPT, xAI Grok, META Llama, etc.).

### `store`

```bash
ai-memory store \
  -T "Title" \
  -c "Content" \
  --tier mid \
  --namespace my-app \
  --tags "tag1,tag2" \
  --priority 7 \
  --confidence 0.9 \
  --source claude \
  --expires-at "2026-04-15T00:00:00Z" \
  --ttl-secs 86400
```

Use `-c -` to read content from stdin. Validates all fields before writing. `--expires-at` sets an explicit expiration timestamp (RFC3339). `--ttl-secs` sets a TTL in seconds (overrides tier default).

### `update`

```bash
ai-memory update <id> -T "New title" -c "New content" --priority 8 --expires-at "2026-06-01T00:00:00Z"
```

The `--expires-at` flag sets or changes the expiration on an existing memory.

### `recall`

```bash
ai-memory recall "search context" --namespace my-app --limit 10 --tags auth --since 2026-01-01T00:00:00Z
```

### `search`

```bash
ai-memory search "exact terms" --namespace my-app --tier long --limit 20 --since 2026-01-01 --until 2026-12-31 --tags rust
```

### `get`

```bash
ai-memory get <id>
```

Shows the memory plus all its links.

### `list`

```bash
ai-memory list --namespace my-app --tier mid --limit 50 --offset 0 --since 2026-01-01 --until 2026-12-31 --tags devops
```

The `--offset` flag enables pagination. Use with `--limit` to page through results.

### `delete`

```bash
ai-memory delete <id>
```

### `promote`

```bash
ai-memory promote <id>
```

Promotes to long-term and clears expiry.

### `forget`

```bash
ai-memory forget --namespace my-app --pattern "old stuff" --tier short
```

At least one filter is required.

### `link`

```bash
ai-memory link <source-id> <target-id> --relation supersedes
```

Relation types: `related_to` (default), `supersedes`, `contradicts`, `derived_from`. Self-links rejected.

### `consolidate`

```bash
ai-memory consolidate "id1,id2,id3" -T "Summary title" -s "Consolidated content" --namespace my-app
```

### `gc`

```bash
ai-memory gc
```

### `stats`

```bash
ai-memory stats
```

### `namespaces`

```bash
ai-memory namespaces
```

### `export` / `import`

```bash
ai-memory export > backup.json
ai-memory import < backup.json
```

Export includes memories and links. Import validates each memory and skips invalid ones.

### `resolve`

Resolve a contradiction by marking one memory as superseding another.

```bash
ai-memory resolve <winner_id> <loser_id>
```

Creates a "supersedes" link from winner to loser. Demotes the loser (priority=1, confidence=0.1). Touches the winner (bumps access count).

### `shell`

Interactive REPL for browsing and managing memories.

```bash
ai-memory shell
```

REPL commands: `recall <ctx>`, `search <q>`, `list [ns]`, `get <id>`, `stats`, `namespaces`, `delete <id>`, `help`, `quit`. Color output with tier labels and priority bars.

### `sync`

Sync memories between two database files.

```bash
ai-memory sync <remote.db> --direction pull|push|merge
```

- `pull` -- import all memories from remote into local
- `push` -- export all local memories to remote
- `merge` -- bidirectional sync (both databases get all memories)

Uses dedup-safe upsert (title+namespace). Links are synced alongside memories.

### `auto-consolidate`

Automatically group and consolidate memories.

```bash
ai-memory auto-consolidate [--namespace <ns>] [--short-only] [--min-count 3] [--dry-run]
```

Groups memories by namespace+primary tag. Groups with >= min_count members are consolidated into one long-term memory. Use `--dry-run` to preview.

### `mine`

Import memories from historical conversations (Claude, ChatGPT, Slack exports).

```bash
ai-memory mine --format claude <path-to-export>
ai-memory mine --format chatgpt <path-to-export>
ai-memory mine --format slack <path-to-export>
```

Takes `--format` to specify the input file format (`claude`, `chatgpt`, `slack`) and a path to the export file or directory.

### `man`

Generate roff man page to stdout.

```bash
ai-memory man           # print roff to stdout
ai-memory man | man -l -  # view immediately
```

### `completions`

```bash
ai-memory completions bash
ai-memory completions zsh
ai-memory completions fish
```

## Adding New Features

1. **Add the model** in `models.rs` -- new struct or new fields on existing structs
2. **Add validation** in `validate.rs` -- new validation function
3. **Add the DB function** in `db.rs` -- SQL operations
4. **Add the HTTP handler** in `handlers.rs` -- Axum handler function
5. **Add the route** in `main.rs` inside the `Router::new()` chain
6. **Add the CLI command** in `main.rs` -- new variant in `Command` enum, new `Args` struct, new `cmd_*()` function
7. **Add the MCP tool** in `mcp.rs` -- tool definition in `tool_definitions()`, handler function, route in `handle_request()`
8. **Add tests** in `tests/integration.rs`

## Testing

The project has **191 tests** total: 140 unit tests across all 15 modules + 51 integration tests in `tests/integration.rs`. **15/15 modules** have unit tests — 95%+ coverage.

```bash
# Run all tests
cargo test

# Run with output
cargo test -- --nocapture

# Run a specific test
cargo test test_name

# Check formatting
cargo fmt --check

# Run clippy
cargo clippy -- -D warnings
```

Integration tests run through the CLI binary, creating temporary databases for isolation.

## Benchmarks

### Criterion (microbenchmarks)

Criterion benchmarks are in `benches/recall.rs`. They test insert, recall, and search performance at 1,000 memories scale.

```bash
cargo bench
# recall/short_query, recall/medium_query, recall/long_query
# search/simple_search, search/filtered_search
# insert/store_memory
```

### LongMemEval (end-to-end accuracy)

The `benchmarks/longmemeval/` directory evaluates recall accuracy against the [LongMemEval](https://github.com/xiaowu0162/LongMemEval) dataset (ICLR 2025). Four harnesses are available:

| Harness | Strategy | R@5 | Speed |
|---------|----------|-----|-------|
| `harness_99.py --no-expand` | Parallel FTS5, 10 cores | **97.0%** | 232 q/s (2.2s) |
| `harness_99.py` | LLM expansion + parallel FTS5 | **97.8%** | 142 q/s (3.5s) |
| `harness_fast.py` | Single-process native SQLite | 96.2% | 57 q/s (8.8s) |
| `harness.py` | CLI subprocess per operation | 96.2% | 1.2 q/s (414s) |

Best result: **97.8% R@5 (489/500), 99.0% R@10, 99.8% R@20** -- 499/500 at R@20.

```bash
# Quick run (keyword, ~2s)
python3 benchmarks/longmemeval/harness_99.py \
  --dataset-path /tmp/LongMemEval --variant S --no-expand --workers 10

# Full run with LLM expansion (requires Ollama + gemma3:4b)
python3 benchmarks/longmemeval/harness_99.py \
  --dataset-path /tmp/LongMemEval --variant S --workers 10
```

See `benchmarks/longmemeval/README.md` for full replication instructions.

## CI/CD Pipeline

GitHub Actions CI runs on every push to `main` and every pull request:

1. **Check formatting** -- `cargo fmt --check`
2. **Clippy** -- `cargo clippy -- -D warnings`
3. **Run tests** -- `cargo test`
4. **Build release** -- `cargo build --release`

Runs on both `ubuntu-latest` and `macos-latest`.

### Release Pipeline

On tag push (e.g., `v0.2.0`):

1. Builds release binaries for `x86_64-unknown-linux-gnu` and `aarch64-apple-darwin`
2. Packages as `.tar.gz`
3. Creates a GitHub Release with the artifacts

## Building from Source

```bash
git clone https://github.com/alphaonedev/ai-memory-mcp.git
cd ai-memory

# Debug build
cargo build

# Release build (optimized, stripped)
cargo build --release

# The binary is at target/release/ai-memory
```

### New Dependencies (v0.4.0)

- `candle-core`, `candle-nn`, `candle-transformers` -- HuggingFace Candle for local embedding model inference
- `hf-hub` -- HuggingFace Hub client for downloading embedding models
- `tokenizers` -- HuggingFace tokenizers for text preprocessing
- `reqwest` -- HTTP client for Ollama API communication (LLM inference)

All dependencies are always compiled; tier selection controls which features are activated at runtime.

Release profile settings (from `Cargo.toml`):
- `opt-level = 3`
- `strip = true` (removes debug symbols)
- `lto = "thin"` (link-time optimization)