# Admin Guide
`ai-memory` is an AI-agnostic memory management system. It works with **any MCP-compatible AI client** -- including Claude AI, OpenAI ChatGPT, xAI Grok, META Llama, and others. The HTTP API and CLI are completely platform-independent.
**Key features for admins:** Zero token cost until recall (replaces built-in auto-memory), TOON compact default response format (79% smaller than JSON), MCP prompts for proactive AI behavior (`recall-first`, `memory-workflow`), 4 feature tiers (keyword → autonomous with local LLMs via Ollama), 191 tests with 95%+ coverage across 15/15 modules.
## Deployment Options
### MCP Server (Recommended)
The simplest deployment is as an MCP tool server. No daemon process to manage -- your AI client spawns the process on demand. MCP (Model Context Protocol) is an open standard supported by multiple AI platforms.
Below is an example for **Claude Code** (user scope: merge `mcpServers` into `~/.claude.json`; or project scope: `.mcp.json` in project root). Other MCP-compatible clients have their own configuration locations — consult your platform's documentation.
```json
{
"mcpServers": {
"memory": {
"command": "ai-memory",
"args": ["--db", "~/.claude/ai-memory.db", "mcp", "--tier", "semantic"]
}
}
}
```
> **Claude Code note:** MCP server configuration does **not** go in `settings.json` or `settings.local.json` -- those files do not support `mcpServers`.
The MCP server:
- Starts when your AI client opens a session
- Communicates over stdio (JSON-RPC) -- the standard MCP transport
- Stops when the session ends
- Uses the same SQLite database as the CLI and HTTP daemon
- Correctly skips all JSON-RPC notifications (no response sent)
- Works with any MCP-compatible client, not just Claude Code
### Standalone (Development)
Run the HTTP daemon directly in the foreground:
```bash
ai-memory --db /path/to/ai-memory.db serve
```
The daemon listens on `127.0.0.1:9077` by default and exposes 24 HTTP endpoints.
### Systemd (Production HTTP Daemon)
```bash
sudo tee /etc/systemd/system/ai-memory.service > /dev/null << 'EOF'
[Unit]
Description=AI Memory Daemon
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db serve
Restart=on-failure
RestartSec=5
Environment=RUST_LOG=ai_memory=info,tower_http=info
# Graceful shutdown: checkpoints WAL before exit
KillSignal=SIGINT
TimeoutStopSec=10
[Install]
WantedBy=multi-user.target
EOF
sudo mkdir -p /var/lib/ai-memory
sudo systemctl daemon-reload
sudo systemctl enable --now ai-memory
```
**Production Hardening:** Add security directives to the `[Service]` section to restrict the daemon's privileges:
```ini
[Service]
User=ai-memory
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
NoNewPrivileges=yes
ReadWritePaths=/var/lib/ai-memory
```
Check status:
```bash
sudo systemctl status ai-memory
sudo journalctl -u ai-memory -f
```
### Docker
Example Dockerfile:
```dockerfile
FROM rust:1.75-slim AS builder
WORKDIR /src
COPY . .
RUN cargo build --release
FROM debian:bookworm-slim
COPY --from=builder /src/target/release/ai-memory /usr/local/bin/
VOLUME /data
EXPOSE 9077
CMD ["ai-memory", "--db", "/data/ai-memory.db", "serve"]
```
Build and run:
```bash
docker build -t ai-memory .
docker run -d -p 127.0.0.1:9077:9077 -v ai-memory-data:/data ai-memory
```
## Configuration
### CLI Flags
| `--db <path>` | `ai-memory.db` | Path to SQLite database |
| `--host <addr>` | `127.0.0.1` | Bind address (serve only) |
| `--port <port>` | `9077` | Bind port (serve only) |
| `--json` | `false` | JSON output for CLI commands |
| `--tier <tier>` | `semantic` | Feature tier: `keyword`, `semantic`, `smart`, `autonomous` (mcp/serve only) |
### Feature Tiers
The `--tier` flag controls which features are enabled. Each tier builds on the previous one:
| `keyword` | 21 | No | No | Minimal |
| `semantic` (default) | 21 | Yes (HuggingFace) | No | ~256 MB |
| `smart` | 21 | Yes | Yes (Ollama) | ~1 GB |
| `autonomous` | 21 | Yes | Yes (Ollama) | ~4 GB |
Set the tier when starting the MCP server or HTTP daemon:
```bash
ai-memory mcp --tier semantic # default
ai-memory mcp --tier smart # enables LLM-powered tools
ai-memory serve --tier autonomous # full feature set
```
### Ollama Setup (Smart & Autonomous Tiers)
The `smart` and `autonomous` tiers require a running [Ollama](https://ollama.com) instance for LLM inference (Gemma 4 models).
#### macOS
```bash
brew install ollama
# Or download from https://ollama.com/download/mac
ollama serve &
ollama pull gemma4:e2b # Smart tier (~1GB)
ollama pull gemma4:e4b # Autonomous tier (~2.3GB)
```
#### Linux
```bash
sudo systemctl start ollama
ollama pull gemma4:e2b # Smart tier (~1GB)
ollama pull gemma4:e4b # Autonomous tier (~2.3GB)
```
#### Windows
```powershell
# Download from https://ollama.com/download/windows, or:
winget install Ollama.Ollama
ollama pull gemma4:e2b # Smart tier (~1GB)
ollama pull gemma4:e4b # Autonomous tier (~2.3GB)
```
#### Verify
```bash
curl http://localhost:11434/api/tags
ollama run gemma4:e2b "Hello, world"
```
ai-memory connects to Ollama at `http://localhost:11434` by default. Set `OLLAMA_HOST` to override. If Ollama is not running, ai-memory gracefully falls back to the semantic tier.
### Embedding Model (semantic tier and above)
At the `semantic` tier and above, ai-memory downloads a sentence-transformer model from HuggingFace on first startup. The model is cached in the HuggingFace cache directory (`~/.cache/huggingface/` by default).
- **First startup** may take 30-60 seconds while the model downloads (~100 MB)
- **Subsequent startups** load from cache (2-5 seconds)
- Set `HF_HOME` to override the cache directory
- No HuggingFace account or API key is required
### Memory Budget Guidance
| `keyword` | Minimal (~10 MB) | SQLite + FTS5 only |
| `semantic` | ~256 MB | Embedding model loaded in memory |
| `smart` | ~1 GB | Embedding model + Ollama with smaller LLM |
| `autonomous` | ~4 GB | Embedding model + Ollama with larger LLM |
### Environment Variables
| `AI_MEMORY_DB` | `ai-memory.db` | Database path (overridden by `--db`) |
| `AI_MEMORY_AGENT_ID` | (auto) | Default `agent_id` stamped on memories this process writes. Used when no `--agent-id` flag is passed. See §Agent Identity below. |
| `RUST_LOG` | (none) | Logging filter (e.g., `ai_memory=info,tower_http=debug`) |
| `AI_MEMORY_NO_CONFIG` | (none) | Set to `1` to skip config file loading (useful for testing) |
### Configuration File (config.toml)
`ai-memory` supports an optional configuration file at `~/.config/ai-memory/config.toml`. This file is read once at process startup and supports the following keys:
> **Note:** Configuration is loaded once at process startup. Changes to `config.toml` require restarting the ai-memory process (MCP server, HTTP daemon, or CLI) to take effect.
| `tier` | String | `"semantic"` | `"keyword"`, `"semantic"`, `"smart"`, `"autonomous"` | Feature tier controlling which AI capabilities are active |
| `db` | String | `"ai-memory.db"` | Any valid file path | Path to the SQLite database file |
| `ollama_url` | String | `"http://localhost:11434"` | Any URL | Ollama base URL for LLM generation (smart/autonomous tiers) |
| `embed_url` | String | Value of `ollama_url` | Any URL | Separate URL for the embedding service; falls back to `ollama_url` if unset |
| `embedding_model` | String | Tier-dependent | `"mini_lm_l6_v2"` (384-dim, ~90 MB), `"nomic_embed_v15"` (768-dim, ~270 MB) | HuggingFace sentence-transformer model for semantic search |
| `llm_model` | String | Tier-dependent | `"gemma4:e2b"` (~1 GB Q4), `"gemma4:e4b"` (~2.3 GB Q4) | Ollama LLM model tag for smart/autonomous features |
| `cross_encoder` | **Bool** | `false` (`true` for autonomous tier) | `true`, `false` | Enable neural cross-encoder reranking (not a string -- must be bare `true`/`false` without quotes) |
| `default_namespace` | String | `"global"` | Any valid namespace (max 128 bytes, no slashes/spaces/nulls) | Default namespace applied to new memories |
| `max_memory_mb` | Integer | Tier-dependent | Any positive integer | Maximum memory budget in MB; used for automatic tier selection via `from_memory_budget()` |
| `archive_on_gc` | Bool | `true` | `true`, `false` | Archive expired memories instead of permanently deleting them during GC |
| `[ttl]` | Section | -- | -- | Per-tier TTL overrides (all sub-fields are integers in seconds) |
| `ttl.short_ttl_secs` | Integer | `21600` (6 hours) | `0` = never expires, or positive integer | TTL for short-tier memories in seconds |
| `ttl.mid_ttl_secs` | Integer | `604800` (7 days) | `0` = never expires, or positive integer | TTL for mid-tier memories in seconds |
| `ttl.long_ttl_secs` | Integer | `0` (never expires) | `0` = never expires, or positive integer | TTL for long-tier memories in seconds |
| `ttl.short_extend_secs` | Integer | `3600` (1 hour) | Non-negative integer | TTL extension on access for short-tier memories |
| `ttl.mid_extend_secs` | Integer | `86400` (1 day) | Non-negative integer | TTL extension on access for mid-tier memories |
> **Note:** Set any TTL to `0` to disable expiry for that tier. Values are clamped to a 10-year maximum (315,360,000 seconds). Negative extension values are clamped to 0.
> **Note:** Restored memories have their `expires_at` cleared (set to NULL) and become permanent.
#### Complete Annotated config.toml
Below is a complete example showing every supported field with explanatory comments. Copy this to `~/.config/ai-memory/config.toml` and uncomment the lines you want to customize.
```toml
# =============================================================================
# ai-memory configuration
# Location: ~/.config/ai-memory/config.toml
# Docs: https://github.com/alphaonedev/ai-memory-mcp
#
# All fields are optional. CLI flags and MCP args override these values.
# Changes require restarting the ai-memory process to take effect.
# =============================================================================
# ---------------------------------------------------------------------------
# Feature tier (controls which AI capabilities are active)
# ---------------------------------------------------------------------------
# Valid values: "keyword", "semantic", "smart", "autonomous"
# keyword — FTS5 keyword search only, no models, minimal RAM
# semantic — adds embedding-based hybrid recall (~256 MB)
# smart — adds query expansion, auto-tagging, contradiction detection (~1 GB, requires Ollama)
# autonomous — full feature set with cross-encoder reranking (~4 GB, requires Ollama)
# Default: "semantic"
# tier = "semantic"
# ---------------------------------------------------------------------------
# Database path
# ---------------------------------------------------------------------------
# Path to the SQLite database file.
# Default: "ai-memory.db" (relative to working directory)
# db = "~/.claude/ai-memory.db"
# ---------------------------------------------------------------------------
# Ollama URLs (smart and autonomous tiers only)
# ---------------------------------------------------------------------------
# Base URL for Ollama LLM generation.
# Default: "http://localhost:11434"
# ollama_url = "http://localhost:11434"
# Separate URL for embedding requests. Falls back to ollama_url if unset.
# Default: same as ollama_url
# embed_url = "http://localhost:11434"
# ---------------------------------------------------------------------------
# Model selection
# ---------------------------------------------------------------------------
# Embedding model for semantic search (semantic tier and above).
# Valid values:
# "mini_lm_l6_v2" — sentence-transformers/all-MiniLM-L6-v2, 384-dim, ~90 MB
# "nomic_embed_v15" — nomic-ai/nomic-embed-text-v1.5, 768-dim, ~270 MB
# Default: tier-dependent (mini_lm_l6_v2 for semantic, nomic_embed_v15 for smart/autonomous)
# embedding_model = "mini_lm_l6_v2"
# LLM model served via Ollama (smart and autonomous tiers).
# Valid values:
# "gemma4:e2b" — Google Gemma 4 Effective 2B, ~1 GB Q4 (smart tier default)
# "gemma4:e4b" — Google Gemma 4 Effective 4B, ~2.3 GB Q4 (autonomous tier default)
# Default: tier-dependent (gemma4:e2b for smart, gemma4:e4b for autonomous)
# llm_model = "gemma4:e2b"
# ---------------------------------------------------------------------------
# Cross-encoder reranking
# ---------------------------------------------------------------------------
# Enable neural cross-encoder reranking for improved recall precision.
# NOTE: This is a boolean, NOT a string. Use bare true/false without quotes.
# Default: false (true for autonomous tier)
# cross_encoder = true
# ---------------------------------------------------------------------------
# Namespace and memory limits
# ---------------------------------------------------------------------------
# Default namespace applied to new memories when none is specified.
# Default: "global"
# default_namespace = "global"
# Maximum memory budget in MB. Used for automatic tier selection when tier
# is not explicitly set — the highest tier that fits within this budget is chosen.
# Default: tier-dependent (0/256/1024/4096 for keyword/semantic/smart/autonomous)
# max_memory_mb = 4096
# ---------------------------------------------------------------------------
# Garbage collection
# ---------------------------------------------------------------------------
# Archive expired memories before GC permanently deletes them.
# When true, expired memories are moved to the archive table and can be
# restored later. When false, GC permanently deletes expired memories.
# Default: true
# archive_on_gc = true
# ---------------------------------------------------------------------------
# Per-tier TTL overrides
# ---------------------------------------------------------------------------
# Customize time-to-live and access-extension durations per memory tier.
# Set any TTL to 0 to disable expiry for that tier.
# Values are clamped to a 10-year maximum (315,360,000 seconds).
# Negative extension values are clamped to 0.
# [ttl]
# short_ttl_secs = 21600 # 6 hours (default)
# mid_ttl_secs = 604800 # 7 days (default)
# long_ttl_secs = 0 # 0 = never expires (default)
# short_extend_secs = 3600 # +1 hour on access (default)
# mid_extend_secs = 86400 # +1 day on access (default)
```
**Precedence:** CLI flags and MCP args take precedence over `config.toml` values. When the MCP server is launched by an AI client, the `--tier` flag in the MCP args is used, not the `config.toml` `tier` setting.
### Compile-Time Constants
These are set in the source code and require recompilation to change:
| `DEFAULT_PORT` | 9077 | `main.rs` |
| `GC_INTERVAL_SECS` | 1800 (30 min) | `main.rs` |
| `MAX_CONTENT_SIZE` | 65536 (64 KB) | `models.rs` |
| `PROMOTION_THRESHOLD` | 5 accesses | `models.rs` |
| `SHORT_TTL_EXTEND_SECS` | 3600 (1 hour) | `models.rs` |
| `MID_TTL_EXTEND_SECS` | 86400 (1 day) | `models.rs` |
## Graceful Shutdown
The HTTP daemon handles SIGINT (Ctrl+C) gracefully:
1. Stops accepting new connections
2. Waits for in-flight requests to complete
3. Checkpoints the WAL (`PRAGMA wal_checkpoint(TRUNCATE)`)
4. Exits cleanly
For systemd, use `KillSignal=SIGINT` and `TimeoutStopSec=10` to ensure the checkpoint completes.
> **Note:** The HTTP daemon handles SIGINT (Ctrl+C) gracefully with WAL checkpoint. Systemd sends SIGTERM by default -- the service file sets `KillSignal=SIGINT` to ensure clean shutdown.
The MCP server exits cleanly when stdin closes (AI client session ends).
## Database Management
### SQLite Settings
The database uses these pragmas (set automatically on open):
- **WAL mode** -- write-ahead logging for concurrent reads
- **busy_timeout = 5000** -- 5 second wait on lock contention
- **synchronous = NORMAL** -- balanced durability/performance
- **foreign_keys = ON** -- enforced referential integrity (links cascade on delete)
### Backup
**Live backup (while daemon is running):**
```bash
sqlite3 /path/to/ai-memory.db ".backup /path/to/backup.db"
```
**JSON export (includes links):**
```bash
ai-memory --db /path/to/ai-memory.db export > backup.json
```
**File copy (daemon must be stopped or use WAL checkpoint first):**
```bash
systemctl stop ai-memory
cp /path/to/ai-memory.db /path/to/backup.db
cp /path/to/ai-memory.db-wal /path/to/backup.db-wal 2>/dev/null
systemctl start ai-memory
```
### Restore
**From JSON (preserves links):**
```bash
ai-memory --db /path/to/new.db import < backup.json
```
**From SQLite backup:**
```bash
systemctl stop ai-memory
cp /path/to/backup.db /var/lib/ai-memory/ai-memory.db
systemctl start ai-memory
```
### Migration
The schema is auto-migrated on startup. The `schema_version` table tracks the current version (currently 4). Migrations are forward-only and non-destructive.
- v1 -> v2: Added `confidence` (REAL) and `source` (TEXT) columns
- v2 -> v3: Added `embedding` (BLOB) column for storing dense vector embeddings
- v3 -> v4: Added `archived_memories` table for GC archival
Migration error handling: only expected errors (e.g., "duplicate column" when re-running a migration) are silently ignored. Real failures are propagated and will prevent startup, ensuring data integrity.
### Upgrade Procedure
1. Stop the service: `sudo systemctl stop ai-memory`
2. Backup the database: `sqlite3 /var/lib/ai-memory/ai-memory.db ".backup /var/lib/ai-memory/ai-memory-backup.db"`
3. Install the new binary (e.g., `cargo install ai-memory` or replace the binary at `/usr/local/bin/ai-memory`)
4. Start the service: `sudo systemctl start ai-memory`
Schema migrations run automatically on startup. No manual migration steps are required.
### Database Maintenance
Manually trigger garbage collection:
```bash
# Via CLI
ai-memory gc
# Via API
curl -X POST http://127.0.0.1:9077/api/v1/gc
```
By default, GC archives expired memories before deleting them. To disable archiving and permanently delete instead, set `archive_on_gc = false` in `config.toml`. Archived memories are moved to a separate archive table and can be listed, restored, or purged:
```bash
# List archived memories
curl http://127.0.0.1:9077/api/v1/archive
# Restore an archived memory
curl -X POST http://127.0.0.1:9077/api/v1/archive/<id>/restore
# Purge all archived memories permanently (optional: ?older_than_days=N)
curl -X DELETE http://127.0.0.1:9077/api/v1/archive
# View archive statistics
curl http://127.0.0.1:9077/api/v1/archive/stats
```
**Disk space guidance:** Approximate database growth: ~2KB per memory (keyword tier), ~3.5KB per memory (semantic tier, 384-dim embeddings), ~5KB per memory (768-dim embeddings). WAL file may grow up to ~50MB during heavy write bursts; checkpoint occurs on graceful shutdown. Archive table grows unboundedly -- use `ai-memory archive purge` periodically.
Compact the database (reduces file size after many deletions):
```bash
sqlite3 /path/to/ai-memory.db "VACUUM"
```
Rebuild the FTS index (if it becomes corrupt):
```bash
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('rebuild')"
```
## Agent Identity (NHI)
Introduced in v0.6.0 via Task 1.2. Every memory carries `metadata.agent_id`, a
best-effort Non-Human Identity marker for the agent that stored it. Design
context and the threat model are tracked on issue [#148](https://github.com/alphaonedev/ai-memory-mcp/issues/148).
### Trust model
**`metadata.agent_id` is a *claimed* identity, not an *attested* one.** Any
caller able to invoke the CLI / MCP / HTTP API can set any well-formed
`agent_id`. Use it for provenance, audit, and filter scoping — **never as an
authorization gate on its own.** True attestation arrives with agent
registration (Task 1.3).
### Resolution precedence
**CLI and MCP (process-scoped):**
1. Explicit caller value (`--agent-id`, MCP `agent_id` tool param, or
`metadata.agent_id` embedded in an MCP store request)
2. `AI_MEMORY_AGENT_ID` environment variable
3. (MCP only) `initialize.clientInfo.name` → `ai:<client>@<hostname>:pid-<pid>`
4. `host:<hostname>:pid-<pid>-<uuid8>` (stable for the process's lifetime)
5. `anonymous:pid-<pid>-<uuid8>` (only when hostname is unavailable)
**HTTP daemon (request-scoped, no process-level default):**
1. `agent_id` field in `POST /api/v1/memories` body
2. `X-Agent-Id` request header
3. `anonymous:req-<uuid8>` (synthesized per-request, logged at WARN)
### Validation
Server-side validator:
`^[A-Za-z0-9_\-:@./]{1,128}$`
This admits prefixed forms (`ai:`, `host:`, `anonymous:`, `human:`, `system:`),
the `@` scope separator, `/` for future SPIFFE ids, and dots. Rejects whitespace,
null bytes, ASCII control chars, and shell metacharacters. Payloads attempting
SQL injection, JSON-path break-outs, or path traversal are all either validator-
rejected or neutralized by the sanitizer (Unicode homoglyphs rejected outright).
### Immutability guarantees
Once a memory is stored, `metadata.agent_id` is preserved across every mutation:
| `db::insert` UPSERT (dedup) | SQL `CASE WHEN json_extract(...) IS NOT NULL THEN json_set(...) ELSE excluded.metadata END` |
| `db::insert_if_newer` (sync merge) | Same SQL CASE WHEN clause |
| `db::update` with caller-supplied metadata | Caller preserves via `identity::preserve_agent_id` (every caller does — MCP `handle_store` dedup, MCP `handle_update`, HTTP `update_memory`) |
| `db::consolidate` | Takes `consolidator_agent_id` parameter; original authors preserved in `metadata.consolidated_from_agents` |
Admins running audit queries can rely on `metadata.agent_id` never changing
post-write unless the memory is deleted and recreated.
### Special metadata keys produced by the system
These are written by the server; treat as read-only in queries:
| `agent_id` | Every write | String matching validator regex |
| `imported_from_agent_id` | `ai-memory import` without `--trust-source`, when the incoming JSON's `agent_id` differed from the caller's | String |
| `consolidated_from_agents` | `memory_consolidate` / `auto-consolidate` merges N sources | Array of deduplicated strings |
| `mined_from` | `ai-memory mine` (Claude / ChatGPT / Slack export import) | String: `"claude"`, `"chatgpt"`, `"slack"` |
| `derived_from` | `memory_consolidate` — array of source memory ids | Array of UUID strings |
### Filtering by `agent_id`
`list` and `search` accept an `agent_id` filter (exact match via SQLite
`json_extract`):
- CLI: `ai-memory list --agent-id alice`, `ai-memory search "x" --agent-id alice`
- MCP: `agent_id` property on the `memory_list` / `memory_search` tool inputs
- HTTP: `GET /api/v1/memories?agent_id=alice`, `GET /api/v1/search?q=x&agent_id=alice`
`recall` does **not** accept the filter (by spec).
### Operational warnings
- **Default identities leak infrastructure.** When no explicit `agent_id` is
set, memories are stamped `host:<hostname>:pid-<pid>-<uuid8>`, exposing the
host's name and the running PID. For multi-tenant databases or any scenario
where the DB is shared outside its origin host, require callers to set
`AI_MEMORY_AGENT_ID` or `--agent-id` explicitly. See [#198] for tracked work
on a config-level opt-out.
- **HTTP per-request anonymous fallback** emits a WARN log line
(`HTTP memory write without agent_id body field or X-Agent-Id header;
assigned anonymous:req-<uuid8>`). Grep for this in production logs to spot
unauthenticated writes.
- **Import provenance** is restamped to the current caller by default. If you
need to restore legacy `agent_id` values verbatim (e.g., migrating a backup),
pass `--trust-source` explicitly.
### Related tracked issues
- [#148](https://github.com/alphaonedev/ai-memory-mcp/issues/148) — Task 1.2 design & NHI assessment
- [#196](https://github.com/alphaonedev/ai-memory-mcp/issues/196) — Store responses don't echo resolved agent_id
- [#197](https://github.com/alphaonedev/ai-memory-mcp/issues/197) — Filter values should run through validator
- [#198](https://github.com/alphaonedev/ai-memory-mcp/issues/198) — Config-level opt-out for hostname/PID leak
## Security Hardening
### Transaction Safety
Critical operations use `BEGIN IMMEDIATE` / `COMMIT` transactions to prevent data corruption under concurrent access:
- **`touch()`** -- the read-modify-write cycle for access count, TTL extension, auto-promotion, and priority reinforcement is fully atomic
- **`consolidate()`** -- the multi-step merge (create new memory, delete originals, aggregate tags) is fully atomic
This prevents race conditions where two concurrent recalls could cause incorrect access counts or missed auto-promotions.
### FTS Query Injection Protection
All full-text search queries are sanitized before being passed to SQLite FTS5:
- Special characters (`*`, `"`, `(`, `)`, `:`, `+`, `-`, `^`, etc.) are stripped
- Remaining tokens are individually double-quoted (e.g., `auth flow` becomes `"auth" "flow"`)
- This prevents FTS query syntax injection that could cause errors or unexpected results
The sanitization is applied in `recall()`, `search()`, and `forget()` operations.
### Error Sanitization
The HTTP API never leaks internal database error details to clients. All `rusqlite::Error` and `anyhow::Error` responses are replaced with a generic `"Internal server error"` message. Detailed errors are logged server-side for debugging.
### Bulk Input Limits
To prevent memory exhaustion and abuse:
- **Bulk create** (`POST /memories/bulk`): Limited to 1,000 items per request
- **Import** (`POST /import`): Limited to 1,000 memories per request
Requests exceeding these limits receive a `400 Bad Request` response.
### Path Parameter Validation
All ID path parameters (e.g., `/memories/{id}`, `/links/{id}`) are validated before database queries are executed. Invalid IDs (empty, too long, containing null bytes) are rejected with a `400 Bad Request` response before any database access occurs.
### Input Validation
All write paths go through the validation layer (`validate.rs`):
- Title: max 512 bytes, no null bytes
- Content: max 64KB, no null bytes
- Namespace: max 128 bytes, no slashes/spaces/nulls
- Source: whitelist (user, claude, hook, api, cli, import, consolidation, system)
- Tags: max 50 tags, each max 128 bytes
- Priority: 1-10
- Confidence: 0.0-1.0, finite
- Relations: whitelist (related_to, supersedes, contradicts, derived_from)
- IDs: max 128 bytes, no null bytes
- Timestamps: valid RFC3339
- TTL: positive, max 1 year
### Localhost Binding
By default, the HTTP daemon binds to `127.0.0.1` only. It is **not accessible from the network**. This is intentional -- `ai-memory` is a local-machine tool.
The MCP server communicates over stdio only -- no network exposure.
### CORS
The HTTP server uses `CorsLayer::permissive()` -- any origin can make requests. For production, use a reverse proxy with restrictive CORS headers.
### No Authentication
There is no authentication mechanism. This is by design -- the daemon is intended for localhost access only by your AI client (Claude AI, ChatGPT, Grok, Llama, or any other). If you expose it to a network, you are responsible for adding a reverse proxy with authentication.
### Multi-User Warning
ai-memory is a single-user tool. Namespaces do not provide access control. If multiple users share a database, any user can read/write any namespace.
### TLS / Reverse Proxy
ai-memory does not support TLS natively. For HTTPS, terminate TLS at a reverse proxy. Minimal nginx example:
```nginx
server {
listen 443 ssl;
server_name memory.example.com;
ssl_certificate /etc/ssl/certs/memory.pem;
ssl_certificate_key /etc/ssl/private/memory.key;
location / {
proxy_pass http://127.0.0.1:9077;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
### Data at Rest
The SQLite database is stored as a regular file. It is not encrypted. If you need encryption at rest, use filesystem-level encryption (LUKS, FileVault, BitLocker).
### MCP Notification Handling
The MCP server correctly handles all JSON-RPC notifications (requests without an `id` field). Notifications are processed but no response is sent, per the JSON-RPC 2.0 specification. This prevents protocol errors when any MCP client sends `notifications/initialized` or other notification messages.
### WAL Files
SQLite WAL mode creates two additional files alongside the database:
- `ai-memory.db-wal` -- write-ahead log
- `ai-memory.db-shm` -- shared memory file
Both are cleaned up on graceful shutdown (the daemon runs `PRAGMA wal_checkpoint(TRUNCATE)` on SIGINT). If the daemon crashes, these files persist but are automatically recovered on next open.
## HTTP API Endpoints
Maximum request body size: 50 MB.
The HTTP daemon exposes **24 endpoints** under `/api/v1`:
| `GET` | `/health` | Deep health check (DB + FTS integrity) |
| `POST` | `/memories` | Create a memory |
| `POST` | `/memories/bulk` | Bulk create (max 1,000) |
| `GET` | `/memories/{id}` | Get a memory by ID (includes links) |
| `PUT` | `/memories/{id}` | Update a memory |
| `DELETE` | `/memories/{id}` | Delete a memory |
| `POST` | `/memories/{id}/promote` | Promote a memory to long-term |
| `GET` | `/memories` | List memories with filters |
| `GET` | `/search` | AND search with 6-factor scoring |
| `GET` | `/recall` | OR recall with touch + auto-promote |
| `POST` | `/recall` | OR recall (POST body) |
| `POST` | `/forget` | Bulk delete by pattern/namespace/tier |
| `POST` | `/consolidate` | Consolidate 2-100 memories |
| `POST` | `/links` | Create a link between memories |
| `GET` | `/links/{id}` | Get links for a memory |
| `GET` | `/namespaces` | List namespaces with counts |
| `GET` | `/stats` | Aggregate statistics |
| `POST` | `/gc` | Trigger garbage collection |
| `GET` | `/export` | Export all memories and links |
| `POST` | `/import` | Import memories and links (max 1,000) |
| `GET` | `/archive` | List archived memories |
| `POST` | `/archive/{id}/restore` | Restore an archived memory |
| `DELETE` | `/archive` | Permanently delete archived memories (optional `?older_than_days=N`) |
| `GET` | `/archive/stats` | Archive statistics |
### HTTP API Request/Response Examples
Below are curl examples showing the exact JSON request bodies and response formats for the most important endpoints. The base URL is `http://127.0.0.1:9077/api/v1`.
#### POST /memories (Store)
Create a new memory. Only `title` and `content` are required; all other fields have defaults.
```bash
curl -X POST http://127.0.0.1:9077/api/v1/memories \
-H "Content-Type: application/json" \
-d '{
"title": "Project uses PostgreSQL 16",
"content": "The production database runs PostgreSQL 16 with pgvector for embeddings.",
"tier": "long",
"namespace": "infra",
"tags": ["postgres", "database"],
"priority": 9,
"confidence": 1.0,
"source": "user",
"ttl_secs": 604800
}'
```
**Required fields:**
| `title` | string | Memory title (max 512 bytes) |
| `content` | string | Memory content (max 64 KB) |
**Optional fields:**
| `tier` | string | `"mid"` | `"short"`, `"mid"`, or `"long"` |
| `namespace` | string | `"global"` | Namespace for grouping (max 128 bytes, no slashes/spaces) |
| `tags` | array | `[]` | String tags (max 50 tags, each max 128 bytes) |
| `priority` | integer | `5` | 1-10 (clamped) |
| `confidence` | float | `1.0` | 0.0-1.0 (clamped) |
| `source` | string | `"api"` | One of: `user`, `claude`, `hook`, `api`, `cli`, `import`, `consolidation`, `system` |
| `expires_at` | string | (none) | Explicit expiry timestamp (RFC3339) |
| `ttl_secs` | integer | (none) | TTL in seconds (overrides tier default) |
**Response (201 Created):**
```json
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16"
}
```
If potential contradictions are found (memories with similar titles in the same namespace), the response includes:
```json
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16",
"potential_contradictions": ["existing-id-1", "existing-id-2"]
}
```
Deduplication: if a memory with the same title+namespace already exists, it is upserted (tier never downgrades, priority keeps the maximum).
**Minimal example (defaults applied):**
```bash
curl -X POST http://127.0.0.1:9077/api/v1/memories \
-H "Content-Type: application/json" \
-d '{"title": "Quick note", "content": "Something to remember."}'
```
Response: `{"id": "...", "tier": "mid", "namespace": "global", "title": "Quick note"}`
#### GET /memories/{id} (Get)
Retrieve a single memory by ID, including its links to other memories.
```bash
curl http://127.0.0.1:9077/api/v1/memories/a1b2c3d4-e5f6-7890-abcd-ef1234567890
```
**Response (200 OK):**
```json
{
"memory": {
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16",
"content": "The production database runs PostgreSQL 16 with pgvector for embeddings.",
"tags": ["postgres", "database"],
"priority": 9,
"confidence": 1.0,
"source": "user",
"access_count": 3,
"created_at": "2026-04-03T15:00:00+00:00",
"updated_at": "2026-04-03T15:00:00+00:00",
"last_accessed_at": "2026-04-10T09:30:00+00:00",
"expires_at": null
},
"links": [
{
"source_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"target_id": "f7e8d9c0-b1a2-3456-7890-abcdef123456",
"relation": "related_to",
"created_at": "2026-04-05T12:00:00+00:00"
}
]
}
```
**Response (404 Not Found):** `{"error": "not found"}`
Note: `last_accessed_at` and `expires_at` are omitted from the JSON when null.
#### GET /recall?context=... (Recall)
Fuzzy OR search with ranked results. Automatically bumps access count, extends TTL, and auto-promotes frequently accessed mid-tier memories to long-term.
```bash
curl "http://127.0.0.1:9077/api/v1/recall?context=database+migration+postgres&namespace=infra&limit=5"
```
**Query parameters:**
| `context` | string | (required) | Search context / query text |
| `namespace` | string | (none) | Filter by namespace |
| `limit` | integer | `10` | Max results (capped at 50) |
| `tags` | string | (none) | Comma-separated tag filter |
| `since` | string | (none) | Only memories updated after this RFC3339 timestamp |
| `until` | string | (none) | Only memories updated before this RFC3339 timestamp |
**Response (200 OK):**
```json
{
"memories": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16",
"content": "The production database runs PostgreSQL 16 with pgvector for embeddings.",
"tags": ["postgres", "database"],
"priority": 9,
"confidence": 1.0,
"source": "user",
"access_count": 4,
"created_at": "2026-04-03T15:00:00+00:00",
"updated_at": "2026-04-03T15:00:00+00:00",
"last_accessed_at": "2026-04-12T10:00:00+00:00",
"score": 0.763
}
],
"count": 1
}
```
Each memory in the response includes a `score` field (float, rounded to 3 decimal places) representing the composite relevance score. Memories are returned sorted by score descending.
Recall is also available via POST for larger query bodies:
```bash
curl -X POST http://127.0.0.1:9077/api/v1/recall \
-H "Content-Type: application/json" \
-d '{
"context": "database migration postgres",
"namespace": "infra",
"limit": 5,
"tags": "postgres",
"since": "2026-01-01T00:00:00Z"
}'
```
#### PUT /memories/{id} (Update)
Partial update -- only provided fields are modified. All fields are optional.
```bash
curl -X PUT http://127.0.0.1:9077/api/v1/memories/a1b2c3d4-e5f6-7890-abcd-ef1234567890 \
-H "Content-Type: application/json" \
-d '{
"content": "PostgreSQL 16.2 with pgvector 0.7 for embeddings. Upgraded 2026-04-10.",
"priority": 10,
"tags": ["postgres", "database", "pgvector"]
}'
```
**Updatable fields:**
| `title` | string | New title |
| `content` | string | New content |
| `tier` | string | New tier (`"short"`, `"mid"`, `"long"`) |
| `namespace` | string | New namespace |
| `tags` | array | Replace tags entirely |
| `priority` | integer | New priority (1-10) |
| `confidence` | float | New confidence (0.0-1.0) |
| `expires_at` | string | New expiry (RFC3339) |
**Response (200 OK):** Returns the full updated memory object:
```json
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16",
"content": "PostgreSQL 16.2 with pgvector 0.7 for embeddings. Upgraded 2026-04-10.",
"tags": ["postgres", "database", "pgvector"],
"priority": 10,
"confidence": 1.0,
"source": "user",
"access_count": 4,
"created_at": "2026-04-03T15:00:00+00:00",
"updated_at": "2026-04-12T10:05:00+00:00"
}
```
**Response (404 Not Found):** `{"error": "not found"}`
**Response (409 Conflict):** `{"error": "title already exists in namespace ..."}` (if updating the title to one that already exists in the same namespace)
#### GET /archive (List Archived)
List memories that were archived by garbage collection.
```bash
curl "http://127.0.0.1:9077/api/v1/archive?namespace=infra&limit=20&offset=0"
```
**Query parameters:**
| `namespace` | string | (none) | Filter by namespace |
| `limit` | integer | `50` | Max results (capped at 1000) |
| `offset` | integer | `0` | Pagination offset |
**Response (200 OK):**
```json
{
"archived": [
{
"id": "expired-memory-id",
"tier": "short",
"namespace": "infra",
"title": "Temp debug session",
"content": "Debugging connection pooling issue...",
"tags": ["debug"],
"priority": 3,
"confidence": 1.0,
"source": "claude",
"access_count": 1,
"created_at": "2026-04-01T10:00:00+00:00",
"updated_at": "2026-04-01T10:00:00+00:00",
"expires_at": "2026-04-01T16:00:00+00:00",
"archived_at": "2026-04-02T00:30:00+00:00",
"archive_reason": "gc"
}
],
"count": 1
}
```
#### POST /archive/{id}/restore (Restore)
Restore an archived memory back to the active memories table. The restored memory has its `expires_at` cleared (becomes permanent).
```bash
curl -X POST http://127.0.0.1:9077/api/v1/archive/expired-memory-id/restore
```
**Response (200 OK):**
```json
{
"restored": true,
"id": "expired-memory-id"
}
```
**Response (404 Not Found):** `{"error": "not found in archive"}`
## Monitoring
### Health Endpoint (Deep Check)
```bash
curl http://127.0.0.1:9077/api/v1/health
```
The health check performs a **deep verification**:
1. Database is readable (runs `SELECT COUNT(*) FROM memories`)
2. FTS5 index integrity check (`INSERT INTO memories_fts(memories_fts) VALUES('integrity-check')`)
Returns `200 OK` with `{"status": "ok", "service": "ai-memory"}` if healthy.
Returns `503 Service Unavailable` with `{"status": "error", "service": "ai-memory"}` if the database or FTS index is unhealthy.
### Stats Endpoint
```bash
curl http://127.0.0.1:9077/api/v1/stats
```
Returns:
- Total memory count
- Breakdown by tier
- Breakdown by namespace
- Memories expiring within 1 hour
- Total link count
- Database file size in bytes
### MCP Server Monitoring
The MCP server logs to stderr. Monitor via:
```bash
# If running via an AI client, check your client's MCP logs
# If running manually:
ai-memory mcp 2>mcp-server.log
```
Key log messages:
- `ai-memory MCP server started (stdio)` -- server is ready
- `ai-memory MCP server stopped` -- stdin closed (AI client session ended), server exiting
### Logs
The HTTP daemon logs via `tracing` with configurable levels:
```bash
# Info level (default recommended)
RUST_LOG=ai_memory=info,tower_http=info ai-memory serve
# Debug level (verbose, includes all HTTP requests)
RUST_LOG=ai_memory=debug,tower_http=debug ai-memory serve
# Trace level (extremely verbose)
RUST_LOG=ai_memory=trace ai-memory serve
```
With systemd, logs go to the journal:
```bash
sudo journalctl -u ai-memory -f
sudo journalctl -u ai-memory --since "1 hour ago"
```
### Monitoring Script Example
```bash
#!/bin/bash
echo "ai-memory health check failed"
systemctl restart ai-memory
fi
```
## CI/CD Pipeline
The project uses GitHub Actions for continuous integration and release automation.
### CI (Every Push and PR)
Runs on `ubuntu-latest` and `macos-latest`:
1. **Formatting** -- `cargo fmt --check`
2. **Linting** -- `cargo clippy -- -D warnings`
3. **Tests** -- `cargo test` (191 tests: 140 unit + 51 integration, 15/15 modules)
4. **Build** -- `cargo build --release`
Uses `Swatinem/rust-cache@v2` for build caching.
### Release (On Tag Push)
Triggered by tags matching `v*` (e.g., `v0.1.0`):
1. Builds release binaries for:
- `x86_64-unknown-linux-gnu` (Ubuntu)
- `aarch64-apple-darwin` (macOS ARM)
2. Packages each as `ai-memory-<target>.tar.gz`
3. Creates a GitHub Release with the artifacts
### Running CI Locally
```bash
# Replicate the CI checks
cargo fmt --check
cargo clippy -- -D warnings
cargo test
cargo build --release
```
## Multi-Node Sync
For multi-machine deployments (e.g., laptop + server, or multiple workstations), use the `sync` command to keep databases in sync.
### Manual Sync
```bash
# Pull remote changes to local
ai-memory sync /mnt/shared/ai-memory.db --direction pull
# Push local changes to remote
ai-memory sync /mnt/shared/ai-memory.db --direction push
# Bidirectional merge (recommended)
ai-memory sync /mnt/shared/ai-memory.db --direction merge
```
### Automated Sync via Cron
```bash
# Sync every 15 minutes (bidirectional merge)
*/15 * * * * /usr/local/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db sync /mnt/shared/remote-memory.db --direction merge --json >> /var/log/ai-memory-sync.log 2>&1
```
Sync uses the same dedup-safe upsert as regular stores:
- Title+namespace conflicts are resolved by keeping the higher priority
- Tier never downgrades
- Links are synced alongside memories
- Safe to run concurrently from multiple machines (SQLite WAL mode handles locking)
### Sync via sshfs or rsync
If the remote database is on another machine, mount it or copy it first:
```bash
# Option 1: sshfs mount
mkdir -p /mnt/remote-memory
sshfs user@server:/var/lib/ai-memory /mnt/remote-memory
ai-memory sync /mnt/remote-memory/ai-memory.db --direction merge
# Option 2: rsync + sync + rsync
rsync -a server:/var/lib/ai-memory/ai-memory.db /tmp/remote.db
ai-memory sync /tmp/remote.db --direction merge
rsync -a /tmp/remote.db server:/var/lib/ai-memory/ai-memory.db
```
## Auto-Consolidation (Maintenance)
Auto-consolidation groups memories by namespace and primary tag, then merges groups with enough members into a single long-term summary. This reduces memory count and improves recall relevance.
### Manual Run
```bash
# Preview what would be consolidated
ai-memory auto-consolidate --dry-run
# Consolidate all namespaces (groups of 3+)
ai-memory auto-consolidate
# Only short-term memories, minimum 5 per group
ai-memory auto-consolidate --short-only --min-count 5
```
### Cron Schedule
```bash
# Run auto-consolidation daily at 3am, short-term memories only
0 3 * * * /usr/local/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db auto-consolidate --short-only --json >> /var/log/ai-memory-consolidate.log 2>&1
```
## Man Page
Install the man page for system-wide documentation:
```bash
man ai-memory
```
## Scaling Considerations
`ai-memory` is designed for single-machine use. It is not a distributed system.
- **Concurrency**: The daemon uses `Arc<Mutex<Connection>>` -- one write at a time, but this is fine for a single-user tool. SQLite WAL mode allows concurrent reads.
- **MCP concurrency**: The MCP server is single-threaded (synchronous stdio loop), one request at a time. This is by design -- MCP clients typically send one request at a time.
- **Database size**: SQLite handles databases up to 281 TB. Practically, performance stays excellent up to millions of rows.
- **Memory usage**: Minimal. The daemon holds only the connection and a path in memory. All data is on disk.
- **Multiple instances**: You can run multiple daemons on different ports with different databases. Do not point two daemons at the same database file. The MCP server and CLI can share a database (both use WAL mode).
## Troubleshooting
### Daemon won't start
**Port already in use:**
```bash
ai-memory serve --port 9078
```
**Database locked:**
```bash
# Remove stale WAL files (only if daemon is not running)
rm -f ai-memory.db-wal ai-memory.db-shm
```
**Permission denied:**
```bash
# Check file permissions
ls -la /path/to/ai-memory.db
# Ensure the user running the daemon has read/write access
```
### MCP server not connecting
**Binary not found:**
Check that the path in your MCP configuration (e.g., `~/.claude.json` for Claude Code user scope, or `.mcp.json` for project scope) is correct and the binary is executable.
**Database path issues:**
The MCP server opens the database at the path specified by `--db`. Ensure the directory exists and is writable.
**Protocol errors:**
Check stderr output. The MCP server logs parse errors and protocol issues to stderr.
### Slow queries
If recall or search is slow:
```bash
# Rebuild the FTS index
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('rebuild')"
# Compact the database
sqlite3 /path/to/ai-memory.db "VACUUM"
```
### FTS index corruption
Symptoms: search returns no results or errors.
```bash
# Check integrity
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('integrity-check')"
# Rebuild if corrupt
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('rebuild')"
```
### Database is growing too large
```bash
# Check what's taking space
ai-memory stats
# Delete expired memories
ai-memory gc
# Delete all short-term memories in a namespace
ai-memory forget --tier short --namespace my-app
# Compact after deletion
sqlite3 /path/to/ai-memory.db "VACUUM"
```