anamnesis-embedder 0.1.0

Embedding providers (local + cloud) for Anamnesis. Local default: fastembed-rs.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
<p align="center">
  <img src="./banner.png" alt="Anamnesis banner" width="920">
</p>

<h1 align="center">Anamnesis</h1>

<p align="center">
  <strong>A local-first memory layer that imports, normalizes, indexes, and serves agent memory across tools.</strong>
</p>

<p align="center">
  <a href="https://github.com/Trapezohe/Anamnesis/releases/tag/v0.1.0"><img src="https://img.shields.io/badge/version-v0.1.0-0ea5e9?style=for-the-badge" alt="version"></a>
  <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache--2.0-22c55e?style=for-the-badge" alt="license"></a>
  <img src="https://img.shields.io/badge/rust-%3E%3D1.85-f97316?style=for-the-badge&logo=rust&logoColor=white" alt="rust">
  <img src="https://img.shields.io/badge/MCP-stdio%20%2B%20SSE-8b5cf6?style=for-the-badge" alt="MCP">
  <img src="https://img.shields.io/badge/RAG-local%20hybrid-14b8a6?style=for-the-badge" alt="local rag">
  <a href="https://x.com/Ghast_AI"><img src="https://img.shields.io/badge/X-@Ghast__AI-000000?style=for-the-badge&logo=x&logoColor=white" alt="X"></a>
  <a href="https://discord.gg/ghastai"><img src="https://img.shields.io/badge/Discord-Join-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"></a>
</p>

<p align="center">
  <a href="#overview">Overview</a>
  · <a href="#supported-sources--agents">Supported Sources</a>
  · <a href="#architecture">Architecture</a>
  · <a href="#quick-start">Quick Start</a>
  · <a href="./README.zh-CN.md">简体中文</a>
  · <a href="./docs/BLUEPRINT.md">Blueprint</a>
  · <a href="https://discord.gg/ghastai">Discord</a>
</p>

---

## Overview

**Anamnesis** is an open-source memory infrastructure project for the agent era. It reads memories and sessions from **14 first-class adapters** — agent frameworks (Claude Code, Codex, Hermes, OpenClaw, ghast) and memory systems (mem0, Letta, TencentDB Agent Memory, OpenViking, MemPalace, Memori, MemOS, Memary) — plus any MCP-aware project through the Generic MCP adapter, then normalizes them into one local schema, one local database, and one Anamnesis-owned RAG stack. A two-stage **session extractor** distills raw `Episode` records into long-lived `Fact` / `Preference` / `Feedback` / `Skill` records, with every LLM call auditable, gated, and provenance-linked back to the source Episode.

It is not another chat interface. It is the memory layer underneath your tools:

- **User-sovereign**: your memory data stays local by default.
- **Cross-agent continuity**: what one agent learns about your preferences, projects, and workflows can be reused by other trusted agents.
- **Unified retrieval**: no delegated source-system search, no mixed embedding spaces, no opaque ranking across vendors.
- **Auditable provenance**: every record keeps `adapter / instance / native_id / native_path / raw_hash`.

> Status: `v0.1.0` — first stable adapter-contract release. All 14 first-class adapters pass the shared `MemoryAdapter` invariant suite; CLI, MCP server, search, and the Claude Desktop / Cursor / ghast on-ramp (`anamnesis mcp config`) are stable. Schema and `MemoryAdapter` trait are now considered stable across the 0.1.x line; semver applies.

## Technical Snapshot

| Area | Current implementation |
|---|---|
| Language | Rust 2021, MSRV `1.85` |
| Binaries | `anamnesis` CLI, `anamnesis-mcp` MCP server |
| Storage | SQLite + FTS5 + chunk-level tables; vector search currently uses BLOB-backed cosine fallback, with sqlite-vec as the target replacement layer |
| Retrieval | FTS5 BM25 + vector kNN + Reciprocal Rank Fusion + ContextPacker |
| Embeddings | Local `fastembed-rs` by default; curated model registry; Voyage cloud provider is explicit opt-in |
| Protocol | MCP stdio; `anamnesis-mcp --sse` supports loopback HTTP/SSE |
| Current adapters | 14 first-class: Claude Code, Codex, mem0, Letta, Hermes, OpenClaw, ghast, TencentDB Agent Memory, OpenViking, MemPalace, Memori, MemOS, Memary, Generic MCP |
| Session extractor | §-1.5 PR-6 two-stage (deterministic gate + LLM). Three providers: `mock` (offline, deterministic), `openai` (any Chat-Completions-compatible: OpenAI, Ollama, vLLM, OpenRouter, …), `anthropic` (native Messages API). Per-run audit log at `<data_dir>/audit/stage2.jsonl`; `anamnesis lineage <id>` walks `provenance.derived_from`. |
| Security posture | Local-first, source provenance, explicit cloud opt-in; MCP admin tool gating is the next P0 hardening step |

## Supported Sources & Agents

### Importable memory sources

#### §-2.2 Agents

| Source / Agent | Status | What is read today | Precision |
|---|---|---|---|
| Claude Code | Usable | `~/.claude/projects/*/memory/*.md`, project `*.jsonl` sessions | Medium-high for memory markdown; medium-low for sessions |
| Codex | Usable | `~/.codex/` session JSON/JSONL | Medium |
| Hermes (Nous Research) | Usable | `~/.hermes/MEMORY.md` + `USER.md` + SQLite session DBs | Medium-high |
| OpenClaw | Usable | `~/.openclaw/` workspace MD + `skills/`, `sessions/*.json[l]` | Medium-high |
| ghast | Usable | `~/Documents/ghast_desktop/prompts/`, bundled skills; detects encrypted profile DB | Medium-high |

#### §-2.3 Memory frameworks

| Source / Framework | Status | What is read today | License |
|---|---|---|---|
| mem0 | Usable | Self-hosted SQLite `memories` table | Apache-2.0 |
| Letta (formerly MemGPT) | Usable | SQLite `block` table (`~/.letta/letta.db`) | Apache-2.0 |
| TencentDB Agent Memory | Usable | `~/.openclaw/memory-tdai/` 4-tier (L0 refs, L1 JSONL facts, L2 scenarios, L3 persona) | MIT |
| OpenViking | Usable | VikingFS AGFS workspace (resources/user/agent/session × L0/L1/L2) | AGPLv3 (read-only, no link) |
| MemPalace | Usable | `~/.mempalace/identity.txt` + ChromaDB drawers/closets | AGPLv3 (read-only, no link) |
| Memori | Usable | SQLite — entity_facts, process_attrs, conversation messages + summaries, KG triples | Apache-2.0 |
| MemOS | Usable | MemCube dumps (`textual_memory.json`) per `memory_type` | Apache-2.0 |
| Memary | Usable | Local cache files (`memory_stream.json`, entity tally, past chat, personas) | MIT |

#### §-2.4 Long-tail (any MCP-aware source)

| Protocol | Status | What is read today |
|---|---|---|
| Generic MCP server | Usable | `resources/list` + `resources/read` — works for any project that exposes an MCP server (Cognee, Zep, etc.) |

### Consumers that can use Anamnesis

| Consumer | Integration | Status | Notes |
|---|---|---|---|
| ghast | MCP server config | Planned first consumer | Anamnesis remains an independent OSS project |
| Claude Desktop / Claude Code MCP clients | `anamnesis-mcp` stdio | Ready to wire | Suitable for local retrieval and provenance lookup |
| Codex / CLI agents | MCP stdio or CLI | Ready to wire | Can consume Anamnesis via MCP or shell commands |
| Cursor / Zed / MCP-aware tools | MCP stdio / SSE | Ready to wire | Depends on each client’s MCP support |
| Scripts and automation | CLI + JSON output | Ready | `search --json`, `export`, `status --json` |

### Planned support

| Source / Consumer | Type | Plan |
|---|---|---|
| Zep / Graphiti | Temporal knowledge graph | Bi-temporal facts push beyond the current `created_at/updated_at` schema; integration via `generic-mcp` until §-1.4 schema evolves |
| Cognee | DuckDB + Kuzu graph | Today: via Cognee's own MCP server through `generic-mcp`. Native adapter pending if/when a portable on-disk export lands |
| LangMem | LangChain SDK | Reads whichever backend LangGraph Store points at; case-by-case |
| OpenAI / Voyage / other cloud embeddings | Embedding provider | Explicit opt-in only; never called silently |
| Session extractor (§-1.5 PR-6) | Pipeline | Two-stage LLM-gated `Episode → Fact / Preference / Skill / Feedback` distillation |
| Agent Memory Interchange Format | Standardization | Future RFC for cross-agent memory exchange |

## Why Anamnesis

Each agent and memory framework stores memory differently:

- **Agents** — Claude Code keeps project JSONL sessions and markdown memory files; Codex has local session and rollout history; Hermes uses SQLite session DBs plus `MEMORY.md`/`USER.md`; OpenClaw and ghast each layer their own workspace conventions on top.
- **Memory frameworks** — mem0/Letta/Memori use SQLite; MemPalace uses ChromaDB; OpenViking and TencentDB Agent Memory use hierarchical filesystem layouts; MemOS dumps JSON "MemCubes"; Memary uses Neo4j with local-cache JSON; Cognee uses DuckDB+Kuzu; Zep/Graphiti use temporal graphs.

Without a neutral memory layer, users retrain every agent from scratch — and migrating from one memory framework to another means losing years of accumulated context. Anamnesis turns fragmented memory stores into one local, inspectable, searchable, and portable substrate — read-only against each upstream, with full provenance kept so original sources stay authoritative.

## Architecture

```mermaid
flowchart TB
  subgraph Consumers["Consumers"]
    Ghast["ghast"]
    ClaudeDesktop["Claude Desktop"]
    CodexClient["Codex / CLI Agent"]
    Cursor["Cursor / Zed"]
    Scripts["Custom scripts"]
  end

  subgraph Runtime["Anamnesis Runtime"]
    CLI["anamnesis CLI"]
    MCP["anamnesis-mcp<br/>stdio / SSE"]
    Search["search crate<br/>Hybrid RAG + ContextPacker"]
    Importer["importer crate<br/>scan -> normalize -> chunk -> upsert"]
    Store["store crate<br/>SQLite + FTS5 + embeddings"]
    Embedder["embedder crate<br/>fastembed / optional cloud"]
  end

  subgraph Sources["Memory Sources (14 first-class adapters)"]
    Agents["Agents: Claude Code · Codex<br/>Hermes · OpenClaw · ghast"]
    MemFW["Memory frameworks: mem0 · Letta<br/>TencentDB Agent Memory · OpenViking<br/>MemPalace · Memori · MemOS · Memary"]
    GenericMCP["Generic MCP<br/>(Cognee, Zep, any MCP-aware source)"]
  end

  Ghast --> MCP
  ClaudeDesktop --> MCP
  CodexClient --> MCP
  Cursor --> MCP
  Scripts --> CLI

  MCP --> Search
  CLI --> Search
  CLI --> Importer
  MCP -. "admin tools (planned gated)" .-> Importer

  Importer --> Store
  Search --> Store
  Store --> Embedder
  Embedder --> Store

  Agents --> Importer
  MemFW --> Importer
  GenericMCP --> Importer
```

## Import Pipeline

Anamnesis separates carrier reading from memory semantics. Adapters never write the database directly; persistence goes through store transactions.

```mermaid
flowchart LR
  A["Discovery<br/>paths / schema / counts only"] --> B["User Confirm / Source Registry"]
  B --> C["Adapter.scan()<br/>RawRecord stream"]
  C --> D["Parser<br/>Markdown / JSONL / SQLite / MCP resource"]
  D --> E["Normalizer<br/>AnamnesisRecord"]
  E --> F["Chunker<br/>record -> chunks"]
  F --> G["Store.upsert_transaction()<br/>records + raw_artifacts + chunks"]
  G --> H["FTS5 index<br/>chunks_fts"]
  G --> I["Embedding jobs<br/>content_hash + model_id"]
  I --> J["Embedding worker<br/>local fastembed"]
  J --> K["chunk_embeddings"]
```

### Adapter Precision Matrix

| Source | Current read path | Normalized result | Precision | Notes |
|---|---|---|---|---|
| Claude Code memory markdown | `~/.claude/projects/*/memory/*.md` | frontmatter type -> `Kind/Scope`, body -> `content` | Medium-high | Structured memory import is usable; frontmatter parser still needs hardening |
| Claude Code JSONL | project `*.jsonl` files | `Episode / Session` | Medium-low | This is history recall, not stable preference extraction |
| mem0 SQLite | read-only `memories` table | `memory` -> content, default `Fact / User` | Medium-high | SQLite mode is usable; API mode and source embedding provenance are pending |
| Codex | basic `.json/.jsonl` scan | `Episode / Session` | Low | Needs precise Codex session schema and path whitelist |
| Generic MCP | `resources/list` + `resources/read` | `Unknown / Ephemeral` | Low | Suitable for opaque resources until memory metadata conventions exist |

## RAG Retrieval Flow

Anamnesis owns the retrieval path. Source-system vectors, source search APIs, and source ranking logic do not enter cross-source retrieval.

```mermaid
flowchart LR
  Q["Query<br/>text + source/kind/scope/time filters"] --> Filter["SearchFilter<br/>planned store pushdown"]

  Filter --> FTS["FTS5 BM25<br/>record_chunks"]
  Filter --> QEmbed["embed_query()<br/>active model"]
  QEmbed --> Vec["Vector kNN<br/>chunk_embeddings"]

  FTS --> RRF["RRF merge<br/>K = 60"]
  Vec --> RRF
  RRF --> Agg["Aggregate chunks<br/>by record_id"]
  Agg --> Pack["ContextPacker<br/>budget + diversity + provenance"]
  Pack --> Resp["MCP / CLI response<br/>record + matched snippets"]
```

Retrieval principles:

- **Source embeddings are provenance only**: if a source has its own vector, it can be stored in `raw_artifacts`, but it never participates in cross-source search.
- **Index embeddings are unified**: every chunk is embedded by Anamnesis under the active model.
- **Chunks are retrieval units; records are semantic units**: long sessions can split into chunks while still aggregating back to records.
- **ContextPacker controls the final payload**: budget, provenance, source diversity, and matched snippets are handled before returning context to agents.

## Storage Model

```mermaid
erDiagram
  SOURCES ||--o{ RECORDS : registers
  RECORDS ||--o{ RECORD_CHUNKS : splits_into
  RECORDS ||--|| RAW_ARTIFACTS : preserves
  RECORD_CHUNKS ||--o{ CHUNK_EMBEDDINGS : indexed_by
  RECORD_CHUNKS ||--o{ EMBEDDING_JOBS : queues
  SOURCES ||--o{ IMPORT_ERRORS : reports

  SOURCES {
    text adapter
    text instance
    text location
    text config_json
    integer last_import_at
  }

  RECORDS {
    text id
    text adapter
    text instance
    text content
    text scope
    text kind
    text native_id
    text native_path
    text raw_hash
  }

  RECORD_CHUNKS {
    text id
    text record_id
    integer seq
    text content
    text content_hash
    integer token_estimate
  }

  CHUNK_EMBEDDINGS {
    text chunk_id
    text model_id
    text content_hash
    integer dim
    blob embedding
  }

  RAW_ARTIFACTS {
    text record_id
    text payload_json
    blob source_embedding
    text source_embedding_model
    integer captured_at
  }
```

## MCP Runtime

```mermaid
sequenceDiagram
  participant Agent as MCP Client / Agent
  participant Server as anamnesis-mcp
  participant Search as HybridSearcher
  participant Store as SQLite Store
  participant Embed as EmbeddingProvider

  Agent->>Server: tools/call search_memories
  Server->>Search: query + filters + mode
  Search->>Store: FTS5 chunk search
  Search->>Embed: embed_query (if vector/hybrid)
  Embed-->>Search: query vector
  Search->>Store: vector chunk search
  Search->>Search: RRF merge + pack
  Search-->>Server: packed records + snippets + provenance
  Server-->>Agent: MCP response
```

Current MCP surface:

| Type | Capabilities |
|---|---|
| Tools | `search_memories`, `get_record`, `list_sources`, `import_source`, `trace_provenance` |
| Resources | `anamnesis://record/{id}`, `anamnesis://source/{adapter}`, `anamnesis://timeline/{date}` |
| Prompts | `summarize_my_preferences`, `find_related` |

> Security note: `import_source` is an admin capability. During pre-release, use it only with trusted local clients. The next P0 hardening step is to gate MCP admin tools by default.

## Quick Start

### Install (one-liner)

POSIX one-liner — detects your platform, downloads the matching
release tarball from GitHub, verifies its SHA-256, and drops the two
binaries (`anamnesis` + `anamnesis-mcp`) into `~/.local/bin`:

```bash
curl -fsSL https://raw.githubusercontent.com/Trapezohe/Anamnesis/main/install.sh | sh
```

Pin a version or change the install prefix:

```bash
curl -fsSL https://raw.githubusercontent.com/Trapezohe/Anamnesis/main/install.sh \
  | ANAMNESIS_VERSION=v0.1.0 ANAMNESIS_PREFIX=/usr/local/bin sh
```

Supported platforms: **Linux x86_64**, **macOS x86_64**, **macOS
aarch64**. Linux aarch64 is parked (fastembed C-deps); Windows users
should grab the `.zip` from the [Releases page](https://github.com/Trapezohe/Anamnesis/releases) directly.

### Install via Homebrew (once the tap is published)

```bash
brew tap Trapezohe/anamnesis
brew install anamnesis
```

The formula template lives at [`packaging/homebrew/anamnesis.rb`](./packaging/homebrew/anamnesis.rb) — operators who maintain the tap refresh the four `sha256` lines after every release.

### Install from source

```bash
git clone https://github.com/Trapezohe/Anamnesis
cd Anamnesis

# CLI binary
cargo install --path crates/cli

# MCP server binary
cargo install --path crates/mcp-server
```

Or, once the crates are published to crates.io (planned for 0.2.0 —
the `local-fastembed` feature's ONNX dependencies need a publishing
story first):

```bash
cargo install --locked anamnesis-cli anamnesis-mcp-server
```

### Manual binary install

The GitHub release pages also host the raw tarball + `.sha256`
sidecar for every platform if you'd rather not pipe `install.sh`:

```bash
VERSION=0.1.0
TARGET=x86_64-unknown-linux-gnu
curl -L "https://github.com/Trapezohe/Anamnesis/releases/download/v${VERSION}/anamnesis-${VERSION}-${TARGET}.tar.gz" \
  | tar -xz
sudo install -m 755 "anamnesis-${VERSION}-${TARGET}"/anamnesis      /usr/local/bin/
sudo install -m 755 "anamnesis-${VERSION}-${TARGET}"/anamnesis-mcp  /usr/local/bin/
```

Verify the SHA-256 against the `.sha256` sidecar on the release page
before extracting if you're cautious about supply chain.

### Initialize and import

```bash
# Create the local database and set the default embedding model
anamnesis init

# Discover known local memory sources
anamnesis discover

# Register Claude Code as a source
anamnesis source add claude-code --path ~/.claude/projects

# Import and index
anamnesis import claude-code

# Search across imported memory
anamnesis search "how does the user prefer tests to be written?"

# Inspect runtime status
anamnesis status
```

### Run as an MCP server

```bash
# stdio mode for local MCP clients
anamnesis-mcp

# loopback HTTP/SSE mode
anamnesis-mcp --sse 8787
```

Example MCP client config:

```json
{
  "mcpServers": {
    "anamnesis": {
      "command": "anamnesis-mcp",
      "args": []
    }
  }
}
```

## CLI Reference

```bash
anamnesis init [--model KEY]
anamnesis discover
anamnesis source add/list/remove
anamnesis import <adapter>[:instance] [--full] [--dry-run] [--no-embed] [--path PATH]
anamnesis search <query> [--source X] [--kind K] [--scope S] [--limit N] [--mode hybrid|fulltext|vector] [--json]
anamnesis extract [--kind fact|preference|feedback|skill] [--no-dry-run]
                  [--provider mock|openai|anthropic] [--model NAME] [--api-base URL]
                  [--threshold 0.4] [--limit 25] [--max-llm-calls 100]
                  [--concurrency 1] [--max-retries 3] [--yes] [--explain] [--json]
anamnesis lineage <record-id> [--children] [--limit N] [--json]
anamnesis audit list [--limit N] [--json]
anamnesis audit show <line-no|last> [--json]
anamnesis doctor [--source X] [--instance Y] [--include-unregistered] [--strict]
                 [--since 7d] [--strict-staleness] [--json]
anamnesis export [--format jsonl|csv] [--out FILE] [--source X]
anamnesis verify [--repair]
anamnesis model list/use/install/rebuild
anamnesis serve
anamnesis migrate
```

## Session Extractor (§-1.5 PR-6)

Anamnesis distills raw conversation `Episode` records into long-lived
`Fact` / `Preference` / `Feedback` / `Skill` records via a two-stage,
auditable pipeline.

**Stage 1** is a deterministic gate (`anamnesis-extractor::gate`). It
scores every Episode in the store on three local signals — content
length (40-char floor, 600-char plateau), recency (30-day half-life),
and content density (letter-to-noise ratio) — weighted 0.45 / 0.20 /
0.35. Episodes below `--threshold` (default 0.4) are dropped before
any LLM sees them. Stage 1 is pure CPU and makes zero network calls.

**Stage 2** sends each surviving Episode through the configured
provider. Three are wired today:

| `--provider` | What it does                                                              |
|--------------|---------------------------------------------------------------------------|
| `mock`       | Deterministic, offline, zero network. Default — exercise the pipeline.    |
| `openai`     | Any OpenAI-compatible Chat-Completions API. Requires `OPENAI_API_KEY`. Set `--api-base http://localhost:11434/v1` for Ollama, `https://openrouter.ai/api/v1` for OpenRouter, etc. |
| `anthropic`  | Native Anthropic Messages API. Requires `ANTHROPIC_API_KEY`. Pinned to `anthropic-version: 2023-06-01`. |

### Safety posture (§-1.5 #6 + §-1.2 #5)

- **No silent LLM calls.** `extract` defaults to `--dry-run` (just
  shows candidates). `--no-dry-run` is required to actually invoke the
  provider — and even then, prints `Stage 2 plan: N candidate(s) →
  model X (~T input tokens)` **before** the first request and asks for
  `[y/N]` confirmation unless `--yes` is passed.
- **Safety cap.** `--max-llm-calls N` (default 100) refuses to run if
  Stage 1 surfaced more candidates than the cap, before constructing
  any HTTP client.
- **Audit trail.** Every `--no-dry-run` run appends one JSON line to
  `<data_dir>/audit/stage2.jsonl` with timestamps, provider+model,
  candidate counts, tokens, errors, and the full
  `derived_record_ids` / `source_record_ids` cross-reference. Browse
  with `anamnesis audit list` and `anamnesis audit show <#>`.
- **Provenance link.** Each derived record carries
  `provenance.derived_from = source_episode_id`. Walk the chain with
  `anamnesis lineage <record-id>` (or `--children` to find every
  record extracted from a given Episode).
- **Idempotent.** Derived `RecordId`s are deterministic in
  `(source_id, kind, item_index)`, so a second run replaces (not
  duplicates) prior output.

### Example flow

```bash
# Step 1: ingest a Claude Code session as Episode records.
anamnesis source add claude-code --path ~/.claude/projects
anamnesis import claude-code

# Step 2: see which Episodes the Stage-1 gate would surface.
anamnesis extract --kind preference --explain
# → ranks candidates, shows score rationale, no LLM call.

# Step 3: distill via Ollama locally — zero cloud calls.
export OPENAI_API_KEY=ollama-noop
anamnesis extract --kind preference --no-dry-run \
  --provider openai --api-base http://localhost:11434/v1 \
  --model llama3.2:3b

# Step 4: inspect the audit log + lineage of a derived record.
anamnesis audit list
anamnesis lineage <id-from-list> --children
```

### Feature flags

The CLI ships with `openai-provider` and `anthropic-provider` enabled
by default. To build a slimmer binary without `reqwest`:

```bash
cargo build --no-default-features --features local-fastembed,sse
```

(The `mock` provider always works.)

## Repository Layout

```text
anamnesis/
├── crates/
│   ├── core/                   # Domain types, traits, source discovery, chunker, contracts
│   ├── store/                  # SQLite schema, FTS5, embeddings, sources, typed API
│   ├── importer/               # Adapter scan -> normalize -> chunk -> transaction
│   ├── search/                 # Hybrid RAG, RRF, ContextPacker
│   ├── embedder/               # Local fastembed provider, Voyage provider, model registry, worker
│   ├── cli/                    # `anamnesis`
│   ├── mcp-server/             # `anamnesis-mcp`
│   ├── adapter-claude-code/    # Claude Code adapter
│   ├── adapter-codex/          # Codex adapter
│   ├── adapter-mem0/           # mem0 SQLite adapter
│   ├── adapter-letta/          # Letta (formerly MemGPT) SQLite adapter
│   ├── adapter-hermes/         # Hermes (Nous Research) adapter
│   ├── adapter-openclaw/       # OpenClaw adapter
│   ├── adapter-ghast/          # ghast adapter
│   ├── adapter-tdai/           # TencentDB Agent Memory adapter
│   ├── adapter-openviking/     # OpenViking VikingFS adapter
│   ├── adapter-mempalace/      # MemPalace ChromaDB adapter
│   ├── adapter-memori/         # Memori SQLite adapter
│   ├── adapter-memos/          # MemOS MemCube adapter
│   ├── adapter-memary/         # Memary local-cache adapter
│   └── adapter-generic-mcp/    # Generic MCP resource adapter (long-tail)
├── docs/
│   └── BLUEPRINT.md
├── logo.png
├── banner.png
├── CHANGELOG.md
├── CONTRIBUTING.md
└── README.md
```

## Development

```bash
cargo build --workspace
cargo test --workspace
cargo clippy --workspace -- -D warnings

# Faster iteration without fastembed / ONNX default features
cargo test --workspace --no-default-features
```

Notes:

- Default features include `local-fastembed`; first ONNX runtime builds can be slow.
- CI covers no-default-features, SSE transport, and default-feature builds.
- If full tests expose generic MCP loopback readiness flakiness, fix test readiness rather than treating the flake as a passing signal.

## Current Limitations

Anamnesis can already unify imports and retrieval, but it should not yet claim to precisely understand every agent’s memory semantics.

- Codex adapter is currently a basic episode importer.
- Generic MCP adapter currently imports opaque resources.
- `source add` and `import` still need a stricter canonical registry path.
- `--full / --since` and `ScanOpts` need to be wired through adapter scans.
- MCP admin tools need to be disabled by default.
- Session-to-stable-memory extraction is still a design task.

## Roadmap

| Phase | Status | Focus |
|---|---|---|
| Phase 0 | Complete | Rust workspace, Apache-2.0, CI (8-leg matrix), README/CONTRIBUTING, schema v1/v2 |
| Phase 1 | Complete | core/store/importer/search/embedder, 14 first-class adapters across §-2.2 + §-2.3, local hybrid RAG, §-1.5 PR-6 two-stage session extractor (Mock + OpenAI + Anthropic providers, audit log, lineage CLI) |
| Phase 2 | In progress | MCP admin gate, source registry import, filter pushdown, ScanOpts, streaming scan |
| Phase 3 | Planned | ghast integration, Homebrew/cargo release, real dogfood quality evaluation |
| Phase 4 | Planned | Memory MCP convention, Agent Memory Interchange Format, temporal-graph schema evolution (unlocks Zep / Cognee Kuzu native adapters) |

Recommended next PR slices:

1. 429 rate-limit retry/backoff on the OpenAI + Anthropic providers (they currently surface the error and skip the candidate)
2. §-1.4 schema evolution for temporal/graph edges (unlocks Zep/Graphiti, Cognee Kuzu native adapters)
3. §-2.5 adapter health-check tooling — `anamnesis doctor` per source
4. Homebrew/cargo release packaging
5. ghast first-consumer integration

## Contributing

The highest leverage contribution is a high-quality adapter. Every adapter should:

- keep discovery metadata-only;
- stream raw records instead of loading entire corpora into memory;
- keep normalization deterministic and pure;
- preserve provenance;
- pass the shared adapter contract tests.

See [CONTRIBUTING.md](./CONTRIBUTING.md).

## Community

- X: [@Ghast_AI](https://x.com/Ghast_AI)
- Discord: [discord.gg/ghastai](https://discord.gg/ghastai)

## License

[Apache License 2.0](./LICENSE)

Imported memory data is not covered by the project license. It remains yours.

## Star History

<a href="https://www.star-history.com/#Trapezohe/Anamnesis&Date">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Trapezohe/Anamnesis&type=Date&theme=dark" />
    <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Trapezohe/Anamnesis&type=Date" />
    <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Trapezohe/Anamnesis&type=Date" />
  </picture>
</a>