semantic-memory 0.5.1

Local-first hybrid semantic search (SQLite + FTS5 + usearch 2.25) with bitemporal truth and typed receipts
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
# Codex Task Packets

Each packet is designed to be pasted into a separate Codex task or PR. Run them in order unless the dependency graph says they can be parallelized.

---

## Packet 00 — Baseline, AGENTS, and audit ledger

### Prompt

You are preparing the semantic-memory workspace for a gated stabilization and TurboQuant-readiness program.

Do not change product behavior in this pass except for repo hygiene needed to support later passes.

Tasks:

1. Inspect workspace layout and existing `AGENTS.md` files.
2. Add/update a root or package-local `AGENTS.md` using the rules in `05_DROP_IN_AGENTS.md`.
3. Create:

```text
docs/audits/codex-giga-pass-20260511.md
docs/audits/codex-giga-pass-20260511-status.json
```

4. Record starting commit, Rust/Cargo versions, commands run, and baseline failures.
5. Do not mark anything fixed unless you add evidence.

### Likely files

```text
AGENTS.md
docs/audits/codex-giga-pass-20260511.md
docs/audits/codex-giga-pass-20260511-status.json
```

### Commands

```bash
cargo --version || true
rustc --version || true
cargo fmt --all --check || true
cargo check --workspace --all-targets --all-features || true
cargo test --workspace --all-features || true
```

### Acceptance

- Ledger exists.
- Baseline command output is recorded honestly.
- `AGENTS.md` contains non-negotiable invariants.
- No unrelated code changes.

---

## Packet 01 — Central vector validation

### Prompt

Implement central vector/embedding validation so invalid vectors cannot reach SQLite, q8, HNSW, projection imports, or future codec paths.

Tasks:

1. Find all embedding/vector write/index paths.
2. Add or consolidate validation functions:

```rust
validate_embedding(values: &[f32], expected_dim: usize)
validate_embedding_batch(...)
validate_vector_blob_len(bytes: &[u8], expected_dim: usize)
validate_f32_blob_le(bytes: &[u8], expected_dim: usize)
```

3. Validation must reject:
   - wrong dimension,
   - zero dimension when invalid,
   - NaN,
   - +/- infinity,
   - malformed byte lengths,
   - native-endian/alignment-dependent blob decode.
4. Route all storage/indexing paths through the validation layer.
5. Add regression tests.

### Likely files

```text
semantic-memory/src/types.rs
semantic-memory/src/db.rs
semantic-memory/src/lib.rs
semantic-memory/src/storage.rs
semantic-memory/src/hnsw.rs
semantic-memory/src/hnsw_ops.rs
semantic-memory/src/quantize.rs
semantic-memory/src/projection_import.rs
semantic-memory/tests/vector_invariants.rs
semantic-memory/tests/hardening_semantics.rs
```

### Tests to add

- NaN embedding rejected before storage.
- Infinity embedding rejected before HNSW insert.
- Wrong dimension rejected before q8.
- Malformed little-endian blob rejected.
- Valid vector roundtrips exactly through LE bytes.

### Acceptance

- All vector write/index paths call central validation or justify why not.
- Tests fail before fix and pass after fix.
- No `bytemuck`/native-cast persistence for f32 vectors unless explicitly proven safe and endian-stable.

---

## Packet 02 — HNSW key-level parity

### Prompt

Replace count-only HNSW integrity checks with key-level parity between SQLite live embedding rows, HNSW keymap rows, and typed retrieval keys.

Tasks:

1. Define a structured key codec for HNSW keys.
2. For every live embedded row, compute the expected key.
3. Verify expected key exists in active keymap.
4. For every active keymap row, parse key and verify the corresponding live SQLite row exists with a valid embedding.
5. Treat malformed keys, wrong domain prefixes, stale IDs, duplicate active keys, and swapped IDs as integrity failures.
6. Add full/incremental integrity modes if useful.

### Likely files

```text
semantic-memory/src/hnsw.rs
semantic-memory/src/hnsw_ops.rs
semantic-memory/src/lib.rs
semantic-memory/tests/hnsw_integration.rs
semantic-memory/tests/vector_only_hnsw.rs
semantic-memory/tests/episode_identity.rs
```

### Tests to add

- Missing keymap row detected.
- Stale active key detected after source row deletion.
- Swapped key IDs detected even when counts match.
- Wrong domain prefix detected.
- Duplicate active key detected.
- Episode key maps to episode row, not document container.

### Acceptance

- Count equality alone is never the final integrity proof.
- Integrity failures produce clear diagnostics.
- HNSW remains rebuildable from SQLite.

---

## Packet 03 — HNSW fixed-width sidecar header and rebuild-on-suspicion

### Prompt

Make HNSW sidecar persistence explicit, portable, and safe. Remove persisted `usize` from sidecar formats. Add versioned header validation or a conservative rebuild-on-suspicion policy.

Tasks:

1. Define a fixed-width sidecar header.
2. Store dimension and vector count as `u32`/`u64`, not `usize`.
3. Include magic/version/header length.
4. Include enough profile information to reject incompatible sidecars.
5. If old sidecar format is detected, rebuild safely or migrate explicitly.
6. If graph/data/keymap validation fails, mark HNSW degraded and rebuild before trusting search.

### Suggested header

```text
magic:        u32
version:      u16
header_len:   u16
dim:          u32
vector_count: u64
profile_hash: [u8; 32] or optional digest
```

### Likely files

```text
semantic-memory/src/hnsw.rs
semantic-memory/src/hnsw_ops.rs
semantic-memory/tests/hnsw_persistence.rs
semantic-memory/tests/hnsw_hotswap.rs
```

### Tests to add

- Current sidecar roundtrip.
- Old/unsupported header rejected or rebuilt.
- Wrong dimension sidecar rejected.
- Truncated header rejected.
- Empty graph sidecar does not pass as valid.
- Unsupported version fails cleanly.

### Acceptance

- No persisted `usize` in HNSW sidecar format.
- Sidecar validation is semantic, not just non-empty-file checking.
- Suspicious sidecar cannot silently participate in search.

---

## Packet 04 — Pending HNSW mutation failure recovery

### Prompt

Harden HNSW pending-op journaling so failed sidecar/keymap flushes cannot leave the live in-memory index trusted in an ambiguous state.

Tasks:

1. Inspect current pending operation lifecycle.
2. If an HNSW mutation is applied in memory before durable save succeeds, define safe recovery.
3. Prefer simplest safe behavior: on save/keymap failure after mutation, mark HNSW dirty/degraded and force rebuild from SQLite before future HNSW search.
4. Add generation tracking if needed.
5. Ensure errors are observable.

### Likely files

```text
semantic-memory/src/hnsw_ops.rs
semantic-memory/src/hnsw.rs
semantic-memory/src/lib.rs
semantic-memory/tests/hnsw_hotswap.rs
semantic-memory/tests/hnsw_persistence.rs
```

### Tests to add

- Simulated sidecar save failure marks index degraded.
- Future search does not trust degraded in-memory HNSW.
- Rebuild clears degraded state.
- Pending ops replay idempotently.

### Acceptance

- No path silently trusts HNSW after a failed durable save.
- Rebuild-from-SQLite remains the escape hatch.

---

## Packet 05 — Filtered HNSW under-return fallback

### Prompt

Fix HNSW filtered search so sparse namespace/source/session filters cannot under-return while valid rows exist outside the global candidate pool.

Tasks:

1. Inspect search flow for HNSW candidate generation followed by filters.
2. Detect when post-filter vector hits are less than requested `top_k` while scope has enough valid rows.
3. Add deterministic fallback:
   - adaptive overfetch, or
   - brute-force exact vector search within scope.
4. Record fallback/degradation in explained search output and future receipt skeleton.
5. Add sparse namespace/source tests.

### Likely files

```text
semantic-memory/src/search.rs
semantic-memory/src/lib.rs
semantic-memory/tests/search_tests.rs
semantic-memory/tests/brute_force_parity.rs
semantic-memory/tests/vector_only_hnsw.rs
```

### Tests to add

- Namespace-sparse corpus where global HNSW top candidates are filtered out.
- Fallback returns same top-k as brute force within namespace.
- Fallback/degradation flag appears in explained output.
- Source/session filters covered if supported.

### Acceptance

- Filtered HNSW never silently under-returns when exact scoped fallback can find valid results.
- The user/debug output can tell fallback occurred.

---

## Packet 06 — SearchContext and replay-clean recency

### Prompt

Make search deterministic under an explicit `SearchContext`. Remove hidden wall-clock dependence from internal ranking.

Tasks:

1. Add `SearchContext` or equivalent:

```rust
pub struct SearchContext {
    pub evaluation_time: DateTime<Utc>,
    pub receipt_mode: ReceiptMode,
    pub exactness_profile: ExactnessProfile,
}
```

2. Capture `Utc::now()` only at public API boundaries.
3. Internal scoring uses `context.evaluation_time`.
4. Tests freeze time and verify stable rankings.
5. Existing APIs may keep convenience methods by creating a default context at boundary.

### Likely files

```text
semantic-memory/src/search.rs
semantic-memory/src/lib.rs
semantic-memory/tests/search_tests.rs
```

### Tests to add

- Same query/context produces same order.
- Different evaluation time changes recency only predictably.
- No internal repeated `Utc::now()` calls in ranking.

### Acceptance

- Replay can reproduce search ranking given same context.
- Wall clock is not sampled deep inside ranking logic.

---

## Packet 07 — Search receipt skeleton

### Prompt

Add a minimal search receipt/explanation object that records backend, candidate counts, filters, fallback, approximate/exact scoring, rerank state, result IDs, and evaluation time.

Tasks:

1. Inspect existing `ExplainedResult`/score breakdown types.
2. Add a durable or serializable receipt-like struct, e.g. `VectorSearchReceiptV1`.
3. Receipt must not become authoritative truth.
4. Add optional receipt mode so normal search need not always persist receipts.
5. Ensure fallback/degradation is included.

### Minimum fields

```text
receipt_id or transient id
evaluation_time
query embedding digest if available
backend: brute_force | hnsw | hybrid | future_turbo
requested_candidates
returned_candidates
post_filter_candidates
filters applied
fallback/degradation list
exact_rerank bool
result IDs and scores
```

### Likely files

```text
semantic-memory/src/search.rs
semantic-memory/src/types.rs
semantic-memory/src/lib.rs
semantic-memory/tests/search_tests.rs
```

### Tests to add

- Receipt emitted in receipt mode.
- Receipt captures HNSW fallback.
- Receipt captures exact rerank flag.
- Receipt result IDs match returned results.

### Acceptance

- Search can answer “why this result?” at least at backend/fallback/score-breakdown level.

---

## Packet 08 — Boundary/defaulting audit

### Prompt

Audit semantic boundary paths for silent defaulting and convert dangerous defaults into typed errors, explicit policy defaults, or degradation records.

Tasks:

1. Search for `unwrap_or_default`, `unwrap_or`, permissive JSON fallback, and default constructors in:
   - `semantic-memory-forge/src/envelope.rs`
   - `forge-memory-bridge/src/transform.rs`
   - `semantic-memory/src/projection_*`
   - `semantic-memory/src/json_compat_import.rs`
   - `semantic-memory/src/embedder.rs`
2. Categorize each default:
   - harmless presentation/helper default,
   - explicit compatibility default,
   - semantic-boundary bug.
3. Fix semantic-boundary bugs.
4. Add tests for missing/malformed fields.

### Acceptance

- Bridge/import paths do not silently invent required semantic fields.
- Legacy compatibility defaults are documented and tested.
- Dangerous defaulting has typed error/degradation behavior.

---

## Packet 09 — Vector codec abstraction

### Prompt

Add a vector codec abstraction that supports raw reference scoring and current SQ8 without adding TurboQuant yet.

Tasks:

1. Add codec profile type:

```text
VectorCodecProfileV1
```

2. Add vector artifact type/table if persistence is in scope:

```text
VectorArtifactV1
```

3. Use an object-safe byte-oriented trait for persistence compatibility:

```rust
pub trait VectorCodec: Send + Sync {
    fn profile(&self) -> &VectorCodecProfile;
    fn encode_to_bytes(&self, raw: &[f32]) -> Result<Vec<u8>>;
    fn score_inner_product_from_bytes(&self, encoded: &[u8], query: &[f32]) -> Result<f32>;
    fn score_l2_from_bytes(&self, encoded: &[u8], query: &[f32]) -> Result<f32>;
    fn decode_approx_from_bytes(&self, encoded: &[u8]) -> Result<Option<Vec<f32>>>;
}
```

4. Implement `RawF32Codec`.
5. Implement `Sq8Codec` using existing quantize module if feasible.
6. Add profile digest and mismatch validation.

### Likely files

```text
semantic-memory/src/quantize.rs
semantic-memory/src/types.rs
semantic-memory/src/db.rs
semantic-memory/src/lib.rs
semantic-memory/tests/quantization.rs
semantic-memory/tests/quantization_pipeline.rs
```

### Tests to add

- Raw codec deterministic bytes.
- SQ8 codec deterministic bytes.
- Stable profile digest.
- Wrong profile/dimension fails closed.
- Raw reference scoring remains available.

### Acceptance

- No TurboQuant dependency yet.
- Current q8 path is not overloaded with TurboQuant semantics.
- Future codecs have a lawful admission seam.

---

## Packet 10 — Vector artifact persistence

### Prompt

Add minimal persistence for codec profiles and vector artifacts without making compressed vectors authoritative.

Tasks:

1. Add migration/table(s):

```sql
vector_codec_profiles
vector_artifacts
```

2. Store profile canonical JSON/digest.
3. Store encoded bytes/digest.
4. Store source row kind/id and source embedding digest.
5. Preserve raw embedding/reference path.
6. Add rebuild/delete lifecycle tests.

### Suggested schema

```sql
CREATE TABLE vector_codec_profiles (
  profile_id TEXT PRIMARY KEY,
  codec_family TEXT NOT NULL,
  codec_version TEXT NOT NULL,
  dim INTEGER NOT NULL,
  bits INTEGER,
  projections INTEGER,
  seed TEXT,
  score_semantics TEXT NOT NULL,
  canonical_json TEXT NOT NULL,
  profile_digest TEXT NOT NULL,
  created_at TEXT NOT NULL
);

CREATE TABLE vector_artifacts (
  artifact_id TEXT PRIMARY KEY,
  source_kind TEXT NOT NULL,
  source_id TEXT NOT NULL,
  source_embedding_digest TEXT NOT NULL,
  profile_id TEXT NOT NULL,
  encoded_bytes BLOB NOT NULL,
  encoded_digest TEXT NOT NULL,
  created_at TEXT NOT NULL,
  generation INTEGER NOT NULL,
  FOREIGN KEY(profile_id) REFERENCES vector_codec_profiles(profile_id)
);
```

### Acceptance

- Vector artifacts are deleted/retired with source lifecycle or marked stale.
- Rebuild can recreate artifacts from raw embeddings and profiles.
- Compressed vectors are never the only truth.

---

## Packet 11 — TurboQuant optional backend

### Prompt

Add TurboQuant as an optional vector codec backend only after the codec abstraction and integrity gates are green.

Tasks:

1. Add optional dependency/feature:

```toml
[features]
turbo-quant = ["dep:turbo-quant"]
```

2. Implement `TurboQuantCodec` behind feature flag.
3. Profile identity must include:
   - family = `turbo_quant`,
   - codec version,
   - dim,
   - bits,
   - projections,
   - seed,
   - score semantics,
   - normalization.
4. Serialize/deserialize `TurboCode` to canonical bytes.
5. Add raw reference comparison harness.
6. Add receipt integration.

### Tests to add

- Same profile/vector gives same encoded digest.
- Different seed gives different digest.
- Wrong dim/profile fails closed.
- Inner product estimate deterministic.
- Raw-vs-TurboQuant rank drift fixture emits metrics.

### Acceptance

- Feature compiles independently.
- Default build works without TurboQuant.
- TurboQuant codes are derived artifacts only.
- Search receipts disclose approximate scoring and exact rerank state.

---

## Packet 12 — Rank drift and benchmark harness

### Prompt

Add a deterministic benchmark/conformance harness for raw vs HNSW vs SQ8 vs TurboQuant retrieval profiles.

Tasks:

1. Fixed-seed synthetic corpus generator.
2. Corpus profiles:
   - 1k vectors / 100 queries,
   - namespace-sparse corpus,
   - deleted/tombstoned rows,
   - mixed row kinds if supported.
3. Metrics:
   - recall@1, recall@5, recall@10,
   - MRR or NDCG@10,
   - mean rank drift,
   - max top-k loss,
   - mean absolute score error,
   - p95 latency,
   - storage bytes/vector,
   - fallback count.
4. Output JSON/Markdown benchmark report.

### Acceptance

- Harness is deterministic.
- Raw reference is the oracle.
- Approximate profiles cannot claim exactness.
- Metrics can be attached to release notes.

---

## Packet 13 — Product-facing “Why this result?” API

### Prompt

Expose a practical explanation surface for search results.

Tasks:

1. Add API or method to retrieve explanation/receipt for a result.
2. Include:
   - source trail,
   - search profile,
   - vector/index backend,
   - approximate/exact status,
   - fallback/degradation,
   - result score breakdown,
   - receipt/replay handle.
3. Keep wording product-facing.

### Example output

```text
Result came from chunk:123.
It matched by vector search using HNSW.
HNSW under-returned after namespace filtering, so exact scoped fallback ran.
Raw rerank was applied.
No TurboQuant codec was used.
Receipt: search:01J...
```

### Acceptance

- Users/developers can answer “why did this appear?”
- Does not expose internal doctrine jargon by default.

---

## Packet 14 — Docs and public story

### Prompt

Update README/docs to reflect the product/career thesis without overselling unfinished internals.

Tasks:

1. README pitch:

```text
semantic-memory is a Rust substrate for local-first AI memory: SQLite + FTS + vector search, episode identity, source-grounded imports, and receipt-ready retrieval. It is designed to support compressed vector codecs such as TurboQuant without letting indexes or compressed vectors become the source of truth.
```

2. Add architecture diagram.
3. Add “Why this result?” example.
4. Add TurboQuant roadmap section, not default claim unless implemented.
5. Add release gates and benchmark expectations.

### Acceptance

- Public docs lead with visible value.
- Docs do not claim TurboQuant default support before implementation.
- Provenance is translated into receipts/source grounding/replay.

---

## Packet 15 — Final release audit

### Prompt

Run a final release audit after all implementation passes.

Tasks:

1. Run full acceptance commands.
2. Run benchmark/conformance harness.
3. Check no P0/P1 open without blocker.
4. Check no derived artifact is authoritative.
5. Check docs match implementation.
6. Write final report.

### Acceptance

- Final report is complete.
- Tests pass or failures are honestly documented.
- TurboQuant eligibility is explicitly marked:
   - not eligible,
   - prototype eligible,
   - default eligible.