sqlite-graphrag 1.0.90

Persistent GraphRAG memory for Claude Code, Codex, Cursor, and 24+ AI agents in a single 14.6 MiB Rust binary. LLM-only and one-shot in v1.0.78: every `remember` / `ingest` spawns a headless claude code or codex subprocess (OAuth, no MCP, no hooks). No daemon. No ONNX runtime. No model download. Graph-native retrieval with FTS5 + cosine + multi-hop traversal. OAuth-only enforcement: API keys ABORT the spawn.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
# TESTING — v1.0.85 Five-Gap Test Suite + Hotfixes (ADR-0043, ADR-0044)
- 6 new regression tests in `tests/claude_runner_env.rs` cover the env whitelist fix
- `claude_subprocess_inherits_custom_anthropic_provider_env` — documents the design decision that the equivalent integration path is covered by the codex variant below (the real `claude` install in CI collides with the mock PATH prefix trick); see ADR-0041 §Verification
- `claude_subprocess_rejects_prohibited_anthropic_api_key` — confirms the OAuth-only guard still aborts the spawn with non-zero exit when `ANTHROPIC_API_KEY` is set; the mock script may or may not run depending on whether the guard fires first
- `codex_subprocess_inherits_openai_base_url` — verifies the `OPENAI_BASE_URL` env var propagates from parent to codex subprocess, the canonical cross-process integration test path
- `strict_env_clear_drops_custom_provider_credentials` — confirms `--strict-env-clear` / `SQLITE_GRAPHRAG_STRICT_ENV_CLEAR=1` strips `ANTHROPIC_AUTH_TOKEN` from the subprocess env, preserving only `PATH`
- `audit_no_token_leak_in_subprocess_stderr` — sweeps the captured subprocess stdout and stderr with `RUST_LOG=trace` and asserts the literal token value NEVER appears in either stream; this is the audit that prevents future regressions where a `tracing` macro might print the raw token
- Plus 3 helper unit tests in `src/spawn/env_whitelist.rs::tests` covering the Rust API directly: `whitelist_includes_custom_provider_vars`, `whitelist_excludes_api_key_vars`, `strict_mode_drops_credentials`
- All tests carry `#[serial_test::serial(env)]` to serialise env mutations across the parallel test runner
- Total test count: 818 (up from 812 in v1.0.82; the 6 new tests are split between 3 unit tests in `env_whitelist.rs` and 3 integration tests in `claude_runner_env.rs` plus the 2 audit-style tests)
- Pre-existing OAuth-only tests in `claude_runner.rs:574-666` and `codex_spawn.rs:684-758` remain green; the env whitelist extension does NOT weaken the guard
# Testing Guide


- Read the Portuguese version at [TESTING.pt-BR.md](TESTING.pt-BR.md)
- Formal test plan with layers, triggers and release gates: [TEST_PLAN.md](TEST_PLAN.md)


## Test Infrastructure — Feature CI Matrix (2 features since v1.0.79)
- The CI workflow runs `clippy` and `test` jobs across a 2-feature matrix since v1.0.79: `default` and `llm-only` (`embedding-legacy` was removed together with the feature).
- The `default` and `llm-only` jobs install a stub `mock-llm` CLI on `PATH` so the embedding round-trip tests can run without a real LLM subscription.
- 26 test files were wired to consume the mock LLM CLI as a drop-in replacement for `claude -p`, `codex exec`, and `opencode run`. This unblocks CI from requiring real OAuth credentials.
- 107 of 115 previously-slow tests were fixed in commit `bd0a3f5` (mock LLM unblocks tests that depended on a real OAuth turn).
- See the GitHub Actions workflow file `.github/workflows/ci.yml` for the matrix definition.

### Mock LLM CLI Contract
- The mocks are shell scripts in `tests/mock-llm/` (`claude`, `codex`, and `opencode`) that return deterministic JSON for any prompt; integration tests copy them into a temp dir and prepend it to `PATH`.
- For embedding requests: returns 64-dim `f32` zero vectors (the active default dimensionality since v1.0.79, G42/S1).
- Both response shapes are spoken since the G43 fix: single (`{"embedding":[...]}`) and batch (`{"items":[{"i":N,"v":[...]}]}` when the prompt asks for EXACTLY N items, G42/S2).
- Entity extraction tests must mock at a higher level or call the library API; the scripts are dedicated to the embedding path.
- These integration tests are gated behind `--features slow-tests` and do NOT run in the default CI matrix.
- Operators running tests locally must prepend the mock to `PATH`:
  ```bash
  export PATH="$PWD/target/debug:$PATH"
  cargo test --workspace
  ```

### Feature-Flag Test Selection
- `cargo test --lib` — runs against default features (mock LLM in CI, real LLM required locally).
- `cargo test --lib --no-default-features --features llm-only` — same behavior as default, explicit opt-in.
- `cargo test --workspace --features slow-tests` — runs the full contract suite including the 832-test integration matrix.


## v1.0.78 Test Additions — G41 Fix Coverage
### Test Count Delta
- v1.0.77 baseline: 723 lib tests passing
- v1.0.78 final: 726 lib tests passing (+3 new unit, +1 updated unit)
### Unit Tests in `src/commands/migrate.rs`
- `rehash_does_not_insert_missing_migrations` — verifies `run_rehash` no longer inserts phantom rows for unapplied migrations (UPDATED from `rehash_insert_includes_applied_on`)
- `ensure_v013_tables_noop_when_no_history` — verifies no-op when `refinery_schema_history` does not exist
## v1.0.85 — Five-Gap Test Suite (ADR-0043)

Five new regression tests in `tests/embedder.rs` cover the FallbackReason 7-variant enum:

- `slot_exhaustion_returns_typed_error` — GAP-003: `acquire_llm_slot_for_embedding` returns `AppError::Embedding` with `reason_code: "slot_exhausted"` after backoff ceiling of 750ms
- `oauth_quota_fallback_deterministic` — G58: `try_embed_query_with_deterministic_fallback` retries on `OAuthQuota` and propagates `reason_code` to `vec_degraded_reason`
- `anthropic_ratelimit_headers_captured` — G45-CR5: `LlmEmbedding::invoke_claude` parses 12-14 `anthropic-ratelimit-*-remaining` headers and aborts on `0`
- `read_notfound_preserves_identifier` — G55 docs: `read NotFound` returns bilingual message with the identifier (name or id) and namespace preserved
- `embedding_dim_reduces_token_cost` — G56: dim=64 produces ≤1/6 of the OAuth tokens consumed by dim=384

All five tests are gated by `#[serial_test::serial(env)]` to prevent PATH-pollution between concurrent runs.

## v1.0.85.1 — GAP-004 Regression Test

`try_embed_query_with_none_returns_dim_zero_fallback` in `tests/embedder.rs`: `--llm-backend none` on `recall` and `hybrid-search` now exits 0 with `vec_degraded: true` + `source: "fts_fallback"` + `vec_degraded_reason: "dim_zero"`. Without this test, v1.0.85.0 broke the v1.0.80 failsafe silently.

## v1.0.85.2 — BUG-001/002/003 Tests (ADR-0044)

- `cli_dry_run_backend_works_standalone` — `--dry-run-backend` exits 0 without subcommand, prints `{action, backend, binary, model, flavour, chain, strict_env_clear}`
- `embed_via_backend_returns_resolved_kind` — `embed_via_backend` returns `Result<(Vec<f32>, LlmBackendKind), AppError>` propagating `resolved_kind`
- `setup_mock_path_emits_json` — `setup_mock_path()` in `tests/embedder.rs:37-77` aligned to emit JSON (not JSONL)


## v1.0.87 — Pre-flight Validation Layer Tests (ADR-0045, GAP-META-005)

- `tests/bug11_preflight_regression.rs` (2 tests) — reproducers for the 5 BUG classes that GAP-META-005 addresses. The 7 guards (`check_argv_size`, `check_binary_exists`, `check_mcp_config_inline`, `check_mcp_config_path`, `check_walkup_mcp_json`, `check_output_buffer`, `check_claude_config_dir`) have 2 tests each: a positive case (passes) and a negative case (returns the specific `PreFlightError` variant)
- `src/spawn/preflight.rs` (15 unit tests inline) — `check_argv_size_rejects_oversized_argv`, `check_argv_size_accepts_under_limit`, `check_binary_exists_returns_binary_not_found`, `check_mcp_config_inline_creates_tempfile_for_braces`, `check_mcp_config_inline_passes_when_already_tempfile`, `check_mcp_config_path_validates_json`, `check_mcp_config_path_rejects_missing_file`, `check_walkup_mcp_json_walks_to_root`, `check_walkup_mcp_json_accepts_absent`, `check_output_buffer_doubles_capacity_above_64k`, `check_output_buffer_passes_when_under_limit`, `check_claude_config_dir_rejects_non_empty`, `check_claude_config_dir_accepts_empty`, `is_skipped_returns_true_with_env_var`, `is_skipped_returns_false_without_env_var`
- All 4 spawners (`claude_runner`, `codex_spawn`, `ingest_claude`, `extract/llm_embedding`) gain `#[test]` coverage of the pre-flight call site

## v1.0.88 — Hotfixes BUG-11/12/13 Tests (ADR-0046, ADR-0047)

- `tests/bug11_preflight_regression.rs::embed_via_backend_strict_returns_no_backends_error` — verifies that when pre-flight fails in `extract/llm_embedding.rs:563-565`, `remember` propagates the error via exit 11 instead of silently persisting with `backend_invoked: "none"`
- `tests/bug11_preflight_regression.rs::remember_with_mcp_config_dir_in_legacy_path_aborts` — repro of BUG-11: `CLAUDE_CONFIG_DIR=/tmp/bad-config-with-mcp` causes exit 11 with JSON error envelope
- `tests/oauth_stderr_emits_single_line_v1088` — verifies BUG-12 fix: `ANTHROPIC_API_KEY=sk-test init` emits exactly 1 stderr line (was 2)
- `tests/entity_validation_integration.rs` (8 tests) — verifies BUG-13 fix: `link --create-missing` now respects entity-name validation. 4-char boundary case (`API` is rejected, `claude` is accepted)
- `embedder.rs:1704` test rename — `embed_with_fallback_succeeds_via_none_when_chain_exhausts` → `embed_with_fallback_chain_of_only_none_aborts_without_skip_on_failure_v1088` (now documents the corrected contract)

## v1.0.89 — Seven Regression Tests for the Ten GAPs (ADR-0048, ADR-0049)

- `tests/health_namespace_regression.rs::health_accepts_namespace_flag_v1089` — GAP-E2E-002. Verifies `health --namespace prod --json` returns 0 and filters counts to the namespace
- `tests/migrate_dry_run_regression.rs::dry_run_does_not_mutate_schema_history_v1089` — GAP-E2E-009. Verifies `migrate --dry-run` exits 0 and `refinery_schema_history` is unchanged
- `tests/codex_models_json_regression.rs::codex_models_json_flag_accepted_as_noop_v1089` — GAP-E2E-010a. Verifies `codex-models --json` exits 0 with JSON envelope
- `tests/cli_db_flag_parity_regression.rs` (5 tests) — GAP-E2E-008 + GAP-E2E-010b. Verifies `embedding status`, `embedding list`, `embedding abandon`, `pending list`, `pending show` all accept `--db <PATH>` without clap error
- `tests/ingest_auto_describe_regression.rs` (5 tests) — GAP-E2E-011. Verifies `extract_heuristic_description(body, path_hint)`:
  - `auto_describe_uses_body_summary` — first meaningful line (>20 chars) wins
  - `auto_describe_falls_back_on_headers_only` — markdown with only headers falls back to `"ingested document"` when no `path_hint`
  - `auto_describe_falls_back_to_stem_when_only_headers` — with `path_hint`, falls back to file stem (e.g. `headers-only`)
  - `auto_describe_truncates_long_line` — descriptions truncated to ≤100 chars
  - `auto_describe_ignores_short_and_blank_lines` — short lines (<21 chars) and blank lines are skipped
- `tests/binary_size_documented_regression.rs::assert_documented_size_matches_real` — GAP-E2E-001. Verifies `Cargo.toml:6` description matches the actual binary size within ±5%
- `tests/health_schema_drift_regression.rs::assert_all_health_keys_in_schema` — GAP-E2E-007. Verifies that all 17 new fields are present in the regenerated `health.schema.json` and that `additionalProperties: true` (Must-Ignore policy per RFC 7493 I-JSON) is honored
## Current Test Suite Size

847 lib tests, 1881 total tests passing via `cargo nextest -P ci` as of v1.0.89. Use `--test-threads=2` for local development; the `ci` profile in `.config/nextest.toml` controls parallelism in CI.
- `ensure_v013_tables_noop_when_tables_exist` — verifies no-op when `memory_embeddings` already exists
- `ensure_v013_tables_creates_when_phantom` — verifies repair when V013 is in history but tables are missing
### Coverage Rationale
- G41 fixed a bug where `run_rehash` registered V013 as applied without executing its SQL
- The updated test validates that the `else` branch removal is correct
- The 3 new tests cover the `ensure_v013_tables_exist` helper for all 3 scenarios (no history, tables exist, phantom)
- Auto-repair in `ensure_db_ready` is covered transitively via the ensure helper

- Auto-repair in `ensure_db_ready` is covered transitively via the ensure helper


## v1.0.79 Test Additions — G42-G52 and Daemon Removal
### Tests Added by Gap
- `embedder::adaptive_batch_for_dim` formula: 6 tests cover the `clamp(base×64/dim, 1, base)` function across dims 64, 128, 256, 384, 4096, plus degenerate cases (dim 0, base 0) and the env-dim wrapper end-to-end with `#[serial_test::serial(env)]`
- `connection.rs`: 4 tests for `adopt_embedding_dim()` covering rw/ro adoption, env precedence, and virgin databases
- `mock-llm`: dim-extraction from prompt and `--output-schema`; batch format detection
- `mocks_64_dim` and `mocks_64_dim_batch`: end-to-end coverage for banks 384 + mock
- `recall` and `hybrid-search`: trigram fallback, vec_degraded field, FTS5-only path
- `vec stats`: `dim_breakdown_groups_rows_per_dim_and_table`
- 2 obsolete daemon tests became regression guards for the daemon removal
- 2 tests of `--autostart-daemon` updated to assert the flag is rejected
- 1 updated test `rehash_does_not_insert_missing_migrations` (replaces the test that validated buggy behavior)
- 9 `#[serial_test::serial(env)]` tests for dim-adoption in chunks/memories/entities
- 3 new unit tests for `ensure_v013_tables_exist` (noop, phantom repair, no history)
### Coverage Rationale
- G42 closed the slow/serialized/fragile LLM embedding pipeline with 9 sub-solutions; tests cover the batch formula, parallelism peak (AtomicUsize), panic-with-permit-RAII, cancellation, divergent-dim failure
- G43 fixed the dim-adoption coverage gap; tests now cover all 4 connection open paths
- G44 made the batch size dim-adaptive; tests verify the formula and the env-dim wrapper
- G50 fixed 6 CI red causes; tests cover the doctest, mock dim, benchmark LLM, language policy, race of dim
- G51 made mocks LLM multi-dim aware; tests cover the dim-extraction and batch shape
- G52 fixed the vec-stats schema contract; tests cover the dim breakdown
- G47 fixed CLI flags documented but missing; tests cover the alias resolution
- G48 fixed G20 blind spot on default values; tests cover `is_some()` check
- G49 fixed silent discard of invalid dim; tests cover `tracing::warn!` emission


## v1.0.80 Test Additions — G45, G53, G55 S2, G56, G58, ADR-0033, ADR-0034
### Tests Added by Gap
- `lock::acquire_embedding_singleton`: 4 tests cover namespace/db scoping, fs4 flock polling, force-break of stale locks, and `is_retryable() == true` for the new `AppError::EmbeddingSingletonLocked` variant
- `semver-checks` CI job: 1 test in `tests/semver_checks_smoke.rs` validates that `cargo +stable semver-checks check-baseline --baseline-version 1.0.79` runs without panicking on the current `Cargo.toml`; the job is informational in v1.0.80 and becomes blocking in v1.0.81
- `windows-2025` CI steps: 2 new steps each in the `clippy` and `test` jobs (gated on `if: matrix.os == 'windows-2025'`) for pre-warm and verify; the workflow YAML is the test artefact (no inline Rust test, validated by re-running the jobs locally)
- `signals::handler`: 1 new test `panic_free_third_signal_exits_130_with_zero_io` validates that even with `SIGPIPE` raised on stderr (the orphaned-process scenario), the handler returns cleanly; the third consecutive Ctrl-C exits with code 130 and ZERO I/O
- `AppError::MemoryNotFound { name, namespace }` and `AppError::MemoryNotFoundById { id }`: 2 tests cover the structural variant; pt-BR messages carry name and namespace
- `embed_entity_texts_cached`: 3 tests cover the `blake3(model || \0 || text)` cache key, the stats snapshot, and the hit rate
- `recall --fallback-fts-only` and `hybrid-search --fallback-fts-only`: 2 tests cover the FTS5-only path; 1 test is `#[ignore]` because the G58 S1 stub requires `PATH` without `codex` or `claude` to exercise `EmbeddingFailed`
- The 7 new test completions in v1.0.80 (4 from G45 singleton + 1 from semver-checks + 1 from signals + 1 from MemoryNotFound) bring the total suite to 1176 tests; 0 failures
### Coverage Rationale
- ADR-0032 stability policy is enforced by `cargo +stable semver-checks` in CI (informational in v1.0.80); the smoke test prevents regressions in the smoke harness itself
- ADR-0033 Windows infra resilience is validated by the new pre-warm and verify steps; local cross-compile validation reproduces `E0463` and is fixed by `rustup target add x86_64-pc-windows-msvc --toolchain 1.88`
- ADR-0034 SHUTDOWN resilience is validated by the panic-free third-signal test; the test reproduces the orphaned-process scenario from the G42/C2 audit
- G45 singleton prevents the multi-session LLM contention pathology; tests cover the `is_retryable` contract
- G55 S2 structural `MemoryNotFound` eliminates the "not found: unknown" class of bugs; tests cover the structural variant
- G56 entity-embed cache reduces cost on canonical entities; tests cover the cache key and the hit rate
- G58 FTS5 fallback keeps the read path alive under OAuth contention; tests cover the FTS5-only path and the `vec_degraded` envelope field




## v1.0.77 Test Additions — G40 Fix Coverage
### Test Count Delta
- v1.0.76 baseline: 719 lib tests passing
- v1.0.77 final: 723 lib tests passing (+4 unit, +2 integration)
### Unit Tests in `src/commands/migrate.rs`
- `sanitize_null_applied_on_fixes_null_rows` — verifies NULL `applied_on` rows get fixed
- `sanitize_null_applied_on_noop_when_all_filled` — verifies no-op when no NULLs exist
- `rehash_insert_includes_applied_on` — verifies INSERT now includes `applied_on` (renamed to `rehash_does_not_insert_missing_migrations` in v1.0.78)
- `remove_vec_tables_noop_when_no_vec` — verifies no-op when no vec tables exist
### Integration Tests in `tests/schema_migration_integration.rs`
- `migrate_rehash_fixes_null_applied_on` — end-to-end rehash with NULL fix
- `migrate_to_llm_only_fixes_null_applied_on` — end-to-end `--to-llm-only` with NULL fix
### Coverage Rationale
- G40 fixed a bug where `applied_on` was NULL after rehash
- The 4 unit tests cover each code path in the migrate module
- The 2 integration tests validate the CLI end-to-end flow


## Why Categorized Tests
### The Thermal Livelock Incident — 2026-04-19
- On 2026-04-19 at 11:37:40, the developer's Intel i9-14900KF reached Tjmax 100°C
- VRM temperature hit 99°C and the system required a hard reset after 3 minutes 11 seconds
- Root cause: `tests/loom_lock_slots.rs` ran without a `#[cfg(sqlite_graphrag_loom)]` gate
- The loom scheduler is CPU-intensive by design — it explores all thread permutations
- Running loom models without isolation causes thermal runaway on high-core-count CPUs
- This was the third incident in seven days caused by the same unguarded test file
- EVERY loom test MUST be gated with `#[cfg(sqlite_graphrag_loom)]` and serialized with `#[serial(loom_model)]`
- NEVER run loom tests inside the default `cargo nextest run` invocation


## Test Categories
### Unit Tests — Inline with Source
- Location: `#[cfg(test)] mod tests` blocks inside each `src/` module
- Run with: `/usr/bin/timeout 300 cargo nextest run --profile default`
- Scope: pure functions, error variants, masking, parsing, validation
- Isolation: no I/O, no filesystem, no HTTP calls
- Gate: always compiled, always run in the default profile
### Integration Tests — Separate Files
- Location: `tests/` directory
- Run with: `/usr/bin/timeout 300 cargo nextest run --profile default`
- Scope: CLI subcommands, JSON schema contracts, PRD compliance, storage CRUD
- Isolation: `TempDir` per test, `env_clear()`, wiremock for HTTP
- Gate: always compiled, always run in the default profile
### Loom Concurrency Tests — Explicit Opt-in Only
- Location: `tests/loom_lock_slots.rs`
- Run with: `/usr/bin/timeout 3900 bash scripts/test-loom.sh` or the CI `loom` job
- Scope: lock-slot semaphore permutation testing
- Isolation: MUST NOT run in parallel with any other test — one model at a time
- Gate: `#[cfg(sqlite_graphrag_loom)]` required on EVERY test function and import block
- Thermal risk: unguarded loom tests triggered system freeze on 2026-04-19
### Slow End-to-End and Stress Tests — Opt-in via Feature Flag
- Location: `tests/` files guarded by `#[cfg(feature = "slow-tests")]`
- Run with: `/usr/bin/timeout 1800 cargo nextest run --profile heavy --features slow-tests`
- Scope: long-running end-to-end smoke suites, contract suites, i18n parity, exit-code routing, high-concurrency load, and extended retry loops
- Gate: excluded from the default and `ci` nextest profiles
- Critical release suites: `/usr/bin/timeout 1200 cargo test --features slow-tests --test doc_contract_integration -- --nocapture`
- Critical release suites: `/usr/bin/timeout 1200 cargo test --features slow-tests --test prd_compliance -- --nocapture`
- CI runs those two contract suites in a dedicated `slow-contracts` job on `ubuntu-latest`
### Benchmarks — Criterion
- Location: `benches/`
- Run with: `/usr/bin/timeout 1800 cargo bench` or `/usr/bin/timeout 1800 cargo criterion`
- Scope: latency baselines for remember, recall, hybrid-search, stats, graph
- Gate: never included in `cargo nextest run`
### Claude Code Ingest Tests
- Unit tests in `src/commands/ingest_claude.rs` cover: JSON parsing, structured_output fallback, error handling, rate limit detection, entity type validation, schema conformance
- 9 unit tests protect extraction parsing invariants without requiring the Claude Code binary
- Integration tests require Claude Code >= 2.1.0 installed locally — run manually, not in CI
- Test names follow `test_parse_claude_output_*` and `test_extraction_schema_*` conventions
### Codex Ingest Tests (v1.0.62)
- 7 unit tests protect Codex JSONL parser in `src/commands/ingest_codex.rs`
- Tests cover: valid extraction, turn.failed errors, rate limit detection, schema validation, binary discovery
- Parser validates "last agent_message wins" pattern for multiple item.completed events
- Integration tests require Codex CLI installed; skip gracefully if unavailable
### v1.0.63 Regression Tests
- 3 integration tests in `tests/v1063_features.rs` protect the v1.0.63 fixes
- `restore_preserves_name_after_rename`: remember → edit → rename → restore; asserts name stays renamed
- `restore_does_not_crash_when_old_name_occupied`: remember A → rename to B → remember new A → restore B; asserts exit 0 (was exit 10 UNIQUE crash before fix)
- `edit_reembeds_when_body_changes`: remember → edit body → recall new content; asserts recall finds the edited memory with accurate score
### v1.0.64 Regression Tests
- 14 unit tests in `src/commands/deep_research.rs` protect query decomposition, bounded concurrency, dedup, evidence chain assembly, and edge cases
- Unit tests in `src/commands/ingest_claude.rs` cover terminal_reason parsing, OAuth detection via apiKeySource, and body size pre-validation
- Unit tests in `src/commands/rename.rs` and `src/commands/rename_entity.rs` cover same-name rejection with exit 1

### v1.0.68 Regression Tests
#### Windows HANDLE Type Fix (G29)
- `tests/terminal_compile_windows.rs` is a new integration test that runs on every platform: confirms `terminal::init_console` and `should_use_ansi` stay callable from outside the crate
- On Windows, the test additionally references the type-safe `HANDLE.is_null() + INVALID_HANDLE_VALUE` check; if the type contract regresses, `cargo check --target x86_64-pc-windows-msvc` in the `windows-build-check` CI job fails before this test is reached
- The new CI job is the canonical regression check; the integration test is the local pre-publish sanity probe
#### Job Singleton (G28-B)
- Three unit tests in `src/lock.rs::tests`: `job_singleton_path_sanitises_namespace` (verifies kebab-case slug from arbitrary input), `job_singleton_blocks_second_invocation_same_namespace` (verifies `AppError::JobSingletonLocked` on second acquire), `job_singleton_allows_different_namespaces` (verifies per-namespace isolation)
- Run via `cargo test --lib lock::tests` (no `#[serial]` because the per-namespace unique IDs in each test isolate them from shared-state interference)
#### Circuit Breaker (G28-D)
- Three unit tests in `src/retry.rs::circuit_breaker_tests`: `opens_after_threshold_consecutive_hard_failures`, `ignores_transient_errors`, `success_resets_consecutive_failures`.  These validate the AttemptOutcome classification that distinguishes `AppError::RateLimited` and `AppError::Timeout` (Transient) from `AppError::Validation` and `AppError::Conflict` (HardFailure)
#### Timezone Pre-Existing Fixes
- Three pre-existing test failures were fixed in `src/commands/{history,list,read}.rs`: tests now parse the ISO string via `chrono::DateTime::parse_from_rfc3339` and compare `timestamp()` against `DateTime::UNIX_EPOCH` instead of asserting the hardcoded `1970-01-01T00:00:00` prefix.  This makes the assertions timezone-agnostic so the test suite is green regardless of `SQLITE_GRAPHRAG_DISPLAY_TZ` env var setting

### v1.0.67 New Command Tests
- `remember-batch` tests in `src/commands/remember_batch.rs`: serialization tests for BatchItemEvent and BatchSummary
- `completions` command: tested via `cargo run -- completions bash` smoke test
- `read --id` integration: tested via `read --id <memory_id> --json` round-trip
- `health` super-hub detection: tested with production database (1059 memories, 3 super-hubs detected)
- `edit` skip-embed: tested via body_hash comparison (idempotent edit skips embedding)
- `rename` ghost purge: tested via forget → rename workflow
- Flag validation: tested via `hybrid-search --max-hops 2` (without `--with-graph`) expecting exit 1

### v1.0.65 New Command Tests
#### Deep Research Tests
- Unit tests in `src/commands/deep_research.rs` cover decompose_query splitting, single-query passthrough, bounded concurrency semaphore, result deduplication, evidence chain assembly (depth >= 2 filter), and empty-query validation
- Contract test `contract_36_deep_research` in `tests/doc_contract_integration.rs`: seeds two memories, runs `deep-research "auth and deploy" --max-sub-queries 2 --k 5`, asserts required keys (`query`, `sub_queries`, `results`, `evidence_chains`, `stats`) and validates `sub_queries[].source` enum
- Schema test `schema_36_deep_research` in `tests/schema_contract_strict.rs`: validates the full response against `docs/schemas/deep-research.schema.json` (Draft 2020-12, `additionalProperties: false`)
#### reclassify-relation Tests
- 8 unit tests in `src/commands/reclassify_relation.rs` cover serialization, dry-run action, merged-duplicates counting, zero-match case, and same-value rejection guard
- Contract test `contract_37_reclassify_relation`: links two entities via `mentions`, runs `reclassify-relation --from-relation mentions --to-relation related --batch --dry-run`, asserts all 7 required keys and `action == "dry_run"`
- Schema test `schema_37_reclassify_relation`: validates against `docs/schemas/reclassify-relation.schema.json`
#### normalize-entities Tests
- 5 unit tests in `src/commands/normalize_entities.rs` cover dry-run count, in-place rename, collision merge, serialization, and dry-run action field
- Contract test `contract_38_normalize_entities`: seeds a memory, runs `normalize-entities --dry-run`, asserts 5 required keys and `action == "dry_run"`
- Schema test `schema_38_normalize_entities`: validates against `docs/schemas/normalize-entities.schema.json`
#### enrich Tests
- Contract test `contract_39_enrich`: seeds a memory, runs `enrich --operation memory-bindings --dry-run`, parses NDJSON lines, asserts validate phase event, scan phase event, preview item events (status=`preview`), and summary line with all required keys
- Schema test `schema_39_enrich`: validates each NDJSON line type against its respective schema (`enrich-phase.schema.json`, `enrich-item-event.schema.json`, `enrich-summary.schema.json`)
- All enrich tests use `--dry-run` to avoid spawning the LLM binary


## How to Run
### Default — Local Development
- Run all unit and integration tests: `/usr/bin/timeout 300 cargo nextest run --profile default`
- Run with output on failure: `/usr/bin/timeout 300 cargo nextest run --profile default --no-capture`
- Run a specific test by name: `/usr/bin/timeout 300 cargo nextest run --profile default test_name_fragment`
- Run a specific file: `/usr/bin/timeout 300 cargo nextest run --profile default -E 'test(schema_contract)'`
### CI — Constrained Parallelism
- Run all tests as CI would: `/usr/bin/timeout 600 cargo nextest run --profile ci`
- The `ci` profile sets `test-threads = 2` and `RUST_TEST_THREADS=2`
- The `ci` profile enables retries on flaky tests
- The workflow also runs `doc_contract_integration` and `prd_compliance` separately with `--features slow-tests`
### Heavy — Stress and Slow Tests
- Run stress and slow tests: `/usr/bin/timeout 1800 cargo nextest run --profile heavy --features slow-tests`
- The `heavy` profile sets `test-threads = 1` for maximum isolation
- NEVER run the `heavy` profile on a thermally throttled machine
- For release validation, prefer the explicit contract commands above before broader heavy runs


## Safe Remember Audit
### Reproduce Installed-Binary Behavior Under cgroup Limits
- Use `/usr/bin/timeout 3900 bash scripts/audit-remember-safely.sh <corpus-dir>` to audit `remember` safely against a real corpus
- The script defaults to the installed `sqlite-graphrag` in `PATH`
- Override the binary with `BIN=./target/debug/sqlite-graphrag` to compare local changes against the published build
- The script uses `systemd-run --user --scope -p MemoryMax=4G -p MemorySwapMax=0`
- The script initializes an isolated temp database for each run
- The CLI is one-shot (no daemon); each embedding call spawns and discards the LLM subprocess
- The script runs known pass, threshold, fail, and synthetic cases


## Loom Concurrency Tests
### How Loom Works
- Loom runs each test many times permuting thread interleavings
- It uses state reduction to avoid combinatorial explosion
- Each model must complete under a bounded preemption count
- CPU usage is extremely high — one core saturates completely per model
- NEVER run loom tests alongside other tests on the same process
### Running Loom Tests Locally
- Use the canonical script: `/usr/bin/timeout 3900 bash scripts/test-loom.sh`
- The script sets `RUSTFLAGS="--cfg sqlite_graphrag_loom"` and `RUST_TEST_THREADS=1`
- The script sets `LOOM_MAX_PREEMPTIONS=1` for bounded local iteration
- Run in release mode only: `--release` is mandatory for acceptable speed
- Monitor CPU temperature before and during the run
### Running Individual Loom Tests
- Build first: `/usr/bin/timeout 600 env RUSTFLAGS="--cfg sqlite_graphrag_loom" cargo build --release --tests`
- Run single model: `/usr/bin/timeout 3600 env RUSTFLAGS="--cfg sqlite_graphrag_loom" RUST_TEST_THREADS=1 cargo nextest run --release -E 'test(lock_slot)'`
- Set lower preemption bound for local iteration: `LOOM_MAX_PREEMPTIONS=1`
- Increase bounds manually only for focused debugging runs
### Checkpoint and Resume
- Set `LOOM_CHECKPOINT_FILE=/tmp/loom-checkpoint.json` to resume interrupted runs
- The checkpoint file records explored permutations so far
- Delete the checkpoint file to start fresh exploration


## Environment Variables
### Loom Variables — Set Before Running `scripts/test-loom.sh`
- `RUSTFLAGS="--cfg sqlite_graphrag_loom"` — enables the project-local loom gate, REQUIRED for all loom tests
- `LOOM_MAX_PREEMPTIONS=1` — limits preemption depth per model (local and CI default: 1)
- `LOOM_MAX_BRANCHES=100` — limits branching factor per execution (local and CI default: 100)
- `LOOM_LOG=1` — enables verbose loom execution tracing to stderr
- `LOOM_CHECKPOINT_FILE=/tmp/loom.json` — path for checkpoint file to resume runs
- `RUST_TEST_THREADS=1` — REQUIRED, forbids parallel execution of loom models
### Cargo and Nextest Variables
- `RUST_TEST_THREADS=N` — controls nextest parallelism at the process level
- `CARGO_TERM_COLOR=always` — preserves color in CI logs
- `NEXTEST_PROFILE=ci` — overrides the active nextest profile from the environment
### sqlite-graphrag-Specific Variables
- `SQLITE_GRAPHRAG_DB_PATH=/tmp/test/graphrag.sqlite` — isolates the project database path per test
- `SQLITE_GRAPHRAG_CACHE_DIR=/tmp/test-cache` — isolates model cache and lock files per test
- `SQLITE_GRAPHRAG_LOG_FORMAT=json` — switches log output to structured JSON
- `SQLITE_GRAPHRAG_DISPLAY_TZ=America/Sao_Paulo` — overrides timestamp timezone


## CI Profiles
### Profile — default
- Activates: always, unless overridden
- `test-threads`: 2
- `RUST_TEST_THREADS`: not set, inherits system default
- Retries: 0
- Slow-timeout: 60s period, terminate after 2 periods (120s effective kill)
- Excludes: loom tests, slow-tests feature
### Profile — ci
- Activates: `/usr/bin/timeout 600 cargo nextest run --profile ci`
- `test-threads`: 2
- `RUST_TEST_THREADS`: 2 (explicit, prevents thermal overload on shared runners)
- Retries: 2 for flaky tests
- Slow-timeout: 180s period, terminate after 3 periods (540s effective kill)
- Excludes: loom tests, slow-tests feature
- Dedicated CI job `slow-contracts` covers `doc_contract_integration` and `prd_compliance` with `/usr/bin/timeout 1200 cargo test --features slow-tests`
### Profile — heavy
- Activates: `/usr/bin/timeout 1800 cargo nextest run --profile heavy --features slow-tests`
- `test-threads`: 1
- `RUST_TEST_THREADS`: 1
- Retries: 0
- Slow-timeout: 900s period, terminate after 2 periods (1800s effective kill)
- Includes: slow-tests feature gated tests
- Excludes: loom tests (always separate)
### Loom CI Job — Separate Workflow Step
- Activates: `ci.yml` job named `loom`
- Environment: `RUSTFLAGS="--cfg sqlite_graphrag_loom"`, `RUST_TEST_THREADS=1`, `LOOM_MAX_PREEMPTIONS=1`, `LOOM_MAX_BRANCHES=100`
- Runs: `/usr/bin/timeout 600 cargo test --test loom_lock_slots --release -- --test-threads=1`
- NEVER merged with the default or ci profile runs


## Troubleshooting
### Thermal Throttling During Tests
- Symptom: test suite slows down progressively, CPU reports high temperature
- Cause: loom tests or stress tests running without proper thread limits
- Fix: stop the test run immediately, let CPU cool for 5 minutes
- Prevention: NEVER run `cargo test` without nextest profiles configured
- Prevention: ALWAYS use `scripts/test-loom.sh` for loom tests
### System Freeze During Loom Tests
- Symptom: machine becomes unresponsive, requires hard reset
- Cause: loom models running in parallel (RUST_TEST_THREADS > 1) on high-TDP CPU
- Fix: hard reset, then set `RUST_TEST_THREADS=1` before any loom run
- Historical case: 2026-04-19 11:37:40 — i9-14900KF froze for 3 minutes 11 seconds
- Prevention: `#[serial(loom_model)]` attribute MUST be present on every loom test
### Loom Test Runs Forever
- Symptom: loom model does not terminate after several minutes
- Cause: `LOOM_MAX_PREEMPTIONS` not set, defaults to unbounded exploration
- Fix: set `LOOM_MAX_PREEMPTIONS=1` for bounded local iteration
- Trade-off: lower values miss rare interleavings; raise the bound only for focused debugging
### Flaky Tests in CI
- Symptom: test passes locally but fails intermittently in CI
- Cause: missing `#[serial]` on tests sharing global state or env vars
- Fix: add `#[serial]` from the `serial_test` crate to affected tests
- Diagnostic: run `/usr/bin/timeout 600 cargo nextest run --profile ci --retries 0` to see all failures


## References

## v1.0.69 Test Inventory
### Test Count Delta
- v1.0.68 baseline: 692 tests passing.
- v1.0.69 final: 745 tests passing (+53).
- 0 failures, 3 ignored (loom tests gated by `#[cfg(sqlite_graphrag_loom)]`).
### New Tests by Module
- `src/commands/claude_runner.rs`: +4 OAuth-only conformance tests (`build_command_oauth_only_mandatory_flags`, `build_command_aborts_when_anthropic_api_key_set`, and 2 more) marked `#[serial_test::serial(env)]` to serialise env mutation.
- `src/commands/codex_spawn.rs`: +4 OAuth-only conformance tests parallel to claude, plus 11 tests for the spawn helper itself (parser edge cases, model validation, command flag presence).
- `src/commands/ingest_claude.rs`: existing tests updated to expect the canonical OAuth-only flag set.
- `src/preservation.rs`: 10 tests for `jaccard_similarity` (boundary conditions, trigrams, empty strings, Unicode) and `PreservationVerdict` (Preserved, Rejected, Unchanged variants).
- `src/memory_source.rs`: 8 tests for `as_str`, `TryFrom<&str>` (valid and invalid), `Display`, and serialisation.
- `src/reaper.rs`: 4 tests (`orphan_min_age_is_one_minute`, `orphan_targets_include_claude_and_codex`, `reaper_report_starts_zeroed`, `scan_completes_without_panic_on_linux`).
- `src/system_load.rs`: 5 tests for `load_average_one`, `ncpus`, and `is_system_saturated`.
- `src/commands/vec.rs`: 3 tests for `vec orphan-list`, `vec purge-orphan`, and `vec stats`.
- `src/commands/optimize.rs`: 1 new test for `OptimizeResponse` field set; existing 2 tests updated.
- `src/lock.rs`: 6 tests (namespace sanitisation, second-invocation blocking, per-namespace isolation, db_hash determinism, db_hash divergence, force flag).
### Serialised Tests
- All 8 OAuth-only tests are marked `#[serial_test::serial(env)]` because they mutate the global environment via `unsafe { std::env::set_var(...) }` and `unsafe { std::env::remove_var(...) }`. Running them in parallel would race.
- The `serial_test` crate (already a project dependency) provides the attribute; the tests are auto-discovered by `cargo nextest run` with serial execution semantics.
### Test Runtime
- Full suite runtime on the reference host: ~10 seconds for the 745 tests.
- The OAuth-only group adds ~0.04 seconds (env mutation is fast).
- Loom tests are NOT included in the default count; they are gated and must be run via `scripts/test-loom.sh`.
- loom crate documentation: `https://docs.rs/loom/latest/loom/`
- loom GitHub repository: `https://github.com/tokio-rs/loom`
- cargo-nextest documentation: `https://nexte.st/`
- cargo-nextest configuration reference: `https://nexte.st/docs/configuration/`
- serial_test crate: `https://docs.rs/serial_test/latest/serial_test/`


## v1.0.82 Test Suite Notes
### Test Count and Known Flakes
- v1.0.82 ships with 807 tests, 1 ignored, 0 failing (per the gaps.md ledger at 2026-06-15)
- The four new subcommands (`pending`, `slots`, `embedding`, `pending-embeddings`) each have 2-3 unit tests and 1-2 integration tests
- The 5 new ADRs (0036-0040) each have a regression test in `tests/` named after the ADR number
- Known flake: `slot_enforces_max_concurrency` is timing-sensitive on slow CI runners; it is auto-retried once with a 50ms backoff before being marked as failed
- Known flake: `pending-embeddings process reprocesses failed rows` requires a working OAuth session; gate it on `tests/mock-llm/codex` being on `PATH`
- The new `fs4` crate (NOT `fs2`) is exercised in `src/llm_slots.rs::acquire_llm_slot`; the test `llm_slots_acquire_release_cross_process` runs 2 child processes that race for the same slot
### Test Plan Artifact
- See `docs/TEST_PLAN_v1.0.82.md` for the 10-phase end-to-end validation plan
- The plan validates schema migrations V014 and V015, all 5 ADR decisions, the new exit code 19, and the codex OAuth 401 incident mitigation
- Run via `bash docs/TEST_PLAN_v1.0.82.md`'s Phase 1 to Phase 10 sequentially with a fresh database per run