Expand description
In-memory manifest tracking indexed files for online reconciliation.
Each entry stores cheap stat data — (mtime, size, inode) on Unix
(inode = 0 on Windows / unavailable filesystems) — plus a blake3
content hash. Reconciliation runs on every search via
RipvecIndex::diff_against:
- Walk the corpus with the same [
WalkOptions] used at index construction. - For each walked file: compare the stat tuple to the manifest entry. Match → guaranteed-unchanged, skip.
- For mismatches: read the file, blake3-hash, compare against the
stored hash. Match → metadata-only change (vim save-no-edit,
build-tool touch), update the manifest’s stat tuple in place to
short-circuit future diffs. Mismatch → record as
dirty. - Manifest entries not seen during the walk →
deleted. - Walked paths not in the manifest →
new.
If the resulting Diff is empty, the existing index is up-to-date
and no work is needed. Otherwise the caller rebuilds.
§Why blake3 + the stat tuple
The stat tuple is the cheap pre-filter: warm stat() is ~1 µs per
file, so the whole tuple check on a 200-file repo is sub-millisecond.
Most files won’t have a stat change between queries; the cheap path
skips them entirely.
When the stat tuple does mismatch, the question is whether content
actually changed. Reading + blake3’ing a typical 1-30 KB source file
costs ~1-20 µs warm — two orders of magnitude cheaper than the
~1-5 ms cost of re-chunking and re-embedding it. The break-even is
“blake3 is worth it when more than 0.7% of stat changes are touches
rather than real edits”; real-world workflows have 5-50% touch rates
(vim :w with no edits, autoformatters that hash-equal their input,
build tools that touch source for dependency tracking).
§Inode as a third dimension
(mtime, size) alone has a rare blind spot: same-byte-count
content swaps. Atomic-rename saves (the modern editor default) bump
the inode, so adding inode to the tuple catches those without a
blake3 round-trip. Inode is best-effort: 0 on Windows, where we
fall back to (mtime, size). The blake3 verification path still
guarantees correctness even when the inode signal is unavailable.
Structs§
- Diff
- Categorized filesystem changes detected by
diff_against_walk. - File
Entry - One file’s tracked state in the manifest.
- Manifest
- Per-root manifest of indexed files.
Functions§
- diff_
against_ walk - Compare the manifest to the current filesystem state and produce a
Diff.