Expand description
The ANNSEG body format: a storage-agnostic byte encoding of everything a
built AnnIndex holds EXCEPT the f32 vectors. The vectors are rehydrated
at load time from the table rows themselves - the rows are the source of
truth, and the rehydration scan doubles as the staleness proof (it computes
the content fingerprint the storage layer compares against its header).
Layout: a fixed sequence of REQUIRED sections, each
[tag u8][len u64 LE][payload][blake3(payload) 32B]. Per-section hashes
refuse corruption at the section that broke; the storage layer additionally
hashes the whole body. All integers little-endian. Any layout change bumps
the storage header’s format_version - this module never reads old
formats silently.
PointStore.vectors order is PRISM-INTERNAL (cell-reordered): loaders must
place each scanned row’s vector at inverse(id_map)[row_id], never in scan
order - a scan-order fill silently corrupts every f32 rerank.
Structs§
- Segment
Parts - Everything a segment carries; vectors arrive separately via
SegmentParts::into_index.
Enums§
Functions§
- decode
- Decode a segment body. Every section’s BLAKE3 must verify; any mismatch is a corruption refusal, never a partial result.
- encode
- Encode everything but the vectors. The output is the segment BODY; the storage layer wraps it in its header (fingerprint, config hash, counts).
- metric_
tag - prism_
config_ hash - BLAKE3 of the canonical little-endian encoding of EVERY
PrismConfigfield, domain-separated. The storage header pins this; a binary whose active config differs must refuse the segment (the graph was built for a different search geometry).