Skip to main content

Module segment

Module segment 

Source
Expand description

The ANNSEG body format: a storage-agnostic byte encoding of everything a built AnnIndex holds EXCEPT the f32 vectors. The vectors are rehydrated at load time from the table rows themselves - the rows are the source of truth, and the rehydration scan doubles as the staleness proof (it computes the content fingerprint the storage layer compares against its header).

Layout: a fixed sequence of REQUIRED sections, each [tag u8][len u64 LE][payload][blake3(payload) 32B]. Per-section hashes refuse corruption at the section that broke; the storage layer additionally hashes the whole body. All integers little-endian. Any layout change bumps the storage header’s format_version - this module never reads old formats silently.

PointStore.vectors order is PRISM-INTERNAL (cell-reordered): loaders must place each scanned row’s vector at inverse(id_map)[row_id], never in scan order - a scan-order fill silently corrupts every f32 rerank.

Structs§

SegmentParts
Everything a segment carries; vectors arrive separately via SegmentParts::into_index.

Enums§

SegmentError

Functions§

decode
Decode a segment body. Every section’s BLAKE3 must verify; any mismatch is a corruption refusal, never a partial result.
encode
Encode everything but the vectors. The output is the segment BODY; the storage layer wraps it in its header (fingerprint, config hash, counts).
metric_tag
prism_config_hash
BLAKE3 of the canonical little-endian encoding of EVERY PrismConfig field, domain-separated. The storage header pins this; a binary whose active config differs must refuse the segment (the graph was built for a different search geometry). The domain string carries the search-geometry version: bump it whenever build or search semantics change shape.