Skip to main content

Module persist

Module persist 

Source
Expand description

Persistent structural index — serialize to .sxi, load via mmap.

The .sxi (SIMD XML Index) format stores the complete XmlIndex as flat arrays in a single file. On subsequent loads, the XML is mmap’d and arrays are read from the .sxi file, avoiding the entire parse pipeline.

§File Format

[Header: 64 bytes]
  magic: [u8; 4]    = b"SXI\x01"
  version: u32       = 1
  xml_hash: [u8; 8]  = xxh3-64 of XML bytes
  tag_count: u32
  text_count: u32
  name_count: u16
  flags: u16          = bit 0: has_name_index, bits 1-15: reserved
  bloom: [u8; 16]     = reserved for Phase 3 bloom filter
  padding: [u8; 16]

[Offset table: N x u64]  byte offsets of each section

[Section 0..12]  structural arrays (tag_starts, tag_ends, ...)
[Section 13]     name index (name_ids, name_table, flattened posting lists)

Structs§

OwnedXmlIndex
A self-contained index that owns both the XML bytes and the structural index.

Functions§

load_index
Load a .sxi index file and the corresponding XML file.
load_index_with_bytes
Load a .sxi index using XML bytes already in memory.
read_bloom
Read just the bloom filter from an .sxi file header.
serialize_index
Serialize an XmlIndex to a .sxi file.