Expand description
Persistent structural index — serialize to .sxi, load via mmap.
The .sxi (SIMD XML Index) format stores the complete XmlIndex as flat
arrays in a single file. On subsequent loads, the XML is mmap’d and arrays
are read from the .sxi file, avoiding the entire parse pipeline.
§File Format
[Header: 64 bytes]
magic: [u8; 4] = b"SXI\x01"
version: u32 = 1
xml_hash: [u8; 8] = xxh3-64 of XML bytes
tag_count: u32
text_count: u32
name_count: u16
flags: u16 = bit 0: has_name_index, bits 1-15: reserved
bloom: [u8; 16] = reserved for Phase 3 bloom filter
padding: [u8; 16]
[Offset table: N x u64] byte offsets of each section
[Section 0..12] structural arrays (tag_starts, tag_ends, ...)
[Section 13] name index (name_ids, name_table, flattened posting lists)Structs§
- Owned
XmlIndex - A self-contained index that owns both the XML bytes and the structural index.
Functions§
- load_
index - Load a
.sxiindex file and the corresponding XML file. - load_
index_ with_ bytes - Load a
.sxiindex using XML bytes already in memory. - read_
bloom - Read just the bloom filter from an
.sxifile header. - serialize_
index - Serialize an
XmlIndexto a.sxifile.