Expand description
Serialization and deserialization support for Dictionary
This module provides efficient zero-copy serialization using epserde for sux-rs types (BitFieldVec, etc.) combined with native pthash serialization for MPHF functions.
§File Format
The serialization uses a two-file approach:
Main Index File (index.ssi):
- DictionarySerializationHeader (magic, version, k, m, canonical, num_mphf_partitions)
- SpectrumPreservingStringSet (epserde format)
- SparseAndSkewIndex (epserde format, excluding MPHF)
MPHF Container File (index.ssi.mphf):
MphfContainerHeader
├─ magic: "SSHIMH01"
├─ version_major: u32
├─ version_minor: u32
└─ num_partitions: u32
Offset Table ([num_partitions] entries):
├─ MphfPartitionEntry 0
│ ├─ partition_id: u32
│ ├─ byte_offset: u64
│ └─ byte_size: u64
├─ MphfPartitionEntry 1
└─ ...
Data Section (variable length):
├─ MPHF partition 0 (raw fmph::GOFunction serialization)
├─ MPHF partition 1
└─ ...§Benefits of Single MPHF Container
- Scalability: Works with 1 or 1000 partitions equally well (single file)
- Random access: Offset table enables seeking to any partition
- Memory mappable: Entire container can be mmap’d
- Efficient: No per-file overhead, compact layout
- Clean separation: MPHF container is independent binary format
§Zero-Copy Deserialization
When deserializing, sux-rs types are handled by epserde:
BitFieldVec<Vec<usize>>deserializes asBitFieldVec<&[usize]>(ε-copy)- The deserialized Dictionary can be memory-mapped for instant loading
Structs§
- Dictionary
Serialization Header - Header for the serialized Dictionary
- Mphf
Container Header - Header for the MPHF container file
- Mphf
Partition Entry - Entry in the MPHF container offset table
Enums§
- Serialization
Error - Serialization errors
Functions§
- index_
file_ path - Build the main index file path from a base path
- mphf_
container_ path - Build the MPHF container file path from a base path
- read_
mphf_ container - Read MPHFs from a container format
- write_
mphf_ container - Helper functions for MPHF container operations
Type Aliases§
- Serialization
Result - Result type for serialization operations