Skip to main content

Module serialization

Module serialization 

Source
Expand description

Serialization and deserialization support for Dictionary

This module provides efficient zero-copy serialization using epserde for sux-rs types (BitFieldVec, etc.) combined with native pthash serialization for MPHF functions.

§File Format

The serialization uses a two-file approach:

Main Index File (index.ssi):

  • DictionarySerializationHeader (magic, version, k, m, canonical, num_mphf_partitions)
  • SpectrumPreservingStringSet (epserde format)
  • SparseAndSkewIndex (epserde format, excluding MPHF)

MPHF Container File (index.ssi.mphf):

MphfContainerHeader
  ├─ magic: "SSHIMH01"
  ├─ version_major: u32
  ├─ version_minor: u32
  └─ num_partitions: u32
Offset Table ([num_partitions] entries):
  ├─ MphfPartitionEntry 0
  │  ├─ partition_id: u32
  │  ├─ byte_offset: u64
  │  └─ byte_size: u64
  ├─ MphfPartitionEntry 1
  └─ ...
Data Section (variable length):
  ├─ MPHF partition 0 (raw fmph::GOFunction serialization)
  ├─ MPHF partition 1
  └─ ...

§Benefits of Single MPHF Container

  • Scalability: Works with 1 or 1000 partitions equally well (single file)
  • Random access: Offset table enables seeking to any partition
  • Memory mappable: Entire container can be mmap’d
  • Efficient: No per-file overhead, compact layout
  • Clean separation: MPHF container is independent binary format

§Zero-Copy Deserialization

When deserializing, sux-rs types are handled by epserde:

  • BitFieldVec<Vec<usize>> deserializes as BitFieldVec<&[usize]> (ε-copy)
  • The deserialized Dictionary can be memory-mapped for instant loading

Structs§

DictionarySerializationHeader
Header for the serialized Dictionary
MphfContainerHeader
Header for the MPHF container file
MphfPartitionEntry
Entry in the MPHF container offset table

Enums§

SerializationError
Serialization errors

Functions§

index_file_path
Build the main index file path from a base path
mphf_container_path
Build the MPHF container file path from a base path
read_mphf_container
Read MPHFs from a container format
write_mphf_container
Helper functions for MPHF container operations

Type Aliases§

SerializationResult
Result type for serialization operations