Skip to main content

Module fsst

Module fsst 

Source
Expand description

FSST (Fast Static Symbol Table) codec for string/log columns.

Builds a lightweight dictionary of common substrings (1-8 bytes) and encodes strings as sequences of symbol table indices. Unlike whole-string dictionary encoding, FSST handles partial overlap — strings sharing prefixes or suffixes compress well even if no two strings are identical.

Compression: 3-5x on string columns before any terminal compressor. Combined with lz4_flex terminal: 8-15x total on structured log data.

Decompression: simple table lookup — fast enough to query directly over encoded data.

Wire format:

[2 bytes] symbol count (LE u16, max 255)
[symbol_count × (1 + len) bytes] symbol table: (len: u8, bytes: [u8; len])
[4 bytes] total encoded length (LE u32)
[4 bytes] string count (LE u32)
[string_count × 4 bytes] encoded string offsets (cumulative LE u32)
[N bytes] encoded data (symbol indices interleaved with escape+literal)

Escape mechanism: byte value 255 followed by a literal byte encodes bytes not covered by any symbol. Symbol indices are 0..254.

Functions§

decode
Decode FSST-compressed data back to strings.
decode_delimited
Convenience: decode and reassemble with delimiter.
encode
Encode a batch of strings using FSST compression.
encode_delimited
Convenience: encode a single contiguous byte buffer that contains multiple strings separated by a delimiter (e.g., newlines for log data).