Vortex OnPair
A Vortex Encoding for Binary and Utf8 data that uses the OnPair short-string compression algorithm. OnPair is a dictionary-based encoder with fast per-row random access.
The trainer / encoder lives in the standalone onpair
crate; this crate wraps the resulting column as a Vortex array with
cascading-compressor support on every integer child.
Compute
Like the FSST encoding, this crate provides cast and filter
pushdown. Other operators fall back to ordinary decompression.
Default Configuration
The default training preset is dict-12: 12 bits per token,
dictionary capped at 4 096 entries. Token codes are stored as a
PrimitiveArray<u16>; downstream FastLanes::BitPacking losslessly
narrows the child to exactly bits-bit codes on disk.
Layout
- Buffer 0 —
dict_bytes: dictionary blob built by the OnPair trainer, padded withMAX_TOKEN_SIZEtrailing zero bytes so the over-copy decoder can read 16 bytes past the last token. - Slot 0 —
dict_offsets:PrimitiveArray<u32>, lendict_size + 1. - Slot 1 —
codes:PrimitiveArray<u16>, lengthtotal_tokens. - Slot 2 —
codes_offsets:PrimitiveArray<u32>, lengthnum_rows + 1. - Slot 3 —
uncompressed_lengths: integerPrimitiveArray, lengthnum_rows. - Slot 4 — optional validity child.
All four integer slot children flow through the standard cascading compressor pipeline (FoR / BitPacking / RunEnd / etc.).