Skip to main content

Module ndjson

Module ndjson 

Source
Expand description

NDJSON columnar reorg — lossless transform that reorders row-oriented NDJSON data into column-oriented layout.

Two strategies:

Strategy 1 (uniform): All rows share the same schema (keys in same order). Row-oriented (before): {“ts”:“2026-03-15T10:30:00.001Z”,“type”:“page_view”,“user”:“usr_a1b2c3d4”} {“ts”:“2026-03-15T10:30:00.234Z”,“type”:“api_call”,“user”:“usr_a1b2c3d4”} Column-oriented (after): [ts values] “2026-03-15T10:30:00.001Z” \x01 “2026-03-15T10:30:00.234Z” \x00 [type values] “page_view” \x01 “api_call” \x00 [user values] “usr_a1b2c3d4” \x01 “usr_a1b2c3d4”

Strategy 2 (grouped): Rows have diverse schemas (e.g., GitHub Archive events). Groups rows by schema, applies Strategy 1 per group, stores residual rows raw. Metadata version byte distinguishes: 1 = uniform, 2 = grouped.

Separators: \x00 = column separator (cannot appear in valid JSON text) \x01 = value separator within a column (cannot appear in valid JSON)

Functions§

preprocess
Forward transform: NDJSON columnar reorg.
reverse
Reverse transform: reconstruct NDJSON from columnar layout + metadata. Dispatches to the appropriate decoder based on metadata version byte.