pub fn target_data_page_bytes(level: Level) -> usizeExpand description
Target data-page byte size per LSM level. Same hot→cold rationale
as target_row_group_bytes: small pages favor selective reads at L0,
large pages favor scan throughput at L2+.
§Per-column page-size rationale
Parquet-rs (v53) does NOT support per-column data_page_size_limit —
the setting is global across all columns within a file. To approximate
per-column tuning:
-
L0 (rowstore): 8 KiB globally. All columns (
_merutable_ikey,_merutable_value, typed cols) use small pages for fast point lookups. The_merutable_ikeyand_merutable_valuecolumns carry the KV store’s hot-path data; small pages minimize decompression per lookup. -
L1+ (columnstore): 128 KiB globally for analytical scan throughput on typed columns. The
_merutable_ikeycolumn uses PLAIN encoding (set bybuild_column_encoding_props) so its pages remain direct-access friendly even at the larger page size — PLAIN-encoded binary keys decompress with zero overhead vs. dictionary/RLE.
The per-column ENCODING settings (build_column_encoding_props) provide
the differentiation that per-column page sizes cannot:
_merutable_ikey: PLAIN — O(1) decode per key, ideal for lookups_merutable_value(L0 only): PLAIN — opaque postcard blobs- Int32/Int64 typed cols: DELTA_BINARY_PACKED — optimal for sorted ints
- Float/Double typed cols: BYTE_STREAM_SPLIT — IEEE 754 byte-transposition
- ByteArray typed cols: RLE_DICTIONARY — high compression for strings