Skip to main content

target_data_page_bytes

Function target_data_page_bytes 

Source
pub fn target_data_page_bytes(level: Level) -> usize
Expand description

Target data-page byte size per LSM level. Same hot→cold rationale as target_row_group_bytes: small pages favor selective reads at L0, large pages favor scan throughput at L2+.

§Per-column page-size rationale

Parquet-rs (v53) does NOT support per-column data_page_size_limit — the setting is global across all columns within a file. To approximate per-column tuning:

  • L0 (rowstore): 8 KiB globally. All columns (_merutable_ikey, _merutable_value, typed cols) use small pages for fast point lookups. The _merutable_ikey and _merutable_value columns carry the KV store’s hot-path data; small pages minimize decompression per lookup.

  • L1+ (columnstore): 128 KiB globally for analytical scan throughput on typed columns. The _merutable_ikey column uses PLAIN encoding (set by build_column_encoding_props) so its pages remain direct-access friendly even at the larger page size — PLAIN-encoded binary keys decompress with zero overhead vs. dictionary/RLE.

The per-column ENCODING settings (build_column_encoding_props) provide the differentiation that per-column page sizes cannot:

  • _merutable_ikey: PLAIN — O(1) decode per key, ideal for lookups
  • _merutable_value (L0 only): PLAIN — opaque postcard blobs
  • Int32/Int64 typed cols: DELTA_BINARY_PACKED — optimal for sorted ints
  • Float/Double typed cols: BYTE_STREAM_SPLIT — IEEE 754 byte-transposition
  • ByteArray typed cols: RLE_DICTIONARY — high compression for strings