Skip to main content

grafeo_core/codec/
block.rs

1//! Block descriptors for columnar storage.
2//!
3//! A "block" is a logical chunk of a column's rows that can be skipped or
4//! processed as a unit. Blocks bridge raw codec output (bit-packed
5//! integers, dictionary-encoded strings, boolean bitmaps, etc.) and
6//! higher-level scan operators that want to prune work using per-block
7//! summary statistics.
8//!
9//! This module lives in `codec` rather than under any particular store
10//! because both [`graph::compact::ColumnCodec`](crate::graph::compact::column::ColumnCodec)
11//! (the read-only columnar base) and the LPG store's `PropertyColumn`
12//! (the mutable in-memory store, modernized later in Phase 2) describe
13//! their data using the same block descriptors.
14//!
15//! # Phasing
16//!
17//! Phase 2a introduces the descriptor type and the enumeration API on
18//! `ColumnCodec`; every column today reports a single block covering
19//! all rows. Phase 2b will introduce multi-block serialization, and
20//! Phase 2c will add per-block statistics (min/max/null/row counts,
21//! optional bloom) so range scans can skip blocks whose stats prove
22//! no match. Phase 2d/2e extend the same descriptors to LpgStore.
23//!
24//! The split is deliberate: Phase 2a keeps the data layout untouched so
25//! upstream readers can be extended block-aware before any on-disk
26//! format changes.
27
28/// Default number of rows per block when serializing v2 columns.
29///
30/// Chosen so a `BitPacked` column with `bits_per_value == 4` fills exactly
31/// 64 `u64` words per block (4 KiB); other widths land within ~1 KiB of
32/// the same target. Per-block stats (Phase 2c) and iterator early-stop
33/// (Phase 4) get coarse-but-useful skip granularity at this size without
34/// blowing up the block-index footprint.
35pub const DEFAULT_BLOCK_ROWS: u32 = 1024;
36
37/// Descriptor for a logical block within a column.
38///
39/// `row_count` is the runtime view: how many logical rows the block
40/// covers. Phase 2c will add optional `min`, `max`, `null_count`, and
41/// `bloom` fields for per-block pruning. The on-disk block index used
42/// by v2 column serialization carries additional `byte_offset` and
43/// `byte_len` fields, but those are an internal serialization detail
44/// and not exposed on this runtime descriptor.
45#[derive(Debug, Clone, PartialEq, Eq)]
46pub struct BlockEntry {
47    /// Number of logical rows (values) in this block.
48    pub row_count: u32,
49}
50
51impl BlockEntry {
52    /// Constructs a block entry with the given row count.
53    #[must_use]
54    pub const fn new(row_count: u32) -> Self {
55        Self { row_count }
56    }
57}