pub struct FactorizedChunk { /* private fields */ }Expand description
A chunk that supports factorized representation across multiple levels.
Columns are organized in groups by their factorization level:
- Level 0 (flat): Base columns, one value per logical row
- Level 1 (unflat): First expansion, grouped by level 0
- Level 2 (unflat): Second expansion, grouped by level 1
- And so on…
§State Management
The chunk maintains a ChunkState that provides:
- Cached multiplicities for O(1) aggregate access
- Selection vector support for lazy filtering
- Generation tracking for cache invalidation
Implementations§
Source§impl FactorizedChunk
impl FactorizedChunk
Sourcepub fn from_flat(chunk: &DataChunk, column_names: Vec<String>) -> Self
pub fn from_flat(chunk: &DataChunk, column_names: Vec<String>) -> Self
Creates a factorized chunk from a flat DataChunk.
The resulting chunk has a single level (level 0) with all columns flat.
Sourcepub fn with_flat_level(
columns: Vec<ValueVector>,
column_names: Vec<String>,
) -> Self
pub fn with_flat_level( columns: Vec<ValueVector>, column_names: Vec<String>, ) -> Self
Creates a factorized chunk with a single flat level.
Sourcepub fn level_count(&self) -> usize
pub fn level_count(&self) -> usize
Returns the number of factorization levels.
Sourcepub fn logical_row_count(&self) -> usize
pub fn logical_row_count(&self) -> usize
Returns the logical row count (full Cartesian product size).
Sourcepub fn physical_size(&self) -> usize
pub fn physical_size(&self) -> usize
Returns the physical storage size (actual values stored).
Sourcepub fn chunk_state(&self) -> &ChunkState
pub fn chunk_state(&self) -> &ChunkState
Returns the chunk state.
Sourcepub fn chunk_state_mut(&mut self) -> &mut ChunkState
pub fn chunk_state_mut(&mut self) -> &mut ChunkState
Returns mutable access to the chunk state.
Sourcepub fn path_multiplicities_cached(&mut self) -> Arc<[usize]>
pub fn path_multiplicities_cached(&mut self) -> Arc<[usize]>
Returns path multiplicities, computing once and caching.
This is the key optimization for aggregation: multiplicities are computed once and reused for all aggregates (COUNT, SUM, AVG, etc.).
§Example
let mults = chunk.path_multiplicities_cached();
let sum = chunk.sum_deepest_with_multiplicities(0, &mults);
let avg = chunk.avg_deepest_with_multiplicities(0, &mults);Sourcepub fn level(&self, index: usize) -> Option<&FactorizationLevel>
pub fn level(&self, index: usize) -> Option<&FactorizationLevel>
Returns a level by index.
Sourcepub fn level_mut(&mut self, index: usize) -> Option<&mut FactorizationLevel>
pub fn level_mut(&mut self, index: usize) -> Option<&mut FactorizationLevel>
Returns a mutable level by index.
Sourcepub fn add_level(
&mut self,
columns: Vec<ValueVector>,
column_names: Vec<String>,
offsets: &[u32],
)
pub fn add_level( &mut self, columns: Vec<ValueVector>, column_names: Vec<String>, offsets: &[u32], )
Adds a new factorization level for expansion results.
The new level’s multiplicities determine how many values each parent in the previous level expands to.
§Arguments
columns- Columns at the new levelcolumn_names- Names for the new columnsoffsets- Offset array whereoffsets[i]is the start index for parenti
Sourcepub fn add_factorized_level(&mut self, level: FactorizationLevel)
pub fn add_factorized_level(&mut self, level: FactorizationLevel)
Adds a level with pre-computed factorized vectors.
Sourcepub fn flatten(&self) -> DataChunk
pub fn flatten(&self) -> DataChunk
Flattens to a regular DataChunk (materializes the Cartesian product).
All levels are expanded into flat rows.
Sourcepub fn logical_row_iter(&self) -> FactorizedRowIterator<'_> ⓘ
pub fn logical_row_iter(&self) -> FactorizedRowIterator<'_> ⓘ
Returns an iterator over logical rows without materializing.
Each iteration yields a vector of physical indices, one per level.
Sourcepub fn total_column_count(&self) -> usize
pub fn total_column_count(&self) -> usize
Gets the total number of columns across all levels.
Sourcepub fn all_column_names(&self) -> Vec<String>
pub fn all_column_names(&self) -> Vec<String>
Gets all column names in order across all levels.
Sourcepub fn filter_deepest<F>(&self, column_idx: usize, predicate: F) -> Option<Self>
pub fn filter_deepest<F>(&self, column_idx: usize, predicate: F) -> Option<Self>
Filters the deepest level in-place using a predicate on column values.
This is the key optimization: instead of flattening and filtering all rows, we filter only at the deepest level and update parent multiplicities.
§Arguments
column_idx- Column index within the deepest level to filter onpredicate- Function that returns true for values to keep
§Returns
A new FactorizedChunk with filtered values, or None if all rows are filtered out.
Sourcepub fn filter_deepest_multi<F>(&self, predicate: F) -> Option<Self>
pub fn filter_deepest_multi<F>(&self, predicate: F) -> Option<Self>
Filters the deepest level using a multi-column predicate.
This allows filtering based on values from multiple columns in the deepest level.
§Arguments
predicate- Function that takes a slice of values (one per column) and returns true to keep
Sourcepub fn count_rows(&self) -> usize
pub fn count_rows(&self) -> usize
Computes COUNT(*) without flattening - returns the logical row count.
This is O(n) where n is the number of physical values, instead of O(m) where m is the number of logical rows (which can be exponentially larger).
§Example
For a 3-level chunk:
- Level 0: 100 sources
- Level 1: 10 neighbors each = 1,000 physical
- Level 2: 10 neighbors each = 10,000 physical
- Logical rows = 100 * 10 * 10 = 10,000
count_rows() returns 10,000 by computing from multiplicities, not by
iterating through all logical rows.
Sourcepub fn compute_path_multiplicities(&self) -> Vec<usize>
pub fn compute_path_multiplicities(&self) -> Vec<usize>
Computes the effective multiplicity for each value at the deepest level.
This is how many times each value would appear in the flattened result. For example, if a source has 3 first-hop neighbors and each has 2 second-hop neighbors, each first-hop value has multiplicity 2 (appearing in 2 paths).
§Returns
A vector where result[i] is the multiplicity of physical value i at the
deepest level. The sum of all multiplicities equals logical_row_count().
§Note
For repeated access (e.g., computing multiple aggregates), prefer using
path_multiplicities_cached which
caches the result and avoids O(levels) recomputation.
Sourcepub fn sum_deepest(&self, column_idx: usize) -> Option<f64>
pub fn sum_deepest(&self, column_idx: usize) -> Option<f64>
Computes SUM on a numeric column at the deepest level without flattening.
Each value is multiplied by its effective multiplicity (how many times it would appear in the flattened result).
§Arguments
column_idx- Column index within the deepest level
§Returns
The sum as f64, or None if the column doesn’t exist or contains non-numeric values.
Sourcepub fn avg_deepest(&self, column_idx: usize) -> Option<f64>
pub fn avg_deepest(&self, column_idx: usize) -> Option<f64>
Sourcepub fn min_deepest(&self, column_idx: usize) -> Option<Value>
pub fn min_deepest(&self, column_idx: usize) -> Option<Value>
Computes MIN on a column at the deepest level without flattening.
Unlike SUM/AVG, MIN doesn’t need multiplicities - we just find the minimum among all physical values.
§Arguments
column_idx- Column index within the deepest level
§Returns
The minimum value, or None if the column doesn’t exist or is empty.
Sourcepub fn max_deepest(&self, column_idx: usize) -> Option<Value>
pub fn max_deepest(&self, column_idx: usize) -> Option<Value>
Computes MAX on a column at the deepest level without flattening.
Unlike SUM/AVG, MAX doesn’t need multiplicities - we just find the maximum among all physical values.
§Arguments
column_idx- Column index within the deepest level
§Returns
The maximum value, or None if the column doesn’t exist or is empty.
Trait Implementations§
Source§impl Clone for FactorizedChunk
impl Clone for FactorizedChunk
Source§fn clone(&self) -> FactorizedChunk
fn clone(&self) -> FactorizedChunk
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more