Skip to main content

FactorizedChunk

Struct FactorizedChunk 

Source
pub struct FactorizedChunk { /* private fields */ }
Expand description

A chunk that supports factorized representation across multiple levels.

Columns are organized in groups by their factorization level:

  • Level 0 (flat): Base columns, one value per logical row
  • Level 1 (unflat): First expansion, grouped by level 0
  • Level 2 (unflat): Second expansion, grouped by level 1
  • And so on…

§State Management

The chunk maintains a ChunkState that provides:

  • Cached multiplicities for O(1) aggregate access
  • Selection vector support for lazy filtering
  • Generation tracking for cache invalidation

Implementations§

Source§

impl FactorizedChunk

Source

pub fn empty() -> Self

Creates an empty factorized chunk.

Source

pub fn from_flat(chunk: &DataChunk, column_names: Vec<String>) -> Self

Creates a factorized chunk from a flat DataChunk.

The resulting chunk has a single level (level 0) with all columns flat.

Source

pub fn with_flat_level( columns: Vec<ValueVector>, column_names: Vec<String>, ) -> Self

Creates a factorized chunk with a single flat level.

Source

pub fn level_count(&self) -> usize

Returns the number of factorization levels.

Source

pub fn logical_row_count(&self) -> usize

Returns the logical row count (full Cartesian product size).

Source

pub fn physical_size(&self) -> usize

Returns the physical storage size (actual values stored).

Source

pub fn chunk_state(&self) -> &ChunkState

Returns the chunk state.

Source

pub fn chunk_state_mut(&mut self) -> &mut ChunkState

Returns mutable access to the chunk state.

Source

pub fn path_multiplicities_cached(&mut self) -> Arc<[usize]>

Returns path multiplicities, computing once and caching.

This is the key optimization for aggregation: multiplicities are computed once and reused for all aggregates (COUNT, SUM, AVG, etc.).

§Example
let mults = chunk.path_multiplicities_cached();
let sum = chunk.sum_deepest_with_multiplicities(0, &mults);
let avg = chunk.avg_deepest_with_multiplicities(0, &mults);
Source

pub fn level(&self, index: usize) -> Option<&FactorizationLevel>

Returns a level by index.

Source

pub fn level_mut(&mut self, index: usize) -> Option<&mut FactorizationLevel>

Returns a mutable level by index.

Source

pub fn add_level( &mut self, columns: Vec<ValueVector>, column_names: Vec<String>, offsets: &[u32], )

Adds a new factorization level for expansion results.

The new level’s multiplicities determine how many values each parent in the previous level expands to.

§Arguments
  • columns - Columns at the new level
  • column_names - Names for the new columns
  • offsets - Offset array where offsets[i] is the start index for parent i
Source

pub fn add_factorized_level(&mut self, level: FactorizationLevel)

Adds a level with pre-computed factorized vectors.

Source

pub fn flatten(&self) -> DataChunk

Flattens to a regular DataChunk (materializes the Cartesian product).

All levels are expanded into flat rows.

Source

pub fn logical_row_iter(&self) -> FactorizedRowIterator<'_>

Returns an iterator over logical rows without materializing.

Each iteration yields a vector of physical indices, one per level.

Source

pub fn total_column_count(&self) -> usize

Gets the total number of columns across all levels.

Source

pub fn all_column_names(&self) -> Vec<String>

Gets all column names in order across all levels.

Source

pub fn filter_deepest<F>(&self, column_idx: usize, predicate: F) -> Option<Self>
where F: Fn(&Value) -> bool,

Filters the deepest level in-place using a predicate on column values.

This is the key optimization: instead of flattening and filtering all rows, we filter only at the deepest level and update parent multiplicities.

§Arguments
  • column_idx - Column index within the deepest level to filter on
  • predicate - Function that returns true for values to keep
§Returns

A new FactorizedChunk with filtered values, or None if all rows are filtered out.

Source

pub fn filter_deepest_multi<F>(&self, predicate: F) -> Option<Self>
where F: Fn(&[Value]) -> bool,

Filters the deepest level using a multi-column predicate.

This allows filtering based on values from multiple columns in the deepest level.

§Arguments
  • predicate - Function that takes a slice of values (one per column) and returns true to keep
Source

pub fn count_rows(&self) -> usize

Computes COUNT(*) without flattening - returns the logical row count.

This is O(n) where n is the number of physical values, instead of O(m) where m is the number of logical rows (which can be exponentially larger).

§Example

For a 3-level chunk:

  • Level 0: 100 sources
  • Level 1: 10 neighbors each = 1,000 physical
  • Level 2: 10 neighbors each = 10,000 physical
  • Logical rows = 100 * 10 * 10 = 10,000

count_rows() returns 10,000 by computing from multiplicities, not by iterating through all logical rows.

Source

pub fn compute_path_multiplicities(&self) -> Vec<usize>

Computes the effective multiplicity for each value at the deepest level.

This is how many times each value would appear in the flattened result. For example, if a source has 3 first-hop neighbors and each has 2 second-hop neighbors, each first-hop value has multiplicity 2 (appearing in 2 paths).

§Returns

A vector where result[i] is the multiplicity of physical value i at the deepest level. The sum of all multiplicities equals logical_row_count().

§Note

For repeated access (e.g., computing multiple aggregates), prefer using path_multiplicities_cached which caches the result and avoids O(levels) recomputation.

Source

pub fn sum_deepest(&self, column_idx: usize) -> Option<f64>

Computes SUM on a numeric column at the deepest level without flattening.

Each value is multiplied by its effective multiplicity (how many times it would appear in the flattened result).

§Arguments
  • column_idx - Column index within the deepest level
§Returns

The sum as f64, or None if the column doesn’t exist or contains non-numeric values.

Source

pub fn avg_deepest(&self, column_idx: usize) -> Option<f64>

Computes AVG on a numeric column at the deepest level without flattening.

This is equivalent to sum_deepest() / count_rows().

§Arguments
  • column_idx - Column index within the deepest level
§Returns

The average as f64, or None if the column doesn’t exist or the chunk is empty.

Source

pub fn min_deepest(&self, column_idx: usize) -> Option<Value>

Computes MIN on a column at the deepest level without flattening.

Unlike SUM/AVG, MIN doesn’t need multiplicities - we just find the minimum among all physical values.

§Arguments
  • column_idx - Column index within the deepest level
§Returns

The minimum value, or None if the column doesn’t exist or is empty.

Source

pub fn max_deepest(&self, column_idx: usize) -> Option<Value>

Computes MAX on a column at the deepest level without flattening.

Unlike SUM/AVG, MAX doesn’t need multiplicities - we just find the maximum among all physical values.

§Arguments
  • column_idx - Column index within the deepest level
§Returns

The maximum value, or None if the column doesn’t exist or is empty.

Source

pub fn project(&self, column_specs: &[(usize, usize, String)]) -> Self

Projects specific columns from the factorized chunk without flattening.

§Arguments
  • column_specs - List of (level_idx, column_idx, new_name) tuples
§Returns

A new FactorizedChunk with only the specified columns.

Trait Implementations§

Source§

impl Clone for FactorizedChunk

Source§

fn clone(&self) -> FactorizedChunk

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for FactorizedChunk

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl From<FactorizedChunk> for ChunkVariant

Source§

fn from(chunk: FactorizedChunk) -> Self

Converts to this type from the input type.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.