Skip to main content

FactorizedChunk

Struct FactorizedChunk 

Source
pub struct FactorizedChunk { /* private fields */ }
Expand description

A chunk that supports factorized representation across multiple levels.

Columns are organized in groups by their factorization level:

  • Level 0 (flat): Base columns, one value per logical row
  • Level 1 (unflat): First expansion, grouped by level 0
  • Level 2 (unflat): Second expansion, grouped by level 1
  • And so on…

§State Management

The chunk maintains a ChunkState that provides:

  • Cached multiplicities for O(1) aggregate access
  • Selection vector support for lazy filtering
  • Generation tracking for cache invalidation

Implementations§

Source§

impl FactorizedChunk

Source

pub fn empty() -> Self

Creates an empty factorized chunk.

Source

pub fn from_flat(chunk: &DataChunk, column_names: Vec<String>) -> Self

Creates a factorized chunk from a flat DataChunk.

The resulting chunk has a single level (level 0) with all columns flat.

Source

pub fn with_flat_level( columns: Vec<ValueVector>, column_names: Vec<String>, ) -> Self

Creates a factorized chunk with a single flat level.

Source

pub fn level_count(&self) -> usize

Returns the number of factorization levels.

Source

pub fn logical_row_count(&self) -> usize

Returns the logical row count (full Cartesian product size).

Source

pub fn physical_size(&self) -> usize

Returns the physical storage size (actual values stored).

Source

pub fn chunk_state(&self) -> &ChunkState

Returns the chunk state.

Source

pub fn chunk_state_mut(&mut self) -> &mut ChunkState

Returns mutable access to the chunk state.

Source

pub fn path_multiplicities_cached(&mut self) -> Arc<[usize]>

Returns path multiplicities, computing once and caching.

This is the key optimization for aggregation: multiplicities are computed once and reused for all aggregates (COUNT, SUM, AVG, etc.).

§Example
let mults = chunk.path_multiplicities_cached();
let sum = chunk.sum_deepest(0);
let avg = chunk.avg_deepest(0);
Source

pub fn level(&self, index: usize) -> Option<&FactorizationLevel>

Returns a level by index.

Source

pub fn level_mut(&mut self, index: usize) -> Option<&mut FactorizationLevel>

Returns a mutable level by index.

Source

pub fn add_level( &mut self, columns: Vec<ValueVector>, column_names: Vec<String>, offsets: &[u32], )

Adds a new factorization level for expansion results.

The new level’s multiplicities determine how many values each parent in the previous level expands to.

§Arguments
  • columns - Columns at the new level
  • column_names - Names for the new columns
  • offsets - Offset array where offsets[i] is the start index for parent i
Source

pub fn add_factorized_level(&mut self, level: FactorizationLevel)

Adds a level with pre-computed factorized vectors.

Source

pub fn flatten(&self) -> DataChunk

Flattens to a regular DataChunk (materializes the Cartesian product).

All levels are expanded into flat rows.

Source

pub fn logical_row_iter(&self) -> FactorizedRowIterator<'_>

Returns an iterator over logical rows without materializing.

Each iteration yields a vector of physical indices, one per level.

Source

pub fn total_column_count(&self) -> usize

Gets the total number of columns across all levels.

Source

pub fn all_column_names(&self) -> Vec<String>

Gets all column names in order across all levels.

Source

pub fn filter_deepest<F>(&self, column_idx: usize, predicate: F) -> Option<Self>
where F: Fn(&Value) -> bool,

Filters the deepest level in-place using a predicate on column values.

This is the key optimization: instead of flattening and filtering all rows, we filter only at the deepest level and update parent multiplicities.

§Arguments
  • column_idx - Column index within the deepest level to filter on
  • predicate - Function that returns true for values to keep
§Returns

A new FactorizedChunk with filtered values, or None if all rows are filtered out.

§Panics

Panics if column_idx refers to a non-existent column in the deepest level.

Source

pub fn filter_deepest_multi<F>(&self, predicate: F) -> Option<Self>
where F: Fn(&[Value]) -> bool,

Filters the deepest level using a multi-column predicate.

This allows filtering based on values from multiple columns in the deepest level.

§Arguments
  • predicate - Function that takes a slice of values (one per column) and returns true to keep
§Panics

Panics if the deepest level’s internal column storage is inconsistent (column count reports more columns than actually exist).

Source

pub fn count_rows(&self) -> usize

Computes COUNT(*) without flattening - returns the logical row count.

This is O(n) where n is the number of physical values, instead of O(m) where m is the number of logical rows (which can be exponentially larger).

§Example

For a 3-level chunk:

  • Level 0: 100 sources
  • Level 1: 10 neighbors each = 1,000 physical
  • Level 2: 10 neighbors each = 10,000 physical
  • Logical rows = 100 * 10 * 10 = 10,000

count_rows() returns 10,000 by computing from multiplicities, not by iterating through all logical rows.

Source

pub fn compute_path_multiplicities(&self) -> Vec<usize>

Computes the effective multiplicity for each value at the deepest level.

This is how many times each value would appear in the flattened result. For example, if a source has 3 first-hop neighbors and each has 2 second-hop neighbors, each first-hop value has multiplicity 2 (appearing in 2 paths).

§Returns

A vector where result[i] is the multiplicity of physical value i at the deepest level. The sum of all multiplicities equals logical_row_count().

§Note

For repeated access (e.g., computing multiple aggregates), prefer using path_multiplicities_cached which caches the result and avoids O(levels) recomputation.

Source

pub fn sum_deepest(&self, column_idx: usize) -> Option<f64>

Computes SUM on a numeric column at the deepest level without flattening.

Each value is multiplied by its effective multiplicity (how many times it would appear in the flattened result).

§Arguments
  • column_idx - Column index within the deepest level
§Returns

The sum as f64, or None if the column doesn’t exist or contains non-numeric values.

Source

pub fn avg_deepest(&self, column_idx: usize) -> Option<f64>

Computes AVG on a numeric column at the deepest level without flattening.

This is equivalent to sum_deepest() / count_rows().

§Arguments
  • column_idx - Column index within the deepest level
§Returns

The average as f64, or None if the column doesn’t exist or the chunk is empty.

Source

pub fn min_deepest(&self, column_idx: usize) -> Option<Value>

Computes MIN on a column at the deepest level without flattening.

Unlike SUM/AVG, MIN doesn’t need multiplicities - we just find the minimum among all physical values.

§Arguments
  • column_idx - Column index within the deepest level
§Returns

The minimum value, or None if the column doesn’t exist or is empty.

Source

pub fn max_deepest(&self, column_idx: usize) -> Option<Value>

Computes MAX on a column at the deepest level without flattening.

Unlike SUM/AVG, MAX doesn’t need multiplicities - we just find the maximum among all physical values.

§Arguments
  • column_idx - Column index within the deepest level
§Returns

The maximum value, or None if the column doesn’t exist or is empty.

Source

pub fn project(&self, column_specs: &[(usize, usize, String)]) -> Self

Projects specific columns from the factorized chunk without flattening.

§Arguments
  • column_specs - List of (level_idx, column_idx, new_name) tuples
§Returns

A new FactorizedChunk with only the specified columns.

Trait Implementations§

Source§

impl Clone for FactorizedChunk

Source§

fn clone(&self) -> FactorizedChunk

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for FactorizedChunk

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl From<FactorizedChunk> for ChunkVariant

Source§

fn from(chunk: FactorizedChunk) -> Self

Converts to this type from the input type.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.