Skip to main content

TidyView

Struct TidyView 

Source
pub struct TidyView { /* private fields */ }
Expand description

A lazy, zero-allocation view over a base DataFrame.

Holds: • base – shared reference to the underlying columnar data • mask – bitmask of which rows are visible • proj – ordered list of visible column indices

No column buffers are copied until materialize() / to_tensor() is called.

Implementations§

Source§

impl TidyView

Source

pub fn from_df(df: DataFrame) -> Self

Wrap a DataFrame as a full view (all rows, all columns).

Source

pub fn from_rc(df: Rc<DataFrame>) -> Self

Wrap a shared Rc<DataFrame> as a full view.

Source

pub fn nrows(&self) -> usize

Number of visible rows (set bits in mask).

Source

pub fn ncols(&self) -> usize

Number of visible columns (length of projection).

Source

pub fn column_names(&self) -> Vec<&str>

Names of projected columns in stable projection order.

Source

pub fn filter(&self, predicate: &DExpr) -> Result<TidyView, TidyError>

Filter rows by a DExpr predicate.

Returns a new TidyView with a tighter bitmask (AND with existing mask). Does NOT copy any column buffers.

Edge cases: • 0-row base → empty mask returned, no panic. • Non-bool predicate → TidyError::PredicateNotBool. • Float NaN comparisons → deterministic: NaN != NaN (IEEE 754). • Chained filters compose masks with AND without materializing.

Source

pub fn select(&self, cols: &[&str]) -> Result<TidyView, TidyError>

Project to a subset of named columns (in the given order).

Returns a new TidyView with an updated ProjectionMap. No column buffers are copied.

Edge cases: • 0 columns selected → valid empty-column view (no error). • Unknown column → TidyError::ColumnNotFound. • Duplicate column name in cols → TidyError::DuplicateColumn. • Column ordering is exactly as supplied.

Source

pub fn mutate( &self, assignments: &[(&str, DExpr)], ) -> Result<TidyFrame, TidyError>

Apply column-wise assignments and return a materialized TidyFrame.

assignments is an ordered list of (col_name, expr) pairs evaluated left-to-right. Each assignment sees the snapshot of columns at entry to the mutate call (snapshot semantics – new columns created in earlier assignments are NOT visible to later assignments within the same call).

Semantics decisions: • Existing column → overwritten (copy-on-write safe). • New column → appended after existing projected columns. • Scalar broadcasting → a scalar expr is broadcast to all visible rows. • Mask-awareness: only masked-in rows are computed; masked-out rows in the materialized output retain the base value (or zero for new cols). • Type promotion: Int + Float → Float; Int overflow → wrapping. • Multiple assignments with the same target name in one call → error. • Mutate on masked view produces a materialized TidyFrame where only visible rows are present (mask applied during materialization).

Source

pub fn materialize(&self) -> Result<DataFrame, TidyError>

Materialize the view into a new DataFrame (applies mask + projection).

Triggers exactly one allocation per visible column buffer. Rows are emitted in ascending index order (stable/deterministic).

Edge cases: • Empty rows → 0-row DataFrame. • Empty cols → 0-column DataFrame. • Row-major iteration is stable.

Source

pub fn to_tensor(&self, col_names: &[&str]) -> Result<Tensor, TidyError>

Convert visible numeric columns to a tensor (row-major).

Only Float and Int columns are supported.

Source

pub fn mask(&self) -> &BitMask

Access the underlying mask (for testing/inspection).

Source

pub fn proj(&self) -> &ProjectionMap

Access the underlying projection (for testing/inspection).

Source

pub fn base_column(&self, name: &str) -> Option<&Column>

Access a column from the underlying base DataFrame by name.

Returns the raw Column (full length, unmasked) – callers must apply the mask themselves if needed. Used by fct_summary_means and similar.

Source§

impl TidyView

Source

pub fn group_by(&self, keys: &[&str]) -> Result<GroupedTidyView, TidyError>

Group the view by one or more column names.

Returns a GroupedTidyView. No column buffers are copied. Group order = first-occurrence order of (key_col1, key_col2, …) tuples among the currently visible rows (ascending base-row scan).

Edge cases: • 0 rows → 0 groups, no error. • 0 keys → every visible row becomes one group (equivalent to a global aggregate). • Unknown key column → TidyError::ColumnNotFound.

Source

pub fn arrange(&self, keys: &[ArrangeKey]) -> Result<TidyView, TidyError>

Sort visible rows by one or more ArrangeKeys.

Returns a new TidyView backed by the same base DataFrame but with a new mask that encodes the sorted row order.

Design: arrange materialises a RowIndexMap (sorted permutation of visible row indices), then re-encodes it into a new base DataFrame containing only those rows in the sorted order. This allows all subsequent mask-based operations to work correctly.

Semantics: • Stable sort: equal-key rows keep their original relative order. • NaN sorting: NaN values sort LAST (greater than any finite value). • Multi-key: sort by key[0] first, then key[1], … (left-to-right). • Unknown column → TidyError::ColumnNotFound. • Non-numeric sort of Float col: allowed (NaN last). • Mixed-type sort across columns is column-by-column (each col has one type).

Source

pub fn slice(&self, start: usize, end: usize) -> TidyView

Select rows by a half-open range [start, end) of visible-row positions.

Positions are relative to the current visible rows (0-based). Out-of-bounds: clamped to [0, nrows].

Source

pub fn slice_head(&self, n: usize) -> TidyView

Select the first n visible rows (clamped to nrows).

Source

pub fn slice_tail(&self, n: usize) -> TidyView

Select the last n visible rows (clamped to nrows).

Source

pub fn slice_sample(&self, n: usize, seed: u64) -> TidyView

Deterministic random sample of n visible rows using an LCG with seed.

If n >= nrows, returns all visible rows in their original order (no error). Sampling uses a Knuth shuffle variant seeded by seed (deterministic LCG).

Source

pub fn distinct(&self, cols: &[&str]) -> Result<TidyView, TidyError>

Return rows with unique combinations of the specified columns.

Output ordering: first-occurrence order (the first row with each distinct key combination is kept).

Edge cases: • 0 key columns → keeps first row only (all rows equal on zero keys). • Unknown column → TidyError::ColumnNotFound. • After projection/mask: only visible columns/rows are considered.

Source

pub fn inner_join( &self, right: &TidyView, on: &[(&str, &str)], ) -> Result<TidyFrame, TidyError>

Inner join: rows where all on key columns match.

Output: left columns then right columns (excluding duplicate key cols). Row order: left outer loop (preserves left order), right inner ascending. Produces a materialized TidyFrame (joins always materialize).

Edge cases: • Unknown join key → TidyError::ColumnNotFound. • on empty → cross join semantics (every left × every right). • Duplicate keys on left or right → all matching pairs included.

Source

pub fn left_join( &self, right: &TidyView, on: &[(&str, &str)], ) -> Result<TidyFrame, TidyError>

Left join: all left rows; matched right rows or nulls (0/0.0/“”/false).

Row order: left outer loop order preserved, right matches ascending.

Source

pub fn semi_join( &self, right: &TidyView, on: &[(&str, &str)], ) -> Result<TidyView, TidyError>

Semi-join: rows in self that have at least one match in right.

Returns a TidyView (no right columns). Row order: stable left order.

Source

pub fn anti_join( &self, right: &TidyView, on: &[(&str, &str)], ) -> Result<TidyView, TidyError>

Anti-join: rows in self that have NO match in right.

Returns a TidyView (no right columns). Row order: stable left order.

Source§

impl TidyView

Source

pub fn pivot_longer( &self, value_cols: &[&str], names_to: &str, values_to: &str, ) -> Result<TidyFrame, TidyError>

Pivot selected columns from wide to long format.

value_cols: columns to gather (must all have the same type). names_to: name of the output “variable name” column. values_to: name of the output “value” column.

Output schema: [id_cols…, names_to, values_to] Row order: for each source row (in visible order), one output row per value column (in the order they appear in value_cols).

Edge cases: • value_cols empty → TidyError::EmptySelection. • Unknown column → TidyError::ColumnNotFound. • Duplicate in value_cols → TidyError::DuplicateColumn. • Mixed types in value_cols → TidyError::TypeMismatch.

Source

pub fn pivot_wider( &self, id_cols: &[&str], names_from: &str, values_from: &str, ) -> Result<NullableFrame, TidyError>

Pivot long-format data to wide format.

names_from: the column whose values become new column headers. values_from: the column whose values fill the new columns. id_cols: columns that identify each output row.

Output schema: [id_cols…, unique_key_values… (first-occurrence order)] Row order: one row per unique combination of id_col values (first-occurrence order).

Edge cases: • Duplicate (id_key, name_key) combo → TidyError::DuplicateKey. • Missing combo → null fill via NullableFrame. • Unknown column → TidyError::ColumnNotFound.

Source

pub fn rename(&self, renames: &[(&str, &str)]) -> Result<TidyView, TidyError>

Rename columns: renames is a slice of (old_name, new_name).

Returns a new TidyView over a new base DataFrame with renamed columns.

Edge cases: • Unknown old_name → TidyError::ColumnNotFound. • new_name already exists (collision) → TidyError::DuplicateColumn. • old_name == new_name → no-op for that pair.

Source

pub fn relocate( &self, cols: &[&str], position: RelocatePos<'_>, ) -> Result<TidyView, TidyError>

Reorder columns so that cols appear at position before or after another column, or at the front/back.

cols: columns to move. position: RelocatePos::Front, Back, Before(name), After(name).

Non-moved columns keep their relative order. Returns a new TidyView with updated projection.

Edge cases: • Unknown column in cols → TidyError::ColumnNotFound. • Unknown anchor column → TidyError::ColumnNotFound.

Source

pub fn drop_cols(&self, cols: &[&str]) -> Result<TidyView, TidyError>

Drop specified columns from the view (select-minus semantics).

Returns a new TidyView with those columns removed from the projection.

Edge cases: • Unknown column → TidyError::ColumnNotFound. • Dropping all columns → valid (0-col view).

Source

pub fn bind_rows(&self, other: &TidyView) -> Result<TidyFrame, TidyError>

Concatenate rows from other onto self (strict schema match).

Both frames must have the same column names in the same order. Row order: self rows first, then other rows.

Edge cases: • Column names differ → TidyError::Internal("schema mismatch: ..."). • other has zero rows → returns self’s rows (valid, no error).

Source

pub fn bind_cols(&self, other: &TidyView) -> Result<TidyFrame, TidyError>

Concatenate columns from other onto self (strict row count match).

Both frames must have the same number of visible rows. Column order: self columns first, then other columns.

Edge cases: • Row count mismatch → TidyError::LengthMismatch. • Column name collision → TidyError::DuplicateColumn.

Source

pub fn mutate_across( &self, specs: &[AcrossSpec], ) -> Result<TidyFrame, TidyError>

Apply a transformation across multiple columns, adding/replacing each with a generated name {col}_{fn} (or a user-specified template).

Edge cases: • Unknown column → TidyError::ColumnNotFound. • Generated name collision → TidyError::DuplicateColumn. • Empty cols list → no-op (returns materialized frame unchanged).

Source

pub fn right_join( &self, right: &TidyView, on: &[(&str, &str)], suffix: &JoinSuffix, ) -> Result<NullableFrame, TidyError>

Right join: all rows from right, matched rows from self (left).

Output: left cols (nullable) + right cols. Row order: right outer loop order preserved. Unmatched right rows: left columns null-filled.

Source

pub fn full_join( &self, right: &TidyView, on: &[(&str, &str)], suffix: &JoinSuffix, ) -> Result<NullableFrame, TidyError>

Full outer join: all rows from both sides; null-fill for unmatched.

Row order: left rows first (matched and unmatched), then unmatched right rows.

Source

pub fn inner_join_typed( &self, right: &TidyView, on: &[(&str, &str)], suffix: &JoinSuffix, ) -> Result<TidyFrame, TidyError>

Inner join with type validation and collision suffix support.

Same semantics as inner_join but: • validates join key types are compatible (Int/Float widened, others exact). • handles non-key column name collisions using suffix.

Source

pub fn left_join_typed( &self, right: &TidyView, on: &[(&str, &str)], suffix: &JoinSuffix, ) -> Result<TidyFrame, TidyError>

Left join with type validation and collision suffix support.

Source§

impl TidyView

Source

pub fn group_by_fast(&self, keys: &[&str]) -> Result<GroupedTidyView, TidyError>

Like group_by but uses the BTree-accelerated GroupIndex::build_fast.

Semantics and output are IDENTICAL to group_by; this is purely an internal performance upgrade. Tests should confirm identical output.

Source§

impl TidyView

Source

pub fn fct_encode(&self, col: &str) -> Result<FctColumn, TidyError>

Encode a string column in this view into an FctColumn.

Only visible rows (mask) in the current projection are used. This is a materialising op (allocates u16 buffer) → NOT @nogc safe.

Source

pub fn fct_summary_means( &self, fct: &FctColumn, numeric_col: &str, ) -> Result<Vec<f64>, TidyError>

Compute per-level mean of a numeric column for use with fct_reorder.

Returns a Vec of length fct.nlevels(), one mean per level. Levels with no matching rows get NaN.

Trait Implementations§

Source§

impl Clone for TidyView

Source§

fn clone(&self) -> TidyView

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for TidyView

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.