pub struct TidyView { /* private fields */ }Expand description
A lazy, zero-allocation view over a base DataFrame.
Holds:
• base – shared reference to the underlying columnar data
• mask – bitmask of which rows are visible
• proj – ordered list of visible column indices
No column buffers are copied until materialize() / to_tensor() is called.
Implementations§
Source§impl TidyView
impl TidyView
Sourcepub fn column_names(&self) -> Vec<&str>
pub fn column_names(&self) -> Vec<&str>
Names of projected columns in stable projection order.
Sourcepub fn filter(&self, predicate: &DExpr) -> Result<TidyView, TidyError>
pub fn filter(&self, predicate: &DExpr) -> Result<TidyView, TidyError>
Filter rows by a DExpr predicate.
Returns a new TidyView with a tighter bitmask (AND with existing mask).
Does NOT copy any column buffers.
Edge cases:
• 0-row base → empty mask returned, no panic.
• Non-bool predicate → TidyError::PredicateNotBool.
• Float NaN comparisons → deterministic: NaN != NaN (IEEE 754).
• Chained filters compose masks with AND without materializing.
Sourcepub fn select(&self, cols: &[&str]) -> Result<TidyView, TidyError>
pub fn select(&self, cols: &[&str]) -> Result<TidyView, TidyError>
Project to a subset of named columns (in the given order).
Returns a new TidyView with an updated ProjectionMap.
No column buffers are copied.
Edge cases:
• 0 columns selected → valid empty-column view (no error).
• Unknown column → TidyError::ColumnNotFound.
• Duplicate column name in cols → TidyError::DuplicateColumn.
• Column ordering is exactly as supplied.
Sourcepub fn mutate(
&self,
assignments: &[(&str, DExpr)],
) -> Result<TidyFrame, TidyError>
pub fn mutate( &self, assignments: &[(&str, DExpr)], ) -> Result<TidyFrame, TidyError>
Apply column-wise assignments and return a materialized TidyFrame.
assignments is an ordered list of (col_name, expr) pairs evaluated
left-to-right. Each assignment sees the snapshot of columns at entry
to the mutate call (snapshot semantics – new columns created in earlier
assignments are NOT visible to later assignments within the same call).
Semantics decisions:
• Existing column → overwritten (copy-on-write safe).
• New column → appended after existing projected columns.
• Scalar broadcasting → a scalar expr is broadcast to all visible rows.
• Mask-awareness: only masked-in rows are computed; masked-out rows in
the materialized output retain the base value (or zero for new cols).
• Type promotion: Int + Float → Float; Int overflow → wrapping.
• Multiple assignments with the same target name in one call → error.
• Mutate on masked view produces a materialized TidyFrame where
only visible rows are present (mask applied during materialization).
Sourcepub fn materialize(&self) -> Result<DataFrame, TidyError>
pub fn materialize(&self) -> Result<DataFrame, TidyError>
Materialize the view into a new DataFrame (applies mask + projection).
Triggers exactly one allocation per visible column buffer. Rows are emitted in ascending index order (stable/deterministic).
Edge cases: • Empty rows → 0-row DataFrame. • Empty cols → 0-column DataFrame. • Row-major iteration is stable.
Sourcepub fn to_tensor(&self, col_names: &[&str]) -> Result<Tensor, TidyError>
pub fn to_tensor(&self, col_names: &[&str]) -> Result<Tensor, TidyError>
Convert visible numeric columns to a tensor (row-major).
Only Float and Int columns are supported.
Sourcepub fn proj(&self) -> &ProjectionMap
pub fn proj(&self) -> &ProjectionMap
Access the underlying projection (for testing/inspection).
Sourcepub fn base_column(&self, name: &str) -> Option<&Column>
pub fn base_column(&self, name: &str) -> Option<&Column>
Access a column from the underlying base DataFrame by name.
Returns the raw Column (full length, unmasked) – callers must apply
the mask themselves if needed. Used by fct_summary_means and similar.
Source§impl TidyView
impl TidyView
Sourcepub fn group_by(&self, keys: &[&str]) -> Result<GroupedTidyView, TidyError>
pub fn group_by(&self, keys: &[&str]) -> Result<GroupedTidyView, TidyError>
Group the view by one or more column names.
Returns a GroupedTidyView. No column buffers are copied.
Group order = first-occurrence order of (key_col1, key_col2, …) tuples
among the currently visible rows (ascending base-row scan).
Edge cases:
• 0 rows → 0 groups, no error.
• 0 keys → every visible row becomes one group (equivalent to a
global aggregate).
• Unknown key column → TidyError::ColumnNotFound.
Sourcepub fn arrange(&self, keys: &[ArrangeKey]) -> Result<TidyView, TidyError>
pub fn arrange(&self, keys: &[ArrangeKey]) -> Result<TidyView, TidyError>
Sort visible rows by one or more ArrangeKeys.
Returns a new TidyView backed by the same base DataFrame but with
a new mask that encodes the sorted row order.
Design: arrange materialises a RowIndexMap (sorted permutation of
visible row indices), then re-encodes it into a new base DataFrame
containing only those rows in the sorted order. This allows all
subsequent mask-based operations to work correctly.
Semantics:
• Stable sort: equal-key rows keep their original relative order.
• NaN sorting: NaN values sort LAST (greater than any finite value).
• Multi-key: sort by key[0] first, then key[1], … (left-to-right).
• Unknown column → TidyError::ColumnNotFound.
• Non-numeric sort of Float col: allowed (NaN last).
• Mixed-type sort across columns is column-by-column (each col has one type).
Sourcepub fn slice(&self, start: usize, end: usize) -> TidyView
pub fn slice(&self, start: usize, end: usize) -> TidyView
Select rows by a half-open range [start, end) of visible-row positions.
Positions are relative to the current visible rows (0-based).
Out-of-bounds: clamped to [0, nrows].
Sourcepub fn slice_head(&self, n: usize) -> TidyView
pub fn slice_head(&self, n: usize) -> TidyView
Select the first n visible rows (clamped to nrows).
Sourcepub fn slice_tail(&self, n: usize) -> TidyView
pub fn slice_tail(&self, n: usize) -> TidyView
Select the last n visible rows (clamped to nrows).
Sourcepub fn slice_sample(&self, n: usize, seed: u64) -> TidyView
pub fn slice_sample(&self, n: usize, seed: u64) -> TidyView
Deterministic random sample of n visible rows using an LCG with seed.
If n >= nrows, returns all visible rows in their original order (no error).
Sampling uses a Knuth shuffle variant seeded by seed (deterministic LCG).
Sourcepub fn distinct(&self, cols: &[&str]) -> Result<TidyView, TidyError>
pub fn distinct(&self, cols: &[&str]) -> Result<TidyView, TidyError>
Return rows with unique combinations of the specified columns.
Output ordering: first-occurrence order (the first row with each distinct key combination is kept).
Edge cases:
• 0 key columns → keeps first row only (all rows equal on zero keys).
• Unknown column → TidyError::ColumnNotFound.
• After projection/mask: only visible columns/rows are considered.
Sourcepub fn inner_join(
&self,
right: &TidyView,
on: &[(&str, &str)],
) -> Result<TidyFrame, TidyError>
pub fn inner_join( &self, right: &TidyView, on: &[(&str, &str)], ) -> Result<TidyFrame, TidyError>
Inner join: rows where all on key columns match.
Output: left columns then right columns (excluding duplicate key cols).
Row order: left outer loop (preserves left order), right inner ascending.
Produces a materialized TidyFrame (joins always materialize).
Edge cases:
• Unknown join key → TidyError::ColumnNotFound.
• on empty → cross join semantics (every left × every right).
• Duplicate keys on left or right → all matching pairs included.
Sourcepub fn left_join(
&self,
right: &TidyView,
on: &[(&str, &str)],
) -> Result<TidyFrame, TidyError>
pub fn left_join( &self, right: &TidyView, on: &[(&str, &str)], ) -> Result<TidyFrame, TidyError>
Left join: all left rows; matched right rows or nulls (0/0.0/“”/false).
Row order: left outer loop order preserved, right matches ascending.
Source§impl TidyView
impl TidyView
Sourcepub fn pivot_longer(
&self,
value_cols: &[&str],
names_to: &str,
values_to: &str,
) -> Result<TidyFrame, TidyError>
pub fn pivot_longer( &self, value_cols: &[&str], names_to: &str, values_to: &str, ) -> Result<TidyFrame, TidyError>
Pivot selected columns from wide to long format.
value_cols: columns to gather (must all have the same type).
names_to: name of the output “variable name” column.
values_to: name of the output “value” column.
Output schema: [id_cols…, names_to, values_to]
Row order: for each source row (in visible order), one output row per
value column (in the order they appear in value_cols).
Edge cases:
• value_cols empty → TidyError::EmptySelection.
• Unknown column → TidyError::ColumnNotFound.
• Duplicate in value_cols → TidyError::DuplicateColumn.
• Mixed types in value_cols → TidyError::TypeMismatch.
Sourcepub fn pivot_wider(
&self,
id_cols: &[&str],
names_from: &str,
values_from: &str,
) -> Result<NullableFrame, TidyError>
pub fn pivot_wider( &self, id_cols: &[&str], names_from: &str, values_from: &str, ) -> Result<NullableFrame, TidyError>
Pivot long-format data to wide format.
names_from: the column whose values become new column headers.
values_from: the column whose values fill the new columns.
id_cols: columns that identify each output row.
Output schema: [id_cols…, unique_key_values… (first-occurrence order)] Row order: one row per unique combination of id_col values (first-occurrence order).
Edge cases:
• Duplicate (id_key, name_key) combo → TidyError::DuplicateKey.
• Missing combo → null fill via NullableFrame.
• Unknown column → TidyError::ColumnNotFound.
Sourcepub fn rename(&self, renames: &[(&str, &str)]) -> Result<TidyView, TidyError>
pub fn rename(&self, renames: &[(&str, &str)]) -> Result<TidyView, TidyError>
Rename columns: renames is a slice of (old_name, new_name).
Returns a new TidyView over a new base DataFrame with renamed columns.
Edge cases:
• Unknown old_name → TidyError::ColumnNotFound.
• new_name already exists (collision) → TidyError::DuplicateColumn.
• old_name == new_name → no-op for that pair.
Sourcepub fn relocate(
&self,
cols: &[&str],
position: RelocatePos<'_>,
) -> Result<TidyView, TidyError>
pub fn relocate( &self, cols: &[&str], position: RelocatePos<'_>, ) -> Result<TidyView, TidyError>
Reorder columns so that cols appear at position before or after
another column, or at the front/back.
cols: columns to move.
position: RelocatePos::Front, Back, Before(name), After(name).
Non-moved columns keep their relative order.
Returns a new TidyView with updated projection.
Edge cases:
• Unknown column in cols → TidyError::ColumnNotFound.
• Unknown anchor column → TidyError::ColumnNotFound.
Sourcepub fn drop_cols(&self, cols: &[&str]) -> Result<TidyView, TidyError>
pub fn drop_cols(&self, cols: &[&str]) -> Result<TidyView, TidyError>
Drop specified columns from the view (select-minus semantics).
Returns a new TidyView with those columns removed from the projection.
Edge cases:
• Unknown column → TidyError::ColumnNotFound.
• Dropping all columns → valid (0-col view).
Sourcepub fn bind_rows(&self, other: &TidyView) -> Result<TidyFrame, TidyError>
pub fn bind_rows(&self, other: &TidyView) -> Result<TidyFrame, TidyError>
Concatenate rows from other onto self (strict schema match).
Both frames must have the same column names in the same order.
Row order: self rows first, then other rows.
Edge cases:
• Column names differ → TidyError::Internal("schema mismatch: ...").
• other has zero rows → returns self’s rows (valid, no error).
Sourcepub fn bind_cols(&self, other: &TidyView) -> Result<TidyFrame, TidyError>
pub fn bind_cols(&self, other: &TidyView) -> Result<TidyFrame, TidyError>
Concatenate columns from other onto self (strict row count match).
Both frames must have the same number of visible rows.
Column order: self columns first, then other columns.
Edge cases:
• Row count mismatch → TidyError::LengthMismatch.
• Column name collision → TidyError::DuplicateColumn.
Sourcepub fn mutate_across(
&self,
specs: &[AcrossSpec],
) -> Result<TidyFrame, TidyError>
pub fn mutate_across( &self, specs: &[AcrossSpec], ) -> Result<TidyFrame, TidyError>
Apply a transformation across multiple columns, adding/replacing each
with a generated name {col}_{fn} (or a user-specified template).
Edge cases:
• Unknown column → TidyError::ColumnNotFound.
• Generated name collision → TidyError::DuplicateColumn.
• Empty cols list → no-op (returns materialized frame unchanged).
Sourcepub fn right_join(
&self,
right: &TidyView,
on: &[(&str, &str)],
suffix: &JoinSuffix,
) -> Result<NullableFrame, TidyError>
pub fn right_join( &self, right: &TidyView, on: &[(&str, &str)], suffix: &JoinSuffix, ) -> Result<NullableFrame, TidyError>
Right join: all rows from right, matched rows from self (left).
Output: left cols (nullable) + right cols. Row order: right outer loop order preserved. Unmatched right rows: left columns null-filled.
Sourcepub fn full_join(
&self,
right: &TidyView,
on: &[(&str, &str)],
suffix: &JoinSuffix,
) -> Result<NullableFrame, TidyError>
pub fn full_join( &self, right: &TidyView, on: &[(&str, &str)], suffix: &JoinSuffix, ) -> Result<NullableFrame, TidyError>
Full outer join: all rows from both sides; null-fill for unmatched.
Row order: left rows first (matched and unmatched), then unmatched right rows.
Sourcepub fn inner_join_typed(
&self,
right: &TidyView,
on: &[(&str, &str)],
suffix: &JoinSuffix,
) -> Result<TidyFrame, TidyError>
pub fn inner_join_typed( &self, right: &TidyView, on: &[(&str, &str)], suffix: &JoinSuffix, ) -> Result<TidyFrame, TidyError>
Inner join with type validation and collision suffix support.
Same semantics as inner_join but:
• validates join key types are compatible (Int/Float widened, others exact).
• handles non-key column name collisions using suffix.
Sourcepub fn left_join_typed(
&self,
right: &TidyView,
on: &[(&str, &str)],
suffix: &JoinSuffix,
) -> Result<TidyFrame, TidyError>
pub fn left_join_typed( &self, right: &TidyView, on: &[(&str, &str)], suffix: &JoinSuffix, ) -> Result<TidyFrame, TidyError>
Left join with type validation and collision suffix support.
Source§impl TidyView
impl TidyView
Sourcepub fn group_by_fast(&self, keys: &[&str]) -> Result<GroupedTidyView, TidyError>
pub fn group_by_fast(&self, keys: &[&str]) -> Result<GroupedTidyView, TidyError>
Like group_by but uses the BTree-accelerated GroupIndex::build_fast.
Semantics and output are IDENTICAL to group_by; this is purely an
internal performance upgrade. Tests should confirm identical output.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for TidyView
impl RefUnwindSafe for TidyView
impl !Send for TidyView
impl !Sync for TidyView
impl Unpin for TidyView
impl UnsafeUnpin for TidyView
impl UnwindSafe for TidyView
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more