Expand description
CJC Data DSL – Typed expression trees, logical plans, plan optimizer, and column-buffer kernel execution.
This implements the tidyverse-inspired data pipeline:
df |> filter(col("age") > 18) |> group_by("dept") |> summarize(avg_salary = mean(col("salary")))Modules§
- agg_
kernels - Specialized aggregate kernels for grouped operations.
- column_
meta - Column-level metadata for query optimization.
- dict_
encoding - Stable dictionary encoding for string columns.
- lazy
- Lazy evaluation IR for TidyView.
- tidy_
dispatch - Shared tidy dispatch: maps CJC language method calls on TidyView / GroupedTidyView values to the concrete cjc_data API.
Structs§
- Across
Spec - An across() specification: select columns and apply one function.
- Across
Transform - A named across transformation.
- Arrange
Key - One sorting key for
arrange. - BitMask
- A compact, word-aligned bitmask over
nrowsrows. - CsvConfig
- Configuration for
CsvReader. - CsvReader
- Zero-copy CSV parser for byte slices.
- Data
Frame - A columnar DataFrame.
- FctColumn
- A compact categorical column: stores u16 indices into a levels table.
- Group
Index - A deterministic group index built from a
TidyView. - Group
Meta - Metadata for one group in a
GroupIndex. - Grouped
Tidy View - A grouped view produced by
TidyView::group_by(...). - Join
Suffix - Join suffix options for handling column name collisions in inner/left/right/full joins.
- Nullable
Column - A nullable column: values buffer + validity bitmap.
- Nullable
Factor - A FctColumn with a validity bitmap. Null cells have validity=false.
Null is NOT a level;
data[i]for a null row is 0 (sentinel, must not be used). - Nullable
Frame - A DataFrame-like frame that can hold nullable columns. Used as output of joins, pivots, and bind operations that may introduce nulls.
- Pipeline
- Fluent builder for data pipelines.
- Projection
Map - A stable ordered list of column indices into the base DataFrame.
- RowIndex
Map - A permutation / selection vector over base-frame row indices.
- Streaming
CsvProcessor - A streaming CSV processor that visits rows one at a time without materializing the full DataFrame.
- Tidy
Frame - A materialized, mutable DataFrame with copy-on-write alias safety.
- Tidy
View - A lazy, zero-allocation view over a base
DataFrame.
Enums§
- AggFunc
- Aggregation function for use in
summarizeexpressions. - Column
- A single column in a DataFrame.
- DBinOp
- Binary operator for Data DSL expressions.
- DExpr
- An expression in the Data DSL.
- Data
Error - Errors from DataFrame operations (plan execution, joins, tensor conversion).
- Logical
Plan - A logical query plan node.
- NullCol
- A typed nullable column variant stored in a DataFrame column slot.
- Relocate
Pos - Position specifier for
TidyView::relocate. - TidyAgg
- An aggregator expression for use in
summarise. - Tidy
Error - Errors produced by Phase 10 tidy operations.
Functions§
- execute
- Execute a logical plan against in-memory data.
- label_
encode - Convert a string slice into a categorical encoding with sorted unique levels and integer codes.
- one_
hot_ encode - One-hot encode a categorical column into multiple boolean columns.
- optimize
- Optimize a logical plan.
- ordinal_
encode - Convert a string slice into a categorical encoding with a user-specified level order.
Type Aliases§
- Across
Fn - A scalar transformation function for
across().