Skip to main content

Crate cjc_data

Crate cjc_data 

Source
Expand description

CJC Data DSL – Typed expression trees, logical plans, plan optimizer, and column-buffer kernel execution.

This implements the tidyverse-inspired data pipeline:

df |> filter(col("age") > 18) |> group_by("dept") |> summarize(avg_salary = mean(col("salary")))

Modules§

agg_kernels
Specialized aggregate kernels for grouped operations.
column_meta
Column-level metadata for query optimization.
dict_encoding
Stable dictionary encoding for string columns.
lazy
Lazy evaluation IR for TidyView.
tidy_dispatch
Shared tidy dispatch: maps CJC language method calls on TidyView / GroupedTidyView values to the concrete cjc_data API.

Structs§

AcrossSpec
An across() specification: select columns and apply one function.
AcrossTransform
A named across transformation.
ArrangeKey
One sorting key for arrange.
BitMask
A compact, word-aligned bitmask over nrows rows.
CsvConfig
Configuration for CsvReader.
CsvReader
Zero-copy CSV parser for byte slices.
DataFrame
A columnar DataFrame.
FctColumn
A compact categorical column: stores u16 indices into a levels table.
GroupIndex
A deterministic group index built from a TidyView.
GroupMeta
Metadata for one group in a GroupIndex.
GroupedTidyView
A grouped view produced by TidyView::group_by(...).
JoinSuffix
Join suffix options for handling column name collisions in inner/left/right/full joins.
NullableColumn
A nullable column: values buffer + validity bitmap.
NullableFactor
A FctColumn with a validity bitmap. Null cells have validity=false. Null is NOT a level; data[i] for a null row is 0 (sentinel, must not be used).
NullableFrame
A DataFrame-like frame that can hold nullable columns. Used as output of joins, pivots, and bind operations that may introduce nulls.
Pipeline
Fluent builder for data pipelines.
ProjectionMap
A stable ordered list of column indices into the base DataFrame.
RowIndexMap
A permutation / selection vector over base-frame row indices.
StreamingCsvProcessor
A streaming CSV processor that visits rows one at a time without materializing the full DataFrame.
TidyFrame
A materialized, mutable DataFrame with copy-on-write alias safety.
TidyView
A lazy, zero-allocation view over a base DataFrame.

Enums§

AggFunc
Aggregation function for use in summarize expressions.
Column
A single column in a DataFrame.
DBinOp
Binary operator for Data DSL expressions.
DExpr
An expression in the Data DSL.
DataError
Errors from DataFrame operations (plan execution, joins, tensor conversion).
LogicalPlan
A logical query plan node.
NullCol
A typed nullable column variant stored in a DataFrame column slot.
RelocatePos
Position specifier for TidyView::relocate.
TidyAgg
An aggregator expression for use in summarise.
TidyError
Errors produced by Phase 10 tidy operations.

Functions§

execute
Execute a logical plan against in-memory data.
label_encode
Convert a string slice into a categorical encoding with sorted unique levels and integer codes.
one_hot_encode
One-hot encode a categorical column into multiple boolean columns.
optimize
Optimize a logical plan.
ordinal_encode
Convert a string slice into a categorical encoding with a user-specified level order.

Type Aliases§

AcrossFn
A scalar transformation function for across().