Skip to main content

Crate fp_columnar

Crate fp_columnar 

Source
Expand description

Columnar storage layer for frankenpandas — provides the Column container that backs every DataFrame column and Series value buffer in fp-frame.

A column is a typed value buffer (DType) plus a separate ValidityMask tracking which cells are missing. This split mirrors Apache Arrow’s storage layout and lets the type system enforce correctness on the dense-value side while keeping pandas-style missing-value semantics (NullKind::Null, NullKind::NaN, NullKind::NaT) on the validity side.

§Public surface

  • Column: the public columnar container. Built from a DType + a Vec<Scalar>. Exposes value access (Column::value, Column::values), reductions (Column::sum, Column::mean, Column::count, the nan-aware aggregations from fp-types), and typed binary operations dispatched through ArithmeticOp / ComparisonOp.
  • ColumnData: the inner enum holding the dense buffer. Most callers go through Column rather than touching this directly.
  • SparseColumn: opt-in sparse encoding (paired value buffer + index-of-non-fill positions). Stored alongside the dense Column for backwards compat when consumers only need Column.
  • ValidityMask: per-cell missing-value bitmap. Stored on Column; exposed for users that want to compose masks directly (logical masking, conditional updates, etc.).
  • ArithmeticOp / ComparisonOp: enum tags for typed binary-op dispatch (used by fp-frame’s expression engine and Series arithmetic).
  • CrackIndex: an internal positional index used by the “cracking” optimisation for repeated boolean-mask filters.

§Error reporting

ColumnError enumerates the failure modes (length mismatch, dtype mismatch, missing-value-in-required-slot, etc.). All Column-mutating fns return Result<_, ColumnError> so callers get explicit error categories.

§Relationship to other crates

  • fp-types supplies the DType / Scalar / NullKind / nan* reduction primitives this crate composes on top of.
  • fp-frame stores a Vec<Column> per DataFrame (one column per data column) plus a separate Index from fp-index for the row labels.
  • fp-index uses Column internally for some MultiIndex level storage.

Structs§

Column
CrackIndex
Adaptive crack index for progressive column partitioning.
SparseColumn
ValidityMask

Enums§

ArithmeticOp
ColumnData
AG-10: Typed array representation for vectorized batch execution.
ColumnError
ComparisonOp
Element-wise comparison operations that produce Bool-typed columns.

Functions§

radix_argsort_i64
Stable LSD radix argsort of an i64 slice (br-frankenpandas-y5s15): the permutation that orders values ascending (or descending), equal values keeping their original order. Bit-identical to a stable sort_by(i64::cmp): i64_radix_key is order-preserving and the counting sort is stable; descending flips the key (!key) so equal values still keep original order (matching a reversed comparator whose Equal arm doesn’t reorder). Reusable for any all-Int64 ordering (index labels, single columns).
radix_argsort_multi_u64
Stable LSD radix lexsort over several u64 key columns (br-frankenpandas-lnsu6). Returns the permutation that orders rows lexicographically by keys_by_col[0], then keys_by_col[1], …, with equal rows keeping their original order — exactly a stable multi-key sort_by. The least-significant digit overall is the last column’s low byte, so the columns are processed in reverse (each an 8-pass stable counting sort that threads the running permutation), making the first column the most significant. O(n·k) and comparison-free. All key vectors must have the same length; callers bake per-column ascending/descending into the keys.