trs-dataframe 0.11.1

Dataframe library for Teiresias
Documentation

trs-dataframe

Column-oriented dataframe for the Teiresias stack. Lightweight, typed, and designed around per-row [DataValue] candidates rather than a full ndarray back-end.

  • Typed columns with nullable bitmap. Each column is a [TypedDataArray] wrapping a [TypedData] tagged union over native primitives (bool, u8/u32/u64, i32/i64, f32/f64, String, Vec<TypedData>) with a Generic fallback for heterogeneous data. Null positions are tracked via a packed Vec<u64> bitmap — one bit per element — so typed columns can represent missing values without falling back to DataValue::Null. Hot paths can take a zero-copy &[i32] slice (or any other primitive) without going through DataValue.
  • Operational primitives. Append, push rows, extend, filter, join (incl. many-to-many by id and Cartesian product), sort + top-N, and column add/remove. See the inherent methods on [DataFrame] / [ColumnFrame].
  • Materialized views. select returns row-major Array2<DataValue>; select_view gives a (ncols, nrows) stacked array; select_vec_view hands back zero-copy &TypedData borrows; select_typed coerces to a uniform primitive type via [Extract].
  • Filter DSL. FilterRules::try_from("a >= 1f64 && (b <= 5 || c <= 8)") — parsed expressions over column values with type-aware comparison and a small set of column functions (len, to_datetime_us, …).
  • Pluggable runtimes. Optional python (PyO3 + numpy bindings), polars-df (polars::DataFrame interop), jmalloc, and tracing features. The python feature is on by default.

Install

[dependencies]
trs-dataframe = { version = "0.10", default-features = false }
# or with Python bindings + numpy + messagepack:
# trs-dataframe = "0.10"

Quick start

use trs_dataframe::{df, DataFrame, DataType};

// Build a frame with the `df!` macro.
let mut frame: DataFrame = df! {
    "id"    => [1i32, 2, 3, 4],
    "score" => [10.5f64, 11.0, 9.5, 12.5],
    "label" => ["a", "b", "a", "c"],
};

assert_eq!(frame.n_rows(), 4);
assert_eq!(frame.n_columns(), 3);

// Materialize a row-major Array2 view.
let arr = frame.select(None).unwrap();
assert_eq!(arr.shape(), &[4, 3]);

// Zero-copy access requires the column to be in its native primitive
// representation. The `df!` macro starts every column as `Generic`, so
// promote the score column to `F64` first and then take a typed slice.
frame.dataframe.get_column_mut(&"score".into())
    .unwrap()
    .try_convert_to_dtype(DataType::F64)
    .unwrap();
let cols = frame.select_vec_view(Some(&["score".into()])).unwrap();
let scores: &[f64] = cols[0].as_ref().unwrap().as_slice_f64().unwrap();
assert_eq!(scores, &[10.5, 11.0, 9.5, 12.5]);

// Append more rows.
let extra: DataFrame = df! {
    "id"    => [5i32],
    "score" => [13.0f64],
    "label" => ["b"],
};
frame.extend(extra).unwrap();
assert_eq!(frame.n_rows(), 5);

Filtering

use trs_dataframe::{df, filter::FilterRules};

let frame = df! {
    "a" => [1i32, 2, 3, 4, 5],
    "b" => [10i32, 20, 30, 40, 50],
};

let rules = FilterRules::try_from("a >= 2i32 && b <= 40i32").unwrap();
let filtered = frame.filter(&rules).unwrap();
assert_eq!(filtered.n_rows(), 3);

Sorting and top-N

use trs_dataframe::{df, dataframe::TopN};

let frame = df! { "x" => [3i32, 1, 4, 1, 5, 9, 2, 6] };
let sorted = frame.sorted(&"x".into()).unwrap();
let top3 = sorted.topn(TopN::First(3)).unwrap();
assert_eq!(top3.nrows(), 3);

Feature flags

Feature Default Purpose
python yes PyO3 bindings, numpy interop, messagepack roundtrip.
polars-df no From/Into between polars::DataFrame and types.
jmalloc no Use jemalloc as the global allocator.
tracing no Pull in tracing-subscriber for runtime tracing setup.
utoipa no Derive OpenAPI schema for serializable types.

Development

cargo test --lib            # unit + integration tests
cargo test --doc            # doc examples
cargo bench                 # criterion benchmarks (see benches/)
cargo clippy --lib          # lints

The benchmark harness is in benches/bench_main.rs; sample data is fetched into benches/downloaded-data/ on first run.

License

Apache-2.0. See LICENSE.