trs-dataframe 0.11.1

Dataframe library for Teiresias
Documentation
# trs-dataframe

Column-oriented dataframe for the Teiresias stack. Lightweight, typed, and
designed around per-row [`DataValue`] candidates rather than a full ndarray
back-end.

- **Typed columns with nullable bitmap.** Each column is a [`TypedDataArray`]
  wrapping a [`TypedData`] tagged union over native primitives (`bool`,
  `u8`/`u32`/`u64`, `i32`/`i64`, `f32`/`f64`, `String`, `Vec<TypedData>`)
  with a `Generic` fallback for heterogeneous data. Null positions are tracked
  via a packed `Vec<u64>` bitmap — one bit per element — so typed columns
  can represent missing values without falling back to `DataValue::Null`.
  Hot paths can take a zero-copy `&[i32]` slice (or any other primitive)
  without going through `DataValue`.
- **Operational primitives.** Append, push rows, extend, filter, join (incl.
  many-to-many by id and Cartesian product), sort + top-N, and column
  add/remove. See the inherent methods on [`DataFrame`] / [`ColumnFrame`].
- **Materialized views.** [`select`]DataFrame::select returns row-major
  `Array2<DataValue>`; [`select_view`]DataFrame::select_view gives a
  `(ncols, nrows)` stacked array; [`select_vec_view`]DataFrame::select_vec_view
  hands back zero-copy `&TypedData` borrows; [`select_typed`]DataFrame::select_typed
  coerces to a uniform primitive type via [`Extract`].
- **Filter DSL.** `FilterRules::try_from("a >= 1f64 && (b <= 5 || c <= 8)")`  parsed expressions over column values with type-aware comparison and a
  small set of column functions (`len`, `to_datetime_us`, …).
- **Pluggable runtimes.** Optional `python` (PyO3 + numpy bindings),
  `polars-df` (`polars::DataFrame` interop), `jmalloc`, and `tracing`
  features. The `python` feature is on by default.

## Install

```toml
[dependencies]
trs-dataframe = { version = "0.10", default-features = false }
# or with Python bindings + numpy + messagepack:
# trs-dataframe = "0.10"
```

## Quick start

```rust
use trs_dataframe::{df, DataFrame, DataType};

// Build a frame with the `df!` macro.
let mut frame: DataFrame = df! {
    "id"    => [1i32, 2, 3, 4],
    "score" => [10.5f64, 11.0, 9.5, 12.5],
    "label" => ["a", "b", "a", "c"],
};

assert_eq!(frame.n_rows(), 4);
assert_eq!(frame.n_columns(), 3);

// Materialize a row-major Array2 view.
let arr = frame.select(None).unwrap();
assert_eq!(arr.shape(), &[4, 3]);

// Zero-copy access requires the column to be in its native primitive
// representation. The `df!` macro starts every column as `Generic`, so
// promote the score column to `F64` first and then take a typed slice.
frame.dataframe.get_column_mut(&"score".into())
    .unwrap()
    .try_convert_to_dtype(DataType::F64)
    .unwrap();
let cols = frame.select_vec_view(Some(&["score".into()])).unwrap();
let scores: &[f64] = cols[0].as_ref().unwrap().as_slice_f64().unwrap();
assert_eq!(scores, &[10.5, 11.0, 9.5, 12.5]);

// Append more rows.
let extra: DataFrame = df! {
    "id"    => [5i32],
    "score" => [13.0f64],
    "label" => ["b"],
};
frame.extend(extra).unwrap();
assert_eq!(frame.n_rows(), 5);
```

## Filtering

```rust
use trs_dataframe::{df, filter::FilterRules};

let frame = df! {
    "a" => [1i32, 2, 3, 4, 5],
    "b" => [10i32, 20, 30, 40, 50],
};

let rules = FilterRules::try_from("a >= 2i32 && b <= 40i32").unwrap();
let filtered = frame.filter(&rules).unwrap();
assert_eq!(filtered.n_rows(), 3);
```

## Sorting and top-N

```rust
use trs_dataframe::{df, dataframe::TopN};

let frame = df! { "x" => [3i32, 1, 4, 1, 5, 9, 2, 6] };
let sorted = frame.sorted(&"x".into()).unwrap();
let top3 = sorted.topn(TopN::First(3)).unwrap();
assert_eq!(top3.nrows(), 3);
```

## Feature flags

| Feature     | Default | Purpose                                                 |
|-------------|---------|---------------------------------------------------------|
| `python`    | yes     | PyO3 bindings, numpy interop, messagepack roundtrip.    |
| `polars-df` | no      | `From`/`Into` between `polars::DataFrame` and types.    |
| `jmalloc`   | no      | Use jemalloc as the global allocator.                   |
| `tracing`   | no      | Pull in `tracing-subscriber` for runtime tracing setup. |
| `utoipa`    | no      | Derive OpenAPI schema for serializable types.           |

## Development

```bash
cargo test --lib            # unit + integration tests
cargo test --doc            # doc examples
cargo bench                 # criterion benchmarks (see benches/)
cargo clippy --lib          # lints
```

The benchmark harness is in `benches/bench_main.rs`; sample data is fetched
into `benches/downloaded-data/` on first run.

## License

Apache-2.0. See [LICENSE](LICENSE).