# trs-dataframe
Column-oriented dataframe for the Teiresias stack. Lightweight, typed, and
designed around per-row [`DataValue`] candidates rather than a full ndarray
back-end.
- **Typed columns with nullable bitmap.** Each column is a [`TypedDataArray`]
wrapping a [`TypedData`] tagged union over native primitives (`bool`,
`u8`/`u32`/`u64`, `i32`/`i64`, `f32`/`f64`, `String`, `Vec<TypedData>`)
with a `Generic` fallback for heterogeneous data. Null positions are tracked
via a packed `Vec<u64>` bitmap — one bit per element — so typed columns
can represent missing values without falling back to `DataValue::Null`.
Hot paths can take a zero-copy `&[i32]` slice (or any other primitive)
without going through `DataValue`.
- **Operational primitives.** Append, push rows, extend, filter, join (incl.
many-to-many by id and Cartesian product), sort + top-N, and column
add/remove. See the inherent methods on [`DataFrame`] / [`ColumnFrame`].
- **Materialized views.** [`select`](DataFrame::select) returns row-major
`Array2<DataValue>`; [`select_view`](DataFrame::select_view) gives a
`(ncols, nrows)` stacked array; [`select_vec_view`](DataFrame::select_vec_view)
hands back zero-copy `&TypedData` borrows; [`select_typed`](DataFrame::select_typed)
coerces to a uniform primitive type via [`Extract`].
- **Filter DSL.** `FilterRules::try_from("a >= 1f64 && (b <= 5 || c <= 8)")` —
parsed expressions over column values with type-aware comparison and a
small set of column functions (`len`, `to_datetime_us`, …).
- **Pluggable runtimes.** Optional `python` (PyO3 + numpy bindings),
`polars-df` (`polars::DataFrame` interop), `jmalloc`, and `tracing`
features. The `python` feature is on by default.
## Install
```toml
[dependencies]
trs-dataframe = { version = "0.10", default-features = false }
# or with Python bindings + numpy + messagepack:
# trs-dataframe = "0.10"
```
## Quick start
```rust
use trs_dataframe::{df, DataFrame, DataType};
// Build a frame with the `df!` macro.
let mut frame: DataFrame = df! {
"id" => [1i32, 2, 3, 4],
"score" => [10.5f64, 11.0, 9.5, 12.5],
"label" => ["a", "b", "a", "c"],
};
assert_eq!(frame.n_rows(), 4);
assert_eq!(frame.n_columns(), 3);
// Materialize a row-major Array2 view.
let arr = frame.select(None).unwrap();
assert_eq!(arr.shape(), &[4, 3]);
// Zero-copy access requires the column to be in its native primitive
// representation. The `df!` macro starts every column as `Generic`, so
// promote the score column to `F64` first and then take a typed slice.
frame.dataframe.get_column_mut(&"score".into())
.unwrap()
.try_convert_to_dtype(DataType::F64)
.unwrap();
let cols = frame.select_vec_view(Some(&["score".into()])).unwrap();
let scores: &[f64] = cols[0].as_ref().unwrap().as_slice_f64().unwrap();
assert_eq!(scores, &[10.5, 11.0, 9.5, 12.5]);
// Append more rows.
let extra: DataFrame = df! {
"id" => [5i32],
"score" => [13.0f64],
"label" => ["b"],
};
frame.extend(extra).unwrap();
assert_eq!(frame.n_rows(), 5);
```
## Filtering
```rust
use trs_dataframe::{df, filter::FilterRules};
let frame = df! {
"a" => [1i32, 2, 3, 4, 5],
"b" => [10i32, 20, 30, 40, 50],
};
let rules = FilterRules::try_from("a >= 2i32 && b <= 40i32").unwrap();
let filtered = frame.filter(&rules).unwrap();
assert_eq!(filtered.n_rows(), 3);
```
## Sorting and top-N
```rust
use trs_dataframe::{df, dataframe::TopN};
let frame = df! { "x" => [3i32, 1, 4, 1, 5, 9, 2, 6] };
let sorted = frame.sorted(&"x".into()).unwrap();
let top3 = sorted.topn(TopN::First(3)).unwrap();
assert_eq!(top3.nrows(), 3);
```
## Feature flags
| `python` | yes | PyO3 bindings, numpy interop, messagepack roundtrip. |
| `polars-df` | no | `From`/`Into` between `polars::DataFrame` and types. |
| `jmalloc` | no | Use jemalloc as the global allocator. |
| `tracing` | no | Pull in `tracing-subscriber` for runtime tracing setup. |
| `utoipa` | no | Derive OpenAPI schema for serializable types. |
## Development
```bash
cargo test --lib # unit + integration tests
cargo test --doc # doc examples
cargo bench # criterion benchmarks (see benches/)
cargo clippy --lib # lints
```
The benchmark harness is in `benches/bench_main.rs`; sample data is fetched
into `benches/downloaded-data/` on first run.
## License
Apache-2.0. See [LICENSE](LICENSE).