Skip to main content

matten_data/
lib.rs

1//! `matten-data` — a tiny table-to-Tensor preparation companion for small PoC
2//! datasets.
3//!
4//! # Status
5//!
6//! **Experimental.** This is a scope-locked companion (RFC-033) for the boring
7//! step between table-like input and a numeric [`matten::Tensor`]. The API may
8//! change before beta; pin the minor version. Under lock-step family versioning
9//! (RFC-030) the crate shares the workspace family version; maturity is the Status
10//! label, not the version number.
11//!
12//! # The workflow
13//!
14//! ```text
15//! small CSV / table-like data
16//!   -> inspect schema
17//!   -> select columns by name
18//!   -> clean missing values explicitly
19//!   -> convert to numeric explicitly
20//!   -> matten::Tensor
21//! ```
22//!
23//! ```
24//! # #[cfg(not(feature = "csv"))] fn main() {}
25//! # #[cfg(feature = "csv")] fn main() -> Result<(), matten_data::MattenDataError> {
26//! use matten_data::Table;
27//!
28//! let csv = "sales,cost,note\n10,2,a\n20,,b\n30,4,c";
29//! let table = Table::from_csv_str(csv)?;
30//!
31//! // Inspect, select, clean, convert — every step explicit.
32//! let tensor = table
33//!     .select_columns(["sales", "cost"])?
34//!     .fill_missing(0.0)?
35//!     .try_numeric()?
36//!     .to_tensor()?;
37//!
38//! assert_eq!(tensor.shape(), &[3, 2]);
39//! assert_eq!(tensor.as_slice(), &[10.0, 2.0, 20.0, 0.0, 30.0, 4.0]);
40//! # Ok(())
41//! # }
42//! ```
43//!
44//! # What it is not
45//!
46//! `matten-data` is **not a dataframe library**. It has no joins, group-by, pivot,
47//! query DSL, lazy execution, indexing/`loc`/`iloc`, rolling/window operations,
48//! datetime engine, categorical dtype system, or large-data streaming. For those
49//! workloads use [Polars](https://pola.rs), [DataFusion](https://datafusion.apache.org),
50//! Pandas, or another dataframe/query tool. It is a small conversion helper for
51//! application-validated or trusted data, not a CSV firewall or input sandbox.
52//!
53//! # Relationship to core `dynamic`
54//!
55//! Core `matten`'s `dynamic` feature is *value-level* ingestion (mixed values
56//! inside a `Tensor`, with explicit `try_numeric()`). `matten-data` is *table-level*
57//! preparation (headers, named columns, schema summary, table-shaped missing-value
58//! policy) whose end goal is a numeric `Tensor`. It does not expose a second
59//! computation engine.
60//!
61//! # Conversion rules
62//!
63//! Numeric conversion is strict and explicit (`try_numeric` then `to_tensor`):
64//! integers and floats become `f64`; booleans and non-numeric text are rejected;
65//! a remaining missing cell is rejected (fill it first). Missing values never
66//! silently become zero, and booleans never silently become `1`/`0`.
67
68#![forbid(unsafe_code)]
69
70#[cfg(feature = "csv")]
71mod csv;
72mod error;
73mod numeric;
74mod schema;
75mod table;
76
77#[cfg(all(test, feature = "csv"))]
78mod tests;
79
80pub use error::MattenDataError;
81pub use numeric::NumericTable;
82pub use schema::{ColumnKind, ColumnSummary, SchemaSummary};
83pub use table::{CellValue, Table};