matten-mlprep
Beta (
0.19.xfamily release). Small, transparent, deterministic preprocessing helpers formatten::Tensor. Not an ML framework. The API is intended to be mostly stable but is still pre-1.0; pin the minor version.
Part of the matten workspace — see it for the full family.
Overview
matten-mlprep provides a handful of plain functions for preparing numeric
tensors before handing them to an external tool. There is no model training, no
autograd, no optimizer, and no hidden randomness — every function is a pure,
deterministic transform you can read and reason about.
It depends only on core matten (no default features); it adds no
ndarray, candle, or rand dependency.
Why / when
Use it for the boring-but-necessary steps between "I have a numeric Tensor" and
"I can feed a model": scale features, add an intercept column, carve out a test
set. When you need anything stateful or model-shaped, reach for a real ML crate —
this one deliberately stops at preprocessing.
Quick start
use Tensor;
use ;
let x = new;
let z = standardize_columns?; // zero mean, unit std per column
let z = add_bias_column?; // prepend a 1.0 intercept column
let = train_test_split?;
# Ok::
Design notes
- Convention: rank-2 only,
rows = samples,columns = features. No silent transposition; a non-2D input is an error. - Population std.
standardize_columnsdivides byn(like scikit-learn'sStandardScaler). - Constant columns error, not silently zero. A zero-variance / zero-range
column returns
MattenMlprepError::ZeroVariance { column }so you handle it deliberately. add_bias_columnprepends the1.0column (intercept at index 0).train_test_splitis ordered and deterministic —first floor(n*ratio)rows are train, the rest are test. No shuffle. (A seeded variant is planned; see RFC-024 §6.)- Dynamic tensors are rejected, not panicked (with the
dynamicfeature).
Public API
The complete surface (the breaking-change baseline for this crate):
;
;
;
;
Limitations
- Rank-2 only. Inputs must be
[rows = samples, columns = features]; other ranks are an error. No automatic reshaping or transposition. - No data cleaning.
NaN/Infpropagate to the output; clean your data first (e.g. via the coredynamicon-ramp) if it is not already numeric-clean. - Population std.
standardize_columnsdivides byn(notn-1). - Ordered split only.
train_test_splitdoes not shuffle. A seeded shuffled variant is planned but not yet available (RFC-024 §6). - Not for large/streaming data. These are eager, in-memory transforms.
Compatibility
- SemVer: pre-1.0 (
0.x). A0.xminor bump may break and carries migration notes; patch releases are additive only. Pin the minor (matten-mlprep = "0.19"). - MSRV: Rust 1.85 (edition 2024).
matten: shares the0.19family version (RFC-030). - A
1.0release requires explicit maintainer confirmation.
More detail
See the workspace ROADMAP.md and RFC-024 (scope) / RFC-028
(design) under rfcs/.
License
Apache-2.0 © nabbisen