Expand description
matten-mlprep — small, transparent, deterministic preprocessing helpers for
matten::Tensor.
This companion crate (RFC-024, RFC-028) prepares numeric tensors for use with
external tools. It is not an ML framework: there is no model training, no
autograd, no optimizer, and no hidden randomness. Every function is a pure,
deterministic transform you can reason about. It depends only on core
matten (no default features) — no ndarray, no candle, no rand.
§Convention
All functions operate on rank-2 tensors with rows = samples and
columns = features. A non-2D tensor is rejected; there is no silent
transposition.
§Functions
standardize_columns— per-column z-score (population std).minmax_scale_columns— per-column scaling to[0, 1].add_bias_column— prepend a constant1.0intercept column.train_test_split— ordered, deterministic row split.
use matten::Tensor;
use matten_mlprep::{add_bias_column, standardize_columns, train_test_split};
let x = Tensor::new(vec![1.0, 3.0, 5.0, 7.0], &[4, 1]);
let z = standardize_columns(&x).unwrap(); // zero mean, unit std
let z = add_bias_column(&z).unwrap(); // [4, 2], column 0 = 1.0
let (train, test) = train_test_split(&z, 0.75).unwrap();
assert_eq!(train.shape(), &[3, 2]);
assert_eq!(test.shape(), &[1, 2]);§Status
Beta. The API may still change. Constant (zero-variance) columns are
rejected explicitly by the scalers rather than silently producing a zero
column — see MattenMlprepError::ZeroVariance. Dynamic tensors are
rejected at every public entry point unconditionally — the guard does not
depend on the companion dynamic feature (RFC-031).
§Feature flags
dynamic— Compatibility forwarding feature. No longer required for dynamic rejection as of v0.19.1. Dynamic tensors are rejected at companion boundaries regardless of whether this feature is enabled. Reconsider removal no earlier than v0.20.0.
Enums§
- Matten
Mlprep Error - Errors produced by
matten-mlpreppreprocessing functions.
Functions§
- add_
bias_ column - Prepends a constant
1.0bias column:[n, m] -> [n, m+1]. - minmax_
scale_ columns - Scales each column to the
[0, 1]range:out[i,j] = (x[i,j] - min_j) / (max_j - min_j). - standardize_
columns - Standardizes each column to zero mean and unit (population) standard
deviation:
out[i,j] = (x[i,j] - mean_j) / std_j. - train_
test_ split - Splits the rows of a 2D tensor into
(train, test)by an ordered, deterministic partition — no shuffling.