Crate matten_mlprep

Expand description

matten-mlprep — small, transparent, deterministic preprocessing helpers for matten::Tensor.

This companion crate (RFC-024, RFC-028) prepares numeric tensors for use with external tools. It is not an ML framework: there is no model training, no autograd, no optimizer, and no hidden randomness. Every function is a pure, deterministic transform you can reason about. It depends only on core matten (no default features) — no ndarray, no candle, no rand.

§Convention

All functions operate on rank-2 tensors with rows = samples and columns = features. A non-2D tensor is rejected; there is no silent transposition.

§Functions

standardize_columns — per-column z-score (population std).
minmax_scale_columns — per-column scaling to [0, 1].
add_bias_column — prepend a constant 1.0 intercept column.
train_test_split — ordered, deterministic row split.

use matten::Tensor;
use matten_mlprep::{add_bias_column, standardize_columns, train_test_split};

let x = Tensor::new(vec![1.0, 3.0, 5.0, 7.0], &[4, 1]);
let z = standardize_columns(&x).unwrap();        // zero mean, unit std
let z = add_bias_column(&z).unwrap();            // [4, 2], column 0 = 1.0
let (train, test) = train_test_split(&z, 0.75).unwrap();
assert_eq!(train.shape(), &[3, 2]);
assert_eq!(test.shape(), &[1, 2]);

§Status

Experimental (0.1.x). The API may change. Constant (zero-variance) columns are rejected explicitly by the scalers rather than silently producing a zero column — see MattenMlprepError::ZeroVariance.

Enums§

MattenMlprepError: Errors produced by matten-mlprep preprocessing functions.

Functions§

add_bias_column: Prepends a constant 1.0 bias column: [n, m] -> [n, m+1].
minmax_scale_columns: Scales each column to the [0, 1] range: out[i,j] = (x[i,j] - min_j) / (max_j - min_j).
standardize_columns: Standardizes each column to zero mean and unit (population) standard deviation: out[i,j] = (x[i,j] - mean_j) / std_j.
train_test_split: Splits the rows of a 2D tensor into (train, test) by an ordered, deterministic partition — no shuffling.