Skip to main content

Crate matten_mlprep

Crate matten_mlprep 

Source
Expand description

matten-mlprep — small, transparent, deterministic preprocessing helpers for matten::Tensor.

This companion crate (RFC-024, RFC-028) prepares numeric tensors for use with external tools. It is not an ML framework: there is no model training, no autograd, no optimizer, and no hidden randomness. Every function is a pure, deterministic transform you can reason about. It depends only on core matten (no default features) — no ndarray, no candle, no rand.

§Convention

All functions operate on rank-2 tensors with rows = samples and columns = features. A non-2D tensor is rejected; there is no silent transposition.

§Functions

use matten::Tensor;
use matten_mlprep::{add_bias_column, standardize_columns, train_test_split};

let x = Tensor::new(vec![1.0, 3.0, 5.0, 7.0], &[4, 1]);
let z = standardize_columns(&x).unwrap();        // zero mean, unit std
let z = add_bias_column(&z).unwrap();            // [4, 2], column 0 = 1.0
let (train, test) = train_test_split(&z, 0.75).unwrap();
assert_eq!(train.shape(), &[3, 2]);
assert_eq!(test.shape(), &[1, 2]);

§Status

Beta. The API may still change. Constant (zero-variance) columns are rejected explicitly by the scalers rather than silently producing a zero column — see MattenMlprepError::ZeroVariance. Dynamic tensors are rejected at every public entry point unconditionally — the guard does not depend on the companion dynamic feature (RFC-031).

§Feature flags

  • dynamic — Compatibility forwarding feature. No longer required for dynamic rejection as of v0.19.1. Dynamic tensors are rejected at companion boundaries regardless of whether this feature is enabled. Reconsider removal no earlier than v0.20.0.

Enums§

MattenMlprepError
Errors produced by matten-mlprep preprocessing functions.

Functions§

add_bias_column
Prepends a constant 1.0 bias column: [n, m] -> [n, m+1].
minmax_scale_columns
Scales each column to the [0, 1] range: out[i,j] = (x[i,j] - min_j) / (max_j - min_j).
standardize_columns
Standardizes each column to zero mean and unit (population) standard deviation: out[i,j] = (x[i,j] - mean_j) / std_j.
train_test_split
Splits the rows of a 2D tensor into (train, test) by an ordered, deterministic partition — no shuffling.