//! Normalizer: scale each sample (row) to unit norm.
//!
//! Unlike column-wise scalers, the `Normalizer` operates row-wise: each
//! sample is scaled independently so that its chosen norm equals 1.
//!
//! Supported norms:
//! - **L1**: divide by the sum of absolute values
//! - **L2**: divide by the Euclidean norm (default)
//! - **Max**: divide by the maximum absolute value
//!
//! Samples that already have a zero norm are left unchanged.
//!
//! This transformer is **stateless** — no fitting is required. Call
//! [`Transform::transform`] directly. For scikit-learn API parity it ALSO
//! supports the stateful [`Fit`](ferrolearn_core::traits::Fit) →
//! [`FittedNormalizer`] path, which records `n_features_in_` and (like
//! sklearn) validates the input in `fit`; the fitted type's `transform`
//! reuses the very same row-norm logic as the stateless path, so both paths
//! are bit-identical.
//!
//! # `## REQ status`
//!
//! Binary (R-DEFER-2), translating `sklearn/preprocessing/_data.py` (`class Normalizer`
//! `:1980`, `normalize` `:1866`). Design doc: `.design/preprocess/normalizer.md`. Expected
//! values from the live sklearn 1.5.2 oracle (R-CHAR-3). Consumers: the in-file
//! `PipelineTransformer`/`FittedPipelineTransformer` impls (pipeline integration) + crate
//! re-export (`lib.rs:119`, grandfathered S5). No PyO3 binding.
//!
//! | REQ | Status | Evidence |
//! |---|---|---|
//! | REQ-1 (row-wise L1/L2/Max transform) | SHIPPED | `Transform::transform` divides each row by its norm (L1=Σ\|v\|, L2=√Σv², Max=max\|v\|; zero-norm row unchanged), default L2; mirrors sklearn dense `normalize` (`_data.py:1962-1969`, `_handle_zeros_in_scale` `:1968`). Critic-verified bit-identical to live oracle: `guard_l1/l2/max/zero_row/f32_matches_oracle` in `tests/divergence_normalizer.rs`. Consumers: `FittedPipelineTransformer::transform_pipeline` + crate re-export `lib.rs:119`. |
//! | REQ-2 (transform input validation per check_array) | SHIPPED | FIXED #1140. `transform` guards (sklearn order) zero-samples → `InsufficientSamples` (`validation.py:1084`), zero-features → `InvalidParameter` (`:1093`), non-finite NaN/±inf → `InvalidParameter` (`:1063`) — matching `Normalizer.transform` → `normalize` → `check_array` (`_data.py:1933-1940`). Mirrors converged `binarizer.rs`. Critic two-round CLEAN: 6 rejection pins + finite-not-over-rejected guards (zero-NORM-row/1e308/subnormal/-0.0); pipeline consumer inherits validation. |
//! | REQ-3 (validating fit + parameter constraints) | SHIPPED | FIXED #1141. `impl Fit<Array2<F>, ()> for Normalizer` (`fit`): runs the SAME `validate_normalize_input` guard as `Transform::transform`/`normalize` (REQ-2: zero-samples → `InsufficientSamples`, zero-features/non-finite NaN±inf → `InvalidParameter`, sklearn `_validate_data` default `force_all_finite=True` REJECTS NaN/inf — confirmed `Normalizer().fit([[nan]])`/`[[inf]]` raise ValueError, `:2082`,`utils/validation.py:1063/1084/1093`), records `n_features_in_ = x.ncols()`, returns `FittedNormalizer { norm, copy, n_features_in_ }` (no fitted statistics — Normalizer is stateless, sklearn fit "Only validates", `:2062-2083`). sklearn's `_parameter_constraints {norm:[StrOptions{l1,l2,max}]}` (`:2053-2055`) has NO ferrolearn analog: `NormType` is a closed Rust enum, so an out-of-domain norm is UNREPRESENTABLE rather than runtime-rejected — the type system satisfies the param-domain check. Live-oracle tests: `fit_l1/l2/max_matches_oracle_and_stateless`, `fit_rejects_nan/pos_inf/neg_inf`, `fit_zero_row_unchanged`, `fitted_transform_shape_mismatch`, `fit_path_equals_stateless_path` in `tests/divergence_normalizer.rs`. Consumers: `FittedNormalizer::transform` (the fitted path) + crate re-export `lib.rs:140`. |
//! | REQ-4 (normalize free fn: axis / return_norm) | SHIPPED | FIXED #1142. `pub fn normalize` + `pub fn normalize_with_norms` (free fns) mirror sklearn `normalize(X, norm, *, axis=1, copy=True, return_norm=False)` (`_data.py:1866`). Shared `row_norm` helper computes L1=Σ\|v\|, L2=√Σv², Max=max\|v\| (`:1962-1967`); `_handle_zeros_in_scale` zero→1 (`:1968`); `X /= norms` (`:1969`). `axis=1` row-normalizes; `axis=0` column-normalizes (sklearn transpose `:1926-1942`,`:1971-1972`); `axis ∉ {0,1}` → `InvalidParameter`. `normalize_with_norms` returns `(normalized, raw_norms)` (return_norm `:1974-1975`; raw, NOT zero-handled). Same validation as `Transform::transform` (REQ-2). Oracle-grounded tests in `#[cfg(test)]`: `normalize_l2/l1/max_axis1_matches_sklearn`, `normalize_l2_axis0_matches_sklearn`, `normalize_return_norm_l2_and_l1`, `normalize_invalid_axis_errors`. |
//! | REQ-5 (copy parameter) | SHIPPED | FIXED #1143. `Normalizer<F>` gains a `copy: bool` field (default `true`) + `#[must_use] with_copy` builder + `copy()` getter, threaded onto `FittedNormalizer`, mirroring sklearn `__init__(norm='l2', *, copy=True)` (`_data.py:2058-2060`, `_parameter_constraints {copy:["boolean"]}` `:2055`). ACCEPT-AND-DOCUMENT no-op: ferrolearn's `Transform` always returns a freshly allocated array (`to_owned()`), so `copy` has no observable effect — `copy=True`/`copy=False` produce identical output (sklearn's `copy=False` does in-place row normalization, an optimization Rust's ownership makes moot here). Live-oracle test `fit_copy_true_false_identical`. Consumers: `FittedNormalizer` carries the flag + crate re-export `lib.rs:140`. |
//! | REQ-6 (n_features_in_ / feature names) | PARTIAL | `n_features_in_` SHIPPED, `get_feature_names_out` NOT-STARTED. `FittedNormalizer<F>` records `n_features_in_ = x.ncols()` in `fit` and exposes `pub fn n_features_in(&self) -> usize`, mirroring sklearn's `_validate_data` setting `n_features_in_` (`:2082`); `FittedNormalizer::transform` validates the input column count against it (`ShapeMismatch`, sklearn `_validate_data(reset=False)` `:2104`). The `OneToOneFeatureMixin.get_feature_names_out` / `feature_names_in_` string-name plumbing is OUT OF SCOPE for this build (no string feature-name infrastructure in ferrolearn yet) — open prereq blocker #1144 for the feature-name half. Live-oracle test `fit_n_features_in_matches_ncols`. |
//! | REQ-7 (sparse support) | NOT-STARTED | open prereq blocker #1145. Dense-only; no CSR `inplace_csr_row_normalize_l1/l2` / `min_max_axis` Max (`:1944-1960`). |
//! | REQ-8 (PyO3 binding) | SHIPPED | FIXED #1146. `ferrolearn-python` surfaces `Normalizer` as `ferrolearn.Normalizer`: the hand-written `_RsNormalizer` `#[pyclass]` (`ferrolearn-python/src/extras.rs`, registered `lib.rs`) maps sklearn's `norm` STRING ('l1'/'l2'/'max') to the closed Rust `NormType` enum via `RsNormalizer::resolve_norm` — a bad string → `PyValueError` (sklearn `_parameter_constraints {norm: StrOptions({"l1","l2","max"})}`, `_data.py:2055`, `InvalidParameterError` ⊂ ValueError), builds `Normalizer::<f64>::new(normtype).with_copy(copy)`, runs the validating `Fit` (NaN/±inf → `PyValueError`, REQ-3) and delegates `transform` to `FittedNormalizer`. The non-test production consumer is `_extras.py::Normalizer(_TransformerWrapper)` with sklearn's `__init__(self, norm="l2", *, copy=True)` ABI (norm positional-or-keyword, copy keyword-only, `_data.py:2058`) + an overridden STATELESS `transform` (build-on-demand without fit, `_more_tags stateless=True` `_data.py:2110`, #2213) doing a FLOAT-ONLY dtype cast-back (float32→float32, float64→float64, int64→float64 UPCAST per `check_array(dtype=FLOAT_DTYPES)` `_data.py:2104`, #2214-analog — DIFFERS from Binarizer's number-preserving cast); re-exported in `__init__.py`. Verified vs the live sklearn 1.5.2 oracle: `tests/divergence_normalizer.py` (l1/l2/max values, default-l2, positional-norm, stateless, dtype, NaN/±inf, zero-norm, bad-norm, clone/get_params/set_params, copy no-op, pipeline). **Reduced-precision caveat (#2215, tracked):** sklearn `normalize` casts X to the INPUT float precision via `check_array(dtype=FLOAT_DTYPES)` (`_data.py:1933`) and computes the norm + division IN that precision (float16/float32), but the f64-only binding ABI (shared by EVERY `_Rs*` transformer) computes the norm in float64 then casts the result back — so float32 (~6e-8) and float16 (~5e-4) VALUES diverge slightly (dtype LABELS match; the float64 path is bit-exact, <1e-12). Same class as the generic-F precision caveats #2205/#2206; float16 is fundamentally unmatchable (the Rust core has no f16). Pinned `#[skip]` in `tests/divergence_normalizer_reduced_precision.py`. |
//! | REQ-9 (ferray substrate) | NOT-STARTED | open prereq blocker #1147. `ndarray::Array2` + `num_traits::Float`, not `ferray-core`/`ferray-ufunc` (R-SUBSTRATE-1/2). |
use ferrolearn_core::error::FerroError;
use ferrolearn_core::pipeline::{FittedPipelineTransformer, PipelineTransformer};
use ferrolearn_core::traits::{Fit, Transform};
use ndarray::{Array1, Array2, ArrayView1};
use num_traits::Float;
// ---------------------------------------------------------------------------
// NormType
// ---------------------------------------------------------------------------
/// The norm used by [`Normalizer`] when scaling each sample.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum NormType {
/// L1 norm: sum of absolute values.
L1,
/// L2 norm: Euclidean norm (square root of sum of squares). This is the default.
#[default]
L2,
/// Max norm: maximum absolute value in the sample.
Max,
}
// ---------------------------------------------------------------------------
// Normalizer
// ---------------------------------------------------------------------------
/// A stateless row-wise normalizer.
///
/// Each sample (row) is independently scaled so that its chosen norm equals 1.
/// Samples with a zero norm are left unchanged.
///
/// This transformer is stateless — no [`Fit`](ferrolearn_core::traits::Fit)
/// step is needed. Call [`Transform::transform`] directly.
///
/// # Examples
///
/// ```
/// use ferrolearn_preprocess::normalizer::{Normalizer, NormType};
/// use ferrolearn_core::traits::Transform;
/// use ndarray::array;
///
/// let normalizer = Normalizer::<f64>::new(NormType::L2);
/// let x = array![[3.0, 4.0], [1.0, 0.0]];
/// let out = normalizer.transform(&x).unwrap();
/// // Row 0: [3/5, 4/5], Row 1: [1.0, 0.0]
/// ```
#[derive(Debug, Clone)]
pub struct Normalizer<F> {
/// The norm to use for normalisation.
pub(crate) norm: NormType,
/// sklearn's `copy` constructor parameter (`__init__(norm='l2', *, copy=True)`,
/// `_data.py:2058-2060`; `_parameter_constraints {copy:["boolean"]}` `:2055`).
/// ACCEPT-AND-DOCUMENT no-op: ferrolearn's [`Transform`] always returns a
/// freshly allocated array, so `copy` has no observable effect. Retained for
/// API parity. Defaults to `true`.
pub(crate) copy: bool,
_marker: std::marker::PhantomData<F>,
}
impl<F: Float + Send + Sync + 'static> Normalizer<F> {
/// Create a new `Normalizer` with the specified norm type.
#[must_use]
pub fn new(norm: NormType) -> Self {
Self {
norm,
copy: true,
_marker: std::marker::PhantomData,
}
}
/// Create a new `Normalizer` using the default L2 norm.
#[must_use]
pub fn l2() -> Self {
Self::new(NormType::L2)
}
/// Create a new `Normalizer` using the L1 norm.
#[must_use]
pub fn l1() -> Self {
Self::new(NormType::L1)
}
/// Create a new `Normalizer` using the Max norm.
#[must_use]
pub fn max() -> Self {
Self::new(NormType::Max)
}
/// Return the configured norm type.
#[must_use]
pub fn norm(&self) -> NormType {
self.norm
}
/// Set the `copy` parameter (sklearn `Normalizer(copy=...)`,
/// `_data.py:2058`, `_parameter_constraints {copy:["boolean"]}` `:2055`).
///
/// This is an ACCEPT-AND-DOCUMENT no-op: ferrolearn's [`Transform`] always
/// returns a freshly allocated array, so `copy` has no observable effect on
/// the output. It is retained for API parity with scikit-learn.
#[must_use]
pub fn with_copy(mut self, copy: bool) -> Self {
self.copy = copy;
self
}
/// Return the configured `copy` flag (sklearn `Normalizer.copy`).
#[must_use]
pub fn copy(&self) -> bool {
self.copy
}
}
impl<F: Float + Send + Sync + 'static> Default for Normalizer<F> {
fn default() -> Self {
Self::new(NormType::L2)
}
}
// ---------------------------------------------------------------------------
// FittedNormalizer (sklearn stateful `fit` -> fitted estimator path)
// ---------------------------------------------------------------------------
/// A fitted [`Normalizer`].
///
/// `Normalizer` is stateless — its `fit` (sklearn `Normalizer.fit`,
/// `_data.py:2062-2083`, "Only validates estimator's parameters") learns NO
/// statistics; it merely validates the input and records `n_features_in_`. The
/// fitted type therefore carries only the configured `norm`, the `copy` flag,
/// and the recorded feature count. Its [`Transform::transform`] reuses the very
/// same row-norm logic as the stateless [`Normalizer`]/[`normalize`] path, so
/// the two paths are bit-identical.
#[derive(Debug, Clone)]
pub struct FittedNormalizer<F> {
/// The norm to use for normalisation.
pub(crate) norm: NormType,
/// The `copy` flag carried from the unfitted [`Normalizer`] (no-op; see
/// [`Normalizer::with_copy`]).
pub(crate) copy: bool,
/// Number of features (columns) seen during [`Fit::fit`] — sklearn's
/// `n_features_in_` (`_data.py:2082`, set by `_validate_data`).
pub(crate) n_features_in_: usize,
_marker: std::marker::PhantomData<F>,
}
impl<F: Float + Send + Sync + 'static> FittedNormalizer<F> {
/// Return the number of features (columns) seen during [`Fit::fit`].
///
/// Mirrors scikit-learn's `Normalizer.n_features_in_` (`_data.py:2082`).
#[must_use]
pub fn n_features_in(&self) -> usize {
self.n_features_in_
}
/// Return the configured norm type.
#[must_use]
pub fn norm(&self) -> NormType {
self.norm
}
/// Return the configured `copy` flag (no-op; see [`Normalizer::with_copy`]).
#[must_use]
pub fn copy(&self) -> bool {
self.copy
}
}
impl<F: Float + Send + Sync + 'static> Fit<Array2<F>, ()> for Normalizer<F> {
type Fitted = FittedNormalizer<F>;
type Error = FerroError;
/// Validate the input and record `n_features_in_`, returning a
/// [`FittedNormalizer`].
///
/// `Normalizer` is stateless: like scikit-learn's `Normalizer.fit`
/// (`sklearn/preprocessing/_data.py:2062-2083`, "Only validates estimator's
/// parameters"), this learns NO statistics. It runs the SAME `check_array`
/// validation as [`Transform::transform`] / [`normalize`] (REQ-2, via the
/// shared `validate_normalize_input` helper) and records
/// `n_features_in_ = x.ncols()`. sklearn's `_validate_data` uses the default
/// `force_all_finite=True`, so NaN/±inf are REJECTED in `fit`
/// (`Normalizer().fit([[nan]])` / `[[inf]]` raise `ValueError`).
///
/// # Errors
///
/// Returns [`FerroError::InsufficientSamples`] for zero rows and
/// [`FerroError::InvalidParameter`] for zero features or any non-finite
/// value (NaN, +inf, -inf) — matching `check_array`
/// (`sklearn/utils/validation.py:1084`, `:1093`, `:1063`) as routed through
/// `Normalizer.fit` -> `_validate_data` (`_data.py:2082`).
fn fit(&self, x: &Array2<F>, _y: &()) -> Result<FittedNormalizer<F>, FerroError> {
validate_normalize_input(x)?;
Ok(FittedNormalizer {
norm: self.norm,
copy: self.copy,
n_features_in_: x.ncols(),
_marker: std::marker::PhantomData,
})
}
}
impl<F: Float + Send + Sync + 'static> Transform<Array2<F>> for FittedNormalizer<F> {
type Output = Array2<F>;
type Error = FerroError;
/// Normalize each row of `x` to unit norm, delegating to the SAME row-norm
/// logic as the stateless [`Normalizer`] / [`normalize`] path.
///
/// First validates that `x` has the same number of columns recorded during
/// [`Fit::fit`] (sklearn `_validate_data(reset=False)`,
/// `sklearn/preprocessing/_data.py:2104`) and applies the REQ-2
/// `check_array` guards, then calls the shared [`normalize`] free function
/// with `axis=1` (sklearn `Normalizer.transform` ->
/// `normalize(X, norm=self.norm, axis=1)`, `:2106`). The output is therefore
/// byte-identical to `Normalizer::transform`.
///
/// # Errors
///
/// Returns [`FerroError::ShapeMismatch`] if the column count differs from
/// `n_features_in_`. Returns [`FerroError::InsufficientSamples`] for zero
/// rows and [`FerroError::InvalidParameter`] for zero features or any
/// non-finite value (REQ-2, via [`normalize`]).
fn transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
// sklearn `_validate_data(reset=False)` runs `check_array` (finite /
// min-samples / min-features) BEFORE `_check_n_features` (`base.py:633`
// then `:654`, #2207). So validate + normalize FIRST (this is
// `check_array`'s job via the shared REQ-2 guard in `normalize`); a NaN /
// +-inf / zero-sample / zero-feature input must raise its check_array
// error EVEN when the column count is also wrong. Only after that does
// the n_features comparison fire.
let normalized = normalize(x, self.norm, 1)?;
if x.ncols() != self.n_features_in_ {
return Err(FerroError::ShapeMismatch {
expected: vec![x.nrows(), self.n_features_in_],
actual: vec![x.nrows(), x.ncols()],
context: "FittedNormalizer::transform".into(),
});
}
Ok(normalized)
}
}
// ---------------------------------------------------------------------------
// Trait implementations
// ---------------------------------------------------------------------------
impl<F: Float + Send + Sync + 'static> Transform<Array2<F>> for Normalizer<F> {
type Output = Array2<F>;
type Error = FerroError;
/// Normalize each row of `x` to unit norm.
///
/// Rows with a zero norm value are left unchanged.
///
/// # Errors
///
/// Returns [`FerroError::InsufficientSamples`] if `x` has zero rows. This
/// mirrors scikit-learn's `Normalizer.transform` ->
/// `normalize` -> `check_array` (`sklearn/preprocessing/_data.py:1933`),
/// whose min-samples check (`utils/validation.py:1084`,
/// `ensure_min_samples=1`) raises `ValueError: Found array with 0 sample(s)
/// ... while a minimum of 1 is required by Normalizer.`
///
/// Returns [`FerroError::InvalidParameter`] if `x` has zero features
/// (columns). This mirrors the same `check_array` min-features check
/// (`utils/validation.py:1093`, `ensure_min_features=1`) which raises
/// `ValueError: Found array with 0 feature(s) ... while a minimum of 1 is
/// required by Normalizer.`
///
/// Returns [`FerroError::InvalidParameter`] if `x` contains any non-finite
/// value (NaN, +inf, or -inf). This mirrors `check_array(force_all_finite=
/// True)` (`utils/validation.py:1063`), which raises `ValueError: Input X
/// contains NaN.` / `Input X contains infinity ...` before normalizing.
fn transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
if x.nrows() == 0 {
return Err(FerroError::InsufficientSamples {
required: 1,
actual: 0,
context: "Normalizer::transform".into(),
});
}
if x.ncols() == 0 {
return Err(FerroError::InvalidParameter {
name: "X".to_string(),
reason: "Found array with 0 feature(s); a minimum of 1 is required \
by Normalizer"
.to_string(),
});
}
if x.iter().any(|v| !v.is_finite()) {
return Err(FerroError::InvalidParameter {
name: "X".to_string(),
reason: "Input X contains non-finite values (NaN or infinity); \
Normalizer requires all-finite input"
.to_string(),
});
}
let mut out = x.to_owned();
for mut row in out.rows_mut() {
let norm_val =
match self.norm {
NormType::L1 => row.iter().copied().fold(F::zero(), |acc, v| acc + v.abs()),
NormType::L2 => row
.iter()
.copied()
.fold(F::zero(), |acc, v| acc + v * v)
.sqrt(),
NormType::Max => row.iter().copied().fold(F::zero(), |acc, v| {
if v.abs() > acc { v.abs() } else { acc }
}),
};
if norm_val == F::zero() {
// Zero-norm row: leave unchanged.
continue;
}
for v in &mut row {
*v = *v / norm_val;
}
}
Ok(out)
}
}
// ---------------------------------------------------------------------------
// Standalone `normalize` free function (sklearn `normalize`, `_data.py:1866`)
// ---------------------------------------------------------------------------
/// Compute the `norm` of a single 1-D slice (one row or one column).
///
/// Mirrors sklearn's dense `normalize` per-vector norms (`_data.py:1962-1967`):
/// L1 = Σ|v|, L2 = √Σv², Max = max|v|.
fn row_norm<F: Float>(row: ArrayView1<F>, norm: NormType) -> F {
match norm {
NormType::L1 => row.iter().copied().fold(F::zero(), |acc, v| acc + v.abs()),
NormType::L2 => row
.iter()
.copied()
.fold(F::zero(), |acc, v| acc + v * v)
.sqrt(),
NormType::Max => {
row.iter().copied().fold(
F::zero(),
|acc, v| {
if v.abs() > acc { v.abs() } else { acc }
},
)
}
}
}
/// Run the shared `check_array` input validation (REQ-2) used by both
/// [`Normalizer`]'s `transform` and the free [`normalize`]/[`normalize_with_norms`]
/// functions, in sklearn's `check_array` order (zero-samples → zero-features →
/// non-finite; `sklearn/utils/validation.py:1084`, `:1093`, `:1063`).
fn validate_normalize_input<F: Float>(x: &Array2<F>) -> Result<(), FerroError> {
if x.nrows() == 0 {
return Err(FerroError::InsufficientSamples {
required: 1,
actual: 0,
context: "normalize".into(),
});
}
if x.ncols() == 0 {
return Err(FerroError::InvalidParameter {
name: "X".to_string(),
reason: "Found array with 0 feature(s); a minimum of 1 is required \
by the normalize function"
.to_string(),
});
}
if x.iter().any(|v| !v.is_finite()) {
return Err(FerroError::InvalidParameter {
name: "X".to_string(),
reason: "Input X contains non-finite values (NaN or infinity); \
the normalize function requires all-finite input"
.to_string(),
});
}
Ok(())
}
/// Shared core of [`normalize`] / [`normalize_with_norms`]: validate `axis` and
/// input, then return the normalized array plus the per-axis **raw** norm vector.
///
/// The returned `norms` are the actual computed norms (NOT zero-handled): a
/// zero-norm row/column appears as `0.0` even though the division used `1`
/// (`_handle_zeros_in_scale`, `_data.py:1968`) to leave it unchanged. This
/// matches sklearn's `normalize(..., return_norm=True)` (`:1974-1975`).
fn normalize_inner<F: Float>(
x: &Array2<F>,
norm: NormType,
axis: usize,
) -> Result<(Array2<F>, Array1<F>), FerroError> {
if axis != 0 && axis != 1 {
return Err(FerroError::InvalidParameter {
name: "axis".into(),
reason: "must be 0 or 1".into(),
});
}
validate_normalize_input(x)?;
let mut out = x.to_owned();
if axis == 1 {
// Row-normalize (sklearn default axis=1).
let mut norms = Array1::<F>::zeros(out.nrows());
for (i, mut row) in out.rows_mut().into_iter().enumerate() {
let n = row_norm(row.view(), norm);
norms[i] = n;
// _handle_zeros_in_scale: a zero norm divides by 1 (row unchanged).
let eff = if n == F::zero() { F::one() } else { n };
for v in &mut row {
*v = *v / eff;
}
}
Ok((out, norms))
} else {
// axis == 0: column-normalize. sklearn transposes, runs the axis=1
// path, then transposes back (`_data.py:1926-1942`, `:1971-1972`).
let mut norms = Array1::<F>::zeros(out.ncols());
for (j, mut col) in out.columns_mut().into_iter().enumerate() {
let n = row_norm(col.view(), norm);
norms[j] = n;
let eff = if n == F::zero() { F::one() } else { n };
for v in &mut col {
*v = *v / eff;
}
}
Ok((out, norms))
}
}
/// Scale input vectors individually to unit norm — the standalone, estimator-less
/// API mirroring scikit-learn's `normalize` free function
/// (`sklearn/preprocessing/_data.py:1866`).
///
/// With `axis == 1` (sklearn's default) each **row** (sample) is divided by its
/// `norm` (L1 = Σ|v|, L2 = √Σv², Max = max|v|); with `axis == 0` each **column**
/// (feature) is normalized instead (sklearn transposes, row-normalizes, and
/// transposes back — `:1926-1942`, `:1971-1972`). A row/column whose norm is zero
/// is left unchanged, matching `_handle_zeros_in_scale` (`:1968`).
///
/// # Errors
///
/// Returns [`FerroError::InvalidParameter`] if `axis` is not `0` or `1`. Also
/// applies the same `check_array` input validation as [`Normalizer`]'s
/// `transform` (REQ-2): [`FerroError::InsufficientSamples`] for zero rows, and
/// [`FerroError::InvalidParameter`] for zero features or any non-finite value
/// (`_data.py:1933-1940`).
#[must_use = "normalize returns a new array; the input is not modified"]
pub fn normalize<F: Float>(
x: &Array2<F>,
norm: NormType,
axis: usize,
) -> Result<Array2<F>, FerroError> {
let (out, _norms) = normalize_inner(x, norm, axis)?;
Ok(out)
}
/// Like [`normalize`] but also returns the per-axis norm vector — the
/// `return_norm=True` form of scikit-learn's `normalize`
/// (`sklearn/preprocessing/_data.py:1971-1975`).
///
/// Returns `(normalized, norms)` where `norms` is the per-row vector for
/// `axis == 1` (length = n_rows) or the per-column vector for `axis == 0`
/// (length = n_cols). The norms are the **raw** computed norms, NOT
/// zero-handled: a zero norm appears as `0.0` in the returned vector even though
/// the division used `1` to leave that row/column unchanged (sklearn returns the
/// raw `norms` array — `:1974-1975`).
///
/// # Errors
///
/// Same as [`normalize`].
#[must_use = "normalize_with_norms returns a new array and the norm vector"]
pub fn normalize_with_norms<F: Float>(
x: &Array2<F>,
norm: NormType,
axis: usize,
) -> Result<(Array2<F>, Array1<F>), FerroError> {
normalize_inner(x, norm, axis)
}
// ---------------------------------------------------------------------------
// Pipeline integration (generic)
// ---------------------------------------------------------------------------
impl<F: Float + Send + Sync + 'static> PipelineTransformer<F> for Normalizer<F> {
/// Fit the normalizer using the pipeline interface.
///
/// Because `Normalizer` is stateless, this simply boxes `self` as a
/// [`FittedPipelineTransformer`].
///
/// # Errors
///
/// This implementation never returns an error.
fn fit_pipeline(
&self,
_x: &Array2<F>,
_y: &Array1<F>,
) -> Result<Box<dyn FittedPipelineTransformer<F>>, FerroError> {
Ok(Box::new(self.clone()))
}
}
impl<F: Float + Send + Sync + 'static> FittedPipelineTransformer<F> for Normalizer<F> {
/// Transform data using the pipeline interface.
///
/// # Errors
///
/// Propagates errors from [`Transform::transform`].
fn transform_pipeline(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
self.transform(x)
}
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
#[cfg(test)]
mod tests {
use super::*;
use approx::assert_abs_diff_eq;
use ndarray::array;
#[test]
fn test_l2_norm_basic() {
let norm = Normalizer::<f64>::l2();
// Row [3, 4] has L2 norm 5.
let x = array![[3.0, 4.0]];
let out = norm.transform(&x).unwrap();
assert_abs_diff_eq!(out[[0, 0]], 0.6, epsilon = 1e-10);
assert_abs_diff_eq!(out[[0, 1]], 0.8, epsilon = 1e-10);
}
#[test]
fn test_l2_unit_norm_after_transform() {
let norm = Normalizer::<f64>::l2();
let x = array![[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]];
let out = norm.transform(&x).unwrap();
for row in out.rows() {
let row_norm: f64 = row.iter().map(|v| v * v).sum::<f64>().sqrt();
assert_abs_diff_eq!(row_norm, 1.0, epsilon = 1e-10);
}
}
#[test]
fn test_l1_norm_basic() {
let norm = Normalizer::<f64>::l1();
// Row [1, 2, 3] has L1 norm 6.
let x = array![[1.0, 2.0, 3.0]];
let out = norm.transform(&x).unwrap();
assert_abs_diff_eq!(out[[0, 0]], 1.0 / 6.0, epsilon = 1e-10);
assert_abs_diff_eq!(out[[0, 1]], 2.0 / 6.0, epsilon = 1e-10);
assert_abs_diff_eq!(out[[0, 2]], 3.0 / 6.0, epsilon = 1e-10);
}
#[test]
fn test_l1_unit_norm_after_transform() {
let norm = Normalizer::<f64>::l1();
let x = array![[1.0, 2.0, 3.0], [-4.0, 5.0, 6.0]];
let out = norm.transform(&x).unwrap();
for row in out.rows() {
let row_norm: f64 = row.iter().map(|v| v.abs()).sum();
assert_abs_diff_eq!(row_norm, 1.0, epsilon = 1e-10);
}
}
#[test]
fn test_max_norm_basic() {
let norm = Normalizer::<f64>::max();
// Row [-5, 3, 1] has max norm 5.
let x = array![[-5.0, 3.0, 1.0]];
let out = norm.transform(&x).unwrap();
assert_abs_diff_eq!(out[[0, 0]], -1.0, epsilon = 1e-10);
assert_abs_diff_eq!(out[[0, 1]], 0.6, epsilon = 1e-10);
assert_abs_diff_eq!(out[[0, 2]], 0.2, epsilon = 1e-10);
}
#[test]
fn test_zero_row_unchanged() {
let norm = Normalizer::<f64>::l2();
let x = array![[0.0, 0.0, 0.0], [1.0, 2.0, 3.0]];
let out = norm.transform(&x).unwrap();
// Zero row stays zero
assert_abs_diff_eq!(out[[0, 0]], 0.0, epsilon = 1e-15);
assert_abs_diff_eq!(out[[0, 1]], 0.0, epsilon = 1e-15);
assert_abs_diff_eq!(out[[0, 2]], 0.0, epsilon = 1e-15);
}
#[test]
fn test_negative_values_l2() {
let norm = Normalizer::<f64>::l2();
let x = array![[-3.0, -4.0]];
let out = norm.transform(&x).unwrap();
assert_abs_diff_eq!(out[[0, 0]], -0.6, epsilon = 1e-10);
assert_abs_diff_eq!(out[[0, 1]], -0.8, epsilon = 1e-10);
}
#[test]
fn test_default_is_l2() {
let norm = Normalizer::<f64>::default();
assert_eq!(norm.norm(), NormType::L2);
}
#[test]
fn test_multiple_rows_independent() {
let norm = Normalizer::<f64>::l2();
let x = array![[3.0, 4.0], [0.0, 5.0]];
let out = norm.transform(&x).unwrap();
// Row 0: L2 norm = 5
assert_abs_diff_eq!(out[[0, 0]], 0.6, epsilon = 1e-10);
assert_abs_diff_eq!(out[[0, 1]], 0.8, epsilon = 1e-10);
// Row 1: L2 norm = 5
assert_abs_diff_eq!(out[[1, 0]], 0.0, epsilon = 1e-10);
assert_abs_diff_eq!(out[[1, 1]], 1.0, epsilon = 1e-10);
}
#[test]
fn test_pipeline_integration() {
use ferrolearn_core::pipeline::PipelineTransformer;
let norm = Normalizer::<f64>::l2();
let x = array![[3.0, 4.0], [0.0, 2.0]];
let y = Array1::zeros(2);
let fitted = norm.fit_pipeline(&x, &y).unwrap();
let result = fitted.transform_pipeline(&x).unwrap();
assert_abs_diff_eq!(result[[0, 0]], 0.6, epsilon = 1e-10);
assert_abs_diff_eq!(result[[0, 1]], 0.8, epsilon = 1e-10);
}
#[test]
fn test_f32_normalizer() {
let norm = Normalizer::<f32>::l2();
let x: Array2<f32> = array![[3.0f32, 4.0]];
let out = norm.transform(&x).unwrap();
assert!((out[[0, 0]] - 0.6f32).abs() < 1e-6);
assert!((out[[0, 1]] - 0.8f32).abs() < 1e-6);
}
// -----------------------------------------------------------------------
// REQ-4 — standalone `normalize` / `normalize_with_norms` free functions.
// Oracle: live sklearn 1.5.2 (R-CHAR-3), X = [[1,2,2],[0,3,4]].
// normalize(X, l2, axis=1) -> [[.33333333,.66666667,.66666667],[0,.6,.8]]
// normalize(X, l1, axis=1) -> [[.2,.4,.4],[0,.42857143,.57142857]]
// normalize(X, max,axis=1) -> [[.5,1,1],[0,.75,1]]
// normalize(X, l2, axis=0) -> [[1,.5547002,.4472136],[0,.83205029,.89442719]]
// return_norm l2 axis=1 norms -> [3,5]; l1 axis=1 norms -> [5,7]
// -----------------------------------------------------------------------
#[test]
fn normalize_l2_axis1_matches_sklearn() -> Result<(), FerroError> {
let x = array![[1.0, 2.0, 2.0], [0.0, 3.0, 4.0]];
let out = normalize(&x, NormType::L2, 1)?;
let expected = array![[0.33333333, 0.66666667, 0.66666667], [0.0, 0.6, 0.8]];
for (a, b) in out.iter().zip(expected.iter()) {
assert_abs_diff_eq!(a, b, epsilon = 1e-7);
}
Ok(())
}
#[test]
fn normalize_l1_axis1_matches_sklearn() -> Result<(), FerroError> {
let x = array![[1.0, 2.0, 2.0], [0.0, 3.0, 4.0]];
let out = normalize(&x, NormType::L1, 1)?;
let expected = array![[0.2, 0.4, 0.4], [0.0, 0.42857143, 0.57142857]];
for (a, b) in out.iter().zip(expected.iter()) {
assert_abs_diff_eq!(a, b, epsilon = 1e-7);
}
Ok(())
}
#[test]
fn normalize_max_axis1_matches_sklearn() -> Result<(), FerroError> {
let x = array![[1.0, 2.0, 2.0], [0.0, 3.0, 4.0]];
let out = normalize(&x, NormType::Max, 1)?;
let expected = array![[0.5, 1.0, 1.0], [0.0, 0.75, 1.0]];
for (a, b) in out.iter().zip(expected.iter()) {
assert_abs_diff_eq!(a, b, epsilon = 1e-7);
}
Ok(())
}
#[test]
fn normalize_l2_axis0_matches_sklearn() -> Result<(), FerroError> {
let x = array![[1.0, 2.0, 2.0], [0.0, 3.0, 4.0]];
let out = normalize(&x, NormType::L2, 0)?;
let expected = array![[1.0, 0.5547002, 0.4472136], [0.0, 0.83205029, 0.89442719]];
for (a, b) in out.iter().zip(expected.iter()) {
assert_abs_diff_eq!(a, b, epsilon = 1e-7);
}
Ok(())
}
#[test]
fn normalize_return_norm_l2_and_l1() -> Result<(), FerroError> {
let x = array![[1.0, 2.0, 2.0], [0.0, 3.0, 4.0]];
let (_out_l2, norms_l2) = normalize_with_norms(&x, NormType::L2, 1)?;
assert_abs_diff_eq!(norms_l2[0], 3.0, epsilon = 1e-9);
assert_abs_diff_eq!(norms_l2[1], 5.0, epsilon = 1e-9);
let (_out_l1, norms_l1) = normalize_with_norms(&x, NormType::L1, 1)?;
assert_abs_diff_eq!(norms_l1[0], 5.0, epsilon = 1e-9);
assert_abs_diff_eq!(norms_l1[1], 7.0, epsilon = 1e-9);
Ok(())
}
#[test]
fn normalize_invalid_axis_errors() {
let x = array![[1.0, 2.0, 2.0], [0.0, 3.0, 4.0]];
let err = normalize(&x, NormType::L2, 2);
assert!(matches!(err, Err(FerroError::InvalidParameter { .. })));
}
}