Expand description
Normalizer: scale each sample (row) to unit norm.
Unlike column-wise scalers, the Normalizer operates row-wise: each
sample is scaled independently so that its chosen norm equals 1.
Supported norms:
- L1: divide by the sum of absolute values
- L2: divide by the Euclidean norm (default)
- Max: divide by the maximum absolute value
Samples that already have a zero norm are left unchanged.
This transformer is stateless — no fitting is required. Call
Transform::transform directly. For scikit-learn API parity it ALSO
supports the stateful Fit →
FittedNormalizer path, which records n_features_in_ and (like
sklearn) validates the input in fit; the fitted type’s transform
reuses the very same row-norm logic as the stateless path, so both paths
are bit-identical.
§## REQ status
Binary (R-DEFER-2), translating sklearn/preprocessing/_data.py (class Normalizer
:1980, normalize :1866). Design doc: .design/preprocess/normalizer.md. Expected
values from the live sklearn 1.5.2 oracle (R-CHAR-3). Consumers: the in-file
PipelineTransformer/FittedPipelineTransformer impls (pipeline integration) + crate
re-export (lib.rs:119, grandfathered S5). No PyO3 binding.
| REQ | Status | Evidence |
|---|---|---|
| REQ-1 (row-wise L1/L2/Max transform) | SHIPPED | Transform::transform divides each row by its norm (L1=Σ|v|, L2=√Σv², Max=max|v|; zero-norm row unchanged), default L2; mirrors sklearn dense normalize (_data.py:1962-1969, _handle_zeros_in_scale :1968). Critic-verified bit-identical to live oracle: guard_l1/l2/max/zero_row/f32_matches_oracle in tests/divergence_normalizer.rs. Consumers: FittedPipelineTransformer::transform_pipeline + crate re-export lib.rs:119. |
| REQ-2 (transform input validation per check_array) | SHIPPED | FIXED #1140. transform guards (sklearn order) zero-samples → InsufficientSamples (validation.py:1084), zero-features → InvalidParameter (:1093), non-finite NaN/±inf → InvalidParameter (:1063) — matching Normalizer.transform → normalize → check_array (_data.py:1933-1940). Mirrors converged binarizer.rs. Critic two-round CLEAN: 6 rejection pins + finite-not-over-rejected guards (zero-NORM-row/1e308/subnormal/-0.0); pipeline consumer inherits validation. |
| REQ-3 (validating fit + parameter constraints) | SHIPPED | FIXED #1141. impl Fit<Array2<F>, ()> for Normalizer (fit): runs the SAME validate_normalize_input guard as Transform::transform/normalize (REQ-2: zero-samples → InsufficientSamples, zero-features/non-finite NaN±inf → InvalidParameter, sklearn _validate_data default force_all_finite=True REJECTS NaN/inf — confirmed Normalizer().fit([[nan]])/[[inf]] raise ValueError, :2082,utils/validation.py:1063/1084/1093), records n_features_in_ = x.ncols(), returns FittedNormalizer { norm, copy, n_features_in_ } (no fitted statistics — Normalizer is stateless, sklearn fit “Only validates”, :2062-2083). sklearn’s _parameter_constraints {norm:[StrOptions{l1,l2,max}]} (:2053-2055) has NO ferrolearn analog: NormType is a closed Rust enum, so an out-of-domain norm is UNREPRESENTABLE rather than runtime-rejected — the type system satisfies the param-domain check. Live-oracle tests: fit_l1/l2/max_matches_oracle_and_stateless, fit_rejects_nan/pos_inf/neg_inf, fit_zero_row_unchanged, fitted_transform_shape_mismatch, fit_path_equals_stateless_path in tests/divergence_normalizer.rs. Consumers: FittedNormalizer::transform (the fitted path) + crate re-export lib.rs:140. |
| REQ-4 (normalize free fn: axis / return_norm) | SHIPPED | FIXED #1142. pub fn normalize + pub fn normalize_with_norms (free fns) mirror sklearn normalize(X, norm, *, axis=1, copy=True, return_norm=False) (_data.py:1866). Shared row_norm helper computes L1=Σ|v|, L2=√Σv², Max=max|v| (:1962-1967); _handle_zeros_in_scale zero→1 (:1968); X /= norms (:1969). axis=1 row-normalizes; axis=0 column-normalizes (sklearn transpose :1926-1942,:1971-1972); axis ∉ {0,1} → InvalidParameter. normalize_with_norms returns (normalized, raw_norms) (return_norm :1974-1975; raw, NOT zero-handled). Same validation as Transform::transform (REQ-2). Oracle-grounded tests in #[cfg(test)]: normalize_l2/l1/max_axis1_matches_sklearn, normalize_l2_axis0_matches_sklearn, normalize_return_norm_l2_and_l1, normalize_invalid_axis_errors. |
| REQ-5 (copy parameter) | SHIPPED | FIXED #1143. Normalizer<F> gains a copy: bool field (default true) + #[must_use] with_copy builder + copy() getter, threaded onto FittedNormalizer, mirroring sklearn __init__(norm='l2', *, copy=True) (_data.py:2058-2060, _parameter_constraints {copy:["boolean"]} :2055). ACCEPT-AND-DOCUMENT no-op: ferrolearn’s Transform always returns a freshly allocated array (to_owned()), so copy has no observable effect — copy=True/copy=False produce identical output (sklearn’s copy=False does in-place row normalization, an optimization Rust’s ownership makes moot here). Live-oracle test fit_copy_true_false_identical. Consumers: FittedNormalizer carries the flag + crate re-export lib.rs:140. |
| REQ-6 (n_features_in_ / feature names) | PARTIAL | n_features_in_ SHIPPED, get_feature_names_out NOT-STARTED. FittedNormalizer<F> records n_features_in_ = x.ncols() in fit and exposes pub fn n_features_in(&self) -> usize, mirroring sklearn’s _validate_data setting n_features_in_ (:2082); FittedNormalizer::transform validates the input column count against it (ShapeMismatch, sklearn _validate_data(reset=False) :2104). The OneToOneFeatureMixin.get_feature_names_out / feature_names_in_ string-name plumbing is OUT OF SCOPE for this build (no string feature-name infrastructure in ferrolearn yet) — open prereq blocker #1144 for the feature-name half. Live-oracle test fit_n_features_in_matches_ncols. |
| REQ-7 (sparse support) | NOT-STARTED | open prereq blocker #1145. Dense-only; no CSR inplace_csr_row_normalize_l1/l2 / min_max_axis Max (:1944-1960). |
| REQ-8 (PyO3 binding) | SHIPPED | FIXED #1146. ferrolearn-python surfaces Normalizer as ferrolearn.Normalizer: the hand-written _RsNormalizer #[pyclass] (ferrolearn-python/src/extras.rs, registered lib.rs) maps sklearn’s norm STRING (‘l1’/‘l2’/‘max’) to the closed Rust NormType enum via RsNormalizer::resolve_norm — a bad string → PyValueError (sklearn _parameter_constraints {norm: StrOptions({"l1","l2","max"})}, _data.py:2055, InvalidParameterError ⊂ ValueError), builds Normalizer::<f64>::new(normtype).with_copy(copy), runs the validating Fit (NaN/±inf → PyValueError, REQ-3) and delegates transform to FittedNormalizer. The non-test production consumer is _extras.py::Normalizer(_TransformerWrapper) with sklearn’s __init__(self, norm="l2", *, copy=True) ABI (norm positional-or-keyword, copy keyword-only, _data.py:2058) + an overridden STATELESS transform (build-on-demand without fit, _more_tags stateless=True _data.py:2110, #2213) doing a FLOAT-ONLY dtype cast-back (float32→float32, float64→float64, int64→float64 UPCAST per check_array(dtype=FLOAT_DTYPES) _data.py:2104, #2214-analog — DIFFERS from Binarizer’s number-preserving cast); re-exported in __init__.py. Verified vs the live sklearn 1.5.2 oracle: tests/divergence_normalizer.py (l1/l2/max values, default-l2, positional-norm, stateless, dtype, NaN/±inf, zero-norm, bad-norm, clone/get_params/set_params, copy no-op, pipeline). Reduced-precision caveat (#2215, tracked): sklearn normalize casts X to the INPUT float precision via check_array(dtype=FLOAT_DTYPES) (_data.py:1933) and computes the norm + division IN that precision (float16/float32), but the f64-only binding ABI (shared by EVERY _Rs* transformer) computes the norm in float64 then casts the result back — so float32 (~6e-8) and float16 (~5e-4) VALUES diverge slightly (dtype LABELS match; the float64 path is bit-exact, <1e-12). Same class as the generic-F precision caveats #2205/#2206; float16 is fundamentally unmatchable (the Rust core has no f16). Pinned #[skip] in tests/divergence_normalizer_reduced_precision.py. |
| REQ-9 (ferray substrate) | NOT-STARTED | open prereq blocker #1147. ndarray::Array2 + num_traits::Float, not ferray-core/ferray-ufunc (R-SUBSTRATE-1/2). |
Structs§
- Fitted
Normalizer - A fitted
Normalizer. - Normalizer
- A stateless row-wise normalizer.
Enums§
- Norm
Type - The norm used by
Normalizerwhen scaling each sample.
Functions§
- normalize
- Scale input vectors individually to unit norm — the standalone, estimator-less
API mirroring scikit-learn’s
normalizefree function (sklearn/preprocessing/_data.py:1866). - normalize_
with_ norms - Like
normalizebut also returns the per-axis norm vector — thereturn_norm=Trueform of scikit-learn’snormalize(sklearn/preprocessing/_data.py:1971-1975).