ferrolearn-preprocess 0.5.0

//! Standard scaler: zero-mean, unit-variance scaling.
//!
//! Each feature is transformed as `(x - mean) / std`. (Near-)constant columns —
//! detected by sklearn's variance-relative `_is_constant_feature`
//! (`_data.py:72-85`), NOT the `< 10*eps` default mask used by
//! Min/MaxAbs/Robust — use an effective scale of 1, so each fitted-constant
//! entry maps to `(x - mean) / 1 = 0`. NaN is tolerated (ignored in `fit`,
//! passed through `transform`/`inverse_transform`); +/-inf is rejected.
//!
//! # `## REQ status`
//!
//! Binary (R-DEFER-2), translating `sklearn/preprocessing/_data.py` (`class StandardScaler`
//! `:696`, `scale` `:133`). Design doc: `.design/preprocess/standard_scaler.md`. Expected values
//! from the live sklearn 1.5.2 oracle (R-CHAR-3). Consumers: PyO3 `_RsStandardScaler`
//! (`ferrolearn-python/src/transformers.rs:12`) + `PipelineTransformer` impl + crate re-export
//! (`lib.rs`, grandfathered S5). HONEST (R-HONEST-3): `with_mean`/`with_std` constructor params
//! gate conditional center/scale (`_data.py:1064-1067`); the per-column standardize VALUES match
//! sklearn on every flag configuration.
//!
//! | REQ | Status | Evidence |
//! |---|---|---|
//! | REQ-1 (per-column standardize value match, non-constant) | SHIPPED | `Fit::fit` learns per-column `mean`/`std=sqrt(population var ddof=0)`; `Transform::transform` = `(x-mean)/std`, mirroring sklearn `X -= mean_; X /= scale_` (`_data.py:1064-1067`). Critic-verified bit-identical to live oracle: `green_req1_value_match_non_constant` (`[[-1.2247..],[0],[1.2247..]]`), `green_req1_mean_and_scale_attributes` (mean()/std() == sklearn mean_/scale_ on non-constant), `green_req1_negative_decimal_fixture`. Consumers: PyO3 `_RsStandardScaler` + `FittedPipelineTransformer` + re-export. |
//! | REQ-2 (constant / (near-)zero-variance column → 0) | SHIPPED | FIXED #1191; constant detection UPGRADED to `_is_constant_feature` (#1194 build). The exact-zero `s == 0` mask is REPLACED by sklearn's variance-relative `_is_constant_feature(var, mean, n)` — a column is constant iff `var <= n*eps*var + (n*mean*eps)^2` (`_data.py:72-85`,`:1016-1018`), so a constant col gets stored `scale_ = 1` → `(x-mean)/1 = 0`, and now a NEAR-constant col (tiny var below the bound, e.g. `[1e8,1e8+1e-8]`) ALSO → `scale_=1` (was previously divided by a tiny std). `transform`/`inverse_transform` use the STORED `scale_` (no `s_eff` recompute), matching sklearn `_handle_zeros_in_scale(sqrt(var_), constant_mask=...)` (`:88`,`:1019-1021`). Live-oracle tests: `req7_near_constant_column_scale_one`, `req7_genuinely_constant_column_scale_one` (`tests/divergence_standard_scaler.rs`) + existing `divergence_constant_column_maps_to_zero`/`green_req2_*`. NON-constant normal-magnitude columns are UNCHANGED (regression-guarded by `green_req1_*`). |
//! | REQ-3 (inverse_transform round-trip) | SHIPPED | `inverse_transform` = `x*s_eff + mean`, mirroring sklearn `X *= scale_; X += mean_` (`_data.py:1106-1109`); `inverse_transform(transform(X))==X` (`green_req3_inverse_roundtrip`). Consumer: PyO3 `_RsStandardScaler::inverse_transform`. |
//! | REQ-4 (PyO3 binding) | SHIPPED | `_RsStandardScaler` (`transformers.rs:12`, registered `lib.rs:22`) marshals `fit`/`transform`/`inverse_transform`/`mean_` over `FittedStandardScaler<f64>` — a real CPython consumer. |
//! | REQ-5 (var_/scale_/n_samples_seen_ attrs) | SHIPPED | FIXED #1192; constant mask UPGRADED to `_is_constant_feature` (#1194 build). `FittedStandardScaler<F>` stores `var_` (population variance ddof=0 via the corrected Chan-Golub-LeVeque 2-pass reduction over FINITE values — `nuv = sum((x-mean)^2) - (sum(x-mean))^2/count`, `var = nuv/count`, matching sklearn `_incremental_mean_and_var` `extmath.py:1142-1178`), `scale_` (= `_handle_zeros_in_scale(sqrt(var_), constant_mask)`: `std` on non-constant cols, `1.0` where `_is_constant_feature` flags the col — VARIANCE-RELATIVE, not exact-zero), and `n_samples_seen_` (= n_rows), set in `Fit::fit`; getters `var()`/`scale()`/`n_samples_seen()` mirror sklearn `var_`/`scale_`/`n_samples_seen_` (`_data.py:1013-1023`). Live oracle `X=[[1,5],[2,5],[3,5]]` (`var_=[0.6666666666666666,0.0]`, `scale_=[0.816496580927726,1.0]`, `n_samples_seen_=3`): `standard_scaler_var_scale_nsamples_match_sklearn`, `standard_scaler_scale_differs_from_std_on_constant_col`, `standard_scaler_var_equals_std_squared_nonconstant`. Additive getters; `transform`/`inverse_transform` use stored `scale_`. |
//! | REQ-6 (with_mean/with_std/copy params) | SHIPPED | FIXED #1193. `StandardScaler<F>` now carries `with_mean`/`with_std`/`copy` (default `true`, `new()`/`Default`), mirroring sklearn `__init__(*, copy=True, with_mean=True, with_std=True)` + `_parameter_constraints` (`_data.py:829-838`), with `#[must_use]` builders `with_with_mean`/`with_with_std`/`with_copy`. `with_mean`/`with_std` thread into `FittedStandardScaler<F>` (set in `Fit::fit`); `transform` is conditional `if with_mean: X -= mean_; if with_std: X /= scale_` and `inverse_transform` mirrors `if with_std: X *= scale_; if with_mean: X += mean_` (`_data.py:1064-1067`,`:1106-1109`). `copy` is ABI-only (fit/transform operate on owned copies). Critic-grounded vs live oracle `X=[[1,10],[2,20],[3,30]]`: default (T,T) → `[[-1.2247..],[0],[1.2247..]]` (regression-guarded bit-identical to pre-existing default fit), with_std=false (T,F) → center-only `[[-1,-10],[0,0],[1,10]]`, with_mean=false (F,T) → scale-only `[[1.2247..],[2.4495..],[3.6742..]]`, both-false (F,F) → identity, and inverse round-trip for all 4 configs. R-DEV-4 DEVIATION: sklearn nulls `mean_`/`scale_`/`var_` when a flag is `False`; ferrolearn always materializes them (getters return `&Array1`, not `Option`) — the flags govern transform APPLICATION so OUTPUT matches sklearn exactly. |
//! | REQ-7 (NaN tolerance: allow-nan) | SHIPPED | FIXED #1194. `Fit::fit`'s per-column mean + population variance now compute over the FINITE values only (skip `is_nan()`), with the divisor = COUNT OF FINITE values (`new_sample_count = n_samples - nan_count`, `extmath.py:1132-1133`), mirroring sklearn `force_all_finite="allow-nan"` (`_data.py:918`) + `_incremental_mean_and_var` ("NaNs are ignored", `extmath.py:1100`). An ALL-NaN column → `mean`/`var_`/`scale_` = NaN (no panic, no division by zero). NaN inputs pass through `transform`/`inverse_transform` (`(nan-mean)/scale = nan`). inf-rejection (allow-nan REJECTS +/-inf, MinMaxScaler #2200 precedent): `fit`/`transform`/`inverse_transform` return `InvalidParameter` ("Input X contains infinity...") on any `is_infinite()` element. Live-oracle tests `req7_nan_fit_single_column_ignored` (mean_=2,var_=1,scale_=1,transform `[[-1],[nan],[1]]`), `req7_nan_fit_multi_column_scattered`, `req7_all_nan_column_yields_nan_no_panic`, `req7_nan_passthrough_inverse_transform`, `inf_rejected_fit`/`inf_rejected_transform`/`inf_rejected_inverse_transform`, `nan_only_still_fits` (`tests/divergence_standard_scaler.rs`). Consumers: PyO3 `_RsStandardScaler` + `FittedPipelineTransformer` + re-export. **f32 float64-accumulator parity (FIXED #2205):** sklearn ALWAYS computes mean/variance in float64 accumulators even for float32 input (`_data.py:81-82`,`:999`; `_incremental_mean_and_var` via `_safe_accumulator_op`, `extmath.py:1050-1051`), returning float64 `mean_`/`var_`/`scale_`. `Fit::fit` now accumulates the finite-value sum + the squared-deviation sum in `f64` (via `to_f64()`) and STORES `mean`/`std`/`var_`/`scale_` as `Array1<f64>` (the getters `mean()`/`std()`/`var()`/`scale()` return `&Array1<f64>` regardless of `F`, mirroring sklearn's float64 attribute dtype); `transform`/`inverse_transform` compute `(x_f64 ∓ mean_)/scale_` in f64 and downcast the per-element result to `F` (numpy applies the float64 stats to the f32 array then stores the f32 result). For F=f64 every cast is exact → bit-identical to the prior path (regression-guarded by the f64 oracle tests). For F=f32 a large-magnitude column near 2^24 now matches sklearn (the f32 sum no longer rounds): live-oracle `X=[[16777216],[16777216],[16777220]]` f32 → `mean_=16777217.333…`, transform `[-0.7071068,-0.7071068,1.4142137]`. Test `divergence_f32_uses_float64_accumulators` (un-`#[ignore]`-ed). Same class as the pairwise euclidean f32 fix #2298. |
//! | REQ-8 (scale free fn + axis) | NOT-STARTED | open prereq blocker #1195. No `scale` fn / axis=1 (`_data.py:133`). |
//! | REQ-9 (partial_fit / streaming) | NOT-STARTED | open prereq blocker #1196. Single-shot (`_data.py:880-1025`). |
//! | REQ-10 (sample_weight) | NOT-STARTED | open prereq blocker #1197. Unweighted (`_data.py:923-924`). |
//! | REQ-11 (sparse CSR/CSC) | NOT-STARTED | open prereq blocker #1198. Dense-only (`_data.py:940-983`). |
//! | REQ-12 (get_feature_names_out / n_features_in_) | NOT-STARTED | open prereq blocker #1199. None (OneToOneFeatureMixin). |
//! | REQ-13 (ferray substrate) | NOT-STARTED | open prereq blocker #1200. `ndarray`+`num_traits`, not `ferray-core` (R-SUBSTRATE-1/2). |

use ferrolearn_core::error::FerroError;
use ferrolearn_core::pipeline::{FittedPipelineTransformer, PipelineTransformer};
use ferrolearn_core::traits::{Fit, FitTransform, Transform};
use ndarray::{Array1, Array2};
use num_traits::Float;

// ---------------------------------------------------------------------------
// StandardScaler (unfitted)
// ---------------------------------------------------------------------------

/// An unfitted standard scaler.
///
/// Calling [`Fit::fit`] learns the per-column means and standard deviations
/// and returns a [`FittedStandardScaler`] that can transform new data.
///
/// Constant (zero-variance) columns use an effective scale of 1, so each entry
/// maps to `(x - mean) / 1 = 0` after transformation.
///
/// # Examples
///
/// ```
/// use ferrolearn_preprocess::StandardScaler;
/// use ferrolearn_core::traits::{Fit, Transform};
/// use ndarray::array;
///
/// let scaler = StandardScaler::<f64>::new();
/// let x = array![[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]];
/// let fitted = scaler.fit(&x, &()).unwrap();
/// let scaled = fitted.transform(&x).unwrap();
/// // Mean of each column is now ~0, std ~1
/// ```
#[derive(Debug, Clone)]
pub struct StandardScaler<F> {
    /// Whether to center the data (subtract the per-column mean) on transform.
    ///
    /// Mirrors sklearn `StandardScaler(with_mean=True)` (`_data.py:835-838`):
    /// when `false`, [`Transform::transform`] does NOT subtract the mean
    /// (`if with_mean: X -= mean_`, `_data.py:1064-1065`). Default `true`.
    pub with_mean: bool,
    /// Whether to scale the data (divide by the per-column scale) on transform.
    ///
    /// Mirrors sklearn `StandardScaler(with_std=True)` (`_data.py:835-838`):
    /// when `false`, [`Transform::transform`] does NOT divide by the scale
    /// (`if with_std: X /= scale_`, `_data.py:1066-1067`). Default `true`.
    pub with_std: bool,
    /// Whether `fit`/`transform` should copy the input rather than mutate it.
    ///
    /// Mirrors sklearn `StandardScaler(copy=True)` (`_data.py:835-838`). In
    /// ferrolearn this is ABI-only: `fit`/`transform` never mutate the caller's
    /// array (they operate on owned copies), so this flag is stored for API
    /// parity but does not change observable behavior. Default `true`.
    pub copy: bool,
    _marker: std::marker::PhantomData<F>,
}

impl<F: Float + Send + Sync + 'static> StandardScaler<F> {
    /// Create a new `StandardScaler` with default configuration.
    ///
    /// Defaults mirror sklearn `StandardScaler(*, copy=True, with_mean=True,
    /// with_std=True)` (`_data.py:835-838`).
    #[must_use]
    pub fn new() -> Self {
        Self {
            with_mean: true,
            with_std: true,
            copy: true,
            _marker: std::marker::PhantomData,
        }
    }

    /// Set whether to center (subtract the mean) on transform.
    ///
    /// Mirrors sklearn's `with_mean` constructor parameter (`_data.py:835-838`):
    /// `false` disables centering (`if with_mean: X -= mean_`,
    /// `_data.py:1064-1065`).
    #[must_use]
    pub fn with_with_mean(mut self, with_mean: bool) -> Self {
        self.with_mean = with_mean;
        self
    }

    /// Set whether to scale (divide by the scale) on transform.
    ///
    /// Mirrors sklearn's `with_std` constructor parameter (`_data.py:835-838`):
    /// `false` disables scaling (`if with_std: X /= scale_`,
    /// `_data.py:1066-1067`).
    #[must_use]
    pub fn with_with_std(mut self, with_std: bool) -> Self {
        self.with_std = with_std;
        self
    }

    /// Set the `copy` flag (ABI-only in ferrolearn).
    ///
    /// Mirrors sklearn's `copy` constructor parameter (`_data.py:835-838`).
    /// ferrolearn always operates on owned copies, so this does not change
    /// observable behavior.
    #[must_use]
    pub fn with_copy(mut self, copy: bool) -> Self {
        self.copy = copy;
        self
    }
}

impl<F: Float + Send + Sync + 'static> Default for StandardScaler<F> {
    fn default() -> Self {
        Self::new()
    }
}

// ---------------------------------------------------------------------------
// FittedStandardScaler
// ---------------------------------------------------------------------------

/// A fitted standard scaler holding per-column means and standard deviations.
///
/// Created by calling [`Fit::fit`] on a [`StandardScaler`].
///
/// # Deviation (R-DEV-4): fitted attributes are always materialized
///
/// sklearn sets `mean_` / `scale_` / `var_` to `None` when the corresponding
/// flag (`with_mean` / `with_std`) is `False` (`_data.py:986-991`,`:1019-1021`).
/// ferrolearn always materializes `mean`/`var_`/`scale_`/`n_samples_seen_`
/// regardless of the flags, because the `&Array1<F>` getters cannot return
/// `None` without changing their return type. The `with_mean`/`with_std` flags
/// instead govern *application* in [`Transform::transform`] /
/// [`FittedStandardScaler::inverse_transform`] (`if with_mean: X -= mean_`,
/// `if with_std: X /= scale_`, `_data.py:1064-1067`), so the transform OUTPUT
/// matches sklearn exactly. The getters are intentionally NOT `Option`.
#[derive(Debug, Clone)]
pub struct FittedStandardScaler<F> {
    /// Per-column means learned during fitting.
    ///
    /// Stored as `f64` regardless of `F`, mirroring sklearn: `mean_` is dtype
    /// `float64` even for `float32` input (`_incremental_mean_and_var` upcasts
    /// the accumulators via `_safe_accumulator_op`, `extmath.py:1050-1051`;
    /// `_data.py:999`). The transform downcasts per-element to `F` on output
    /// (#2205).
    pub(crate) mean: Array1<f64>,
    /// Per-column standard deviations learned during fitting.
    ///
    /// `f64`-typed (= `sqrt(var_)`); see [`FittedStandardScaler::mean`] (#2205).
    pub(crate) std: Array1<f64>,
    /// Per-column population variance (ddof=0) learned during fitting.
    ///
    /// Mirrors sklearn `StandardScaler.var_` (`_data.py:1013-1023`), dtype
    /// `float64` even for `float32` input (#2205).
    pub(crate) var_: Array1<f64>,
    /// Per-column scale with zeros replaced by 1 (`_handle_zeros_in_scale`).
    ///
    /// Mirrors sklearn `StandardScaler.scale_` (`_data.py:1013-1023`), dtype
    /// `float64` even for `float32` input (#2205).
    pub(crate) scale_: Array1<f64>,
    /// Number of samples (rows) seen during fitting.
    ///
    /// Mirrors sklearn `StandardScaler.n_samples_seen_` (`_data.py:1013-1023`).
    pub(crate) n_samples_seen_: usize,
    /// Whether `transform`/`inverse_transform` center (subtract/add the mean).
    ///
    /// Copied from the unfitted [`StandardScaler::with_mean`] in [`Fit::fit`];
    /// governs `if with_mean: X -= mean_` (`_data.py:1064-1065`).
    pub(crate) with_mean: bool,
    /// Whether `transform`/`inverse_transform` scale (divide/multiply by scale).
    ///
    /// Copied from the unfitted [`StandardScaler::with_std`] in [`Fit::fit`];
    /// governs `if with_std: X /= scale_` (`_data.py:1066-1067`).
    pub(crate) with_std: bool,
    /// Carries the `F` type parameter (the fitted statistics are stored as
    /// `f64` to mirror sklearn's float64 attributes; `F` governs the
    /// transform input/output dtype).
    pub(crate) _marker: std::marker::PhantomData<F>,
}

impl<F: Float + Send + Sync + 'static> FittedStandardScaler<F> {
    /// Return the per-column means learned during fitting.
    ///
    /// Returns `f64` regardless of `F`, mirroring sklearn `mean_` (dtype
    /// `float64` even for `float32` input, #2205).
    #[must_use]
    pub fn mean(&self) -> &Array1<f64> {
        &self.mean
    }

    /// Return the per-column standard deviations learned during fitting.
    ///
    /// Returns `f64` regardless of `F` (= `sqrt(var_)`, #2205).
    #[must_use]
    pub fn std(&self) -> &Array1<f64> {
        &self.std
    }

    /// Return the per-column population variance (ddof=0) learned during fitting.
    ///
    /// Mirrors sklearn `StandardScaler.var_` (`_data.py:1013-1023`): the raw
    /// variance, which is `0.0` on a constant column. Returns `f64` regardless
    /// of `F` (dtype `float64` even for `float32` input, #2205).
    #[must_use]
    pub fn var(&self) -> &Array1<f64> {
        &self.var_
    }

    /// Return the per-column scale learned during fitting.
    ///
    /// Mirrors sklearn `StandardScaler.scale_` (`_data.py:1013-1023`,
    /// `_handle_zeros_in_scale`, `:88`): equals `std()` on non-constant columns
    /// and `1.0` on constant (zero-variance) columns. Returns `f64` regardless
    /// of `F` (dtype `float64` even for `float32` input, #2205).
    #[must_use]
    pub fn scale(&self) -> &Array1<f64> {
        &self.scale_
    }

    /// Return the number of samples (rows) seen during fitting.
    ///
    /// Mirrors sklearn `StandardScaler.n_samples_seen_` (`_data.py:1013-1023`).
    #[must_use]
    pub fn n_samples_seen(&self) -> usize {
        self.n_samples_seen_
    }

    /// Inverse-transform scaled data back to original space.
    ///
    /// Applies `x_orig = x_scaled * scale + mean` per column, where constant
    /// (zero-variance) columns use an effective scale of 1 (matching sklearn
    /// `_handle_zeros_in_scale`, `_data.py:88`, `:1106-1109`).
    ///
    /// # Errors
    ///
    /// Returns [`FerroError::ShapeMismatch`] if the number of columns does not
    /// match the number of features seen during fitting, or
    /// [`FerroError::InvalidParameter`] if any input element is +/-inf
    /// (sklearn `force_all_finite="allow-nan"` rejects infinity, `_data.py:1094`).
    pub fn inverse_transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
        let n_features = self.mean.len();
        if x.ncols() != n_features {
            return Err(FerroError::ShapeMismatch {
                expected: vec![x.nrows(), n_features],
                actual: vec![x.nrows(), x.ncols()],
                context: "FittedStandardScaler::inverse_transform".into(),
            });
        }
        // sklearn `inverse_transform` validates with `force_all_finite="allow-nan"`
        // (`_data.py:1094`): NaN passes through, +/-inf raises ValueError
        // (MinMaxScaler #2200/#2202 precedent).
        if x.iter().any(|v| v.is_infinite()) {
            return Err(FerroError::InvalidParameter {
                name: "X".into(),
                reason: "Input X contains infinity or a value too large for dtype.".into(),
            });
        }
        let mut out = x.to_owned();
        for (mut col, (&m, &s)) in out
            .columns_mut()
            .into_iter()
            .zip(self.mean.iter().zip(self.scale_.iter()))
        {
            // Conditional un-scale/un-center mirroring sklearn `inverse_transform`
            // (`_data.py:1106-1109`): `if with_std: X *= scale_` then
            // `if with_mean: X += mean_`. The stored `mean_`/`scale_` are f64
            // (sklearn float64 attributes, #2205); numpy applies them in float64
            // and downcasts to the input dtype, so the per-element result is
            // computed in f64 then cast to `F` (f64-input is exact → unchanged).
            // `scale_` has zero-variance columns replaced by 1
            // (`_handle_zeros_in_scale`, `_data.py:88`).
            //
            // Each in-place op (`X *= scale_`, `X += mean_`) is a SEPARATE numpy
            // store back into the `F`-typed array, so the per-element result of
            // EACH op rounds to `F`: two roundings in the REVERSE order of
            // `transform` (multiply-then-round, add-then-round). For F=f64
            // `F::from(f64)` is exact → identical to the fused expression (no
            // regression); for F=f32 each step rounds to f32, matching sklearn's
            // two float32 roundings (#2305). NaN propagates through both steps.
            for v in &mut col {
                if self.with_std {
                    // X *= scale_ (round to F)
                    *v = F::from(v.to_f64().unwrap_or(f64::NAN) * s).unwrap_or_else(F::nan);
                }
                if self.with_mean {
                    // X += mean_ (round to F)
                    *v = F::from(v.to_f64().unwrap_or(f64::NAN) + m).unwrap_or_else(F::nan);
                }
            }
        }
        Ok(out)
    }
}

// ---------------------------------------------------------------------------
// Trait implementations
// ---------------------------------------------------------------------------

impl<F: Float + Send + Sync + 'static> Fit<Array2<F>, ()> for StandardScaler<F> {
    type Fitted = FittedStandardScaler<F>;
    type Error = FerroError;

    /// Fit the scaler by computing per-column means and standard deviations.
    ///
    /// # Errors
    ///
    /// Returns [`FerroError::InsufficientSamples`] if the input has zero rows.
    fn fit(&self, x: &Array2<F>, _y: &()) -> Result<FittedStandardScaler<F>, FerroError> {
        let n_samples = x.nrows();
        if n_samples == 0 {
            return Err(FerroError::InsufficientSamples {
                required: 1,
                actual: 0,
                context: "StandardScaler::fit".into(),
            });
        }
        // sklearn validates X with `force_all_finite="allow-nan"`
        // (`_data.py:918`): NaN is permitted, but +/-inf raises ValueError
        // ("Input X contains infinity or a value too large for dtype('...')").
        // Mirrors the MinMaxScaler #2200 precedent (allow-nan rejects inf).
        if x.iter().any(|v| v.is_infinite()) {
            return Err(FerroError::InvalidParameter {
                name: "X".into(),
                reason: "Input X contains infinity or a value too large for dtype.".into(),
            });
        }

        let n_features = x.ncols();
        // The fitted statistics are stored as `f64` regardless of `F`, mirroring
        // sklearn's float64 `mean_`/`var_`/`scale_` attributes (#2205).
        let mut mean = Array1::<f64>::zeros(n_features);
        let mut std_arr = Array1::<f64>::zeros(n_features);
        let mut var_arr = Array1::<f64>::zeros(n_features);
        // Per-feature finite sample count (= n_samples - NaN count), reused for
        // the `_is_constant_feature` bound. sklearn's `n_samples_seen_` is the
        // finite count under `allow-nan` (`extmath.py:1132-1133`, `_data.py:996`).
        let mut finite_count = Array1::<usize>::zeros(n_features);

        for j in 0..n_features {
            let col = x.column(j);
            // NaN-ignoring mean + population variance (ddof=0) over the FINITE
            // values only, mirroring sklearn's `_incremental_mean_and_var` under
            // `force_all_finite="allow-nan"` (`_data.py:918`,`:999`;
            // `extmath.py:1100` "NaNs are ignored"). The divisor is the COUNT OF
            // FINITE values (`new_sample_count = n_samples - nan_count`,
            // `extmath.py:1132-1133`), NOT `n_samples`. An all-NaN column has
            // count 0 → mean/var = NaN (matching sklearn's `nan` there; no panic,
            // no division materialized).
            // The mean/variance reduction accumulates in FLOAT64 regardless of
            // `F`, mirroring sklearn's `_safe_accumulator_op`: numpy upcasts the
            // sum/variance to `dtype=np.float64` whenever the input float dtype
            // is < 8 bytes (`extmath.py:1050-1051`), so `_incremental_mean_and_var`
            // returns float64 `mean_`/`var_`/`scale_` even for float32 input
            // (`_data.py:999`,`:1016`). For F=f64 `to_f64()`/`F::from` are exact →
            // identity (no regression); for F=f32 a large-magnitude column near
            // 2^24 no longer f32-rounds the sum, matching sklearn (#2205, same
            // class as the pairwise euclidean f32 fix #2298).
            let mut count: usize = 0;
            let mut sum = 0.0f64;
            for v in col.iter().copied() {
                if v.is_nan() {
                    continue;
                }
                count += 1;
                sum += v.to_f64().unwrap_or(f64::NAN);
            }
            finite_count[j] = count;
            if count == 0 {
                // All-NaN column: mean/var/std = NaN (no division by zero).
                mean[j] = f64::NAN;
                var_arr[j] = f64::NAN;
                std_arr[j] = f64::NAN;
                continue;
            }
            let cnt = count as f64;
            let m = sum / cnt;
            // Corrected 2-pass variance (Chan/Golub/LeVeque), matching sklearn's
            // `_incremental_mean_and_var` first-batch reduction (`extmath.py:1142`
            // `T = new_sum/new_sample_count`, `:1155-1162`
            // `nuv = sum((x-T)^2) - (sum(x-T))^2/count`, `:1178`
            // `var = nuv/count`). The correction term is load-bearing for
            // near-constant columns (e.g. `[1e8, 1e8+1e-8]` → `5.55e-17`, where
            // the naive `sum((x-mean)^2)/count` would give `1.11e-16`); on
            // ordinary columns it is numerically negligible (byte-identical to
            // the prior `sum((x-mean)^2)/count` path on finite, normal-magnitude
            // data). Accumulated in f64 (see above).
            let mut sum_dev = 0.0f64;
            let mut sum_dev_sq = 0.0f64;
            for v in col.iter().copied() {
                if v.is_nan() {
                    continue;
                }
                let d = v.to_f64().unwrap_or(f64::NAN) - m;
                sum_dev += d;
                sum_dev_sq += d * d;
            }
            let variance = (sum_dev_sq - (sum_dev * sum_dev) / cnt) / cnt;
            // Store the f64 result directly (the fitted attributes are f64,
            // mirroring sklearn's float64 `mean_`/`var_`/`scale_`, #2205).
            mean[j] = m;
            var_arr[j] = variance;
            std_arr[j] = variance.sqrt();
        }

        // `scale_` mirrors sklearn `_handle_zeros_in_scale(sqrt(var_),
        // constant_mask=_is_constant_feature(var_, mean_, n_samples_seen_))`
        // (`_data.py:1016-1021`,`:88-120`). UNLIKE Min/MaxAbs/Robust (which use
        // the default `scale < 10*eps` mask, `:114`), StandardScaler passes an
        // explicit variance-relative constant mask computed on the RAW variances:
        // a column is constant iff `var <= n*eps*var + (n*mean*eps)^2`
        // (`_is_constant_feature`, `:72-85`). `eps` is ALWAYS `f64::EPSILON`
        // (sklearn: "variance is always computed using float64 accumulators",
        // `:81-82`) — even on the f32 path the bound is computed by upcasting to
        // f64, matching sklearn. A constant column gets `scale_ = 1.0` (NOT
        // sqrt(var)); a near-constant column (tiny var below the bound) is also
        // flagged and gets `scale_ = 1.0`. A NaN var (all-NaN column) is not
        // `<= bound` (NaN compares false) → constant_mask false → scale_ = NaN.
        let f64_eps = f64::EPSILON;
        let mut scale_ = std_arr.clone();
        for j in 0..n_features {
            // `var_`/`mean` are already f64 (sklearn's float64 accumulators,
            // #2205), so the constant-feature bound matches sklearn directly.
            let var_f64 = var_arr[j];
            let mean_f64 = mean[j];
            let n_f64 = finite_count[j] as f64;
            let upper_bound = n_f64 * f64_eps * var_f64 + (n_f64 * mean_f64 * f64_eps).powi(2);
            // `_is_constant_feature`: var <= upper_bound (`_data.py:85`).
            if var_f64 <= upper_bound {
                scale_[j] = 1.0;
            }
        }

        Ok(FittedStandardScaler {
            mean,
            std: std_arr,
            var_: var_arr,
            scale_,
            n_samples_seen_: n_samples,
            with_mean: self.with_mean,
            with_std: self.with_std,
            _marker: std::marker::PhantomData,
        })
    }
}

impl<F: Float + Send + Sync + 'static> Transform<Array2<F>> for FittedStandardScaler<F> {
    type Output = Array2<F>;
    type Error = FerroError;

    /// Transform data by subtracting the mean and dividing by the standard deviation.
    ///
    /// (Near-)constant columns use the precomputed `scale_ = 1` (sklearn
    /// `_is_constant_feature` + `_handle_zeros_in_scale`, `_data.py:72-85`,
    /// `:88-120`,`:1016-1021`), so a fitted-constant entry maps to
    /// `(x - mean) / 1 = 0`. NaN inputs pass through (`(nan - mean)/scale = nan`)
    /// and an all-NaN fitted column (whose `mean`/`scale_` are NaN) transforms to
    /// NaN, matching sklearn's `allow-nan` contract (`_data.py:1052`,`:1064-1067`).
    ///
    /// # Errors
    ///
    /// Returns [`FerroError::ShapeMismatch`] if the number of columns does not
    /// match the number of features seen during fitting, or
    /// [`FerroError::InvalidParameter`] if any input element is +/-inf
    /// (sklearn `force_all_finite="allow-nan"` rejects infinity, `_data.py:1052`).
    fn transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
        let n_features = self.mean.len();
        if x.ncols() != n_features {
            return Err(FerroError::ShapeMismatch {
                expected: vec![x.nrows(), n_features],
                actual: vec![x.nrows(), x.ncols()],
                context: "FittedStandardScaler::transform".into(),
            });
        }
        // sklearn `transform` validates with `force_all_finite="allow-nan"`
        // (`_data.py:1052`): NaN passes through, +/-inf raises ValueError
        // (MinMaxScaler #2200 precedent).
        if x.iter().any(|v| v.is_infinite()) {
            return Err(FerroError::InvalidParameter {
                name: "X".into(),
                reason: "Input X contains infinity or a value too large for dtype.".into(),
            });
        }
        let mut out = x.to_owned();
        for (mut col, (&m, &s)) in out
            .columns_mut()
            .into_iter()
            .zip(self.mean.iter().zip(self.scale_.iter()))
        {
            // Conditional center/scale mirroring sklearn `transform`
            // (`_data.py:1064-1067`): `if with_mean: X -= mean_` then
            // `if with_std: X /= scale_`. `mean_`/`scale_` are f64 (sklearn's
            // float64 attributes, #2205); numpy performs `X_f32 - mean_f64` in
            // float64 and assigns the result back to the f32 array, so the
            // center/scale is computed in f64 and the per-element result is
            // downcast to `F` (the f64-input case is exact → unchanged). `scale_`
            // already has zero-variance columns replaced by 1
            // (`_handle_zeros_in_scale`, `_data.py:88`), so a constant column maps
            // to `(x - mean) / 1 = 0`.
            //
            // Each in-place op (`X -= mean_`, `X /= scale_`) is a SEPARATE numpy
            // store back into the `F`-typed array, so the per-element result of
            // EACH op rounds to `F`: two roundings, not one fused rounding. For
            // F=f64 `F::from(f64)` is exact → identical to the fused expression
            // (no regression); for F=f32 each step rounds to f32, matching
            // sklearn's two float32 roundings (1 ULP near 2^24, #2305). NaN
            // propagates through both steps (`F::from(nan)` → nan).
            for v in &mut col {
                if self.with_mean {
                    // X -= mean_ (round to F)
                    *v = F::from(v.to_f64().unwrap_or(f64::NAN) - m).unwrap_or_else(F::nan);
                }
                if self.with_std {
                    // X /= scale_ (round to F)
                    *v = F::from(v.to_f64().unwrap_or(f64::NAN) / s).unwrap_or_else(F::nan);
                }
            }
        }
        Ok(out)
    }
}

/// Implement `Transform` on the unfitted scaler to satisfy the `FitTransform: Transform`
/// supertrait bound. Calling `transform` on an unfitted scaler always returns an error
/// because no statistics have been learned yet.
impl<F: Float + Send + Sync + 'static> Transform<Array2<F>> for StandardScaler<F> {
    type Output = Array2<F>;
    type Error = FerroError;

    /// Always returns an error — the scaler must be fitted first.
    ///
    /// Use [`Fit::fit`] to produce a [`FittedStandardScaler`], then call
    /// [`Transform::transform`] on that.
    fn transform(&self, _x: &Array2<F>) -> Result<Array2<F>, FerroError> {
        Err(FerroError::InvalidParameter {
            name: "StandardScaler".into(),
            reason: "scaler must be fitted before calling transform; use fit() first".into(),
        })
    }
}

impl<F: Float + Send + Sync + 'static> FitTransform<Array2<F>> for StandardScaler<F> {
    type FitError = FerroError;

    /// Fit the scaler on `x` and return the scaled output in one step.
    ///
    /// # Errors
    ///
    /// Returns an error if fitting fails (e.g., zero rows).
    fn fit_transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
        let fitted = self.fit(x, &())?;
        fitted.transform(x)
    }
}

// ---------------------------------------------------------------------------
// Pipeline integration
// ---------------------------------------------------------------------------

impl<F: Float + Send + Sync + 'static> PipelineTransformer<F> for StandardScaler<F> {
    /// Fit the scaler using the pipeline interface.
    ///
    /// The `y` argument is ignored; it exists only for API compatibility.
    ///
    /// # Errors
    ///
    /// Propagates errors from [`Fit::fit`].
    fn fit_pipeline(
        &self,
        x: &Array2<F>,
        _y: &Array1<F>,
    ) -> Result<Box<dyn FittedPipelineTransformer<F>>, FerroError> {
        let fitted = self.fit(x, &())?;
        Ok(Box::new(fitted))
    }
}

impl<F: Float + Send + Sync + 'static> FittedPipelineTransformer<F> for FittedStandardScaler<F> {
    /// Transform data using the pipeline interface.
    ///
    /// # Errors
    ///
    /// Propagates errors from [`Transform::transform`].
    fn transform_pipeline(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
        self.transform(x)
    }
}

// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------

#[cfg(test)]
mod tests {
    use super::*;
    use approx::assert_abs_diff_eq;
    use ndarray::array;

    #[test]
    fn test_standard_scaler_zero_mean_unit_variance() {
        let scaler = StandardScaler::<f64>::new();
        let x = array![[1.0, 10.0], [2.0, 20.0], [3.0, 30.0]];
        let fitted = scaler.fit(&x, &()).unwrap();
        let scaled = fitted.transform(&x).unwrap();

        // Each column should have mean ~0
        for j in 0..scaled.ncols() {
            let col_mean: f64 = scaled.column(j).iter().sum::<f64>() / scaled.nrows() as f64;
            assert_abs_diff_eq!(col_mean, 0.0, epsilon = 1e-10);
        }

        // Each column should have population std ~1
        for j in 0..scaled.ncols() {
            let col_mean: f64 = scaled.column(j).iter().sum::<f64>() / scaled.nrows() as f64;
            let variance: f64 = scaled
                .column(j)
                .iter()
                .map(|&v| (v - col_mean).powi(2))
                .sum::<f64>()
                / scaled.nrows() as f64;
            assert_abs_diff_eq!(variance, 1.0, epsilon = 1e-10);
        }
    }

    #[test]
    fn test_inverse_transform_roundtrip() {
        let scaler = StandardScaler::<f64>::new();
        let x = array![[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]];
        let fitted = scaler.fit(&x, &()).unwrap();
        let scaled = fitted.transform(&x).unwrap();
        let recovered = fitted.inverse_transform(&scaled).unwrap();

        for (a, b) in x.iter().zip(recovered.iter()) {
            assert_abs_diff_eq!(a, b, epsilon = 1e-10);
        }
    }

    #[test]
    fn test_constant_column_maps_to_zero() {
        let scaler = StandardScaler::<f64>::new();
        // Column 1 is constant: std = 0
        let x = array![[1.0, 5.0], [2.0, 5.0], [3.0, 5.0]];
        let fitted = scaler.fit(&x, &()).unwrap();
        assert_abs_diff_eq!(fitted.std()[1], 0.0, epsilon = 1e-15);
        let scaled = fitted.transform(&x).unwrap();
        // Constant column maps to (x - mean)/1 = 0 (sklearn with_mean=True).
        for i in 0..3 {
            assert_abs_diff_eq!(scaled[[i, 1]], 0.0, epsilon = 1e-10);
        }
    }

    #[test]
    fn test_fit_transform_equivalence() {
        let scaler = StandardScaler::<f64>::new();
        let x = array![[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]];
        let via_fit_transform = scaler.fit_transform(&x).unwrap();
        let fitted = scaler.fit(&x, &()).unwrap();
        let via_separate = fitted.transform(&x).unwrap();
        for (a, b) in via_fit_transform.iter().zip(via_separate.iter()) {
            assert_abs_diff_eq!(a, b, epsilon = 1e-15);
        }
    }

    #[test]
    fn test_shape_mismatch_error() {
        let scaler = StandardScaler::<f64>::new();
        let x_train = array![[1.0, 2.0], [3.0, 4.0]];
        let fitted = scaler.fit(&x_train, &()).unwrap();
        let x_bad = array![[1.0, 2.0, 3.0]];
        assert!(fitted.transform(&x_bad).is_err());
    }

    #[test]
    fn test_insufficient_samples_error() {
        let scaler = StandardScaler::<f64>::new();
        let x: Array2<f64> = Array2::zeros((0, 3));
        assert!(scaler.fit(&x, &()).is_err());
    }

    // sklearn 1.5.2 oracle (R-CHAR-3), X = [[1,5],[2,5],[3,5]] (col 1 constant):
    //   StandardScaler().fit(X) -> mean_ = [2.0, 5.0],
    //     var_   = [0.6666666666666666, 0.0],
    //     scale_ = [0.816496580927726, 1.0],
    //     n_samples_seen_ = 3.
    #[test]
    fn standard_scaler_var_scale_nsamples_match_sklearn() -> Result<(), FerroError> {
        let scaler = StandardScaler::<f64>::new();
        let x = array![[1.0, 5.0], [2.0, 5.0], [3.0, 5.0]];
        let fitted = scaler.fit(&x, &())?;

        assert_abs_diff_eq!(fitted.var()[0], 0.6666666666666666, epsilon = 1e-9);
        assert_abs_diff_eq!(fitted.var()[1], 0.0, epsilon = 1e-9);

        assert_abs_diff_eq!(fitted.scale()[0], 0.816496580927726, epsilon = 1e-12);
        // scale_[1] is EXACTLY 1.0 on the constant column (_handle_zeros), not 0.0.
        assert_eq!(fitted.scale()[1], 1.0);

        assert_eq!(fitted.n_samples_seen(), 3);
        Ok(())
    }

    #[test]
    fn standard_scaler_scale_differs_from_std_on_constant_col() -> Result<(), FerroError> {
        let scaler = StandardScaler::<f64>::new();
        let x = array![[1.0, 5.0], [2.0, 5.0], [3.0, 5.0]];
        let fitted = scaler.fit(&x, &())?;

        // Constant column: raw std is 0.0 but scale_ is 1.0 (_handle_zeros).
        assert_eq!(fitted.std()[1], 0.0);
        assert_eq!(fitted.scale()[1], 1.0);
        // Non-constant column: scale_ equals raw std unchanged.
        assert_eq!(fitted.scale()[0], fitted.std()[0]);
        Ok(())
    }

    #[test]
    fn standard_scaler_var_equals_std_squared_nonconstant() -> Result<(), FerroError> {
        let scaler = StandardScaler::<f64>::new();
        let x = array![[1.0, 5.0], [2.0, 5.0], [3.0, 5.0]];
        let fitted = scaler.fit(&x, &())?;

        assert_abs_diff_eq!(
            fitted.var()[0],
            fitted.std()[0] * fitted.std()[0],
            epsilon = 1e-12
        );
        Ok(())
    }

    // sklearn 1.5.2 oracle (R-CHAR-3), X = [[1,10],[2,20],[3,30]]:
    //   StandardScaler(with_mean=True,  with_std=True ).fit_transform(X)
    //     -> [[-1.224744871391589,-1.224744871391589],[0,0],[1.224744871391589,1.224744871391589]]
    //   StandardScaler(with_mean=True,  with_std=False).fit_transform(X)
    //     -> [[-1,-10],[0,0],[1,10]]                       (center only)
    //   StandardScaler(with_mean=False, with_std=True ).fit_transform(X)
    //     -> [[1.224744871391589,1.224744871391589],
    //         [2.449489742783178,2.449489742783178],
    //         [3.6742346141747673,3.6742346141747673]]     (scale only)
    //   StandardScaler(with_mean=False, with_std=False).fit_transform(X)
    //     -> X                                              (identity)
    #[test]
    fn standard_scaler_with_mean_std_default_matches_sklearn() -> Result<(), FerroError> {
        let x = array![[1.0, 10.0], [2.0, 20.0], [3.0, 30.0]];
        let scaler = StandardScaler::<f64>::new();
        let fitted = scaler.fit(&x, &())?;
        let scaled = fitted.transform(&x)?;

        let expected = array![
            [-1.224744871391589, -1.224744871391589],
            [0.0, 0.0],
            [1.224744871391589, 1.224744871391589]
        ];
        for (a, b) in scaled.iter().zip(expected.iter()) {
            assert_abs_diff_eq!(a, b, epsilon = 1e-7);
        }

        // Regression guard: default (with_mean=true, with_std=true) must be
        // BYTE-identical to a plain default fit produced the same way.
        let default_fitted = StandardScaler::<f64>::new().fit(&x, &())?;
        let default_scaled = default_fitted.transform(&x)?;
        for (a, b) in scaled.iter().zip(default_scaled.iter()) {
            assert_eq!(a.to_bits(), b.to_bits());
        }
        Ok(())
    }

    #[test]
    fn standard_scaler_with_std_false() -> Result<(), FerroError> {
        let x = array![[1.0, 10.0], [2.0, 20.0], [3.0, 30.0]];
        let scaler = StandardScaler::<f64>::new().with_with_std(false);
        let fitted = scaler.fit(&x, &())?;
        let scaled = fitted.transform(&x)?;

        let expected = array![[-1.0, -10.0], [0.0, 0.0], [1.0, 10.0]];
        for (a, b) in scaled.iter().zip(expected.iter()) {
            assert_abs_diff_eq!(a, b, epsilon = 1e-7);
        }
        Ok(())
    }

    #[test]
    fn standard_scaler_with_mean_false() -> Result<(), FerroError> {
        let x = array![[1.0, 10.0], [2.0, 20.0], [3.0, 30.0]];
        let scaler = StandardScaler::<f64>::new().with_with_mean(false);
        let fitted = scaler.fit(&x, &())?;
        let scaled = fitted.transform(&x)?;

        let expected = array![
            [1.224744871391589, 1.224744871391589],
            [2.449489742783178, 2.449489742783178],
            [3.6742346141747673, 3.6742346141747673]
        ];
        for (a, b) in scaled.iter().zip(expected.iter()) {
            assert_abs_diff_eq!(a, b, epsilon = 1e-7);
        }
        Ok(())
    }

    #[test]
    fn standard_scaler_both_false_identity() -> Result<(), FerroError> {
        let x = array![[1.0, 10.0], [2.0, 20.0], [3.0, 30.0]];
        let scaler = StandardScaler::<f64>::new()
            .with_with_mean(false)
            .with_with_std(false);
        let fitted = scaler.fit(&x, &())?;
        let scaled = fitted.transform(&x)?;

        for (a, b) in scaled.iter().zip(x.iter()) {
            assert_abs_diff_eq!(a, b, epsilon = 1e-12);
        }
        Ok(())
    }

    #[test]
    fn standard_scaler_inverse_roundtrip_each_config() -> Result<(), FerroError> {
        let x = array![[1.0, 10.0], [2.0, 20.0], [3.0, 30.0]];
        for &(with_mean, with_std) in &[(true, true), (true, false), (false, true), (false, false)]
        {
            let scaler = StandardScaler::<f64>::new()
                .with_with_mean(with_mean)
                .with_with_std(with_std);
            let fitted = scaler.fit(&x, &())?;
            let scaled = fitted.transform(&x)?;
            let recovered = fitted.inverse_transform(&scaled)?;
            for (a, b) in x.iter().zip(recovered.iter()) {
                assert_abs_diff_eq!(a, b, epsilon = 1e-9);
            }
        }
        Ok(())
    }

    #[test]
    fn test_f32_scaler() {
        let scaler = StandardScaler::<f32>::new();
        let x: Array2<f32> = array![[1.0f32, 2.0], [3.0, 4.0], [5.0, 6.0]];
        let fitted = scaler.fit(&x, &()).unwrap();
        let scaled = fitted.transform(&x).unwrap();
        let col0_mean: f32 = scaled.column(0).iter().sum::<f32>() / 3.0;
        assert!((col0_mean).abs() < 1e-6);
    }
}