//! Max-absolute scaler: scale each feature by its maximum absolute value.
//!
//! Each feature is transformed as `x_scaled = x / max(|x|)` so that values
//! fall within `[-1, 1]`. This scaler does not shift the data (no centering),
//! making it suitable for sparse data.
//!
//! Columns where `max_abs = 0` (all-zero features) are left unchanged.
//!
//! # `## REQ status`
//!
//! Binary (R-DEFER-2), translating `sklearn/preprocessing/_data.py` (`class MaxAbsScaler`
//! `:1116`, `maxabs_scale` `:1351`). Design doc: `.design/preprocess/max_abs_scaler.md`. Expected
//! values from the live sklearn 1.5.2 oracle (R-CHAR-3). Consumers: PyO3 `_RsMaxAbsScaler`
//! (`ferrolearn-python/src/extras.rs:1156`) + `PipelineTransformer` impl + crate re-export (S5).
//! HONEST (R-HONEST-3): verify-and-document — the dense max-abs path matches sklearn including
//! the zero-max_abs edge (which, unlike Min/StandardScaler, does NOT diverge: a zero-max_abs
//! column is all-zero, and `x/scale_(1) = x` equals ferrolearn's leave-unchanged).
//!
//! | REQ | Status | Evidence |
//! |---|---|---|
//! | REQ-1 (per-column max-abs value match) | SHIPPED | `Fit::fit` learns per-column `max_abs=max(|x|)`; `Transform::transform` = `x/max_abs`, mirroring sklearn `max_abs_=_nanmax(abs(X))` (`_data.py:1263`) / `scale_=_handle_zeros_in_scale(max_abs_)` (`:1272`) / `X/=scale_` (`:1305`). Critic-verified bit-identical to live oracle: `divergence_max_abs_scaler.rs` green guards (`[[-3,1],[0,-2],[2,4]]` → `[[-1,0.25],[0,-0.5],[0.6667,1]]`, all-negative, mixed, f32). Consumers: PyO3 `_RsMaxAbsScaler` + `FittedPipelineTransformer` + re-export. |
//! | REQ-2 (zero-max_abs column → identity, MATCHES sklearn) | SHIPPED | A zero-max_abs column is all-zero; sklearn `scale_=_handle_zeros_in_scale(0)=1` → `x/1=x` = ferrolearn's leave-unchanged for ANY input. Critic-verified MATCH (discriminating: `fit([[0],[0]]).transform([[5]])==[[5.0]]` in both). NOT a divergence (contrast Min/StandardScaler). |
//! | REQ-3 (inverse_transform round-trip) | SHIPPED | `inverse_transform` = `x*max_abs` (zero-max_abs left unchanged), mirroring sklearn `X *= scale_` (`_data.py:1337`); `inverse_transform(transform(X))==X` (green guard). Consumer: re-export boundary (S5). |
//! | REQ-4 (PyO3 binding) | SHIPPED | `_RsMaxAbsScaler` (`extras.rs:1156`, registered `lib.rs:82`) marshals `fit`/`transform` over `FittedMaxAbsScaler<f64>` — a real CPython consumer; maturin smoke. |
//! | REQ-5 (NaN tolerance: allow-nan) | SHIPPED | FIXED #1202. `Fit::fit`'s per-column `max(\|x\|)` reduction now SKIPS NaN via an `Option<F>` accumulator (`continue` on `is_nan()`), mirroring sklearn `max_abs = _nanmax(abs(X), axis=0)` under `force_all_finite='allow-nan'` (`_data.py:1263`,`:1256`): a column with at least one finite value gets the finite max-abs; an ALL-NaN column gets `max_abs = F::nan()` (NaN passes `_handle_zeros_in_scale` since NaN != 0 → `scale_` NaN → transform/inverse NaN; no panic, no zero-substitution). `transform`/`inverse_transform` now divide/multiply by `scale_` (so a NaN-column maps to NaN, a zero-column to identity); NaN inputs pass through (`nan/scale = nan`, `nan*scale = nan`). inf-rejection (allow-nan REJECTS ±inf, MinMaxScaler #2200 precedent): `fit`/`transform`/`inverse_transform` return `InvalidParameter` ("Input X contains infinity...") on any `is_infinite()` element. Live-oracle tests `req5_nan_fit_single_column_ignored`, `req5_nan_fit_multi_column_scattered`, `req5_all_nan_column_yields_nan_no_panic`, `req5_nan_passthrough_inverse_transform`, `inf_rejected_fit`/`inf_rejected_transform`/`inf_rejected_inverse_transform`, `nan_only_still_fits` in `tests/divergence_max_abs_scaler.rs`. Consumers: PyO3 `_RsMaxAbsScaler` + `FittedPipelineTransformer` + re-export. |
//! | REQ-6 (scale_/n_samples_seen_ attrs) | SHIPPED | `FittedMaxAbsScaler<F>` stores `scale_ = max_abs.mapv(\|m\| if m==0 {1} else {m})` (mirroring sklearn `scale_ = _handle_zeros_in_scale(max_abs_)` `_data.py:1272`,`:88` — `1.0` on all-zero columns) and `n_samples_seen_ = n_samples` (`:1266`), set in `Fit::fit`. Getters `scale()`/`n_samples_seen()` (`#[must_use]`). Oracle (`MaxAbsScaler().fit([[1,0],[-3,0],[2,0]])` → `max_abs_=[3,0]`, `scale_=[3,1]`, `n_samples_seen_=3`): tests `max_abs_scale_nsamples_match_sklearn`, `max_abs_scale_differs_from_max_abs_on_zero_col`. `transform`/`inverse_transform` unchanged (still divide/multiply by `max_abs`; identical to dividing by `scale_` since they coincide off the all-zero columns). |
//! | REQ-7 (partial_fit / streaming) | NOT-STARTED | open prereq blocker #1204. Single-shot (`_data.py:1232-1273`). |
//! | REQ-8 (maxabs_scale free fn + axis) | NOT-STARTED | open prereq blocker #1205. No `maxabs_scale` / axis=1 (`_data.py:1351`). |
//! | REQ-9 (copy param + _parameter_constraints) | SHIPPED | FIXED #1206. `MaxAbsScaler<F>` gains a `copy: bool` field (default `true`) + `#[must_use] with_copy(self, bool) -> Self` builder + `copy()` getter, mirroring sklearn `__init__(*, copy=True)` (`_data.py:1190`) under `_parameter_constraints {copy:["boolean"]}` (`:1188`). ACCEPT-AND-DOCUMENT no-op: ferrolearn's [`Transform`] always returns a freshly allocated array (`to_owned()`), so `copy` has no observable effect (documented; behavior unchanged) — the direct analog of MinMaxScaler REQ-10's `copy` no-op. Live-oracle test `req9_copy_is_no_op_on_values`. Consumers: PyO3 `_RsMaxAbsScaler` + re-export. |
//! | REQ-10 (sparse CSR/CSC) | NOT-STARTED | open prereq blocker #1207. Dense-only; MaxAbsScaler is sklearn's flagship sparse-safe scaler (`_data.py:1260-1261`,`:1303`). |
//! | REQ-11 (get_feature_names_out / n_features_in_) | NOT-STARTED | open prereq blocker #1208. None (OneToOneFeatureMixin). |
//! | REQ-12 (ferray substrate) | NOT-STARTED | open prereq blocker #1209. `ndarray`+`num_traits`, not `ferray-core` (R-SUBSTRATE-1/2). |
use ferrolearn_core::error::FerroError;
use ferrolearn_core::pipeline::{FittedPipelineTransformer, PipelineTransformer};
use ferrolearn_core::traits::{Fit, FitTransform, Transform};
use ndarray::{Array1, Array2};
use num_traits::Float;
// ---------------------------------------------------------------------------
// MaxAbsScaler (unfitted)
// ---------------------------------------------------------------------------
/// An unfitted max-absolute scaler.
///
/// Calling [`Fit::fit`] learns the per-column maximum absolute values and
/// returns a [`FittedMaxAbsScaler`] that can transform new data.
///
/// Columns where the maximum absolute value is zero are left unchanged after
/// transformation.
///
/// # Examples
///
/// ```
/// use ferrolearn_preprocess::MaxAbsScaler;
/// use ferrolearn_core::traits::{Fit, Transform};
/// use ndarray::array;
///
/// let scaler = MaxAbsScaler::<f64>::new();
/// let x = array![[-3.0, 1.0], [0.0, -2.0], [2.0, 4.0]];
/// let fitted = scaler.fit(&x, &()).unwrap();
/// let scaled = fitted.transform(&x).unwrap();
/// // All values now in [-1, 1]
/// ```
#[derive(Debug, Clone)]
pub struct MaxAbsScaler<F> {
/// sklearn's `copy` constructor parameter (`_data.py:1190`,
/// `_parameter_constraints {copy:["boolean"]}` `:1188`). ACCEPT-AND-DOCUMENT
/// no-op: ferrolearn's [`Transform`] always returns a freshly allocated
/// array, so `copy` has no observable effect here. Retained for API parity.
/// Defaults to `true`.
pub(crate) copy: bool,
_marker: std::marker::PhantomData<F>,
}
impl<F: Float + Send + Sync + 'static> MaxAbsScaler<F> {
/// Create a new `MaxAbsScaler`.
#[must_use]
pub fn new() -> Self {
Self {
copy: true,
_marker: std::marker::PhantomData,
}
}
/// Set sklearn's `copy` constructor parameter (`_data.py:1190`).
///
/// ACCEPT-AND-DOCUMENT no-op: ferrolearn's [`Transform`] contract always
/// returns a freshly allocated array, so `copy` has no observable effect.
/// The flag is retained only for API parity with scikit-learn
/// (`_parameter_constraints {copy:["boolean"]}`, `_data.py:1188`); toggling
/// it does not change behavior.
#[must_use]
pub fn with_copy(mut self, copy: bool) -> Self {
self.copy = copy;
self
}
/// Return the `copy` flag (accept-and-document no-op; see [`Self::with_copy`]).
#[must_use]
pub fn copy(&self) -> bool {
self.copy
}
}
impl<F: Float + Send + Sync + 'static> Default for MaxAbsScaler<F> {
fn default() -> Self {
Self::new()
}
}
// ---------------------------------------------------------------------------
// FittedMaxAbsScaler
// ---------------------------------------------------------------------------
/// A fitted max-absolute scaler holding per-column maximum absolute values.
///
/// Created by calling [`Fit::fit`] on a [`MaxAbsScaler`].
#[derive(Debug, Clone)]
pub struct FittedMaxAbsScaler<F> {
/// Per-column maximum absolute values learned during fitting.
pub(crate) max_abs: Array1<F>,
/// Per-column scaling factors, `max_abs` with all-zero columns replaced by `1.0`.
pub(crate) scale_: Array1<F>,
/// Number of samples (rows) seen during fitting.
pub(crate) n_samples_seen_: usize,
}
impl<F: Float + Send + Sync + 'static> FittedMaxAbsScaler<F> {
/// Return the per-column maximum absolute values learned during fitting.
#[must_use]
pub fn max_abs(&self) -> &Array1<F> {
&self.max_abs
}
/// Return the per-column scaling factors used to divide each feature.
///
/// Mirrors sklearn `MaxAbsScaler.scale_ = _handle_zeros_in_scale(max_abs_)`
/// (`sklearn/preprocessing/_data.py:1272`): equal to `max_abs` on nonzero
/// columns and exactly `1.0` on all-zero columns, so dividing by `scale_`
/// leaves an all-zero column unchanged.
#[must_use]
pub fn scale(&self) -> &Array1<F> {
&self.scale_
}
/// Return the number of samples (rows) seen during fitting.
///
/// Mirrors sklearn `MaxAbsScaler.n_samples_seen_`
/// (`sklearn/preprocessing/_data.py:1266`).
#[must_use]
pub fn n_samples_seen(&self) -> usize {
self.n_samples_seen_
}
/// Inverse-transform scaled data back to the original space.
///
/// Applies `x_orig = x_scaled * scale_` per column, mirroring sklearn
/// `X *= self.scale_` (`_data.py:1337`) with
/// `scale_ = _handle_zeros_in_scale(max_abs_)` (`:1272`,`:88`): a
/// zero-`max_abs` (all-zero) column has `scale_ = 1` (`x * 1 = x`), an
/// all-NaN fitted column has `scale_ = NaN` (`x * NaN = NaN`). NaN inputs
/// pass through (`nan * scale = nan`), matching sklearn's `allow-nan`
/// contract (`_data.py:1331`).
///
/// # Errors
///
/// Returns [`FerroError::ShapeMismatch`] if the number of columns does not
/// match the number of features seen during fitting, or
/// [`FerroError::InvalidParameter`] if any input element is +/-inf
/// (sklearn `force_all_finite="allow-nan"` rejects infinity).
pub fn inverse_transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
let n_features = self.max_abs.len();
if x.ncols() != n_features {
return Err(FerroError::ShapeMismatch {
expected: vec![x.nrows(), n_features],
actual: vec![x.nrows(), x.ncols()],
context: "FittedMaxAbsScaler::inverse_transform".into(),
});
}
// sklearn `inverse_transform` validates with
// `force_all_finite="allow-nan"` (`_data.py:1331`): NaN passes through,
// +/-inf raises ValueError.
if x.iter().any(|v| v.is_infinite()) {
return Err(FerroError::InvalidParameter {
name: "X".into(),
reason: "Input X contains infinity or a value too large for dtype.".into(),
});
}
let mut out = x.to_owned();
for (j, mut col) in out.columns_mut().into_iter().enumerate() {
// Multiply by `scale_` (= `_handle_zeros_in_scale`): zero-`max_abs`
// column has `scale_ = 1` (`x * 1 = x`), all-NaN column has
// `scale_ = NaN` (`x * NaN = NaN`), otherwise `scale_ = max_abs`.
let scale = self.scale_[j];
for v in &mut col {
*v = *v * scale;
}
}
Ok(out)
}
}
// ---------------------------------------------------------------------------
// Trait implementations
// ---------------------------------------------------------------------------
impl<F: Float + Send + Sync + 'static> Fit<Array2<F>, ()> for MaxAbsScaler<F> {
type Fitted = FittedMaxAbsScaler<F>;
type Error = FerroError;
/// Fit the scaler by computing per-column maximum absolute values.
///
/// # Errors
///
/// Returns [`FerroError::InsufficientSamples`] if the input has zero rows.
fn fit(&self, x: &Array2<F>, _y: &()) -> Result<FittedMaxAbsScaler<F>, FerroError> {
let n_samples = x.nrows();
if n_samples == 0 {
return Err(FerroError::InsufficientSamples {
required: 1,
actual: 0,
context: "MaxAbsScaler::fit".into(),
});
}
// sklearn validates X with `force_all_finite="allow-nan"`
// (`_data.py:1256`): NaN is permitted, but +/-inf raises ValueError
// ("Input X contains infinity or a value too large for dtype('...')").
// Mirrors the MinMaxScaler #2200 precedent (allow-nan rejects inf).
if x.iter().any(|v| v.is_infinite()) {
return Err(FerroError::InvalidParameter {
name: "X".into(),
reason: "Input X contains infinity or a value too large for dtype.".into(),
});
}
let n_features = x.ncols();
let mut max_abs = Array1::zeros(n_features);
for j in 0..n_features {
// NaN-ignoring per-column max(|x|), mirroring sklearn's
// `max_abs = _nanmax(abs(X), axis=0)` under
// `force_all_finite="allow-nan"` (`_data.py:1263`,`:1256`). NaN values
// are skipped (Option accumulator). If a column is ALL NaN (every
// entry skipped) the accumulator stays `None` and we emit NaN —
// matching `_nanmax` returning nan on an all-NaN slice (that column's
// scale_/transform become NaN via `_handle_zeros_in_scale`, which
// leaves NaN unchanged since NaN != 0; no panic, no zero substitution).
let mut acc: Option<F> = None;
for v in x.column(j).iter().copied() {
if v.is_nan() {
continue;
}
let a = v.abs();
acc = Some(match acc {
Some(m) if m >= a => m,
_ => a,
});
}
max_abs[j] = acc.unwrap_or_else(F::nan);
}
// sklearn: scale_ = _handle_zeros_in_scale(max_abs_) (`_data.py:1272`,
// `:114-119`): `constant_mask = scale < 10 * finfo(dtype).eps;
// scale[constant_mask] = 1.0`. A max_abs BELOW the near-constant
// threshold (NOT just exactly 0) becomes 1.0 so dividing leaves the
// column unchanged (#2203). A NaN max_abs (all-NaN column) is NOT
// `< threshold` (NaN compares false), so it passes through -> scale_ =
// NaN -> transform/inverse yield NaN (matching sklearn).
let ten_eps = F::epsilon() * F::from(10.0).unwrap_or_else(F::one);
let scale_ = max_abs.mapv(|m| if m < ten_eps { F::one() } else { m });
// sklearn: n_samples_seen_ = X.shape[0] (`_data.py:1266`).
let n_samples_seen_ = n_samples;
Ok(FittedMaxAbsScaler {
max_abs,
scale_,
n_samples_seen_,
})
}
}
impl<F: Float + Send + Sync + 'static> Transform<Array2<F>> for FittedMaxAbsScaler<F> {
type Output = Array2<F>;
type Error = FerroError;
/// Transform data by dividing each feature by its maximum absolute value.
///
/// Mirrors sklearn `X /= self.scale_` (`_data.py:1305`) with
/// `scale_ = _handle_zeros_in_scale(max_abs_)` (`:1272`,`:88`): a
/// zero-`max_abs` (all-zero) column has `scale_ = 1`, so `x / 1 = x` leaves
/// it unchanged; an all-NaN fitted column has `scale_ = NaN`, so `x / NaN =
/// NaN`. NaN inputs pass through (`nan / scale = nan`), matching sklearn's
/// `allow-nan` contract (`_data.py:1256`,`:1299`).
///
/// # Errors
///
/// Returns [`FerroError::ShapeMismatch`] if the number of columns does not
/// match the number of features seen during fitting, or
/// [`FerroError::InvalidParameter`] if any input element is +/-inf
/// (sklearn `force_all_finite="allow-nan"` rejects infinity).
fn transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
let n_features = self.max_abs.len();
if x.ncols() != n_features {
return Err(FerroError::ShapeMismatch {
expected: vec![x.nrows(), n_features],
actual: vec![x.nrows(), x.ncols()],
context: "FittedMaxAbsScaler::transform".into(),
});
}
// sklearn `transform` validates with `force_all_finite="allow-nan"`
// (`_data.py:1299`): NaN passes through, +/-inf raises ValueError.
if x.iter().any(|v| v.is_infinite()) {
return Err(FerroError::InvalidParameter {
name: "X".into(),
reason: "Input X contains infinity or a value too large for dtype.".into(),
});
}
let mut out = x.to_owned();
for (j, mut col) in out.columns_mut().into_iter().enumerate() {
// Divide by the precomputed `scale_` (= `_handle_zeros_in_scale`):
// a zero-`max_abs` column has `scale_ = 1` (`x / 1 = x`), an all-NaN
// column has `scale_ = NaN` (`x / NaN = NaN`), otherwise `scale_ =
// max_abs`. NaN inputs pass through (`nan / scale = nan`).
let scale = self.scale_[j];
for v in &mut col {
*v = *v / scale;
}
}
Ok(out)
}
}
/// Implement `Transform` on the unfitted scaler to satisfy the `FitTransform: Transform`
/// supertrait bound. Calling `transform` on an unfitted scaler always returns an error.
impl<F: Float + Send + Sync + 'static> Transform<Array2<F>> for MaxAbsScaler<F> {
type Output = Array2<F>;
type Error = FerroError;
/// Always returns an error — the scaler must be fitted first.
///
/// Use [`Fit::fit`] to produce a [`FittedMaxAbsScaler`], then call
/// [`Transform::transform`] on that.
fn transform(&self, _x: &Array2<F>) -> Result<Array2<F>, FerroError> {
Err(FerroError::InvalidParameter {
name: "MaxAbsScaler".into(),
reason: "scaler must be fitted before calling transform; use fit() first".into(),
})
}
}
impl<F: Float + Send + Sync + 'static> FitTransform<Array2<F>> for MaxAbsScaler<F> {
type FitError = FerroError;
/// Fit the scaler on `x` and return the scaled output in one step.
///
/// # Errors
///
/// Returns an error if fitting fails (e.g., zero rows).
fn fit_transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
let fitted = self.fit(x, &())?;
fitted.transform(x)
}
}
// ---------------------------------------------------------------------------
// Pipeline integration (generic)
// ---------------------------------------------------------------------------
impl<F: Float + Send + Sync + 'static> PipelineTransformer<F> for MaxAbsScaler<F> {
/// Fit the scaler using the pipeline interface.
///
/// The `y` argument is ignored; it exists only for API compatibility.
///
/// # Errors
///
/// Propagates errors from [`Fit::fit`].
fn fit_pipeline(
&self,
x: &Array2<F>,
_y: &Array1<F>,
) -> Result<Box<dyn FittedPipelineTransformer<F>>, FerroError> {
let fitted = self.fit(x, &())?;
Ok(Box::new(fitted))
}
}
impl<F: Float + Send + Sync + 'static> FittedPipelineTransformer<F> for FittedMaxAbsScaler<F> {
/// Transform data using the pipeline interface.
///
/// # Errors
///
/// Propagates errors from [`Transform::transform`].
fn transform_pipeline(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError> {
self.transform(x)
}
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
#[cfg(test)]
mod tests {
use super::*;
use approx::assert_abs_diff_eq;
use ndarray::array;
#[test]
fn test_max_abs_scaler_basic() {
let scaler = MaxAbsScaler::<f64>::new();
let x = array![[-3.0, 1.0], [0.0, -2.0], [2.0, 4.0]];
let fitted = scaler.fit(&x, &()).unwrap();
// col0: max_abs = 3.0, col1: max_abs = 4.0
assert_abs_diff_eq!(fitted.max_abs()[0], 3.0, epsilon = 1e-10);
assert_abs_diff_eq!(fitted.max_abs()[1], 4.0, epsilon = 1e-10);
let scaled = fitted.transform(&x).unwrap();
assert_abs_diff_eq!(scaled[[0, 0]], -1.0, epsilon = 1e-10);
assert_abs_diff_eq!(scaled[[1, 0]], 0.0, epsilon = 1e-10);
assert_abs_diff_eq!(scaled[[2, 0]], 2.0 / 3.0, epsilon = 1e-10);
assert_abs_diff_eq!(scaled[[2, 1]], 1.0, epsilon = 1e-10);
}
#[test]
fn test_values_in_range() {
let scaler = MaxAbsScaler::<f64>::new();
let x = array![[-10.0, 5.0], [3.0, -8.0], [7.0, 2.0]];
let fitted = scaler.fit(&x, &()).unwrap();
let scaled = fitted.transform(&x).unwrap();
for v in &scaled {
assert!(
*v >= -1.0 - 1e-10 && *v <= 1.0 + 1e-10,
"value {v} out of [-1, 1]"
);
}
}
#[test]
fn test_zero_column_unchanged() {
let scaler = MaxAbsScaler::<f64>::new();
let x = array![[0.0, 1.0], [0.0, 2.0], [0.0, 3.0]];
let fitted = scaler.fit(&x, &()).unwrap();
assert_abs_diff_eq!(fitted.max_abs()[0], 0.0, epsilon = 1e-15);
let scaled = fitted.transform(&x).unwrap();
// All-zero column stays 0.0
for i in 0..3 {
assert_abs_diff_eq!(scaled[[i, 0]], 0.0, epsilon = 1e-10);
}
}
#[test]
fn test_inverse_transform_roundtrip() {
let scaler = MaxAbsScaler::<f64>::new();
let x = array![[-3.0, 1.0], [0.0, -2.0], [2.0, 4.0]];
let fitted = scaler.fit(&x, &()).unwrap();
let scaled = fitted.transform(&x).unwrap();
let recovered = fitted.inverse_transform(&scaled).unwrap();
for (a, b) in x.iter().zip(recovered.iter()) {
assert_abs_diff_eq!(a, b, epsilon = 1e-10);
}
}
#[test]
fn test_fit_transform_equivalence() {
let scaler = MaxAbsScaler::<f64>::new();
let x = array![[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]];
let via_fit_transform = scaler.fit_transform(&x).unwrap();
let fitted = scaler.fit(&x, &()).unwrap();
let via_separate = fitted.transform(&x).unwrap();
for (a, b) in via_fit_transform.iter().zip(via_separate.iter()) {
assert_abs_diff_eq!(a, b, epsilon = 1e-15);
}
}
#[test]
fn test_shape_mismatch_error() {
let scaler = MaxAbsScaler::<f64>::new();
let x_train = array![[1.0, 2.0], [3.0, 4.0]];
let fitted = scaler.fit(&x_train, &()).unwrap();
let x_bad = array![[1.0, 2.0, 3.0]];
assert!(fitted.transform(&x_bad).is_err());
}
#[test]
fn test_insufficient_samples_error() {
let scaler = MaxAbsScaler::<f64>::new();
let x: Array2<f64> = Array2::zeros((0, 3));
assert!(scaler.fit(&x, &()).is_err());
}
#[test]
fn test_unfitted_transform_error() {
let scaler = MaxAbsScaler::<f64>::new();
let x = array![[1.0, 2.0]];
assert!(scaler.transform(&x).is_err());
}
#[test]
fn test_negative_values() {
let scaler = MaxAbsScaler::<f64>::new();
// All negative values
let x = array![[-5.0], [-3.0], [-1.0]];
let fitted = scaler.fit(&x, &()).unwrap();
assert_abs_diff_eq!(fitted.max_abs()[0], 5.0, epsilon = 1e-10);
let scaled = fitted.transform(&x).unwrap();
assert_abs_diff_eq!(scaled[[0, 0]], -1.0, epsilon = 1e-10);
assert_abs_diff_eq!(scaled[[1, 0]], -0.6, epsilon = 1e-10);
assert_abs_diff_eq!(scaled[[2, 0]], -0.2, epsilon = 1e-10);
}
#[test]
fn test_pipeline_integration() {
use ferrolearn_core::pipeline::PipelineTransformer;
let scaler = MaxAbsScaler::<f64>::new();
let x = array![[2.0, 4.0], [1.0, -2.0]];
let y = Array1::zeros(2);
let fitted = scaler.fit_pipeline(&x, &y).unwrap();
let result = fitted.transform_pipeline(&x).unwrap();
assert_abs_diff_eq!(result[[0, 0]], 1.0, epsilon = 1e-10);
assert_abs_diff_eq!(result[[1, 1]], -0.5, epsilon = 1e-10);
}
#[test]
fn max_abs_scale_nsamples_match_sklearn() -> Result<(), FerroError> {
// Live sklearn 1.5.2 oracle (R-CHAR-3):
// MaxAbsScaler().fit([[1,0],[-3,0],[2,0]])
// -> max_abs_ = [3.0, 0.0], scale_ = [3.0, 1.0], n_samples_seen_ = 3
// column 1 is all-zero: scale_ = _handle_zeros_in_scale(0) = 1 (_data.py:1272,:88).
let scaler = MaxAbsScaler::<f64>::new();
let x = array![[1.0, 0.0], [-3.0, 0.0], [2.0, 0.0]];
let fitted = scaler.fit(&x, &())?;
assert_abs_diff_eq!(fitted.scale()[0], 3.0, epsilon = 1e-12);
assert_abs_diff_eq!(fitted.scale()[1], 1.0, epsilon = 1e-12);
// Exactly 1.0 on the all-zero column (not merely close).
assert!(fitted.scale()[1] == 1.0);
assert_eq!(fitted.n_samples_seen(), 3);
Ok(())
}
#[test]
fn max_abs_scale_differs_from_max_abs_on_zero_col() -> Result<(), FerroError> {
// scale_ differs from max_abs_ exactly on all-zero columns: sklearn
// max_abs_ = [3.0, 0.0] but scale_ = [3.0, 1.0] (_data.py:1272,:88).
let scaler = MaxAbsScaler::<f64>::new();
let x = array![[1.0, 0.0], [-3.0, 0.0], [2.0, 0.0]];
let fitted = scaler.fit(&x, &())?;
// All-zero column: max_abs_ == 0.0 but scale_ == 1.0.
assert!(fitted.max_abs()[1] == 0.0);
assert!(fitted.scale()[1] == 1.0);
// Nonzero column: scale_ unchanged from max_abs_.
assert!(fitted.scale()[0] == fitted.max_abs()[0]);
Ok(())
}
#[test]
fn test_f32_scaler() {
let scaler = MaxAbsScaler::<f32>::new();
let x: Array2<f32> = array![[2.0f32, -4.0], [1.0, 3.0]];
let fitted = scaler.fit(&x, &()).unwrap();
let scaled = fitted.transform(&x).unwrap();
assert!((scaled[[0, 0]] - 1.0f32).abs() < 1e-6);
assert!((scaled[[0, 1]] - (-1.0f32)).abs() < 1e-6);
}
}