ferrolearn-cluster 0.5.0

//! OPTICS — Ordering Points To Identify the Clustering Structure.
//!
//! This module provides [`OPTICS`], a density-based algorithm that computes a
//! **reachability ordering** of the data.  Unlike DBSCAN, OPTICS does not
//! require a global density threshold; instead it produces a reachability plot
//! from which clusters at various density levels can be extracted.
//!
//! # Algorithm
//!
//! 1. For each unprocessed point `p`:
//!    - Compute its **core distance** — the distance to the `min_samples`-th
//!      nearest neighbour (within `max_eps`), or `∞` if there are fewer than
//!      `min_samples` neighbours.
//!    - If `p` is a core point, update the reachability distances of all
//!      unprocessed neighbours and add them to an ordered seed list.
//!    - Append `p` to the ordering with its final reachability distance.
//!
//! 2. **Cluster extraction** via the Xi method (see [`FittedOPTICS::extract_clusters`]):
//!    steep descents in the reachability plot define cluster boundaries.
//!
//! OPTICS does **not** implement [`Predict`](ferrolearn_core::Predict) — it
//! produces a reachability ordering and reachability distances from which
//! cluster memberships can be derived post-hoc.
//!
//! # Examples
//!
//! ```
//! use ferrolearn_cluster::OPTICS;
//! use ferrolearn_core::Fit;
//! use ndarray::Array2;
//!
//! let x = Array2::from_shape_vec((9, 2), vec![
//!     0.0, 0.0,  0.1, 0.1,  0.0, 0.1,
//!     5.0, 5.0,  5.1, 5.0,  5.0, 5.1,
//!     10.0, 0.0, 10.1, 0.0, 10.0, 0.1,
//! ]).unwrap();
//!
//! let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
//! assert_eq!(fitted.ordering().len(), 9);
//! ```
//!
//! # `## REQ status`
//!
//! Binary (R-DEFER-2), translating `sklearn/cluster/_optics.py` (`class OPTICS`,
//! `compute_optics_graph`, `cluster_optics_xi`, `cluster_optics_dbscan`). Design
//! doc: `.design/cluster/optics.md`. Cites use ferrolearn symbol anchors / sklearn
//! `file:line` (commit 156ef14); expected values from the live sklearn 1.5.2 oracle
//! (R-CHAR-3). OPTICS is deterministic (no RNG) → value-parity is genuine. No PyO3
//! binding — only consumer is the crate re-export. After fixing the traversal
//! (#1080), `core_distances_`/`ordering_`/`reachability_` VALUE-match sklearn, and
//! `predecessor_` (the `-1`-sentinel int array, REQ-4) now value-matches too. The
//! `cluster_method='dbscan'` + `eps` extraction (REQ-6) is now SHIPPED — `fn
//! cluster_optics_dbscan` value-matches the live oracle. The Xi `labels_`
//! (REQ-5), the in-Xi `min_cluster_size` criterion 3.a (REQ-7), and the
//! `cluster_hierarchy_` attribute (REQ-8) are now SHIPPED: `fn cluster_optics_xi`
//! / `fn xi_cluster` faithfully port sklearn's `cluster_optics_xi` / `_xi_cluster`
//! (incl. the paper-corrected steep inequalities, `predecessor_correction`, the
//! in-loop size cut, and `_extract_xi_labels`), bit-exact vs the live oracle. The
//! remaining parameter/attribute surface (REQ-9 toggle, REQ-10 param surface,
//! REQ-12 PyO3, REQ-13 ferray) stays divergent.
//!
//! | REQ | Status | Evidence |
//! |---|---|---|
//! | REQ-1 (`core_distances_` VALUE) | SHIPPED | impl `fn core_distance` (distance to the `(min_samples-1)`-th other point within `max_eps`) value-matches sklearn `_compute_core_distances_` = `kneighbors(X,min_samples)[0][:,-1]` (`_optics.py:405-438`) even on hard fixtures. Consumer: crate re-export `pub use optics::{FittedOPTICS, OPTICS}` (`lib.rs`). Guards: `green_core_distances_three_blobs`, `green_core_distances_small10_ms2/ms3` in `tests/divergence_optics.rs` (live-oracle). |
//! | REQ-2 (`ordering_` traversal value-parity) | SHIPPED | impl `Fit::fit` now selects the next seed by LINEAR ARGMIN over all unprocessed reachability with smallest-index tie-break (single pool, no heap), matching sklearn `compute_optics_graph` (`_optics.py:638-659`, `:53` "we do not employ a heap"). The prior `BinaryHeap` traversal diverged on tie-prone data. Guard: `divergence_ordering_small10` (sklearn `[0,1,3,9,5,6,7,8,2,4]`; was `[…5,7,8,6…]`) + `green_ordering_three_blobs`/`green_ordering_docstring`. Fixed #1080. |
//! | REQ-3 (`reachability_` VALUE) | SHIPPED | impl `fn update_seeds` computes `max(core_dist_p, dist)` then rounds via `fn round_to_precision` (`np.around(decimals=np.finfo(dtype).precision)`, round-ties-even = numpy rint, `_optics.py:711`); combined with the REQ-2 traversal the reported plot value-matches sklearn (the #1080 ordering parity on tie fixtures REQUIRES the matching rounded reachability at each argmin step). Guards: `green_reachability_docstring` + (noisy) the ordering parity which is driven by reachability. Fixed #1080 (bundled). |
//! | REQ-4 (`predecessor_` `-1`-sentinel int array) | SHIPPED | impl: `FittedOPTICS` stores `predecessor_: Array1<i64>` built in `Fit::fit` (`predecessors.iter().map(\|p\| p.map_or(-1, \|j\| j as i64))`), surfaced via accessor `fn predecessor() -> &Array1<i64>`. This is the `-1`-sentinel int array (shape `(n_samples,)`, indexed by original sample index) matching sklearn `predecessor_` "Seed points have a predecessor of -1" (`_optics.py:187-189`, `np.full(n_samples,-1,dtype=int)` `:604-605`, set on STRICT improvement `:712-714`). ferrolearn's `fn update_seeds` records the predecessor under the SAME strict `new_reach < reachability[q]` condition as sklearn's `improved = np.where(rdists < ...)` (`:712`), so the VALUES match too — verified element-wise (incl. the `-1` seed) vs the live oracle on `three_blobs` (`[-1,0,0,1,3,3,8,6,4]`), `small10` (tie-prone, `[-1,0,3,1,8,9,5,6,7,3]`), and the docstring fixture (`[-1,0,1,5,3,2]`). Internal `fn predecessors() -> &[Option<usize>]` kept (seed `None`) for the Xi extractor + back-compat. Consumer: crate re-export `pub use optics::{FittedOPTICS, OPTICS}` (`lib.rs`). Guards (live-oracle, R-CHAR-3): `green_predecessor_three_blobs`, `green_predecessor_small10`, `green_predecessor_docstring` in `tests/divergence_optics.rs`. **f32 caveat (#2195):** sklearn OPTICS ALWAYS computes the graph in float64 — `reachability_.dtype == float64` even for float32 input (it upcasts) — whereas ferrolearn computes generically in `F`, so on f32 a near-tie in the reachability plot can round to the opposite side, swapping `ordering_`/`reachability_` and hence `predecessor_` (e.g. `small10` f32 at indices 6/7/8). This is an OPTICS-WIDE f32 divergence (it equally affects the SHIPPED f64-only REQ-1/2/3), tracked as #2195 and pinned `#[ignore]` in `tests/divergence_optics_predecessor_f32.rs`; the f64 path is bit-exact. The future fix is to compute the OPTICS graph in f64 regardless of `F` (matching sklearn's upcast) — out of this single-file predecessor scope. |
//! | REQ-5 (`labels_` Xi VALUE parity) | SHIPPED | impl: `Fit::fit`'s `OpticsClusterMethod::Xi` arm calls `fn cluster_optics_xi` → `fn xi_cluster` → `fn extract_xi_labels`, a faithful port of sklearn `cluster_optics_xi`/`_xi_cluster`/`_extract_xi_labels` (`sklearn/cluster/_optics.py:810-1201`): the appended `+inf` sentinel (`:1070`), `xi_complement = 1-xi`, the IEEE-754 `ratio` (Rust `inf/inf=NaN`, `x/inf=0`, `x/0=+inf` reproduce numpy `errstate(invalid="ignore")` `:1082-1087`), the PAPER-CORRECTED steep inequalities (`steep_downward = ratio >= 1/xi_complement` `:1085`, `steep_upward = ratio <= xi_complement` `:1084`), the steep-down/steep-up branches with `fn extend_region`/`fn update_filter_sdas`/`fn correct_predecessor`, Definition-11 criteria 1-4 (incl. the `:1143` `r(x) > D_max` paper-correction), and `_extract_xi_labels`' leaf-only assign + `labels[ordering]=labels` scatter (`:1194-1200`). Consumer (R-DEFER-1, non-test): the `Xi` arm of `Fit::fit` populates `labels_`, reachable via the crate re-export `pub use optics::{FittedOPTICS, OPTICS}` (`lib.rs`); also surfaced through `FittedOPTICS::extract_clusters`. Guards (live-oracle, R-CHAR-3): `green_xi_labels_hierarchy_doctest` (the `_optics.py:891-896` doctest `[0,0,0,1,1,1]`), `green_xi_labels_hierarchy_three_blobs`, `green_xi_labels_hierarchy_small10`, `green_xi_min_cluster_size_inside_loop`, `green_xi_varied_threshold_small10` in `tests/divergence_optics.rs`. **f32 caveat:** the underlying graph is f32-divergent (#2195); these guards are f64. |
//! | REQ-6 (`cluster_method='dbscan'` + `eps` + `cluster_optics_dbscan`) | SHIPPED | impl: free fn `cluster_optics_dbscan(reachability, core_distances, ordering, eps) -> Array1<isize>` translates sklearn's two-step labelling EXACTLY (`sklearn/cluster/_optics.py:781-787`): step 1 walks `ordering` accumulating `cumsum(far_reach & near_core) - 1` with `far_reach = reachability > eps` (STRICT) and `near_core = core_distances <= eps` (INCLUSIVE); step 2 overwrites `labels[far_reach & !near_core] = -1`. Wired into `Fit::fit` via the new `cluster_method: OpticsClusterMethod` field (`enum {Xi (default), Dbscan}`, builder `fn with_cluster_method`) + `eps: Option<F>` field (builder `fn with_eps`): the `Dbscan` arm resolves `eps = self.eps.unwrap_or(self.max_eps)` and errors (`FerroError::InvalidParameter`) when `eps > self.max_eps`, mirroring sklearn's `eps` resolution + ValueError (`:375-383`). Consumer (R-DEFER-1, non-test): `Fit::fit`'s `OpticsClusterMethod::Dbscan` arm CALLS `cluster_optics_dbscan` to populate `labels_`; reachable publicly via `pub mod optics` (`ferrolearn_cluster::optics::{cluster_optics_dbscan, OpticsClusterMethod}`). Guards (live-oracle, R-CHAR-3): `green_dbscan_three_blobs_eps05/eps_none_all_noise/eps_large_merge`, `green_dbscan_small10_eps07_mixed/eps2_merge/eps05_noise`, `green_dbscan_eps_gt_max_eps_errs`, `green_xi_default_unchanged_three_blobs` in `tests/divergence_optics.rs`. (The Xi `labels_` REQ-5/7/8 + the PyO3 surface REQ-12 stay separate NOT-STARTED.) **eps-at-core-distance-boundary caveat (#2196):** when `eps` is set EXACTLY equal to one of ferrolearn's `core_distances_` values, the `<=` near-core cut can flip vs sklearn because ferrolearn's `core_distance` (`sqrt(sum-of-squares)`) differs from sklearn's `kneighbors`/`euclidean_distances` distance form by a sub-ULP (~4e-17), so `core <= eps` rounds to opposite sides at the exact boundary (`small10` at `eps == core_distances_[5]`). This is the SAME distance-form boundary class as DBSCAN #952 (REQ-1 `core_distances_` value-matches to ~1e-9, not bit-exact); for any `eps` not landing precisely on a differing core-distance the cut is value-exact. Tracked #2196, pinned `#[ignore]` in `tests/divergence_optics_dbscan_boundary.rs`. |
//! | REQ-7 (Xi `min_cluster_size` criterion 3.a) | SHIPPED | impl: `fn xi_cluster` applies the size cut INSIDE the steep-up loop — `if c_end + 1 - c_start < min_cluster_size { continue; }` — matching sklearn's `if c_end - c_start + 1 < min_cluster_size: continue` (`sklearn/cluster/_optics.py:1154-1156`), NOT a post-filter (the old `fn filter_small_clusters` post-pass was REMOVED). `min_cluster_size` resolves `None → min_samples` in `fn cluster_optics_xi` (`:902-903`). This is a DIFFERENT fixed point from a post-filter: dropping a cluster mid-loop changes which later intervals get labelled AND the hierarchy. Consumer: same `Fit::fit` Xi arm as REQ-5. Guard (live-oracle): `green_xi_min_cluster_size_inside_loop` — `blobs44` with `min_samples=3`: the 4-point small blob keeps a label at the default `min_cluster_size` but is filtered at `min_cluster_size=10`, changing BOTH `labels_` AND `cluster_hierarchy_` exactly as sklearn. |
//! | REQ-8 (`cluster_hierarchy_` attribute) | SHIPPED | impl: `FittedOPTICS` stores `cluster_hierarchy_: Array2<i64>` (`(n_clusters, 2)` `[start, end]` plot-order intervals, both inclusive), surfaced via `fn cluster_hierarchy() -> &Array2<i64>`. Built in `fn cluster_optics_xi` from the `Vec<(usize,usize)>` `fn xi_cluster` returns (`np.array(clusters)`, `sklearn/cluster/_optics.py:1172`), set on the `Xi` arm of `Fit::fit` (`self.cluster_hierarchy_ = clusters_`, `:373`); the `Dbscan` arm leaves it empty (shape `(0,2)`) — sklearn only sets it on the Xi branch. The order is `_xi_cluster`'s discovery order (U_clusters reversed), which equals the docstring's `(end, -start)` ascending on these fixtures (verified vs the oracle; `cluster_optics_xi` does NOT re-sort — `cluster_hierarchy_ = clusters_` directly). Consumer (R-DEFER-1, non-test): the `Xi` arm of `Fit::fit` populates the field; reachable via the crate re-export. Guards (live-oracle): `green_xi_labels_hierarchy_doctest` (the `_optics.py:893-896` doctest `[[0,2],[3,5],[0,5]]`, with `len(hierarchy)=3 > 2=n_unique_labels` asserting the nested-cluster-kept contract), `_three_blobs`, `_small10`, `green_xi_min_cluster_size_inside_loop`, `green_xi_varied_threshold_small10`, `green_dbscan_cluster_hierarchy_empty` in `tests/divergence_optics.rs`. |
//! | REQ-9 (`predecessor_correction` toggle) | SHIPPED | impl: `OPTICS<F>` carries `predecessor_correction: bool` (default `true` in `fn new`, builder `fn with_predecessor_correction`), threaded through `Fit::fit` → `fn cluster_optics_xi` → `fn xi_cluster`, which applies `fn correct_predecessor` ONLY when the flag is set — matching sklearn's `predecessor_correction=True` default (`sklearn/cluster/_optics.py:277`) and its conditional use (`if predecessor_correction:` `:1147-1150`). Consumer (R-DEFER-1, non-test): the `Xi` arm of `Fit::fit` reads `self.predecessor_correction`. Guard (live-oracle): `green_xi_predecessor_correction_toggle_small10` — `predecessor_correction=False` changes the corrected end (`[4,7]→[4,8]`) and labels point 2, bit-exact vs sklearn. (The remaining REQ-10 param surface stays NOT-STARTED.) |
//! | REQ-10 (param surface `metric`/`p`/`algorithm`/`leaf_size`/`n_jobs` + `min_samples=5` default + float fractions) | NOT-STARTED | open prereq blocker #1088. sklearn `__init__` (`:266-297`); ferrolearn `fn new(min_samples)` required, only `max_eps`/`xi`/`min_cluster_size` builders. |
//! | REQ-11 (validation accept/reject BOUNDARIES) | SHIPPED | impl: `Fit::fit`'s parameter-validation block now matches sklearn's `OPTICS._parameter_constraints` (`sklearn/cluster/_optics.py:242-264`) at the accept/reject BOUNDARY: `min_samples < 2` REJECTED (`Interval(Integral, 2, None, closed="left")` `:243-246` — `{0,1}` rejected; was permitting `min_samples=1`), `max_eps < 0` REJECTED but `max_eps == 0` ACCEPTED (`Interval(Real, 0, None, closed="both")` `:247` — was over-rejecting `max_eps==0`), `xi < 0 \|\| xi > 1` REJECTED but `xi == 0` AND `xi == 1` ACCEPTED (`Interval(Real, 0, 1, closed="both")` `:253` — was over-rejecting `xi∈{0,1}`). The `max_eps == 0` degenerate path needed NO algorithm change: with `max_eps==0` no point has a neighbour, so `fn core_distance` returns `∞` for all (the `if core_distances[point].is_finite()` guard in `Fit::fit` skips `fn update_seeds`), `reachability_` stays `∞`, `ordering_` is by index, and `fn xi_cluster_extraction` on the all-`∞` plot yields all-`-1` labels (the `ratio = inf/inf = NaN` comparisons are all false → no steep points → no clusters) — value-matching the live oracle WITHOUT divide-by-zero / OOB / unwrap-None (R-CODE-2). `xi == 1` likewise does not panic: ferrolearn's Xi ratios use Rust float arithmetic (`1.0/0.0 == inf`), unlike sklearn's Python `ZeroDivisionError`. **Error TYPE ABI:** the grandfathered crate `FerroError::InvalidParameter` is kept (NOT sklearn's `InvalidParameterError`); only the accept/reject BOUNDARY is matched (R-DEV-2 user-API ABI is the boundary, not the exception class name — a crate-wide grandfathered convention). Consumer (R-DEFER-1, non-test): the validated `Fit::fit` is reachable via the crate re-export `pub use optics::{FittedOPTICS, OPTICS}` (`lib.rs`). Guards (live-oracle, R-CHAR-3): `green_min_samples_below_2_rejected`, `green_min_samples_2_accepted`, `green_max_eps_zero_degenerate_matches_oracle` (element-wise `core_distances_`/`reachability_`/`ordering_`/`labels_` vs the oracle), `green_max_eps_half_unchanged`, `green_max_eps_negative_rejected`, `green_xi_zero_accepted`, `green_xi_one_accepted_no_panic`, `green_xi_negative_rejected`, `green_xi_above_one_rejected` in `tests/divergence_optics.rs`. The `min_samples` float-fraction branch (`Interval(RealNotInt, 0, 1, closed="both")` `:245`) + the remaining param surface (`metric`/`p`/`algorithm`/`leaf_size`/`n_jobs`, the `min_samples=5` default) stay NOT-STARTED (REQ-10, blocker #1088). |
//! | REQ-12 (PyO3 binding) | NOT-STARTED | open prereq blocker #1090. No `_RsOPTICS` (grep empty); `import ferrolearn` cannot reach OPTICS. |
//! | REQ-13 (ferray substrate) | NOT-STARTED | open prereq blocker #1091. `optics.rs` imports `ndarray`/`num-traits`, not `ferray-core` (R-SUBSTRATE-1/2). |

use ferrolearn_core::error::FerroError;
use ferrolearn_core::traits::Fit;
use ndarray::{Array1, Array2};
use num_traits::Float;
use std::cmp::Ordering;

// ─────────────────────────────────────────────────────────────────────────────
// Configuration struct
// ─────────────────────────────────────────────────────────────────────────────

/// Cluster-extraction method used by [`OPTICS`] to derive `labels_` from the
/// reachability plot.
///
/// Mirrors scikit-learn's `cluster_method` parameter, whose
/// `_parameter_constraints` is `StrOptions({"dbscan", "xi"})` with default
/// `"xi"` (`sklearn/cluster/_optics.py:251`, `:274`).
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum OpticsClusterMethod {
    /// The automatic Xi-steep extraction technique (the default), proposed in
    /// the OPTICS paper. Uses the `xi`/`min_cluster_size` parameters.
    #[default]
    Xi,
    /// A DBSCAN-like extraction at a fixed `eps`, via
    /// [`cluster_optics_dbscan`]. Uses the `eps` parameter.
    Dbscan,
}

/// OPTICS clustering configuration (unfitted).
///
/// Holds hyperparameters.  Call [`Fit::fit`] to run the algorithm and produce
/// a [`FittedOPTICS`].
///
/// # Type Parameters
///
/// - `F`: The floating-point type (`f32` or `f64`).
#[derive(Debug, Clone)]
pub struct OPTICS<F> {
    /// Minimum number of points required to form a core point (including
    /// the point itself).
    pub min_samples: usize,
    /// Maximum radius considered for neighbourhood queries.  Points beyond
    /// this distance are not considered neighbours.  Defaults to `F::infinity()`.
    pub max_eps: F,
    /// Xi steep-point threshold used by [`FittedOPTICS::extract_clusters`].
    /// A value in `(0, 1)`.  Defaults to `0.05`.
    pub xi: F,
    /// Minimum number of points required for a cluster to be kept.
    /// Clusters smaller than this are relabelled as noise (`-1`).
    /// Defaults to `None`, meaning use `min_samples`.
    pub min_cluster_size: Option<usize>,
    /// The cluster-extraction method used to derive `labels_` from the
    /// reachability plot.  Defaults to [`OpticsClusterMethod::Xi`], matching
    /// scikit-learn's `cluster_method="xi"` (`sklearn/cluster/_optics.py:274`).
    pub cluster_method: OpticsClusterMethod,
    /// DBSCAN `eps` parameter, used only when
    /// `cluster_method == OpticsClusterMethod::Dbscan`.  When `None` (the
    /// default) it assumes the same value as `max_eps`, matching scikit-learn's
    /// `eps=None` (`sklearn/cluster/_optics.py:275`, resolved `:375-378`).
    pub eps: Option<F>,
    /// Correct clusters based on the calculated predecessors during Xi
    /// extraction.  Defaults to `true`, matching scikit-learn's
    /// `predecessor_correction=True` (`sklearn/cluster/_optics.py:277`).
    pub predecessor_correction: bool,
}

impl<F: Float> OPTICS<F> {
    /// Create a new `OPTICS` with the given `min_samples`.
    ///
    /// Defaults: `max_eps = F::infinity()`, `xi = 0.05`, `min_cluster_size = None`.
    #[must_use]
    pub fn new(min_samples: usize) -> Self {
        Self {
            min_samples,
            max_eps: F::infinity(),
            xi: F::from(0.05).unwrap_or_else(|| F::from(5e-2).unwrap()),
            min_cluster_size: None,
            cluster_method: OpticsClusterMethod::Xi,
            eps: None,
            predecessor_correction: true,
        }
    }

    /// Set the maximum neighbourhood radius.
    #[must_use]
    pub fn with_max_eps(mut self, max_eps: F) -> Self {
        self.max_eps = max_eps;
        self
    }

    /// Set the Xi steep-point threshold.
    ///
    /// Must be in `(0, 1)`.
    #[must_use]
    pub fn with_xi(mut self, xi: F) -> Self {
        self.xi = xi;
        self
    }

    /// Set the minimum cluster size.
    ///
    /// Clusters with fewer than `min_cluster_size` points are relabelled as
    /// noise (`-1`).  When `None` (the default), `min_samples` is used as
    /// the minimum cluster size, matching scikit-learn's default behaviour.
    #[must_use]
    pub fn with_min_cluster_size(mut self, size: usize) -> Self {
        self.min_cluster_size = Some(size);
        self
    }

    /// Set the cluster-extraction method (`Xi` or `Dbscan`).
    ///
    /// Mirrors scikit-learn's `cluster_method` parameter
    /// (`sklearn/cluster/_optics.py:274`). With [`OpticsClusterMethod::Dbscan`]
    /// the labels are derived from the reachability plot at a fixed `eps` via
    /// [`cluster_optics_dbscan`]; with [`OpticsClusterMethod::Xi`] (the default)
    /// the automatic Xi-steep extraction is used.
    #[must_use]
    pub fn with_cluster_method(mut self, method: OpticsClusterMethod) -> Self {
        self.cluster_method = method;
        self
    }

    /// Set the DBSCAN `eps` parameter (used only when
    /// `cluster_method == OpticsClusterMethod::Dbscan`).
    ///
    /// When unset (`None`, the default), `eps` assumes the value of `max_eps`,
    /// matching scikit-learn's `eps=None` resolution
    /// (`sklearn/cluster/_optics.py:375-378`). `eps` must not exceed `max_eps`,
    /// otherwise [`Fit::fit`] returns an error.
    #[must_use]
    pub fn with_eps(mut self, eps: F) -> Self {
        self.eps = Some(eps);
        self
    }

    /// Set whether to apply predecessor correction during Xi extraction.
    ///
    /// Mirrors scikit-learn's `predecessor_correction` parameter, a `bool`
    /// with default `True` (`sklearn/cluster/_optics.py:277`). When `true`
    /// (the default), each candidate Xi cluster is adjusted via
    /// `_correct_predecessor` (Algorithm 2 of Schubert & Gertz 2018) before
    /// being accepted.
    #[must_use]
    pub fn with_predecessor_correction(mut self, predecessor_correction: bool) -> Self {
        self.predecessor_correction = predecessor_correction;
        self
    }
}

// ─────────────────────────────────────────────────────────────────────────────
// Fitted struct
// ─────────────────────────────────────────────────────────────────────────────

/// Fitted OPTICS model.
///
/// Stores the reachability ordering, reachability distances, core distances,
/// predecessor tracking, and cluster labels (extracted via the Xi method).
///
/// OPTICS does **not** implement [`Predict`](ferrolearn_core::Predict).
#[derive(Debug, Clone)]
pub struct FittedOPTICS<F> {
    /// Indices of data points in the reachability order.
    ordering_: Vec<usize>,
    /// Reachability distance for each data point (indexed by original point
    /// index, not by ordering position).  The first point processed always
    /// has reachability distance `∞`.
    reachability_: Array1<F>,
    /// Core distance for each point (indexed by original point index).
    /// Equals `∞` for non-core points.
    core_distances_: Array1<F>,
    /// Cluster label for each training sample (0-indexed for clusters; `-1`
    /// for noise).  Extracted using the Xi method.
    labels_: Array1<isize>,
    /// Predecessor for each point in the OPTICS ordering.
    /// `predecessors_[i] = Some(j)` means point `i` was reached from point `j`.
    /// The first point in each connected component has `None`.
    predecessors_: Vec<Option<usize>>,
    /// sklearn-faithful `predecessor_`: a `-1`-sentinel integer array, indexed by
    /// original sample index. `predecessor_[i] = j` (`>= 0`) means point `i` was
    /// reached from point `j`; seed points have `-1`. Built from `predecessors_`
    /// at fit time (`Some(j) -> j as i64`, `None -> -1`), mirroring
    /// `predecessor_ = np.full(n_samples, -1, dtype=int)` then
    /// `predecessor_[unproc[improved]] = point_index` in `compute_optics_graph`
    /// (`sklearn/cluster/_optics.py:604-605`, `:714`).
    predecessor_: Array1<i64>,
    /// The `min_samples` value used during fitting (needed for Xi extraction).
    min_samples_: usize,
    /// The hierarchy of clusters discovered by the Xi method, as a
    /// `(n_clusters, 2)` integer array of `[start, end]` intervals (both
    /// indices inclusive, in reachability-plot / `ordering_` order).
    ///
    /// Mirrors scikit-learn's `OPTICS.cluster_hierarchy_` — "The list of
    /// clusters in the form of `[start, end]` in each row, with all indices
    /// inclusive. The clusters are ordered according to `(end, -start)`
    /// (ascending) so that larger clusters encompassing smaller clusters come
    /// after such nested smaller clusters" (`sklearn/cluster/_optics.py:191-200`,
    /// returned by `cluster_optics_xi` `:859-865` / `_xi_cluster` `:1166-1172`).
    /// Empty (shape `(0, 2)`) when `cluster_method == Dbscan` (sklearn only
    /// sets `cluster_hierarchy_` on the Xi branch, `:373`).
    cluster_hierarchy_: Array2<i64>,
}

impl<F: Float> FittedOPTICS<F> {
    /// Return the reachability ordering (indices into the training data).
    #[must_use]
    pub fn ordering(&self) -> &[usize] {
        &self.ordering_
    }

    /// Return the reachability distances, indexed by original point index.
    #[must_use]
    pub fn reachability(&self) -> &Array1<F> {
        &self.reachability_
    }

    /// Return the core distances, indexed by original point index.
    #[must_use]
    pub fn core_distances(&self) -> &Array1<F> {
        &self.core_distances_
    }

    /// Return the cluster labels (Xi-method extraction).
    ///
    /// Noise points have label `-1`.
    #[must_use]
    pub fn labels(&self) -> &Array1<isize> {
        &self.labels_
    }

    /// Return the predecessors for each point.
    ///
    /// `predecessors()[i] = Some(j)` means point `i` was reached from point `j`
    /// during the OPTICS ordering phase. The first point in each connected
    /// component has `None`.
    #[must_use]
    pub fn predecessors(&self) -> &[Option<usize>] {
        &self.predecessors_
    }

    /// Return the sklearn-faithful `predecessor_`: a `-1`-sentinel integer array
    /// of shape `(n_samples,)`, indexed by original sample index.
    ///
    /// `predecessor()[i] = j` (`>= 0`) means point `i` was reached from point
    /// `j` during the OPTICS ordering phase; seed points (the first point in
    /// each connected component) have `predecessor()[i] = -1`.
    ///
    /// This mirrors scikit-learn's `OPTICS.predecessor_` — "Point that a sample
    /// was reached from, indexed by object order. Seed points have a predecessor
    /// of -1." (`sklearn/cluster/_optics.py:187-189`, built as
    /// `np.full(n_samples, -1, dtype=int)` in `compute_optics_graph` `:604-605`,
    /// `:714`). It is the `-1`-sentinel `Array1<i64>` counterpart of the internal
    /// [`predecessors`](Self::predecessors) (`&[Option<usize>]`, seed `None`).
    #[must_use]
    pub fn predecessor(&self) -> &Array1<i64> {
        &self.predecessor_
    }

    /// Return the cluster hierarchy discovered by the Xi method.
    ///
    /// A `(n_clusters, 2)` integer array of `[start, end]` intervals (both
    /// indices inclusive, in reachability-plot / `ordering_` order), ordered so
    /// that larger clusters encompassing smaller ones come after the nested
    /// smaller clusters. Because `labels_` keeps only the leaf-level clusters,
    /// `cluster_hierarchy().nrows()` is usually greater than the number of
    /// distinct non-noise labels.
    ///
    /// Mirrors scikit-learn's `OPTICS.cluster_hierarchy_`
    /// (`sklearn/cluster/_optics.py:191-200`). Empty (shape `(0, 2)`) when the
    /// `Dbscan` cluster method was used (sklearn only sets it on the Xi branch).
    #[must_use]
    pub fn cluster_hierarchy(&self) -> &Array2<i64> {
        &self.cluster_hierarchy_
    }

    /// Return the number of clusters found (excluding noise).
    #[must_use]
    pub fn n_clusters(&self) -> usize {
        let max_label = self.labels_.iter().max().copied().unwrap_or(-1);
        if max_label < 0 {
            0
        } else {
            (max_label + 1) as usize
        }
    }

    /// Extract flat clusters from the reachability plot using the Xi method.
    ///
    /// The Xi method identifies *steep up* and *steep down* areas in the
    /// reachability plot.  Clusters are formed between matching steep-down /
    /// steep-up pairs.
    ///
    /// `xi` must be in `(0, 1)`.  Returns a vector of cluster labels
    /// (length == `n_samples`); noise has label `-1`.
    ///
    /// # Errors
    ///
    /// Returns [`FerroError::InvalidParameter`] if `xi` is outside `(0, 1)`.
    pub fn extract_clusters(&self, xi: F) -> Result<Array1<isize>, FerroError> {
        if xi <= F::zero() || xi >= F::one() {
            return Err(FerroError::InvalidParameter {
                name: "xi".into(),
                reason: "must be in (0, 1)".into(),
            });
        }
        // Re-run the Xi extraction with this `xi`, the fitted `min_samples`
        // (also used as the default `min_cluster_size`), and predecessor
        // correction enabled (the sklearn default). Discards the hierarchy.
        let (labels, _hierarchy) = cluster_optics_xi(
            &self.reachability_,
            &self.predecessor_,
            &self.ordering_,
            xi,
            self.min_samples_,
            None,
            true,
        );
        Ok(labels)
    }
}

// ─────────────────────────────────────────────────────────────────────────────
// Internal helpers
// ─────────────────────────────────────────────────────────────────────────────

/// Euclidean distance between two slices.
#[inline]
fn euclidean<F: Float>(a: &[F], b: &[F]) -> F {
    a.iter()
        .zip(b)
        .fold(F::zero(), |acc, (&ai, &bi)| acc + (ai - bi) * (ai - bi))
        .sqrt()
}

/// Return all neighbours of `idx` within distance `max_eps` (sorted by distance).
///
/// Returns `(neighbor_indices, distances)` in ascending distance order.
fn get_neighbors<F: Float>(x: &Array2<F>, idx: usize, max_eps: F) -> (Vec<usize>, Vec<F>) {
    let row = x.row(idx);
    let rs = row.as_slice().unwrap_or(&[]);
    let mut pairs: Vec<(F, usize)> = (0..x.nrows())
        .filter_map(|j| {
            let other = x.row(j);
            let os = other.as_slice().unwrap_or(&[]);
            let d = euclidean(rs, os);
            if d <= max_eps && j != idx {
                Some((d, j))
            } else {
                None
            }
        })
        .collect();
    pairs.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap_or(Ordering::Equal));
    let indices = pairs.iter().map(|p| p.1).collect();
    let dists = pairs.iter().map(|p| p.0).collect();
    (indices, dists)
}

/// Compute core distance of `idx`: distance to the `min_samples`-th nearest
/// neighbour within `max_eps`.  Returns `F::infinity()` if fewer than
/// `min_samples` neighbours exist.
fn core_distance<F: Float>(x: &Array2<F>, idx: usize, max_eps: F, min_samples: usize) -> F {
    let row = x.row(idx);
    let rs = row.as_slice().unwrap_or(&[]);

    let mut dists: Vec<F> = (0..x.nrows())
        .filter_map(|j| {
            if j == idx {
                return None;
            }
            let other = x.row(j);
            let os = other.as_slice().unwrap_or(&[]);
            let d = euclidean(rs, os);
            if d <= max_eps { Some(d) } else { None }
        })
        .collect();

    if dists.len() < min_samples.saturating_sub(1) {
        // Not enough neighbours (need min_samples - 1 others).
        return F::infinity();
    }

    dists.sort_by(|a, b| a.partial_cmp(b).unwrap_or(Ordering::Equal));
    // The core distance is the distance to the (min_samples-1)-th other point
    // (0-indexed), i.e., the min_samples-th point overall including self.
    let k = min_samples.saturating_sub(1);
    if k == 0 {
        F::zero()
    } else if k <= dists.len() {
        dists[k - 1]
    } else {
        F::infinity()
    }
}

/// Round a reachability value to the floating-point type's decimal precision,
/// matching sklearn's `np.around(rdists, decimals=np.finfo(dtype).precision)`
/// (`sklearn/cluster/_optics.py:711`).
///
/// `np.finfo` precision is 15 decimals for float64 and 6 for float32.  Rounding
/// collapses sub-ULP differences between candidate reachabilities into exact
/// ties, which is required for sklearn's smallest-index tie-break in the
/// traversal to reproduce its ordering.
#[inline]
fn round_to_precision<F: Float>(v: F) -> F {
    if !v.is_finite() {
        return v;
    }
    let decimals = if std::mem::size_of::<F>() == 4 { 6 } else { 15 };
    let factor = F::from(10f64.powi(decimals)).unwrap_or_else(F::one);
    // numpy's `np.around` uses round-half-to-even (`np.rint`); matching it is
    // load-bearing — half-away-from-zero would shift `538516480713450.5` to
    // `...451` instead of `...450`, breaking the tie collapse. f64/f32 both
    // round through f64 here, then convert back, mirroring numpy's multiply/
    // rint/divide.
    let scaled = (v * factor).to_f64().unwrap_or(0.0).round_ties_even();
    F::from(scaled).unwrap_or(v) / factor
}

/// Update reachability distances and predecessors for the unprocessed
/// neighbours of `current_point`.
///
/// For each unprocessed neighbour `q`, the new reachability distance is
/// `round(max(core_dist_p, dist(p, q)))` (rounded to the dtype precision,
/// matching `np.around` in `_set_reach_dist`, `sklearn/cluster/_optics.py:710-714`).
/// If this improves the current value, `reachability[q]` is lowered and
/// `current_point` is recorded as the predecessor of `q`.
fn update_seeds<F: Float>(
    core_dist_p: F,
    current_point: usize,
    neighbors: &[usize],
    neighbor_dists: &[F],
    processed: &[bool],
    reachability: &mut Array1<F>,
    predecessors: &mut [Option<usize>],
) {
    for (i, &q) in neighbors.iter().enumerate() {
        if processed[q] {
            continue;
        }
        let new_reach = if core_dist_p > neighbor_dists[i] {
            core_dist_p
        } else {
            neighbor_dists[i]
        };
        // np.around to the dtype precision (`_optics.py:711`) so that
        // sub-ULP differences become exact ties for the tie-break.
        let new_reach = round_to_precision(new_reach);
        if new_reach < reachability[q] {
            reachability[q] = new_reach;
            predecessors[q] = Some(current_point);
        }
    }
}

/// Perform DBSCAN extraction from an OPTICS reachability graph at a fixed `eps`.
///
/// This is the ferrolearn translation of scikit-learn's
/// `cluster_optics_dbscan(*, reachability, core_distances, ordering, eps)`
/// (`sklearn/cluster/_optics.py:726-788`). Given the OPTICS-computed
/// `reachability_`, `core_distances_`, and `ordering_`, it labels each sample as
/// a member of a DBSCAN-like cluster (`0, 1, 2, …`) or as noise (`-1`), running
/// in linear time over `ordering`.
///
/// The exact two-step labelling (`_optics.py:781-787`):
/// 1. `far_reach = reachability > eps` (STRICT `>`), `near_core =
///    core_distances <= eps` (INCLUSIVE `<=`). Walking the points in `ordering`
///    order, maintain a running cumulative sum of `far_reach[o] & near_core[o]`
///    and assign `labels[o] = cumsum_so_far - 1`. This assigns ALL points.
/// 2. For every point `i` with `far_reach[i] & !near_core[i]`, OVERWRITE
///    `labels[i] = -1` (the noise mask).
///
/// `reachability`, `core_distances` are indexed by ORIGINAL sample index;
/// `ordering` lists the original indices in OPTICS reachability order. The
/// returned labels are indexed by original sample index, shape `(n_samples,)`.
///
/// # Panics
///
/// Does not panic. Out-of-range entries in `ordering` (which OPTICS never
/// produces) are skipped rather than indexing out of bounds.
#[must_use]
pub fn cluster_optics_dbscan<F: Float>(
    reachability: &[F],
    core_distances: &[F],
    ordering: &[usize],
    eps: F,
) -> Array1<isize> {
    let n_samples = core_distances.len();
    // labels = np.zeros(n_samples, dtype=int)  (`_optics.py:782`).
    let mut labels = Array1::<isize>::zeros(n_samples);

    // Step 1: labels[ordering] = np.cumsum(far_reach[ordering] & near_core[ordering]) - 1
    // (`_optics.py:784-786`). `far_reach` uses STRICT `>`, `near_core` uses
    // INCLUSIVE `<=`. We walk `ordering` accumulating the boolean cumsum.
    let mut cumsum: isize = 0;
    for &o in ordering {
        if o >= n_samples {
            // OPTICS never emits an out-of-range index; guard defensively.
            continue;
        }
        let far_reach = reachability[o] > eps;
        let near_core = core_distances[o] <= eps;
        if far_reach && near_core {
            cumsum += 1;
        }
        labels[o] = cumsum - 1;
    }

    // Step 2: labels[far_reach & ~near_core] = -1  (`_optics.py:787`). This
    // OVERWRITES the cumsum value for the noise points, indexed by original
    // sample index over ALL samples (not just those in `ordering`).
    for i in 0..n_samples {
        let far_reach = reachability[i] > eps;
        let near_core = core_distances[i] <= eps;
        if far_reach && !near_core {
            labels[i] = -1;
        }
    }

    labels
}

// ─────────────────────────────────────────────────────────────────────────────
// Xi-method cluster extraction — faithful port of sklearn's
// `cluster_optics_xi` / `_xi_cluster` (`sklearn/cluster/_optics.py:810-1202`).
//
// The reachability plot, predecessor plot, and cluster intervals all index the
// reachability-PLOT (i.e. `ordering`) order, NOT the original sample order. The
// final `_extract_xi_labels` scatters back to original-sample order.
// ─────────────────────────────────────────────────────────────────────────────

/// A steep-down area (SDA) tracked during Xi extraction
/// (a `{"start", "end", "mib"}` dict in sklearn, `_optics.py:1104`).
#[derive(Debug, Clone)]
struct SteepDownArea {
    start: usize,
    end: usize,
    mib: f64,
}

/// Automatically extract clusters via the Xi-steep method.
///
/// Faithful translation of scikit-learn's `cluster_optics_xi`
/// (`sklearn/cluster/_optics.py:810-918`). Resolves `min_cluster_size`
/// (`None` → `min_samples`), runs [`xi_cluster`] over the reachability plot, and
/// derives leaf labels via [`extract_xi_labels`].
///
/// Returns `(labels, clusters)` where `labels` is the `(n_samples,)` cluster
/// assignment (`-1` for noise, indexed by ORIGINAL sample order) and `clusters`
/// is the `(n_clusters, 2)` `[start, end]` hierarchy (PLOT-order intervals,
/// both inclusive) — exactly `OPTICS.labels_` and `OPTICS.cluster_hierarchy_`.
///
/// `reachability` and `predecessor` are indexed by ORIGINAL sample order;
/// `ordering` lists original sample indices in reachability-plot order.
///
/// ferrolearn's `min_samples` / `min_cluster_size` are already `>= 2` integers
/// (the float-fraction `<= 1` resolution in `cluster_optics_xi` `:900-906` is
/// REQ-10, out of scope), so only the `None → min_samples` default is applied
/// here.
fn cluster_optics_xi<F: Float>(
    reachability: &Array1<F>,
    predecessor: &Array1<i64>,
    ordering: &[usize],
    xi: F,
    min_samples: usize,
    min_cluster_size: Option<usize>,
    predecessor_correction: bool,
) -> (Array1<isize>, Array2<i64>) {
    let n_total = reachability.len();
    // `if min_cluster_size is None: min_cluster_size = min_samples` (`:902-903`).
    let min_cluster_size = min_cluster_size.unwrap_or(min_samples);

    if ordering.is_empty() {
        return (
            Array1::from_elem(n_total, -1isize),
            Array2::<i64>::zeros((0, 2)),
        );
    }

    // reachability_plot = reachability[ordering] (`:909`), as f64 with non-finite
    // → +inf. _xi_cluster then hstacks an extra +inf sentinel (`:1070`).
    let mut r_plot: Vec<f64> = ordering
        .iter()
        .map(|&i| {
            let v = reachability[i].to_f64().unwrap_or(f64::INFINITY);
            if v.is_finite() { v } else { f64::INFINITY }
        })
        .collect();
    r_plot.push(f64::INFINITY);

    // predecessor_plot = predecessor[ordering] (`:910`), keeping sklearn's `-1`
    // sentinel for seed points.
    let pred_plot: Vec<i64> = ordering.iter().map(|&i| predecessor[i]).collect();

    let xi_f64 = xi.to_f64().unwrap_or(0.05);

    let clusters = xi_cluster(
        &r_plot,
        &pred_plot,
        ordering,
        xi_f64,
        min_samples,
        min_cluster_size,
        predecessor_correction,
    );

    let labels = extract_xi_labels(ordering, &clusters, n_total);

    // Build the `(n_clusters, 2)` hierarchy array (`np.array(clusters)`, `:1172`).
    let hierarchy = if clusters.is_empty() {
        Array2::<i64>::zeros((0, 2))
    } else {
        let mut flat: Vec<i64> = Vec::with_capacity(clusters.len() * 2);
        for &(s, e) in &clusters {
            flat.push(s as i64);
            flat.push(e as i64);
        }
        Array2::from_shape_vec((clusters.len(), 2), flat)
            .unwrap_or_else(|_| Array2::<i64>::zeros((0, 2)))
    };

    (labels, hierarchy)
}

/// Extend the steep area until it's maximal.
///
/// Faithful translation of `_extend_region` (`sklearn/cluster/_optics.py:921-981`).
/// The same function serves both directions: to extend an upward region pass
/// `steep_point=steep_upward, xward_point=downward`; to extend a downward region
/// pass `steep_point=steep_downward, xward_point=upward`. Up/down steep regions
/// can't have more than `min_samples` consecutive non-steep points going the
/// correct direction. The returned `end` is inclusive.
fn extend_region(
    steep_point: &[bool],
    xward_point: &[bool],
    start: usize,
    min_samples: usize,
) -> usize {
    let n_samples = steep_point.len();
    let mut non_xward_points = 0usize;
    let mut index = start;
    let mut end = start;
    // find a maximal area
    while index < n_samples {
        if steep_point[index] {
            non_xward_points = 0;
            end = index;
        } else if !xward_point[index] {
            // it's not a steep point, but still goes up.
            non_xward_points += 1;
            // region should include no more than min_samples consecutive
            // non steep xward points.
            if non_xward_points > min_samples {
                break;
            }
        } else {
            return end;
        }
        index += 1;
    }
    end
}

/// Update steep-down areas (SDAs) using the new maximum-in-between (`mib`) value
/// and `xi_complement = 1 - xi`.
///
/// Faithful translation of `_update_filter_sdas`
/// (`sklearn/cluster/_optics.py:984-995`): if `mib` is `inf` every SDA is
/// dropped; otherwise keep SDAs whose `r_plot[start] * xi_complement >= mib`
/// (sklearn `mib <= reachability_plot[sda["start"]] * xi_complement`) and update
/// each survivor's `mib = max(mib, sda.mib)`.
fn update_filter_sdas(
    sdas: Vec<SteepDownArea>,
    mib: f64,
    xi_complement: f64,
    r_plot: &[f64],
) -> Vec<SteepDownArea> {
    if mib.is_infinite() {
        return Vec::new();
    }
    let mut res: Vec<SteepDownArea> = sdas
        .into_iter()
        .filter(|sda| mib <= r_plot[sda.start] * xi_complement)
        .collect();
    for sda in &mut res {
        sda.mib = sda.mib.max(mib);
    }
    res
}

/// Correct for predecessors (Algorithm 2 of Schubert & Gertz 2018).
///
/// Faithful translation of `_correct_predecessor`
/// (`sklearn/cluster/_optics.py:998-1017`). Inputs are PLOT-ordered. Returns
/// `Some((s, e))` for the corrected interval, or `None` (sklearn `(None, None)`)
/// when the interval collapses (`s >= e`).
///
/// `predecessor_plot[e]` is an ORIGINAL sample index (or `-1` for a seed);
/// `ordering[i]` is also an original sample index, so the membership test
/// `p_e == ordering[i]` compares like-for-like. A `-1` predecessor never equals
/// any `ordering[i]` (all `>= 0`), matching sklearn where a seed's predecessor
/// (`-1`) is not in the ordering window.
fn correct_predecessor(
    r_plot: &[f64],
    predecessor_plot: &[i64],
    ordering: &[usize],
    s: usize,
    mut e: usize,
) -> Option<(usize, usize)> {
    let mut s = s;
    while s < e {
        if r_plot[s] > r_plot[e] {
            return Some((s, e));
        }
        let p_e = predecessor_plot[e];
        let mut found = false;
        for &item in ordering.iter().take(e).skip(s) {
            if p_e == item as i64 {
                found = true;
                break;
            }
        }
        if found {
            return Some((s, e));
        }
        e -= 1;
        // (s is never advanced by sklearn; kept mutable only to mirror the
        // signature — the `while s < e` guard handles the collapse.)
        let _ = &mut s;
    }
    None
}

/// Automatically extract clusters according to the Xi-steep method (Figure 19).
///
/// Faithful translation of `_xi_cluster` (`sklearn/cluster/_optics.py:1020-1172`).
///
/// `r_plot` already has the appended `+inf` sentinel (length `n_plot + 1`).
/// `predecessor_plot` is PLOT-ordered with the `-1` seed sentinel. Returns the
/// list of `(start, end)` cluster intervals (PLOT-order, both inclusive) in
/// discovery order, with each steep-up area's matches reversed (smaller clusters
/// first), exactly as sklearn returns `np.array(clusters)`.
///
/// The ratio arithmetic mirrors numpy under `errstate(invalid="ignore")`:
/// `inf/inf` and `0/0` give `NaN` (all comparisons false), `x/inf = 0`,
/// finite `x/0 = +inf`. Rust's IEEE-754 float division reproduces this exactly,
/// so no special-casing is needed (and no divide-by-zero panic, R-CODE-2).
#[allow(
    clippy::too_many_arguments,
    reason = "mirrors sklearn `_xi_cluster`'s 7-parameter signature (sklearn/cluster/_optics.py:1020-1028)"
)]
fn xi_cluster(
    r_plot: &[f64],
    predecessor_plot: &[i64],
    ordering: &[usize],
    xi: f64,
    min_samples: usize,
    min_cluster_size: usize,
    predecessor_correction: bool,
) -> Vec<(usize, usize)> {
    // xi_complement = 1 - xi (`:1072`).
    let xi_complement = 1.0 - xi;

    let n_plot = r_plot.len() - 1; // excluding the appended +inf sentinel

    // Definition 9 (paper-corrected): steep DOWN uses `>=`, steep UP uses `<=`.
    // ratio = r_plot[:-1] / r_plot[1:] (`:1083-1087`).
    let mut steep_upward = vec![false; n_plot];
    let mut steep_downward = vec![false; n_plot];
    let mut upward = vec![false; n_plot];
    let mut downward = vec![false; n_plot];
    let inv_xi_complement = 1.0 / xi_complement; // 1 / (1 - xi); +inf when xi == 1
    for i in 0..n_plot {
        let ratio = r_plot[i] / r_plot[i + 1]; // NaN for inf/inf or 0/0
        steep_upward[i] = ratio <= xi_complement;
        steep_downward[i] = ratio >= inv_xi_complement;
        downward[i] = ratio > 1.0;
        upward[i] = ratio < 1.0;
    }

    let mut sdas: Vec<SteepDownArea> = Vec::new();
    let mut clusters: Vec<(usize, usize)> = Vec::new();
    let mut index = 0usize;
    let mut mib = 0.0_f64; // maximum in between

    // for steep_index in np.flatnonzero(steep_upward | steep_downward) (`:1091`).
    let steep_indices: Vec<usize> = (0..n_plot)
        .filter(|&i| steep_upward[i] || steep_downward[i])
        .collect();

    for steep_index in steep_indices {
        // continue if steep_index is inside a discovered xward area (`:1094-1095`).
        if steep_index < index {
            continue;
        }

        // mib = max(mib, np.max(r_plot[index : steep_index + 1])) (`:1097`).
        for &item in r_plot.iter().take(steep_index + 1).skip(index) {
            if item > mib {
                mib = item;
            }
        }

        if steep_downward[steep_index] {
            // --- steep downward area (`:1100-1107`) ---
            sdas = update_filter_sdas(sdas, mib, xi_complement, r_plot);
            let d_start = steep_index;
            let d_end = extend_region(&steep_downward, &upward, d_start, min_samples);
            sdas.push(SteepDownArea {
                start: d_start,
                end: d_end,
                mib: 0.0,
            });
            index = d_end + 1;
            mib = r_plot[index];
        } else {
            // --- steep upward area (`:1110-1170`) ---
            sdas = update_filter_sdas(sdas, mib, xi_complement, r_plot);
            let u_start = steep_index;
            let u_end = extend_region(&steep_upward, &downward, u_start, min_samples);
            index = u_end + 1;
            mib = r_plot[index];

            let mut u_clusters: Vec<(usize, usize)> = Vec::new();
            for d in &sdas {
                let mut c_start = d.start;
                let mut c_end = u_end;

                // line (**), sc2*: r_plot[c_end + 1] * xi_complement < D["mib"]
                // → continue (`:1123-1124`). c_end+1 <= u_end+1 = index <= n_plot,
                // and r_plot has the appended sentinel, so the index is in range.
                if r_plot[c_end + 1] * xi_complement < d.mib {
                    continue;
                }

                // Definition 11: criterion 4 (`:1126-1144`).
                let d_max = r_plot[d.start];
                if d_max * xi_complement >= r_plot[c_end + 1] {
                    // Find the first index from the left almost at the end level.
                    // sklearn evaluates `r_plot[c_start+1] > r_plot[c_end+1]`
                    // FIRST then `c_start < D["end"]` (short-circuit `and`). We
                    // guard `c_start + 1 < r_plot.len()` defensively (always true:
                    // c_start < D["end"] <= n_plot-1 here), no OOB.
                    while c_start < d.end && r_plot[c_start + 1] > r_plot[c_end + 1] {
                        c_start += 1;
                    }
                } else if r_plot[c_end + 1] * xi_complement >= d_max {
                    // Find the first index from the right almost at the start
                    // level. Paper-correction: `r(x) > r(sD)` (not `<`). sklearn:
                    // `while r_plot[c_end - 1] > D_max and c_end > U_start`.
                    while c_end > u_start && r_plot[c_end - 1] > d_max {
                        c_end -= 1;
                    }
                }

                // predecessor correction (`:1146-1152`).
                if predecessor_correction {
                    match correct_predecessor(r_plot, predecessor_plot, ordering, c_start, c_end) {
                        Some((cs, ce)) => {
                            c_start = cs;
                            c_end = ce;
                        }
                        None => continue, // c_start is None → continue
                    }
                }

                // Definition 11: criterion 3.a (REQ-7 — INSIDE the loop):
                // c_end - c_start + 1 < min_cluster_size → continue (`:1154-1156`).
                if c_end + 1 - c_start < min_cluster_size {
                    continue;
                }

                // Definition 11: criterion 1: c_start > D["end"] → continue (`:1158-1160`).
                if c_start > d.end {
                    continue;
                }

                // Definition 11: criterion 2: c_end < U_start → continue (`:1162-1164`).
                if c_end < u_start {
                    continue;
                }

                u_clusters.push((c_start, c_end));
            }

            // add smaller clusters first (`:1168-1170`).
            u_clusters.reverse();
            clusters.extend(u_clusters);
        }
    }

    clusters
}

/// Extract labels from the clusters returned by [`xi_cluster`].
///
/// Faithful translation of `_extract_xi_labels`
/// (`sklearn/cluster/_optics.py:1175-1201`). Relies on `clusters` being stored
/// smaller-first: a cluster interval `[c0, c1]` (PLOT-order) is given the next
/// label only if NONE of its plot positions is already labelled, selecting the
/// leaf-level clusters. Then `labels[ordering] = labels.copy()` scatters from
/// plot order back to ORIGINAL sample order.
fn extract_xi_labels(
    ordering: &[usize],
    clusters: &[(usize, usize)],
    n_total: usize,
) -> Array1<isize> {
    let n = ordering.len();
    // labels = np.full(len(ordering), -1) (`:1194`). Length is n_ordered; the
    // plot positions index [0, n). The scatter restores original-sample shape.
    let mut plot_labels = vec![-1isize; n];
    let mut label = 0isize;
    for &(c0, c1) in clusters {
        // `if not np.any(labels[c0 : c1+1] != -1)` (`:1197`). Guard the slice
        // against any out-of-range interval (xi_cluster never emits one).
        let end = (c1 + 1).min(n);
        let start = c0.min(end);
        let any_assigned = plot_labels[start..end].iter().any(|&v| v != -1);
        if !any_assigned {
            for v in &mut plot_labels[start..end] {
                *v = label;
            }
            label += 1;
        }
    }

    // labels[ordering] = labels.copy() (`:1200`): scatter plot-order labels back
    // to original-sample order. `labels` (original order) starts all -1; any
    // sample not in `ordering` (OPTICS always covers all) stays -1.
    let mut labels = Array1::from_elem(n_total, -1isize);
    for (plot_pos, &sample) in ordering.iter().enumerate() {
        labels[sample] = plot_labels[plot_pos];
    }
    labels
}

// ─────────────────────────────────────────────────────────────────────────────
// Fit implementation
// ─────────────────────────────────────────────────────────────────────────────

impl<F: Float + Send + Sync + 'static> Fit<Array2<F>, ()> for OPTICS<F> {
    type Fitted = FittedOPTICS<F>;
    type Error = FerroError;

    /// Fit the OPTICS model to the data.
    ///
    /// Computes the reachability ordering and distances for all training points.
    /// Cluster labels are extracted using the Xi method with the configured `xi`
    /// parameter.
    ///
    /// # Errors
    ///
    /// - [`FerroError::InvalidParameter`] if `min_samples < 2`, `max_eps < 0`,
    ///   or `xi` is outside `[0, 1]`.
    /// - [`FerroError::InsufficientSamples`] if the dataset is empty.
    fn fit(&self, x: &Array2<F>, _y: &()) -> Result<FittedOPTICS<F>, FerroError> {
        let n_samples = x.nrows();

        // Validate parameters — boundaries match sklearn's
        // `OPTICS._parameter_constraints` (`sklearn/cluster/_optics.py:242-264`).
        // The error TYPE stays the grandfathered crate `FerroError` ABI (not
        // sklearn's `InvalidParameterError`); only the accept/reject BOUNDARY is
        // matched.
        //
        // `"min_samples": [Interval(Integral, 2, None, closed="left"), ...]`
        // (`:243-246`) — an int `min_samples` must be `>= 2`, so `{0, 1}` are
        // rejected. (ferrolearn's `min_samples` is `usize`; the float-fraction
        // branch is REQ-10, out of scope.)
        if self.min_samples < 2 {
            return Err(FerroError::InvalidParameter {
                name: "min_samples".into(),
                reason: "must be at least 2".into(),
            });
        }
        // `"max_eps": [Interval(Real, 0, None, closed="both")]` (`:247`) —
        // `max_eps >= 0`, with `max_eps == 0` ALLOWED (closed at 0). Reject only
        // `max_eps < 0` (for an `F` that can be negative). With `max_eps == 0`
        // the fit RUNS: no point has a neighbour within `max_eps`, so every core
        // distance and reachability is `∞`, ordering is by index, and the labels
        // are all `-1` (matching the live sklearn 1.5.2 oracle).
        // `NaN` is rejected (NaN ∉ [0, ∞]; sklearn's `Interval` rejects it as
        // `InvalidParameterError`). Note `NaN < 0` is `false`, so the NaN guard
        // is explicit (#2197).
        if self.max_eps < F::zero() || self.max_eps.is_nan() {
            return Err(FerroError::InvalidParameter {
                name: "max_eps".into(),
                reason: "must be non-negative".into(),
            });
        }
        // `"xi": [Interval(Real, 0, 1, closed="both")]` (`:253`) — `xi ∈ [0, 1]`,
        // BOTH endpoints allowed (`xi == 0` and `xi == 1` accepted by parameter
        // validation). Reject only `xi < 0 || xi > 1`. (sklearn's `_xi_cluster`
        // raises a runtime `ZeroDivisionError` for `xi == 1.0` because
        // `1 - xi == 0`; ferrolearn's Xi extraction computes the same ratios in
        // Rust float arithmetic — `1.0/0.0 == inf`, `inf/inf == NaN` — so it does
        // NOT panic (R-CODE-2), producing all-`-1` labels. Matching the Xi
        // `labels_` VALUE for these endpoints is REQ-5, NOT-STARTED; only the
        // accept/reject BOUNDARY is matched here.)
        if self.xi < F::zero() || self.xi > F::one() || self.xi.is_nan() {
            return Err(FerroError::InvalidParameter {
                name: "xi".into(),
                reason: "must be in [0, 1]".into(),
            });
        }
        // `xi == 1.0` passes sklearn's `_parameter_constraints` (closed at 1) but
        // makes `_xi_cluster` divide by `1 - xi == 0`, raising a runtime
        // `ZeroDivisionError` IN THE XI METHOD (`_optics.py` steep-area ratio);
        // the OBSERVABLE contract is therefore "fit raises". We mirror that by
        // erroring for `xi == 1` ONLY on the Xi path (the `'dbscan'` method
        // ignores `xi`, so sklearn does NOT raise there). #2197.
        if matches!(self.cluster_method, OpticsClusterMethod::Xi) && self.xi == F::one() {
            return Err(FerroError::InvalidParameter {
                name: "xi".into(),
                reason: "xi == 1 is not supported by the Xi method (1 - xi == 0)".into(),
            });
        }
        // `"eps": [Interval(Real, 0, None, closed="both"), None]` (`:252`) — when
        // set, `eps >= 0` (NaN rejected). sklearn validates this BEFORE the fit
        // body, so it applies regardless of `cluster_method` (the Xi path never
        // reads `eps`, but an invalid `eps` is still rejected). #2198.
        if self.eps.is_some_and(|eps| eps < F::zero() || eps.is_nan()) {
            return Err(FerroError::InvalidParameter {
                name: "eps".into(),
                reason: "must be non-negative".into(),
            });
        }

        if n_samples == 0 {
            return Err(FerroError::InsufficientSamples {
                required: 1,
                actual: 0,
                context: "OPTICS requires at least 1 sample".into(),
            });
        }
        // sklearn `_validate_size(min_samples, n_samples, "min_samples")`
        // (`_optics.py:393-400`, called `:597`): `min_samples` must be NO GREATER
        // than `n_samples` (else `ValueError`). ferrolearn previously fit silently
        // with all-`∞` core distances (#2197).
        if self.min_samples > n_samples {
            return Err(FerroError::InvalidParameter {
                name: "min_samples".into(),
                reason: format!(
                    "must be no greater than the number of samples ({n_samples}). Got {}",
                    self.min_samples
                ),
            });
        }

        // Initialise state.
        let mut reachability = Array1::from_elem(n_samples, F::infinity());
        let mut core_distances = Array1::from_elem(n_samples, F::infinity());
        let mut processed = vec![false; n_samples];
        let mut ordering: Vec<usize> = Vec::with_capacity(n_samples);
        let mut predecessors: Vec<Option<usize>> = vec![None; n_samples];

        // Pre-compute core distances.
        for i in 0..n_samples {
            core_distances[i] = core_distance(x, i, self.max_eps, self.min_samples);
        }

        // Main OPTICS loop (`compute_optics_graph`, `sklearn/cluster/_optics.py:638-659`).
        // sklearn does NOT use a heap (comment :52-53): each iteration picks the
        // next point by a linear argmin over ALL unprocessed points' reachability
        // (including those still at +inf), preferring the smallest index on ties
        // (:639-640 "prefer smaller ids on ties, possibly np.inf!"). The order
        // entries are written to `ordering` is significant (:632-633).
        for _ in 0..n_samples {
            // Choose the next point: smallest reachability over unprocessed, with
            // smallest-index tie-break (`index[np.argmin(reachability_[index])]`,
            // :641-642).
            let mut point: Option<usize> = None;
            let mut best = F::infinity();
            for j in 0..n_samples {
                if processed[j] {
                    continue;
                }
                // Strict `<` keeps the first (smallest) index on ties.
                if point.is_none() || reachability[j] < best {
                    best = reachability[j];
                    point = Some(j);
                }
            }
            let Some(point) = point else {
                break;
            };

            processed[point] = true;
            ordering.push(point);

            if core_distances[point].is_finite() {
                let (nbrs, nbr_dists) = get_neighbors(x, point, self.max_eps);
                update_seeds(
                    core_distances[point],
                    point,
                    &nbrs,
                    &nbr_dists,
                    &processed,
                    &mut reachability,
                    &mut predecessors,
                );
            }
        }

        // Build the sklearn-faithful `-1`-sentinel `predecessor_` from the
        // `Option<usize>` predecessors: `Some(j) -> j as i64`, `None -> -1`
        // (mirroring `np.full(n_samples, -1, dtype=int)` then in-place updates,
        // `sklearn/cluster/_optics.py:604-605`, `:714`). Built BEFORE the Xi
        // extraction because `cluster_optics_xi` consumes `predecessor[ordering]`.
        let predecessor_: Array1<i64> = predecessors
            .iter()
            .map(|p| p.map_or(-1i64, |j| j as i64))
            .collect();

        // Extract cluster labels.  sklearn dispatches on `cluster_method`
        // (`sklearn/cluster/_optics.py:363-390`): `"xi"` runs `cluster_optics_xi`
        // (the default), `"dbscan"` resolves `eps` then runs
        // `cluster_optics_dbscan`. The Xi branch additionally sets
        // `cluster_hierarchy_` (`:373`); the dbscan branch leaves it empty.
        let (labels, cluster_hierarchy_) = match self.cluster_method {
            OpticsClusterMethod::Xi => {
                // labels_, clusters_ = cluster_optics_xi(...) (`:364-372`);
                // self.cluster_hierarchy_ = clusters_ (`:373`).
                cluster_optics_xi(
                    &reachability,
                    &predecessor_,
                    &ordering,
                    self.xi,
                    self.min_samples,
                    self.min_cluster_size,
                    self.predecessor_correction,
                )
            }
            OpticsClusterMethod::Dbscan => {
                // Resolve `eps`: `eps = self.eps if self.eps is not None else
                // self.max_eps` (`sklearn/cluster/_optics.py:375-378`).
                let eps = self.eps.unwrap_or(self.max_eps);
                // `if eps > self.max_eps: raise ValueError("Specify an epsilon
                // smaller than %s. Got %s.")` (`:380-383`).
                if eps > self.max_eps {
                    return Err(FerroError::InvalidParameter {
                        name: "eps".into(),
                        reason: format!(
                            "Specify an epsilon smaller than {:?}. Got {:?}.",
                            self.max_eps.to_f64().unwrap_or(f64::INFINITY),
                            eps.to_f64().unwrap_or(f64::INFINITY)
                        ),
                    });
                }
                // labels_ = cluster_optics_dbscan(reachability_, core_distances_,
                // ordering_, eps)  (`:385-390`). sklearn does NOT set
                // `cluster_hierarchy_` on this branch, so it stays empty.
                let labels = cluster_optics_dbscan(
                    reachability.as_slice().unwrap_or(&[]),
                    core_distances.as_slice().unwrap_or(&[]),
                    &ordering,
                    eps,
                );
                (labels, Array2::<i64>::zeros((0, 2)))
            }
        };

        Ok(FittedOPTICS {
            ordering_: ordering,
            reachability_: reachability,
            core_distances_: core_distances,
            labels_: labels,
            predecessors_: predecessors,
            predecessor_,
            min_samples_: self.min_samples,
            cluster_hierarchy_,
        })
    }
}

impl<F: Float + Send + Sync + 'static> OPTICS<F> {
    /// Fit on `x` and return the cluster labels for those samples in one
    /// call. Equivalent to sklearn `ClusterMixin.fit_predict`. Noise
    /// samples are labeled as `-1`.
    ///
    /// # Errors
    ///
    /// Forwards any error from [`Fit::fit`].
    pub fn fit_predict(&self, x: &Array2<F>) -> Result<Array1<isize>, FerroError> {
        let fitted = self.fit(x, &())?;
        Ok(fitted.labels().clone())
    }
}

// ─────────────────────────────────────────────────────────────────────────────
// Tests
// ─────────────────────────────────────────────────────────────────────────────

#[cfg(test)]
mod tests {
    use super::*;

    /// Three tight 2-D clusters.
    fn three_blobs() -> Array2<f64> {
        Array2::from_shape_vec(
            (9, 2),
            vec![
                0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 5.0, 5.0, 5.1, 5.0, 5.0, 5.1, 10.0, 0.0, 10.1, 0.0,
                10.0, 0.1,
            ],
        )
        .unwrap()
    }

    #[test]
    fn test_ordering_covers_all_points() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();

        let mut sorted = fitted.ordering().to_vec();
        sorted.sort_unstable();
        assert_eq!(sorted, (0..9).collect::<Vec<_>>());
    }

    #[test]
    fn test_reachability_length() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        assert_eq!(fitted.reachability().len(), 9);
    }

    #[test]
    fn test_core_distances_length() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        assert_eq!(fitted.core_distances().len(), 9);
    }

    #[test]
    fn test_labels_length() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        assert_eq!(fitted.labels().len(), 9);
    }

    #[test]
    fn test_core_points_have_finite_core_distance() {
        let x = three_blobs();
        // With min_samples=2 all tight-cluster points should be core points.
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        // Points 0-8 each have at least one close neighbour.
        for i in 0..9 {
            // Core distance is finite because each point has neighbours.
            assert!(
                fitted.core_distances()[i].is_finite(),
                "expected finite core distance for point {i}"
            );
        }
    }

    #[test]
    fn test_isolated_point_infinite_core_distance() {
        // Add an isolated point far from the clusters.
        let mut data = three_blobs().into_raw_vec_and_offset().0;
        data.extend_from_slice(&[100.0, 100.0]);
        let x = Array2::from_shape_vec((10, 2), data).unwrap();

        // With max_eps=2.0, the isolated point has no neighbours, so its core
        // distance must be infinite regardless of min_samples.
        let fitted = OPTICS::<f64>::new(3)
            .with_max_eps(2.0)
            .fit(&x, &())
            .unwrap();
        assert!(
            fitted.core_distances()[9].is_infinite(),
            "isolated point should have infinite core distance"
        );
    }

    #[test]
    fn test_reachability_first_point_infinite() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        let first = fitted.ordering()[0];
        assert!(
            fitted.reachability()[first].is_infinite(),
            "first point in ordering should have infinite reachability"
        );
    }

    #[test]
    fn test_extract_clusters_valid_xi() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        let labels = fitted.extract_clusters(0.05).unwrap();
        assert_eq!(labels.len(), 9);
    }

    #[test]
    fn test_extract_clusters_invalid_xi_zero() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        assert!(fitted.extract_clusters(0.0).is_err());
    }

    #[test]
    fn test_extract_clusters_invalid_xi_one() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        assert!(fitted.extract_clusters(1.0).is_err());
    }

    #[test]
    fn test_invalid_min_samples_zero() {
        let x = three_blobs();
        let result = OPTICS::<f64>::new(0).fit(&x, &());
        assert!(result.is_err());
    }

    #[test]
    fn test_invalid_min_samples_one() {
        // min_samples=1 is now rejected (sklearn requires >= 2).
        let x = three_blobs();
        let result = OPTICS::<f64>::new(1).fit(&x, &());
        assert!(result.is_err());
    }

    #[test]
    fn test_max_eps_zero_accepted() {
        // sklearn accepts max_eps=0 (closed at 0); the fit runs and produces an
        // all-inf / all-noise degenerate result.
        let x = three_blobs();
        let result = OPTICS::<f64>::new(2).with_max_eps(0.0).fit(&x, &());
        assert!(result.is_ok(), "max_eps=0 should be accepted");
        if let Ok(fitted) = result {
            assert!(fitted.core_distances().iter().all(|c| c.is_infinite()));
            assert!(fitted.reachability().iter().all(|r| r.is_infinite()));
        }
    }

    #[test]
    fn test_invalid_max_eps_negative() {
        let x = three_blobs();
        let result = OPTICS::<f64>::new(2).with_max_eps(-1.0).fit(&x, &());
        assert!(result.is_err());
    }

    #[test]
    fn test_xi_zero_accepted() {
        // sklearn accepts xi=0 (closed at 0).
        let x = three_blobs();
        let result = OPTICS::<f64>::new(2).with_xi(0.0).fit(&x, &());
        assert!(result.is_ok());
    }

    #[test]
    fn test_xi_one_xi_method_errors() {
        // sklearn accepts xi=1 at validation but the Xi method raises a runtime
        // ZeroDivisionError (1-xi==0); the observable contract is "fit raises",
        // so ferrolearn errors on the Xi path (#2197), no panic.
        let x = three_blobs();
        let result = OPTICS::<f64>::new(2).with_xi(1.0).fit(&x, &());
        assert!(result.is_err());
    }

    #[test]
    fn test_invalid_xi_negative() {
        let x = three_blobs();
        let result = OPTICS::<f64>::new(2).with_xi(-0.1).fit(&x, &());
        assert!(result.is_err());
    }

    #[test]
    fn test_invalid_xi_above_one() {
        let x = three_blobs();
        let result = OPTICS::<f64>::new(2).with_xi(1.1).fit(&x, &());
        assert!(result.is_err());
    }

    #[test]
    fn test_empty_data_error() {
        let x = Array2::<f64>::zeros((0, 2));
        let result = OPTICS::<f64>::new(2).fit(&x, &());
        assert!(result.is_err());
    }

    #[test]
    fn test_single_sample_rejected() {
        // A single sample cannot be fit by OPTICS: sklearn requires
        // `2 <= min_samples <= n_samples`, but n_samples=1 leaves no valid
        // min_samples, so `_validate_size` raises (min_samples=2 > 1). ferrolearn
        // mirrors that with an Err (#2197).
        let x: Array2<f64> = ndarray::array![[5.0, 5.0]];
        let result = OPTICS::<f64>::new(2).fit(&x, &());
        assert!(
            result.is_err(),
            "single-sample fit must error (min_samples=2 > n_samples=1)"
        );
    }

    #[test]
    fn test_f32_support() {
        let x = Array2::from_shape_vec(
            (6, 2),
            vec![
                0.0f32, 0.0, 0.1, 0.0, 0.0, 0.1, 10.0, 10.0, 10.1, 10.0, 10.0, 10.1,
            ],
        )
        .unwrap();

        let fitted = OPTICS::<f32>::new(2).fit(&x, &()).unwrap();
        assert_eq!(fitted.ordering().len(), 6);
    }

    #[test]
    fn test_n_clusters_non_negative() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        // n_clusters() just counts distinct non-noise labels.
        let _ = fitted.n_clusters(); // Should not panic.
    }

    #[test]
    fn test_ordering_unique_indices() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        let ordering = fitted.ordering();
        let mut seen = std::collections::HashSet::new();
        for &idx in ordering {
            assert!(seen.insert(idx), "duplicate index {idx} in ordering");
        }
    }

    #[test]
    fn test_with_max_eps_limits_reachability() {
        let x = three_blobs();
        let max_eps = 0.5;
        let fitted = OPTICS::<f64>::new(2)
            .with_max_eps(max_eps)
            .fit(&x, &())
            .unwrap();
        // All finite reachability values must be <= max_eps.
        for &r in fitted.reachability() {
            if r.is_finite() {
                assert!(r <= max_eps + 1e-10);
            }
        }
    }

    #[test]
    fn test_predecessors_length() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        assert_eq!(fitted.predecessors().len(), 9);
    }

    #[test]
    fn test_first_point_has_no_predecessor() {
        let x = three_blobs();
        let fitted = OPTICS::<f64>::new(2).fit(&x, &()).unwrap();
        let first = fitted.ordering()[0];
        assert!(
            fitted.predecessors()[first].is_none(),
            "first point in ordering should have no predecessor"
        );
    }

    #[test]
    fn test_min_cluster_size_filters_small_clusters() {
        // Create data where one "cluster" is just 2 points and others are larger.
        let x = Array2::from_shape_vec(
            (8, 2),
            vec![
                0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 0.05, 0.05, // cluster of 4
                10.0, 10.0, 10.05, 10.0, // cluster of 2
                20.0, 20.0, 20.05, 20.0, // cluster of 2
            ],
        )
        .unwrap();

        let fitted = OPTICS::<f64>::new(2)
            .with_min_cluster_size(3)
            .fit(&x, &())
            .unwrap();

        // The small clusters (size 2) should be filtered out as noise.
        for &l in fitted.labels() {
            if l >= 0 {
                // Count how many points share this label.
                let count = fitted.labels().iter().filter(|&&c| c == l).count();
                assert!(
                    count >= 3,
                    "cluster with label {l} has only {count} points, expected >= 3"
                );
            }
        }
    }

    #[test]
    fn test_with_min_cluster_size_builder() {
        let optics = OPTICS::<f64>::new(5).with_min_cluster_size(10);
        assert_eq!(optics.min_cluster_size, Some(10));
    }
}