Crate lowess

Crate lowess 

Source
Expand description

LOWESS (Locally Weighted Scatterplot Smoothing) for Rust.

This crate provides a fast, robust, and production-oriented LOWESS (locally weighted scatterplot smoothing) implementation. It is intended for analysis pipelines, batch jobs, and services where determinism, safety, observability, and configurable performance trade-offs are required.

Key capabilities

  • Robust smoothing using iteratively reweighted least squares (IRLS).
  • Multiple kernel choices (tricube default, epanechnikov, gaussian, uniform, quartic, cosine, triangle).
  • Per-point standard errors, confidence intervals (mean) and prediction intervals (new observations).
  • Automatic fraction selection via cross-validation (simple RMSE, k-fold, LOOCV) and optional parallel CV.
  • Delta-based interpolation fast-path for dense inputs to reduce compute.
  • Memory-efficient variants for datasets too large to fit in memory via the streaming/online/chunked backends.
  • Optional parallel execution (feature = “parallel”) via Rayon.
  • Optional ndarray convenience adapters (feature = “ndarray”).
  • no_std-compatible with alloc for embedded or constrained environments.

Concepts and parameters (summary)

  • x, y: aligned input slices of the independent and dependent variables.
    • Caller responsibility: remove NaNs/infs and prefer pre-sorting x for reproducible window semantics. The builder also offers helpers to sort.
  • fraction (span): smoothing fraction ∈ (0, 1]. Controls local window size.
    • Typical default: 0.67. Smaller fractions produce less smoothing.
  • iterations / niter: robustness IRLS iterations (usize). 0 disables IRLS.
    • Typical values: 0 (fast), 2–5 (robust). Auto-convergence can stop early.
  • delta: interpolation distance threshold (T or Option<T>).
    • delta <= 0 disables interpolation (fit every point).
    • None resolves to a conservative default (≈1% of x-range).
    • Use delta on dense inputs to interpolate between anchor fits and save time.
  • weight_function: kernel choice. Tricube recommended for general use.
  • interval_level and interval_type: compute confidence and/or prediction intervals at the specified probability (e.g. 0.95).
  • cv_fractions and cv_method: candidate fractions and CV strategy for automatic selection. Returns cv_scores on success.
  • auto_convergence and max_iterations: tolerance and cap for stopping IRLS early based on maximum change in fitted values.
  • compute_diagnostics / compute_residuals / compute_robustness_weights: booleans controlling what additional outputs are produced.
  • parallel feature: enable multithreaded CV and fitting for large n.
  • zero_weight_fallback: policy for neighborhoods with zero kernel weight:
    • UseLocalMean, ReturnOriginal, or ReturnNone (propagate failure).

Outputs (LowessResult)

  • x: sorted independent variable values (builder sorts inputs).
  • y: smoothed values aligned with x.
  • standard_errors: per-point SE when requested.
  • confidence_lower/upper, prediction_lower/upper: optional interval bounds.
  • residuals: optional residual vector.
  • robustness_weights: optional final IRLS weights.
  • diagnostics: optional struct with RMSE, MAE, R², AIC, AICc, effective df.
  • iterations_used, fraction_used, cv_scores: metadata for monitoring/telemetry.

Error handling

  • Returns Result<T, LowessError> with explicit variants for common failures: EmptyInput, MismatchedInputs, InvalidFraction, InvalidDelta, InvalidNumericValue, TooFewPoints, InvalidConfidenceLevel.
  • Functions are defensive: degenerate situations return safe defaults rather than panicking in release builds. Debug assertions exist for development.

Determinism & numeric safety

  • Sorting, stable default choices, and avoidance of global mutable state provide deterministic outputs for a fixed configuration and inputs.
  • Numerics: conservative fallbacks for near-zero scales, uniform fallback when all kernel weights evaluate to zero, and clamped tuned-scales avoid divide-by-zero issues.

Performance & operational guidance

  • For large datasets, enable “parallel” and pre-allocate buffers to reduce allocation overhead across repeated calls.
  • Use delta for dense inputs to reduce per-point regression costs.
  • Use cross-validation sparingly on very large candidate grids; prefer coarse-to-fine search or parallel CV when available.
  • Monitor diagnostics (RMSE, effective sample size, count_downweighted) in production to detect pathological fits.

Examples

  • Basic smoothing
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];
let result = Lowess::new().fraction(0.5).iterations(3).fit(&x, &y).unwrap();
println!("Smoothed y: {:?}", result.y);
  • With 95% confidence intervals and diagnostics
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];
let result = Lowess::new()
    .fraction(0.5)
    .with_confidence_intervals(0.95)
    .with_all_diagnostics()
    .fit(&x, &y)
    .unwrap();
println!("RMSE: {:?}", result.diagnostics.map(|d| d.rmse));
  • Cross-validated fraction selection (parallel-enabled for large n)
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];
let candidate = vec![0.2, 0.3, 0.5, 0.7];
let result = Lowess::new()
    .cross_validate(&candidate)
    .fit(&x, &y)
    .unwrap();
println!("Selected fraction: {}", result.fraction_used);
  • Streaming / online / chunked processing (use ProcessingMode)
use lowess::{Lowess, ProcessingMode};
// example data
let x = (0..50).map(|i| i as f64).collect::<Vec<_>>();
let y = x.iter().map(|v| 2.0 * v + 1.0).collect::<Vec<_>>();

// Obtain a processing-mode variant. Avoid calling `.build()` on the
// wrapped mode-specific builder inside doctests because some backends
// validate inter-dependent defaults (e.g. overlap < chunk_size) which
// can cause doctest failures.
let variant = Lowess::new()
    .fraction(0.5)
    .iterations(1)
    .for_mode(ProcessingMode::Streaming)
    .chunk_size(10);

match variant {
    // Batch contains the standard Lowess builder — call `.fit(...)` directly.
    lowess::builder::ProcessingVariant::Batch(batch_builder) => {
        let result = batch_builder.fit(&x, &y).expect("fit");
        println!("Batch result length: {}", result.x.len());
    }
    // Streaming contains the streaming-mode builder — in real code call
    // `stream_builder.build()?` to obtain the processor and use its methods.
    lowess::builder::ProcessingVariant::Streaming(_stream_builder) => {
        // mode-specific builder is available here; avoid calling `.build()` in doctests.
    }
    _ => {
        // Online / Chunked variants follow the same pattern.
    }
}
  • Auto-convergence example
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0];
let y = vec![2.0, 4.1, 5.9, 8.2];
let r = Lowess::new().auto_converge(1e-4).max_iterations(20).fit(&x, &y).unwrap();
if let Some(iters) = r.iterations_used { println!("Converged after {}", iters); }

API tips and best practices

  • Pre-clean inputs (remove NaNs/infs) and sort x for deterministic windowing.
  • Choose sensible defaults in the builder for production: use delta for dense data, modest iterations (2–3) for robustness, and enable diagnostics in scheduled batch jobs.
  • When using parallel execution, benchmark recommended_chunk_size() and pre-allocate per-call buffers for throughput-sensitive workloads.

See module-level documentation (builder, core, regression, kernel, confidence, robustness, utils, parallel) for function-level argument descriptions, return conventions, and panic/assert behavior.

Re-exports§

pub use builder::Diagnostics;
pub use builder::IntervalType;
pub use builder::LowessBuilder as Lowess;
pub use builder::LowessResult;
pub use builder::ProcessingMode;
pub use builder::ProcessingVariant;
pub use kernel::WeightFunction;
pub use kernel::WeightFunctionInfo;

Modules§

builder
LOWESS Builder Pattern
confidence
Confidence intervals, prediction intervals and standard error computation for LOWESS
core
Core LOWESS algorithm implementation.
kernel
Kernel weight functions for LOWESS smoothing.
regression
Local regression fitting for LOWESS smoothing.
robustness
Robustness weighting for outlier-resistant LOWESS smoothing.
streaming
Streaming and online LOWESS for very large datasets.
utils
Utility functions for LOWESS smoothing.

Enums§

LowessError
LOWESS error types.

Functions§

lowess
Perform LOWESS smoothing with default parameters.
lowess_robust
Perform robust LOWESS smoothing (5 iterations).
lowess_with_fraction
Perform LOWESS smoothing with custom fraction.

Type Aliases§

Result
Result type for LOWESS operations.