lowess 0.1.0

LOWESS (Locally Weighted Scatterplot Smoothing) implementation in Rust
Documentation

lowess

License: MIT

A production-grade Rust implementation of LOWESS (Locally Weighted Scatterplot Smoothing) with robust statistics, parallel execution, and streaming capabilities for large datasets.

What is LOWESS?

LOWESS is a non-parametric regression method that fits smooth curves through scatter plots using locally weighted polynomial regression. Originally developed by Cleveland (1979), it's widely used in:

  • 🧬 Genomics & Bioinformatics: DNA methylation analysis, ChIP-seq normalization, gene expression smoothing
  • 📊 Time Series Analysis: Trend extraction, seasonal adjustment, noise reduction
  • 🔬 Scientific Computing: Exploratory data analysis, visualization, preprocessing
  • 📈 Signal Processing: Baseline correction, peak detection, data denoising

Features

Core Capabilities

  • Robust Smoothing: Iteratively reweighted least squares (IRLS) with multiple kernel functions
  • 🎯 Confidence & Prediction Intervals: Per-point standard errors and statistical intervals
  • 🔧 Multiple Kernels: Tricube (default), Epanechnikov, Gaussian, Uniform, Quartic, Cosine, Triangle
  • 📏 Automatic Fraction Selection: Cross-validation (simple RMSE, k-fold, LOOCV)
  • Delta Optimization: Fast interpolation for dense data

Advanced Features

  • 🚀 Parallel Execution: Optional multi-threaded CV and fitting (feature = "parallel")
  • 💾 Streaming Processing: Memory-efficient variants for large datasets
  • 📊 Rich Diagnostics: RMSE, MAE, R², AIC, AICc, effective degrees of freedom
  • 🔢 ndarray Integration: Seamless conversion to/from ndarray (feature = "ndarray")
  • 🎛️ no_std Compatible: Works in embedded environments with alloc

Quality & Reliability

  • ✅ Type-safe builder pattern with compile-time validation
  • 🛡️ Comprehensive error handling (no panics in release builds)
  • 📐 Numerically stable with defensive fallbacks
  • 🔁 Deterministic outputs for reproducible research
  • 📚 Extensively documented with production guidance

Installation

Add to your Cargo.toml:

[dependencies]
lowess = "0.1"

# Optional features
[dependencies.lowess]
version = "0.1"
features = ["parallel", "ndarray"]

Feature Flags

Feature Description Dependencies
parallel Enable parallel cross-validation and fitting rayon
ndarray Integration with ndarray arrays ndarray
std Standard library support (enabled by default) -

Quick Start

Basic Smoothing

use lowess::Lowess;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];

// Simple smoothing with defaults (fraction=0.67, no robustness iterations)
let result = Lowess::new()
    .fit(&x, &y)
    .unwrap();

println!("Smoothed values: {:?}", result.y);

Robust Smoothing with Outliers

use lowess::Lowess;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0];
let y = vec![2.0, 4.0, 6.0, 100.0, 10.0, 12.0]; // outlier at index 3

// Robust smoothing with 5 IRLS iterations
let result = Lowess::robust()
    .fit(&x, &y)
    .unwrap();

With Confidence Intervals

use lowess::Lowess;

let result = Lowess::new()
    .fraction(0.5)
    .iterations(3)
    .with_confidence_intervals(0.95)
    .with_all_diagnostics()
    .fit(&x, &y)
    .unwrap();

println!("RMSE: {:?}", result.diagnostics.unwrap().rmse);
println!("Lower CI: {:?}", result.confidence_lower);
println!("Upper CI: {:?}", result.confidence_upper);

Automatic Fraction Selection

use lowess::Lowess;

// Cross-validate to find optimal smoothing fraction
let candidates = vec![0.2, 0.3, 0.5, 0.7];
let result = Lowess::new()
    .cross_validate(&candidates)
    .fit(&x, &y)
    .unwrap();

println!("Selected fraction: {}", result.fraction_used);
println!("CV scores: {:?}", result.cv_scores);

Streaming for Large Datasets

use lowess::{Lowess, StreamingLowess};

// Process data in chunks to avoid memory issues
let base_config = Lowess::new().fraction(0.3).iterations(2);
let mut streaming = StreamingLowess::new(base_config, 1000);

// Process chunks
let chunk1_result = streaming.process_chunk(&x_chunk1, &y_chunk1)?;
let chunk2_result = streaming.process_chunk(&x_chunk2, &y_chunk2)?;

Online Smoothing (Sliding Window)

use lowess::{Lowess, OnlineLowess};

// Real-time smoothing with sliding window
let base_config = Lowess::new().fraction(0.2);
let mut online = OnlineLowess::new(base_config, 100); // window size

for (&xi, &yi) in x.iter().zip(y.iter()) {
    let smoothed = online.update(xi, yi)?;
    println!("Current smooth value: {}", smoothed);
}

API Overview

Builder Pattern

Lowess::new()
    .fraction(0.5)              // Smoothing span (0, 1]
    .iterations(3)              // Robustness iterations
    .delta(Some(0.01))          // Interpolation threshold
    .kernel(WeightFunction::Epanechnikov)
    .with_confidence_intervals(0.95)
    .with_prediction_intervals(0.95)
    .with_all_diagnostics()
    .cross_validate(&[0.3, 0.5, 0.7])
    .auto_converge(1e-4)        // Early stopping
    .max_iterations(20)
    .fit(&x, &y)?

Result Structure

pub struct LowessResult<T> {
    pub x: Vec<T>,                          // Sorted x values
    pub y: Vec<T>,                          // Smoothed y values
    pub standard_errors: Option<Vec<T>>,    // Per-point SE
    pub confidence_lower: Option<Vec<T>>,   // CI lower bounds
    pub confidence_upper: Option<Vec<T>>,   // CI upper bounds
    pub prediction_lower: Option<Vec<T>>,   // PI lower bounds
    pub prediction_upper: Option<Vec<T>>,   // PI upper bounds
    pub residuals: Option<Vec<T>>,          // Residuals
    pub robustness_weights: Option<Vec<T>>, // Final IRLS weights
    pub diagnostics: Option<Diagnostics<T>>,// Fit statistics
    pub iterations_used: Option<usize>,     // Actual iterations
    pub fraction_used: T,                   // Final fraction
    pub cv_scores: Option<Vec<T>>,          // CV RMSE scores
}

Diagnostics

pub struct Diagnostics<T> {
    pub rmse: T,                    // Root mean squared error
    pub mae: T,                     // Mean absolute error
    pub r_squared: T,               // Coefficient of determination
    pub aic: T,                     // Akaike Information Criterion
    pub aicc: T,                    // Corrected AIC (small samples)
    pub effective_df: T,            // Effective degrees of freedom
    pub count_downweighted: usize,  // Points with low robustness weight
}

Algorithm Details

LOWESS Procedure

  1. Sorting: Data sorted by x-values for deterministic windowing
  2. Local Neighborhoods: For each point, select k nearest neighbors (k = fraction × n)
  3. Kernel Weighting: Apply distance-based kernel (default: tricube)
  4. Weighted Regression: Fit locally weighted least squares
  5. Robustness Iterations (optional): Reweight using bisquare function on residuals
  6. Delta Interpolation (optional): Linearly interpolate between anchor points

Weight Functions

// Tricube (Cleveland's default)
w(d) = (1 - |d|³)³  for |d| < 1, else 0

// Epanechnikov  
w(d) = 1 -for |d| < 1, else 0

// Gaussian
w(d) = exp(-/2)

Robustness Weighting (Bisquare)

// After initial fit, compute residuals r_i
// Estimate scale: s = MAD(residuals) × 1.4826
// Robustness weights:
w_i = (1 - (r_i / (6s))²)²  for |r_i| < 6s, else 0

Performance

Complexity

  • Basic: O(n²) for n data points
  • With delta: Effectively O(n × k) where k ≪ n for dense data
  • Parallel CV: Linear speedup with available cores

Numerical Stability

The implementation includes several safeguards:

  • Scale estimation fallbacks: MAD → mean absolute residual when MAD ≈ 0
  • Minimum tuned scales: Clamped to ε > 0 to avoid division by zero
  • Zero-weight neighborhoods: Configurable fallback policies
  • Uniform weight fallback: When all kernel weights evaluate to zero
  • Auto-convergence: Prevents excessive iterations in stable fits

Comparison with R

This implementation is equivalent to R's stats::lowess():

# R
result <- lowess(x, y, f = 2/3, iter = 3, delta = 0.01)
// Rust (equivalent)
let result = Lowess::new()
    .fraction(2.0/3.0)
    .iterations(3)
    .delta(Some(0.01))
    .fit(&x, &y)?;

Key differences:

  • ✨ More kernel options beyond tricube
  • 📊 Statistical intervals and diagnostics
  • 🔧 Cross-validation built-in
  • 🚀 Parallel execution support
  • 💾 Streaming variants for large data

Error Handling

use lowess::{Lowess, LowessError};

match Lowess::new().fit(&x, &y) {
    Ok(result) => println!("Success: {:?}", result.y),
    Err(LowessError::EmptyInput) => println!("Empty arrays"),
    Err(LowessError::MismatchedInputs { x_len, y_len }) => 
        println!("Length mismatch: x={}, y={}", x_len, y_len),
    Err(LowessError::InvalidFraction(f)) => 
        println!("Bad fraction: {}", f),
    Err(e) => println!("Other error: {}", e),
}

Testing

# Run all tests
cargo test

# Run with all features
cargo test --all-features

# Run benchmarks
cargo bench

# Check documentation
cargo doc --open

Production Usage Guidelines

Best Practices

  1. Pre-clean inputs: Remove NaNs/infs before calling fit()
  2. Sort data: Pre-sort x for reproducible window semantics
  3. Choose appropriate fraction:
    • 0.2-0.3 for very local features
    • 0.5-0.7 for general trends
    • Use cross-validation when uncertain
  4. Enable diagnostics: Monitor RMSE, effective_df in production
  5. Use delta: Enable for dense data (>1000 points)
  6. Tune robustness: 2-3 iterations sufficient for most data

Monitoring

let result = Lowess::new()
    .with_all_diagnostics()
    .fit(&x, &y)?;

if let Some(diag) = result.diagnostics {
    if diag.effective_df < 2.0 {
        log::warn!("Low effective degrees of freedom: {}", diag.effective_df);
    }
    if diag.count_downweighted > x.len() / 2 {
        log::warn!("Over 50% points downweighted - check for systematic issues");
    }
}

Contributing

Contributions are welcome! Areas of interest:

  • Additional kernel functions
  • More cross-validation strategies
  • GPU acceleration
  • Python bindings (PyO3)
  • Additional examples and tutorials

Please see CONTRIBUTING.md for guidelines on reporting bugs, submitting pull requests, testing, and the development workflow.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Academic Papers

  • Cleveland, W. S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". Journal of the American Statistical Association 74(368): 829-836. DOI: 10.2307/2286407

  • Cleveland, W. S. (1981). "LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression". The American Statistician 35(1): 54. DOI: 10.2307/2683591

Related Implementations

Citation

If you use this crate in academic work, please cite:

@software{lowess_rust,
  author = {Valizadeh, Amir},
  title = {lowess: Production-grade LOWESS smoothing for Rust},
  year = {2025},
  url = {https://github.com/thisisamirv/lowess}
}

Author

Amir Valizadeh
📧 thisisamirv@gmail.com

Acknowledgments

  • Based on Cleveland's original LOWESS algorithm
  • Inspired by implementations in R and Python statsmodels
  • Built with the Rust scientific computing ecosystem