lowess

A production-grade Rust implementation of LOWESS (Locally Weighted Scatterplot Smoothing) with robust statistics, parallel execution, and streaming capabilities for large datasets.

What is LOWESS?

LOWESS is a non-parametric regression method that fits smooth curves through scatter plots using locally weighted polynomial regression. Originally developed by Cleveland (1979), it's widely used in:

🧬 Genomics & Bioinformatics: DNA methylation analysis, ChIP-seq normalization, gene expression smoothing
📊 Time Series Analysis: Trend extraction, seasonal adjustment, noise reduction
🔬 Scientific Computing: Exploratory data analysis, visualization, preprocessing
📈 Signal Processing: Baseline correction, peak detection, data denoising

Features

Core Capabilities

✨ Robust Smoothing: Iteratively reweighted least squares (IRLS) with multiple kernel functions
🎯 Confidence & Prediction Intervals: Per-point standard errors and statistical intervals
🔧 Multiple Kernels: Tricube (default), Epanechnikov, Gaussian, Uniform, Quartic, Cosine, Triangle
📏 Automatic Fraction Selection: Cross-validation (simple RMSE, k-fold, LOOCV)
⚡ Delta Optimization: Fast interpolation for dense data

Advanced Features

🚀 Parallel Execution: Optional multi-threaded CV and fitting (feature = "parallel")
💾 Streaming Processing: Memory-efficient variants for large datasets
📊 Rich Diagnostics: RMSE, MAE, R², AIC, AICc, effective degrees of freedom
🔢 ndarray Integration: Seamless conversion to/from ndarray (feature = "ndarray")
🎛️ no_std Compatible: Works in embedded environments with alloc

Quality & Reliability

✅ Type-safe builder pattern with compile-time validation
🛡️ Comprehensive error handling (no panics in release builds)
📐 Numerically stable with defensive fallbacks
🔁 Deterministic outputs for reproducible research
📚 Extensively documented with production guidance

Installation

Add to your Cargo.toml:

[dependencies]
lowess = "0.1"

# Optional features
[dependencies.lowess]
version = "0.1"
features = ["parallel", "ndarray"]

Feature Flags

Feature	Description	Dependencies
`parallel`	Enable parallel cross-validation and fitting	`rayon`
`ndarray`	Integration with ndarray arrays	`ndarray`
`std`	Standard library support (enabled by default)	-

Quick Start

Basic Smoothing

use lowess::Lowess;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];

// Simple smoothing with defaults (fraction=0.67, no robustness iterations)
let result = Lowess::new()
    .fit(&x, &y)
    .unwrap();

println!("Smoothed values: {:?}", result.y);

Robust Smoothing with Outliers

use lowess::Lowess;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0];
let y = vec![2.0, 4.0, 6.0, 100.0, 10.0, 12.0]; // outlier at index 3

// Robust smoothing with 5 IRLS iterations
let result = Lowess::robust()
    .fit(&x, &y)
    .unwrap();

With Confidence Intervals

use lowess::Lowess;

let result = Lowess::new()
    .fraction(0.5)
    .iterations(3)
    .with_confidence_intervals(0.95)
    .with_all_diagnostics()
    .fit(&x, &y)
    .unwrap();

println!("RMSE: {:?}", result.diagnostics.unwrap().rmse);
println!("Lower CI: {:?}", result.confidence_lower);
println!("Upper CI: {:?}", result.confidence_upper);

Automatic Fraction Selection

use lowess::Lowess;

// Cross-validate to find optimal smoothing fraction
let candidates = vec![0.2, 0.3, 0.5, 0.7];
let result = Lowess::new()
    .cross_validate(&candidates)
    .fit(&x, &y)
    .unwrap();

println!("Selected fraction: {}", result.fraction_used);
println!("CV scores: {:?}", result.cv_scores);

Streaming for Large Datasets

use lowess::{Lowess, StreamingLowess};

// Process data in chunks to avoid memory issues
let base_config = Lowess::new().fraction(0.3).iterations(2);
let mut streaming = StreamingLowess::new(base_config, 1000);

// Process chunks
let chunk1_result = streaming.process_chunk(&x_chunk1, &y_chunk1)?;
let chunk2_result = streaming.process_chunk(&x_chunk2, &y_chunk2)?;

Online Smoothing (Sliding Window)

use lowess::{Lowess, OnlineLowess};

// Real-time smoothing with sliding window
let base_config = Lowess::new().fraction(0.2);
let mut online = OnlineLowess::new(base_config, 100); // window size

for (&xi, &yi) in x.iter().zip(y.iter()) {
    let smoothed = online.update(xi, yi)?;
    println!("Current smooth value: {}", smoothed);
}

API Overview

Builder Pattern

Lowess::new()
    .fraction(0.5)              // Smoothing span (0, 1]
    .iterations(3)              // Robustness iterations
    .delta(Some(0.01))          // Interpolation threshold
    .kernel(WeightFunction::Epanechnikov)
    .with_confidence_intervals(0.95)
    .with_prediction_intervals(0.95)
    .with_all_diagnostics()
    .cross_validate(&[0.3, 0.5, 0.7])
    .auto_converge(1e-4)        // Early stopping
    .max_iterations(20)
    .fit(&x, &y)?

Result Structure

pub struct LowessResult<T> {
    pub x: Vec<T>,                          // Sorted x values
    pub y: Vec<T>,                          // Smoothed y values
    pub standard_errors: Option<Vec<T>>,    // Per-point SE
    pub confidence_lower: Option<Vec<T>>,   // CI lower bounds
    pub confidence_upper: Option<Vec<T>>,   // CI upper bounds
    pub prediction_lower: Option<Vec<T>>,   // PI lower bounds
    pub prediction_upper: Option<Vec<T>>,   // PI upper bounds
    pub residuals: Option<Vec<T>>,          // Residuals
    pub robustness_weights: Option<Vec<T>>, // Final IRLS weights
    pub diagnostics: Option<Diagnostics<T>>,// Fit statistics
    pub iterations_used: Option<usize>,     // Actual iterations
    pub fraction_used: T,                   // Final fraction
    pub cv_scores: Option<Vec<T>>,          // CV RMSE scores
}

Diagnostics

pub struct Diagnostics<T> {
    pub rmse: T,                    // Root mean squared error
    pub mae: T,                     // Mean absolute error
    pub r_squared: T,               // Coefficient of determination
    pub aic: T,                     // Akaike Information Criterion
    pub aicc: T,                    // Corrected AIC (small samples)
    pub effective_df: T,            // Effective degrees of freedom
    pub count_downweighted: usize,  // Points with low robustness weight
}

Algorithm Details

LOWESS Procedure

Sorting: Data sorted by x-values for deterministic windowing
Local Neighborhoods: For each point, select k nearest neighbors (k = fraction × n)
Kernel Weighting: Apply distance-based kernel (default: tricube)
Weighted Regression: Fit locally weighted least squares
Robustness Iterations (optional): Reweight using bisquare function on residuals
Delta Interpolation (optional): Linearly interpolate between anchor points

Weight Functions

// Tricube (Cleveland's default)
w(d) = (1 - |d|³)³  for |d| < 1, else 0

// Epanechnikov  
w(d) = 1 - d²  for |d| < 1, else 0

// Gaussian
w(d) = exp(-d²/2)

Robustness Weighting (Bisquare)

// After initial fit, compute residuals r_i
// Estimate scale: s = MAD(residuals) × 1.4826
// Robustness weights:
w_i = (1 - (r_i / (6s))²)²  for |r_i| < 6s, else 0

Performance

Complexity

Basic: O(n²) for n data points
With delta: Effectively O(n × k) where k ≪ n for dense data
Parallel CV: Linear speedup with available cores

Numerical Stability

The implementation includes several safeguards:

Scale estimation fallbacks: MAD → mean absolute residual when MAD ≈ 0
Minimum tuned scales: Clamped to ε > 0 to avoid division by zero
Zero-weight neighborhoods: Configurable fallback policies
Uniform weight fallback: When all kernel weights evaluate to zero
Auto-convergence: Prevents excessive iterations in stable fits

Comparison with R

This implementation is equivalent to R's stats::lowess():

# R
result <- lowess(x, y, f = 2/3, iter = 3, delta = 0.01)

// Rust (equivalent)
let result = Lowess::new()
    .fraction(2.0/3.0)
    .iterations(3)
    .delta(Some(0.01))
    .fit(&x, &y)?;

Key differences:

✨ More kernel options beyond tricube
📊 Statistical intervals and diagnostics
🔧 Cross-validation built-in
🚀 Parallel execution support
💾 Streaming variants for large data

Error Handling

use lowess::{Lowess, LowessError};

match Lowess::new().fit(&x, &y) {
    Ok(result) => println!("Success: {:?}", result.y),
    Err(LowessError::EmptyInput) => println!("Empty arrays"),
    Err(LowessError::MismatchedInputs { x_len, y_len }) => 
        println!("Length mismatch: x={}, y={}", x_len, y_len),
    Err(LowessError::InvalidFraction(f)) => 
        println!("Bad fraction: {}", f),
    Err(e) => println!("Other error: {}", e),
}

Testing

# Run all tests
cargo test

# Run with all features
cargo test --all-features

# Run benchmarks
cargo bench

# Check documentation
cargo doc --open

Production Usage Guidelines

Best Practices

Pre-clean inputs: Remove NaNs/infs before calling fit()
Sort data: Pre-sort x for reproducible window semantics
Choose appropriate fraction:
- 0.2-0.3 for very local features
- 0.5-0.7 for general trends
- Use cross-validation when uncertain
Enable diagnostics: Monitor RMSE, effective_df in production
Use delta: Enable for dense data (>1000 points)
Tune robustness: 2-3 iterations sufficient for most data

Monitoring

let result = Lowess::new()
    .with_all_diagnostics()
    .fit(&x, &y)?;

if let Some(diag) = result.diagnostics {
    if diag.effective_df < 2.0 {
        log::warn!("Low effective degrees of freedom: {}", diag.effective_df);
    }
    if diag.count_downweighted > x.len() / 2 {
        log::warn!("Over 50% points downweighted - check for systematic issues");
    }
}

Contributing

Contributions are welcome! Areas of interest:

Additional kernel functions
More cross-validation strategies
GPU acceleration
Python bindings (PyO3)
Additional examples and tutorials

Please see CONTRIBUTING.md for guidelines on reporting bugs, submitting pull requests, testing, and the development workflow.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Academic Papers

Cleveland, W. S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". Journal of the American Statistical Association 74(368): 829-836. DOI: 10.2307/2286407
Cleveland, W. S. (1981). "LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression". The American Statistician 35(1): 54. DOI: 10.2307/2683591

Related Implementations

R stats::lowess - Original R implementation
Python statsmodels.lowess

Citation

If you use this crate in academic work, please cite:

@software{lowess_rust,
  author = {Valizadeh, Amir},
  title = {lowess: Production-grade LOWESS smoothing for Rust},
  year = {2025},
  url = {https://github.com/thisisamirv/lowess}
}

Author

Amir Valizadeh
📧 thisisamirv@gmail.com

Acknowledgments

Based on Cleveland's original LOWESS algorithm
Inspired by implementations in R and Python statsmodels
Built with the Rust scientific computing ecosystem

lowess 0.1.0