lowess 0.2.0

LOWESS (Locally Weighted Scatterplot Smoothing) implementation in Rust
Documentation

lowess

Crates.io Documentation License: MIT Rust

High-performance LOWESS (Locally Weighted Scatterplot Smoothing) for Rust — 40-500× faster than Python's statsmodels with robust statistics, confidence intervals, and parallel execution.

Why This Crate?

  • Blazingly Fast: 40-500× faster than statsmodels, sub-millisecond smoothing for 1000 points
  • 🎯 Production-Ready: Comprehensive error handling, numerical stability, extensive testing
  • 📊 Feature-Rich: Confidence/prediction intervals, multiple kernels, cross-validation
  • 🚀 Scalable: Parallel execution, streaming mode, delta optimization
  • 🔬 Scientific: Validated against R and Python implementations
  • 🛠️ Flexible: no_std support, ndarray integration, multiple robustness methods

Quick Start

use lowess::Lowess;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];

// Basic smoothing
let result = Lowess::new()
    .fraction(0.5)
    .fit(&x, &y)
    .unwrap();

println!("Smoothed: {:?}", result.y);

Installation

[dependencies]
lowess = "0.2"

# With optional features
lowess = { version = "0.2", features = ["parallel", "ndarray"] }

Features at a Glance

Feature Description Use Case
Robust Smoothing IRLS with Bisquare/Huber/Talwar weights Outlier-contaminated data
Confidence Intervals Point-wise standard errors & bounds Uncertainty quantification
Cross-Validation Auto-select optimal fraction Unknown smoothing parameter
Multiple Kernels Tricube, Epanechnikov, Gaussian, etc. Different smoothness profiles
Parallel Execution Multi-threaded via Rayon Large datasets (n > 1000)
Streaming Mode Constant memory usage Very large datasets
Delta Optimization Skip dense regions 10× speedup on dense data

Common Use Cases

1. Robust Smoothing (Handle Outliers)

let result = Lowess::new()
    .fraction(0.3)
    .iterations(5)                // Robust iterations
    .with_robustness_weights()    // Return outlier weights
    .fit(&x, &y)?;

// Check which points were downweighted
if let Some(weights) = result.robustness_weights {
    for (i, &w) in weights.iter().enumerate() {
        if w < 0.1 {
            println!("Point {} is likely an outlier", i);
        }
    }
}

2. Uncertainty Quantification

let result = Lowess::new()
    .fraction(0.5)
    .with_confidence_intervals(0.95)
    .with_prediction_intervals(0.95)
    .fit(&x, &y)?;

// Plot confidence bands
for i in 0..x.len() {
    println!("x={:.1}: y={:.2} CI=[{:.2}, {:.2}]",
        result.x[i],
        result.y[i],
        result.confidence_lower.unwrap()[i],
        result.confidence_upper.unwrap()[i]
    );
}

3. Automatic Parameter Selection

// Let cross-validation find the optimal smoothing fraction
let result = Lowess::new()
    .cross_validate(&[0.2, 0.3, 0.5, 0.7])
    .fit(&x, &y)?;

println!("Optimal fraction: {}", result.fraction_used);
println!("CV RMSE scores: {:?}", result.cv_scores);

4. Large Dataset Optimization

// Enable all performance optimizations
let result = Lowess::new()
    .fraction(0.3)
    .delta_auto()       // Skip dense regions
    .parallel()         // Multi-threaded (requires "parallel" feature)
    .fit(&large_x, &large_y)?;

5. Production Monitoring

let result = Lowess::new()
    .fraction(0.5)
    .iterations(3)
    .with_diagnostics()
    .fit(&x, &y)?;

if let Some(diag) = result.diagnostics {
    println!("RMSE: {:.4}", diag.rmse);
    println!("R²: {:.4}", diag.r_squared);
    println!("Effective DF: {:.2}", diag.effective_df.unwrap());
    
    // Quality checks
    if diag.effective_df.unwrap() < 2.0 {
        eprintln!("Warning: Very low degrees of freedom");
    }
}

Performance Benchmarks

Comparison against Python's statsmodels on typical workloads:

Dataset Size statsmodels Rust (sequential) Rust (parallel) Sequential Speedup Parallel Speedup
100 points 2.4 ms 0.09 ms 0.10 ms 27× 24×
1,000 points 32.5 ms 0.80 ms 0.81 ms 41× 40×
5,000 points 332 ms 4.1 ms 4.1 ms 81× 81×
10,000 points 1,073 ms 8.2 ms 8.2 ms 131× 245×

Performance Summary

  • Sequential mode: 35-48× faster on average across all test scenarios
  • Parallel mode: 51-76× faster on average, with 1.5-2× additional speedup from parallelization
  • Pathological cases (clustered data, extreme outliers): 260-525× faster
  • Small fractions (0.1 span): 80-114× faster due to localized computation
  • Robustness iterations: 38-77× faster with consistent scaling across iteration counts

When Parallelization Helps Most

Parallel execution shows the greatest gains on:

  • Large datasets (>10,000 points): Up to 245× vs 131× sequential
  • Multiple robustness iterations: 70-77× speedup vs statsmodels
  • Small span values: 114× speedup for fraction=0.1
  • Cross-validation: Linear scaling with available CPU cores

For datasets <1,000 points, sequential mode is typically sufficient as parallelization overhead outweighs benefits.

Benchmarks conducted on dual Intel Xeon Platinum 8562Y+ (64 cores total, 2×32 cores @ 4.1 GHz) running Red Hat Enterprise Linux 8.10. See validation/ directory for detailed methodology and reproducible test scripts.

API Overview

Builder Methods

Lowess::new()
    // Core parameters
    .fraction(0.5)                  // Smoothing span (0, 1], default: 0.67
    .iterations(3)                  // Robustness iterations, default: 0
    .delta(Some(0.01))              // Interpolation threshold
    .delta_auto()                   // Auto-calculate delta
    
    // Kernel selection
    .weight_function(WeightFunction::Tricube)  // Default
    
    // Robustness method
    .robustness_method(RobustnessMethod::Bisquare)  // Default
    
    // Intervals & diagnostics
    .with_confidence_intervals(0.95)
    .with_prediction_intervals(0.95)
    .with_both_intervals(0.95)
    .with_diagnostics()
    .with_robustness_weights()
    
    // Parameter selection
    .cross_validate(&[0.3, 0.5, 0.7])
    .cross_validate_kfold(&[0.3, 0.5, 0.7], 5)
    .cross_validate_loocv(&[0.3, 0.5, 0.7])
    
    // Convergence
    .auto_converge(1e-4)
    .max_iterations(20)
    
    // Performance
    .parallel()                     // Requires "parallel" feature
    
    .fit(&x, &y)?

Result Structure

pub struct LowessResult<T> {
    pub x: Vec<T>,                          // Sorted x values
    pub y: Vec<T>,                          // Smoothed y values
    pub standard_errors: Option<Vec<T>>,    // Point-wise SE
    pub confidence_lower: Option<Vec<T>>,   // 95% CI lower
    pub confidence_upper: Option<Vec<T>>,   // 95% CI upper
    pub prediction_lower: Option<Vec<T>>,   // 95% PI lower
    pub prediction_upper: Option<Vec<T>>,   // 95% PI upper
    pub residuals: Option<Vec<T>>,          // y - fitted
    pub robustness_weights: Option<Vec<T>>, // Final IRLS weights
    pub diagnostics: Option<Diagnostics<T>>,
    pub iterations_used: Option<usize>,     // Actual iterations
    pub fraction_used: T,                   // Selected fraction
    pub cv_scores: Option<Vec<T>>,          // CV RMSE per fraction
}

Advanced Features

Streaming Processing

For datasets too large to fit in memory:

use lowess::{Lowess, adapters::streaming::StreamingLowess};

let config = Lowess::new().fraction(0.3).iterations(2);
let mut streaming = StreamingLowess::new(config, 1000);  // Chunk size

for chunk in data_chunks {
    let result = streaming.process_chunk(&chunk.x, &chunk.y)?;
    // Process result...
}

Online/Incremental Updates

Real-time smoothing with sliding window:

use lowess::{Lowess, adapters::streaming::OnlineLowess};

let config = Lowess::new().fraction(0.2);
let mut online = OnlineLowess::new(config, 100);  // Window size

for (&xi, &yi) in x.iter().zip(y.iter()) {
    let smoothed = online.update(xi, yi)?;
    println!("Current value: {}", smoothed);
}

ndarray Integration

use lowess::Lowess;
use ndarray::Array1;

let x: Array1<f64> = Array1::linspace(0.0, 10.0, 100);
let y: Array1<f64> = x.mapv(|xi| xi.sin()) + 0.1;

let result = Lowess::new()
    .fraction(0.3)
    .fit(x.as_slice().unwrap(), y.as_slice().unwrap())?;

// Convert back to ndarray
let smoothed = Array1::from(result.y);

Parameter Selection Guide

Fraction (Smoothing Span)

  • 0.1-0.3: Local, captures rapid changes (wiggly)
  • 0.4-0.6: Balanced, general-purpose
  • 0.7-1.0: Global, smooth trends only
  • Default: 0.67 (2/3, Cleveland's choice)
  • Use CV when uncertain

Robustness Iterations

  • 0: Clean data, speed critical
  • 1-2: Light contamination
  • 3: Default, good balance (recommended)
  • 4-5: Heavy outliers
  • >5: Diminishing returns

Kernel Function

  • Tricube (default): Best all-around, smooth, efficient
  • Epanechnikov: Theoretically optimal MSE
  • Gaussian: Very smooth, no compact support
  • Uniform: Fastest, least smooth (moving average)

Delta Optimization

  • None: Small datasets (n < 1000)
  • Auto: Let the algorithm decide (recommended)
  • Manual: ~0.01 × range(x) for dense data

Error Handling

use lowess::{Lowess, LowessError};

match Lowess::new().fit(&x, &y) {
    Ok(result) => {
        println!("Success: {:?}", result.y);
    },
    Err(LowessError::EmptyInput) => {
        eprintln!("Empty input arrays");
    },
    Err(LowessError::MismatchedInputs { x_len, y_len }) => {
        eprintln!("Length mismatch: x={}, y={}", x_len, y_len);
    },
    Err(LowessError::InvalidFraction(f)) => {
        eprintln!("Invalid fraction: {} (must be in (0, 1])", f);
    },
    Err(e) => {
        eprintln!("Error: {}", e);
    }
}

Feature Flags

  • std (default): Standard library support
  • parallel: Enable Rayon-based parallelization (adds rayon dependency)
  • ndarray: Enable ndarray integration (adds ndarray dependency)
  • full: Enable all optional features
# Minimal (no_std with alloc)
lowess = { version = "0.2", default-features = false }

# All features
lowess = { version = "0.2", features = ["full"] }

Validation

This implementation has been extensively validated against:

  1. R's stats::lowess: Numerical agreement to machine precision
  2. Python's statsmodels: Validated on 44 test scenarios
  3. Cleveland's original paper: Reproduces published examples

See validation/ directory for cross-language comparison scripts.

MSRV (Minimum Supported Rust Version)

Rust 1.85.0 or later.

Contributing

Contributions welcome! See CONTRIBUTING.md for:

  • Bug reports and feature requests
  • Pull request guidelines
  • Development workflow
  • Testing requirements

License

MIT License - see LICENSE file.

References

Original papers:

  • Cleveland, W.S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". Journal of the American Statistical Association, 74(368): 829-836. DOI:10.2307/2286407

  • Cleveland, W.S. (1981). "LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression". The American Statistician, 35(1): 54.

Related implementations:

Citation

@software{lowess_rust_2025,
  author = {Valizadeh, Amir},
  title = {lowess: High-performance LOWESS for Rust},
  year = {2025},
  url = {https://github.com/thisisamirv/lowess},
  version = {0.2.0}
}

Author

Amir Valizadeh
📧 thisisamirv@gmail.com
🔗 GitHub


Keywords: LOWESS, LOESS, local regression, nonparametric regression, smoothing, robust statistics, time series, bioinformatics, genomics, signal processing