lowess
A production-grade Rust implementation of LOWESS (Locally Weighted Scatterplot Smoothing) with robust statistics, parallel execution, and streaming capabilities for large datasets.
What is LOWESS?
LOWESS is a non-parametric regression method that fits smooth curves through scatter plots using locally weighted polynomial regression. Originally developed by Cleveland (1979), it's widely used in:
- 🧬 Genomics & Bioinformatics: DNA methylation analysis, ChIP-seq normalization, gene expression smoothing
- 📊 Time Series Analysis: Trend extraction, seasonal adjustment, noise reduction
- 🔬 Scientific Computing: Exploratory data analysis, visualization, preprocessing
- 📈 Signal Processing: Baseline correction, peak detection, data denoising
Features
Core Capabilities
- ✨ Robust Smoothing: Iteratively reweighted least squares (IRLS) with multiple kernel functions
- 🎯 Confidence & Prediction Intervals: Per-point standard errors and statistical intervals
- 🔧 Multiple Kernels: Tricube (default), Epanechnikov, Gaussian, Uniform, Quartic, Cosine, Triangle
- 📏 Automatic Fraction Selection: Cross-validation (simple RMSE, k-fold, LOOCV)
- ⚡ Delta Optimization: Fast interpolation for dense data
Advanced Features
- 🚀 Parallel Execution: Optional multi-threaded CV and fitting (feature =
"parallel") - 💾 Streaming Processing: Memory-efficient variants for large datasets
- 📊 Rich Diagnostics: RMSE, MAE, R², AIC, AICc, effective degrees of freedom
- 🔢 ndarray Integration: Seamless conversion to/from ndarray (feature =
"ndarray") - 🎛️ no_std Compatible: Works in embedded environments with
alloc
Quality & Reliability
- ✅ Type-safe builder pattern with compile-time validation
- 🛡️ Comprehensive error handling (no panics in release builds)
- 📐 Numerically stable with defensive fallbacks
- 🔁 Deterministic outputs for reproducible research
- 📚 Extensively documented with production guidance
Installation
Add to your Cargo.toml:
[]
= "0.1"
# Optional features
[]
= "0.1"
= ["parallel", "ndarray"]
Feature Flags
| Feature | Description | Dependencies |
|---|---|---|
parallel |
Enable parallel cross-validation and fitting | rayon |
ndarray |
Integration with ndarray arrays | ndarray |
std |
Standard library support (enabled by default) | - |
Quick Start
Basic Smoothing
use Lowess;
let x = vec!;
let y = vec!;
// Simple smoothing with defaults (fraction=0.67, no robustness iterations)
let result = new
.fit
.unwrap;
println!;
Robust Smoothing with Outliers
use Lowess;
let x = vec!;
let y = vec!; // outlier at index 3
// Robust smoothing with 5 IRLS iterations
let result = robust
.fit
.unwrap;
With Confidence Intervals
use Lowess;
let result = new
.fraction
.iterations
.with_confidence_intervals
.with_all_diagnostics
.fit
.unwrap;
println!;
println!;
println!;
Automatic Fraction Selection
use Lowess;
// Cross-validate to find optimal smoothing fraction
let candidates = vec!;
let result = new
.cross_validate
.fit
.unwrap;
println!;
println!;
Streaming for Large Datasets
use ;
// Process data in chunks to avoid memory issues
let base_config = new.fraction.iterations;
let mut streaming = new;
// Process chunks
let chunk1_result = streaming.process_chunk?;
let chunk2_result = streaming.process_chunk?;
Online Smoothing (Sliding Window)
use ;
// Real-time smoothing with sliding window
let base_config = new.fraction;
let mut online = new; // window size
for in x.iter.zip
API Overview
Builder Pattern
new
.fraction // Smoothing span (0, 1]
.iterations // Robustness iterations
.delta // Interpolation threshold
.kernel
.with_confidence_intervals
.with_prediction_intervals
.with_all_diagnostics
.cross_validate
.auto_converge // Early stopping
.max_iterations
.fit?
Result Structure
Diagnostics
Algorithm Details
LOWESS Procedure
- Sorting: Data sorted by x-values for deterministic windowing
- Local Neighborhoods: For each point, select k nearest neighbors (k = fraction × n)
- Kernel Weighting: Apply distance-based kernel (default: tricube)
- Weighted Regression: Fit locally weighted least squares
- Robustness Iterations (optional): Reweight using bisquare function on residuals
- Delta Interpolation (optional): Linearly interpolate between anchor points
Weight Functions
// Tricube (Cleveland's default)
w = ³ for |d| < 1, else 0
// Epanechnikov
w = 1 - d² for |d| < 1, else 0
// Gaussian
w = exp
Robustness Weighting (Bisquare)
// After initial fit, compute residuals r_i
// Estimate scale: s = MAD(residuals) × 1.4826
// Robustness weights:
w_i = ² for |r_i| < 6s, else 0
Performance
Complexity
- Basic: O(n²) for n data points
- With delta: Effectively O(n × k) where k ≪ n for dense data
- Parallel CV: Linear speedup with available cores
Numerical Stability
The implementation includes several safeguards:
- Scale estimation fallbacks: MAD → mean absolute residual when MAD ≈ 0
- Minimum tuned scales: Clamped to ε > 0 to avoid division by zero
- Zero-weight neighborhoods: Configurable fallback policies
- Uniform weight fallback: When all kernel weights evaluate to zero
- Auto-convergence: Prevents excessive iterations in stable fits
Comparison with R
This implementation is equivalent to R's stats::lowess():
# R
result <-
// Rust (equivalent)
let result = new
.fraction
.iterations
.delta
.fit?;
Key differences:
- ✨ More kernel options beyond tricube
- 📊 Statistical intervals and diagnostics
- 🔧 Cross-validation built-in
- 🚀 Parallel execution support
- 💾 Streaming variants for large data
Error Handling
use ;
match new.fit
Testing
# Run all tests
# Run with all features
# Run benchmarks
# Check documentation
Production Usage Guidelines
Best Practices
- Pre-clean inputs: Remove NaNs/infs before calling
fit() - Sort data: Pre-sort x for reproducible window semantics
- Choose appropriate fraction:
- 0.2-0.3 for very local features
- 0.5-0.7 for general trends
- Use cross-validation when uncertain
- Enable diagnostics: Monitor RMSE, effective_df in production
- Use delta: Enable for dense data (>1000 points)
- Tune robustness: 2-3 iterations sufficient for most data
Monitoring
let result = new
.with_all_diagnostics
.fit?;
if let Some = result.diagnostics
Contributing
Contributions are welcome! Areas of interest:
- Additional kernel functions
- More cross-validation strategies
- GPU acceleration
- Python bindings (PyO3)
- Additional examples and tutorials
Please see CONTRIBUTING.md for guidelines on reporting bugs, submitting pull requests, testing, and the development workflow.
License
This project is licensed under the MIT License - see the LICENSE file for details.
References
Academic Papers
-
Cleveland, W. S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". Journal of the American Statistical Association 74(368): 829-836. DOI: 10.2307/2286407
-
Cleveland, W. S. (1981). "LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression". The American Statistician 35(1): 54. DOI: 10.2307/2683591
Related Implementations
- R stats::lowess - Original R implementation
- Python statsmodels.lowess
Citation
If you use this crate in academic work, please cite:
Author
Amir Valizadeh
📧 thisisamirv@gmail.com
Acknowledgments
- Based on Cleveland's original LOWESS algorithm
- Inspired by implementations in R and Python statsmodels
- Built with the Rust scientific computing ecosystem