lowess
A high-performance implementation of LOWESS (Locally Weighted Scatterplot Smoothing) in Rust. This crate provides a robust, production-ready implementation with support for confidence intervals, multiple kernel functions, and optimized execution modes.
[!IMPORTANT] For parallelization or
ndarraysupport, usefastLowess.
Features
- Robust Statistics: IRLS with Bisquare, Huber, or Talwar weighting for outlier handling.
- Uncertainty Quantification: Point-wise standard errors, confidence intervals, and prediction intervals.
- Optimized Performance: Delta optimization for skipping dense regions and streaming/online modes for large or real-time datasets.
- Parameter Selection: Built-in cross-validation for automatic smoothing fraction selection.
- Flexibility: Multiple weight kernels (Tricube, Epanechnikov, etc.) and
no_stdsupport (requiresalloc). - Validated: Numerical agreement with R's
stats::lowess.
Robustness Advantages
This implementation is more robust than R's lowess due to two key design choices:
MAD-Based Scale Estimation
For robustness weight calculations, this crate uses Median Absolute Deviation (MAD) for scale estimation:
s = median(|r_i - median(r)|)
In contrast, R's lowess uses median of absolute residuals:
s = median(|r_i|)
Why MAD is more robust:
- MAD is a breakdown-point-optimal estimator—it remains valid even when up to 50% of data are outliers.
- The median-centering step removes asymmetric bias from residual distributions.
- MAD provides consistent outlier detection regardless of whether residuals are centered around zero.
Boundary Padding
This crate applies boundary policies (Extend, Reflect, Zero) at dataset edges:
- Extend: Repeats edge values to maintain local neighborhood size.
- Reflect: Mirrors data symmetrically around boundaries.
- Zero: Pads with zeros (useful for signal processing).
- NoBoundary: Original Cleveland behavior
R's lowess does not apply boundary padding, which can lead to:
- Biased estimates near boundaries due to asymmetric local neighborhoods.
- Increased variance at the edges of the smoothed curve.
Gaussian Consistency Factor
For interval estimation (confidence/prediction), residual scale is computed using:
sigma = 1.4826 * MAD
The factor 1.4826 = 1/Phi^-1(3/4) ensures consistency with the standard deviation under Gaussian assumptions.
Performance Advantages
The Rust lowess crate demonstrates consistent performance improvements over R's stats::lowess across all tested scenarios. Median speedups range from 1.3x to 3.4x across different categories, with peak speedups reaching 4.7x in specific configurations. No regressions were observed; Rust was faster in all matched benchmarks.
Category Comparison
| Category | Matched | Median Speedup | Mean Speedup |
|---|---|---|---|
| Delta | 4 | 3.37x | 3.25x |
| Financial | 3 | 2.16x | 2.22x |
| Pathological | 4 | 2.01x | 1.87x |
| Scalability | 3 | 1.97x | 1.94x |
| Iterations | 6 | 1.94x | 2.02x |
| Fraction | 6 | 1.85x | 1.93x |
| Scientific | 3 | 1.84x | 1.80x |
| Genomic | 2 | 1.31x | 1.31x |
Top 10 Performance Wins
| Benchmark | Rust | R | Speedup |
|---|---|---|---|
| delta_medium | 0.18ms | 0.85ms | 4.72x |
| delta_large | 0.14ms | 0.54ms | 3.97x |
| delta_small | 0.34ms | 0.95ms | 2.76x |
| iterations_0 | 0.17ms | 0.43ms | 2.54x |
| financial_5000 | 0.36ms | 0.89ms | 2.46x |
| clustered | 0.82ms | 1.97ms | 2.39x |
| fraction_0.05 | 0.31ms | 0.72ms | 2.36x |
| iterations_2 | 0.64ms | 1.43ms | 2.23x |
| scale_10000 | 0.96ms | 2.08ms | 2.17x |
| constant_y | 0.65ms | 1.40ms | 2.17x |
Regressions: None identified. Rust outperforms R in all matched benchmarks. Check Benchmarks for detailed results and reproducible benchmarking code.
Validation
The Rust lowess crate is a numerical twin of R's lowess implementation:
| Aspect | Status | Details |
|---|---|---|
| Accuracy | ✅ EXACT MATCH | Max diff < 1e-12 across all scenarios |
| Consistency | ✅ PERFECT | 15/15 scenarios pass with strict tolerance |
| Robustness | ✅ VERIFIED | Robust smoothing matches R exactly |
Check Validation for detailed scenario results.
Installation
Add this to your Cargo.toml:
[]
= "0.7"
For no_std environments:
[]
= { = "0.7", = false }
Quick Start
use *;
Builder Methods
use *;
new
// Smoothing span (0, 1]
.fraction
// Robustness iterations
.iterations
// Interpolation threshold
.delta
// Kernel selection
.weight_function
// Robustness method
.robustness_method
// Zero-weight fallback behavior
.zero_weight_fallback
// Boundary handling (for edge effects)
.boundary_policy
// Confidence intervals
.confidence_intervals
// Prediction intervals
.prediction_intervals
// Diagnostics
.return_diagnostics
.return_residuals
.return_robustness_weights
// Cross-validation (for parameter selection)
.cross_validate
// Convergence
.auto_converge
// Execution mode
.adapter
// Build the model
.build?;
Result Structure
Streaming Processing
For datasets that don't fit in memory:
let mut processor = new
.fraction
.iterations
.adapter
.chunk_size
.overlap
.build?;
// Process data in chunks
for chunk in data_chunks
// Finalize processing
let final_result = processor.finalize?;
Online Processing
For real-time data streams:
let mut processor = new
.fraction
.iterations
.adapter
.window_capacity
.build?;
// Process points as they arrive
for in data_stream
Parameter Selection Guide
Fraction (Smoothing Span)
- 0.1-0.3: Local, captures rapid changes (wiggly)
- 0.4-0.6: Balanced, general-purpose
- 0.7-1.0: Global, smooth trends only
- Default: 0.67 (2/3, Cleveland's choice)
- Use CV when uncertain
Robustness Method
- Bisquare (default): Best all-around, smooth, efficient
- Huber: Theoretically optimal MSE
Robustness Iterations
- 0: Clean data, speed critical
- 1-2: Light contamination
- 3: Default, good balance (recommended)
- 4-5: Heavy outliers
- >5: Diminishing returns
Kernel Function
- Tricube (default): Best all-around, smooth, efficient
- Epanechnikov: Theoretically optimal MSE
- Gaussian: Very smooth, no compact support
- Uniform: Fastest, least smooth (moving average)
Delta Optimization
- None: Small datasets (n < 1000)
- 0.01 × range(x): Good starting point for dense data
- Manual tuning: Adjust based on data density
Examples
Check the examples directory for more complex scenarios:
MSRV
Rust 1.85.0 or later (2024 Edition).
Related Work
Contributing
Contributions are welcome! Please see the CONTRIBUTING.md file for more information.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
References
- Cleveland, W.S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". JASA.
- Cleveland, W.S. (1981). "LOWESS: A Program for Smoothing Scatterplots". The American Statistician.