loess-rs
A high-performance implementation of LOESS (Locally Estimated Scatterplot Smoothing) in Rust. This crate provides a robust, production-ready implementation with support for confidence intervals, multiple kernel functions, and optimized execution modes.
LOESS vs. LOWESS
| Feature | LOESS (This Crate) | LOWESS |
|---|---|---|
| Polynomial Degree | Linear, Quadratic, Cubic, Quartic | Linear (Degree 1) |
| Dimensions | Multivariate (n-D support) | Univariate (1-D only) |
| Flexibility | High (Distance metrics) | Standard |
| Complexity | Higher (Matrix inversion) | Lower (Weighted average/slope) |
[!TIP] For a LOWESS implementation which is faster and simpler, use
lowess.
Features
- Robust Statistics: IRLS with Bisquare, Huber, or Talwar weighting for outlier handling.
- Multidimensional Smoothing: Support for n-D data with customizable distance metrics (Euclidean, Manhattan, etc.).
- Flexible Fitting: Linear, Quadratic, Cubic, and Quartic local polynomials.
- Uncertainty Quantification: Point-wise standard errors, confidence intervals, and prediction intervals.
- Optimized Performance: Interpolation surface with Tensor Product Hermite interpolation and streaming/online modes for large or real-time datasets.
- Parameter Selection: Built-in cross-validation for automatic smoothing fraction selection.
- Flexibility: Multiple weight kernels (Tricube, Epanechnikov, etc.) and
no_stdsupport (requiresalloc). - Validated: Numerical twin of R's
stats::loesswith exact match (< 1e-12 diff).
Performance
Benchmarked against R's loess. Achieves 3.3×–25× faster performance across all tested scenarios. No regressions observed.
Summary
| Category | Matched | Median Speedup | Mean Speedup |
|---|---|---|---|
| Fraction | 6 | 6.03× | 9.30× |
| Iterations | 6 | 8.79× | 8.91× |
| Polynomial Degrees | 2 | 8.84× | 8.84× |
| Pathological | 4 | 6.88× | 7.58× |
| Financial | 3 | 4.30× | 4.36× |
| Scalability | 2 | 3.99× | 3.99× |
| Dimensions | 3 | 3.85× | 3.91× |
| Scientific | 3 | 3.75× | 3.70× |
| Genomic | 2 | 3.32× | 3.32× |
Top 10 Performance Wins
| Benchmark | Rust | R | Speedup |
|---|---|---|---|
| fraction_0.67 | 0.86ms | 21.63ms | 25.23× |
| fraction_0.5 | 1.14ms | 12.85ms | 11.25× |
| iterations_1 | 0.76ms | 8.44ms | 11.12× |
| high_noise | 1.50ms | 15.86ms | 10.55× |
| degree_quadratic | 0.79ms | 7.86ms | 9.91× |
| iterations_2 | 0.92ms | 8.95ms | 9.76× |
| iterations_3 | 1.08ms | 9.73ms | 9.01× |
| iterations_5 | 1.49ms | 12.73ms | 8.57× |
| degree_linear | 0.76ms | 5.86ms | 7.76× |
| iterations_0 | 0.75ms | 5.69ms | 7.56× |
Check Benchmarks for detailed results and reproducible benchmarking code.
Validation
The Rust loess-rs crate is a numerical twin of R's loess implementation:
| Aspect | Status | Details |
|---|---|---|
| Accuracy | ✅ EXACT MATCH | Max diff < 1e-12 across all scenarios |
| Consistency | ✅ PERFECT | 20/20 scenarios pass with strict tolerance |
| Robustness | ✅ VERIFIED | Robust smoothing matches R exactly |
Check Validation for detailed scenario results.
Installation
Add this to your Cargo.toml:
[]
= "0.1"
For no_std environments:
[]
= { = "0.1", = false }
Quick Start
use *;
Summary:
Data points: 5
Fraction: 0.5
Smoothed Data:
X Y_smooth
--------------------
1.00 2.00000
2.00 4.10000
3.00 5.90000
4.00 8.20000
5.00 9.80000
Builder Methods
All builder parameters have sensible defaults. You only need to specify what you want to change.
use *;
new
// Smoothing span (0, 1] - default: 0.67
.fraction
// Polynomial degree - default: Linear
.degree
// Number of dimensions - default: 1
.dimensions
// Distance metric - default: Euclidean
.distance_metric
// Robustness iterations - default: 3
.iterations
// Kernel selection - default: Tricube
.weight_function
// Robustness method - default: Bisquare
.robustness_method
// Boundary handling - default: Extend
.boundary_policy
// Confidence intervals (Batch only)
.confidence_intervals
// Prediction intervals (Batch only)
.prediction_intervals
// Include diagnostics
.return_diagnostics
.return_residuals
.return_robustness_weights
// Cross-validation (Batch only)
.cross_validate
// Auto-convergence
.auto_converge
// Interpolation settings
.surface_mode
// Interpolation cell size - default: 0.2
.cell
// Execution mode
.adapter
// Build the model
.build?;
Execution Modes
| Adapter | Use Case | Features |
|---|---|---|
Batch |
Complete datasets in memory | All features supported |
Streaming |
Large datasets (>100K points) | Chunked processing, overlap |
Online |
Real-time data, sensor streams | Incremental updates |
Streaming Processing
For datasets that don't fit in memory:
let mut processor = new
.fraction
.iterations
.adapter
.chunk_size
.overlap
.build?;
// Process data in chunks
let result1 = processor.process_chunk?;
let result2 = processor.process_chunk?;
// Finalize to get remaining buffered data
let final_result = processor.finalize?;
Online Processing
For real-time data streams:
let mut processor = new
.fraction
.iterations
.adapter
.window_capacity
.build?;
// Process points as they arrive
for i in 1..=10
Parameter Selection Guide
Fraction (Smoothing Span)
- 0.1-0.3: Fine detail, may be noisy
- 0.3-0.5: Moderate smoothing (good for most cases)
- 0.5-0.7: Heavy smoothing, emphasizes trends
- 0.7-1.0: Very smooth, may over-smooth
- Default: 0.67 (Cleveland's choice)
Robustness Iterations
- 0: No robustness (fastest, sensitive to outliers)
- 1-3: Light to moderate robustness (recommended)
- 4-6: Strong robustness (for contaminated data)
- 7+: Diminishing returns
Polynomial Degree
- Constant: Local weighted mean (smoothing only)
- Linear (default): Standard LOESS, good bias-variance balance
- Quadratic: Better for peaks/valleys, higher variance
- Cubic/Quartic: Specialized high-order fitting
Kernel Function
- Tricube (default): Best all-around, Cleveland's original choice
- Epanechnikov: Theoretically optimal MSE
- Gaussian: Maximum smoothness, no compact support
- Uniform: Fastest, least smooth (moving average)
Boundary Policy
- Extend (default): Pad with constant values
- Reflect: Mirror data at boundaries (for periodic/symmetric data)
- Zero: Pad with zeros (signal processing)
- NoBoundary: Original Cleveland behavior
Note: For nD data,
Extenddefaults toNoBoundaryto preserve regression accuracy.
Examples
MSRV
Rust 1.85.0 or later (2024 Edition).
Robustness Advantages
This implementation uses MAD-based scale estimation for robustness weight calculations:
s = median(|r_i - median(r)|)
MAD is a breakdown-point-optimal estimator—it remains valid even when up to 50% of data are outliers, compared to the median of absolute residuals used by some other implementations.
Median Absolute Residual (MAR), which is the default Cleveland's choice, is also available through the scaling_method parameter.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
Dual-licensed under AGPL-3.0 (Open Source) or Commercial License.
Contact <thisisamirv@gmail.com> for commercial inquiries.
References
- Cleveland, W.S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". Journal of the American Statistical Association.
- Cleveland, W.S. & Devlin, S.J. (1988). "Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting". Journal of the American Statistical Association.