# lowess
[](LICENSE)
A production-grade Rust implementation of LOWESS (Locally Weighted Scatterplot Smoothing) with robust statistics, parallel execution, and streaming capabilities for large datasets.
## What is LOWESS?
LOWESS is a non-parametric regression method that fits smooth curves through scatter plots using locally weighted polynomial regression. Originally developed by Cleveland (1979), it's widely used in:
- 🧬 **Genomics & Bioinformatics**: DNA methylation analysis, ChIP-seq normalization, gene expression smoothing
- 📊 **Time Series Analysis**: Trend extraction, seasonal adjustment, noise reduction
- 🔬 **Scientific Computing**: Exploratory data analysis, visualization, preprocessing
- 📈 **Signal Processing**: Baseline correction, peak detection, data denoising
## Features
### Core Capabilities
- ✨ **Robust Smoothing**: Iteratively reweighted least squares (IRLS) with multiple kernel functions
- 🎯 **Confidence & Prediction Intervals**: Per-point standard errors and statistical intervals
- 🔧 **Multiple Kernels**: Tricube (default), Epanechnikov, Gaussian, Uniform, Quartic, Cosine, Triangle
- 📏 **Automatic Fraction Selection**: Cross-validation (simple RMSE, k-fold, LOOCV)
- ⚡ **Delta Optimization**: Fast interpolation for dense data
### Advanced Features
- 🚀 **Parallel Execution**: Optional multi-threaded CV and fitting (feature = `"parallel"`)
- 💾 **Streaming Processing**: Memory-efficient variants for large datasets
- 📊 **Rich Diagnostics**: RMSE, MAE, R², AIC, AICc, effective degrees of freedom
- 🔢 **ndarray Integration**: Seamless conversion to/from ndarray (feature = `"ndarray"`)
- 🎛️ **no_std Compatible**: Works in embedded environments with `alloc`
### Quality & Reliability
- ✅ Type-safe builder pattern with compile-time validation
- 🛡️ Comprehensive error handling (no panics in release builds)
- 📐 Numerically stable with defensive fallbacks
- 🔁 Deterministic outputs for reproducible research
- 📚 Extensively documented with production guidance
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
lowess = "0.1"
# Optional features
[dependencies.lowess]
version = "0.1"
features = ["parallel", "ndarray"]
```
### Feature Flags
| `parallel` | Enable parallel cross-validation and fitting | `rayon` |
| `ndarray` | Integration with ndarray arrays | `ndarray` |
| `std` | Standard library support (enabled by default) | - |
## Quick Start
### Basic Smoothing
```rust
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];
// Simple smoothing with defaults (fraction=0.67, no robustness iterations)
let result = Lowess::new()
.fit(&x, &y)
.unwrap();
println!("Smoothed values: {:?}", result.y);
```
### Robust Smoothing with Outliers
```rust
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0];
let y = vec![2.0, 4.0, 6.0, 100.0, 10.0, 12.0]; // outlier at index 3
// Robust smoothing with 5 IRLS iterations
let result = Lowess::robust()
.fit(&x, &y)
.unwrap();
```
### With Confidence Intervals
```rust
use lowess::Lowess;
let result = Lowess::new()
.fraction(0.5)
.iterations(3)
.with_confidence_intervals(0.95)
.with_all_diagnostics()
.fit(&x, &y)
.unwrap();
println!("RMSE: {:?}", result.diagnostics.unwrap().rmse);
println!("Lower CI: {:?}", result.confidence_lower);
println!("Upper CI: {:?}", result.confidence_upper);
```
### Automatic Fraction Selection
```rust
use lowess::Lowess;
// Cross-validate to find optimal smoothing fraction
let candidates = vec![0.2, 0.3, 0.5, 0.7];
let result = Lowess::new()
.cross_validate(&candidates)
.fit(&x, &y)
.unwrap();
println!("Selected fraction: {}", result.fraction_used);
println!("CV scores: {:?}", result.cv_scores);
```
### Streaming for Large Datasets
```rust
use lowess::{Lowess, StreamingLowess};
// Process data in chunks to avoid memory issues
let base_config = Lowess::new().fraction(0.3).iterations(2);
let mut streaming = StreamingLowess::new(base_config, 1000);
// Process chunks
let chunk1_result = streaming.process_chunk(&x_chunk1, &y_chunk1)?;
let chunk2_result = streaming.process_chunk(&x_chunk2, &y_chunk2)?;
```
### Online Smoothing (Sliding Window)
```rust
use lowess::{Lowess, OnlineLowess};
// Real-time smoothing with sliding window
let base_config = Lowess::new().fraction(0.2);
let mut online = OnlineLowess::new(base_config, 100); // window size
for (&xi, &yi) in x.iter().zip(y.iter()) {
let smoothed = online.update(xi, yi)?;
println!("Current smooth value: {}", smoothed);
}
```
## API Overview
### Builder Pattern
```rust
Lowess::new()
.fraction(0.5) // Smoothing span (0, 1]
.iterations(3) // Robustness iterations
.delta(Some(0.01)) // Interpolation threshold
.kernel(WeightFunction::Epanechnikov)
.with_confidence_intervals(0.95)
.with_prediction_intervals(0.95)
.with_all_diagnostics()
.cross_validate(&[0.3, 0.5, 0.7])
.auto_converge(1e-4) // Early stopping
.max_iterations(20)
.fit(&x, &y)?
```
### Result Structure
```rust
pub struct LowessResult<T> {
pub x: Vec<T>, // Sorted x values
pub y: Vec<T>, // Smoothed y values
pub standard_errors: Option<Vec<T>>, // Per-point SE
pub confidence_lower: Option<Vec<T>>, // CI lower bounds
pub confidence_upper: Option<Vec<T>>, // CI upper bounds
pub prediction_lower: Option<Vec<T>>, // PI lower bounds
pub prediction_upper: Option<Vec<T>>, // PI upper bounds
pub residuals: Option<Vec<T>>, // Residuals
pub robustness_weights: Option<Vec<T>>, // Final IRLS weights
pub diagnostics: Option<Diagnostics<T>>,// Fit statistics
pub iterations_used: Option<usize>, // Actual iterations
pub fraction_used: T, // Final fraction
pub cv_scores: Option<Vec<T>>, // CV RMSE scores
}
```
### Diagnostics
```rust
pub struct Diagnostics<T> {
pub rmse: T, // Root mean squared error
pub mae: T, // Mean absolute error
pub r_squared: T, // Coefficient of determination
pub aic: T, // Akaike Information Criterion
pub aicc: T, // Corrected AIC (small samples)
pub effective_df: T, // Effective degrees of freedom
pub count_downweighted: usize, // Points with low robustness weight
}
```
## Algorithm Details
### LOWESS Procedure
1. **Sorting**: Data sorted by x-values for deterministic windowing
2. **Local Neighborhoods**: For each point, select k nearest neighbors (k = fraction × n)
3. **Kernel Weighting**: Apply distance-based kernel (default: tricube)
4. **Weighted Regression**: Fit locally weighted least squares
5. **Robustness Iterations** (optional): Reweight using bisquare function on residuals
6. **Delta Interpolation** (optional): Linearly interpolate between anchor points
### Weight Functions
```rust
// Tricube (Cleveland's default)
// Epanechnikov
w(d) = 1 - d² for |d| < 1, else 0
// Gaussian
w(d) = exp(-d²/2)
```
### Robustness Weighting (Bisquare)
```rust
// After initial fit, compute residuals r_i
// Estimate scale: s = MAD(residuals) × 1.4826
// Robustness weights:
## Performance
### Complexity
- **Basic**: O(n²) for n data points
- **With delta**: Effectively O(n × k) where k ≪ n for dense data
- **Parallel CV**: Linear speedup with available cores
## Numerical Stability
The implementation includes several safeguards:
- **Scale estimation fallbacks**: MAD → mean absolute residual when MAD ≈ 0
- **Minimum tuned scales**: Clamped to ε > 0 to avoid division by zero
- **Zero-weight neighborhoods**: Configurable fallback policies
- **Uniform weight fallback**: When all kernel weights evaluate to zero
- **Auto-convergence**: Prevents excessive iterations in stable fits
## Comparison with R
This implementation is equivalent to R's `stats::lowess()`:
```r
# R
result <- lowess(x, y, f = 2/3, iter = 3, delta = 0.01)
```
```rust
// Rust (equivalent)
let result = Lowess::new()
.fraction(2.0/3.0)
.iterations(3)
.delta(Some(0.01))
.fit(&x, &y)?;
```
Key differences:
- ✨ More kernel options beyond tricube
- 📊 Statistical intervals and diagnostics
- 🔧 Cross-validation built-in
- 🚀 Parallel execution support
- 💾 Streaming variants for large data
## Error Handling
```rust
use lowess::{Lowess, LowessError};
match Lowess::new().fit(&x, &y) {
Ok(result) => println!("Success: {:?}", result.y),
Err(LowessError::EmptyInput) => println!("Empty arrays"),
Err(LowessError::MismatchedInputs { x_len, y_len }) =>
println!("Length mismatch: x={}, y={}", x_len, y_len),
Err(LowessError::InvalidFraction(f)) =>
println!("Bad fraction: {}", f),
Err(e) => println!("Other error: {}", e),
}
```
## Testing
```bash
# Run all tests
cargo test
# Run with all features
cargo test --all-features
# Run benchmarks
cargo bench
# Check documentation
cargo doc --open
```
## Production Usage Guidelines
### Best Practices
1. **Pre-clean inputs**: Remove NaNs/infs before calling `fit()`
2. **Sort data**: Pre-sort x for reproducible window semantics
3. **Choose appropriate fraction**:
- 0.2-0.3 for very local features
- 0.5-0.7 for general trends
- Use cross-validation when uncertain
4. **Enable diagnostics**: Monitor RMSE, effective_df in production
5. **Use delta**: Enable for dense data (>1000 points)
6. **Tune robustness**: 2-3 iterations sufficient for most data
### Monitoring
```rust
let result = Lowess::new()
.with_all_diagnostics()
.fit(&x, &y)?;
if let Some(diag) = result.diagnostics {
if diag.effective_df < 2.0 {
log::warn!("Low effective degrees of freedom: {}", diag.effective_df);
}
if diag.count_downweighted > x.len() / 2 {
log::warn!("Over 50% points downweighted - check for systematic issues");
}
}
```
## Contributing
Contributions are welcome! Areas of interest:
- Additional kernel functions
- More cross-validation strategies
- GPU acceleration
- Python bindings (PyO3)
- Additional examples and tutorials
Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on reporting bugs, submitting pull requests, testing, and the development workflow.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## References
### Academic Papers
- Cleveland, W. S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". *Journal of the American Statistical Association* 74(368): 829-836. [DOI: 10.2307/2286407](https://doi.org/10.2307/2286407)
- Cleveland, W. S. (1981). "LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression". *The American Statistician* 35(1): 54. [DOI: 10.2307/2683591](https://doi.org/10.2307/2683591)
### Related Implementations
- [R stats::lowess](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lowess.html) - Original R implementation
- [Python statsmodels.lowess](https://www.statsmodels.org/stable/generated/statsmodels.nonparametric.smoothers_lowess.lowess.html)
## Citation
If you use this crate in academic work, please cite:
```bibtex
@software{lowess_rust,
author = {Valizadeh, Amir},
title = {lowess: Production-grade LOWESS smoothing for Rust},
year = {2025},
url = {https://github.com/thisisamirv/lowess}
}
```
## Author
**Amir Valizadeh**
📧 [thisisamirv@gmail.com](mailto:thisisamirv@gmail.com)
## Acknowledgments
- Based on Cleveland's original LOWESS algorithm
- Inspired by implementations in R and Python statsmodels
- Built with the Rust scientific computing ecosystem
---