fastLowess
High-performance parallel LOWESS (Locally Weighted Scatterplot Smoothing) for Rust — Built on top of the lowess crate with rayon-based parallelism and seamless ndarray integration.
Why This Crate?
- ⚡ Blazingly Fast: 18-352× faster performance than Python's statsmodels with parallel execution
- 🚀 Parallel by Default: Automatic multi-core utilization via rayon
- 📊 ndarray Integration: First-class support for
Array1<T>data types - 🎯 Production-Ready: Comprehensive error handling, numerical stability, extensive testing
- 📈 Feature-Rich: Confidence/prediction intervals, multiple kernels, cross-validation
- 🔬 Scientific: Validated against R and Python implementations
- 🛠️ Flexible: Multiple robustness methods, streaming/online modes
Relationship with lowess Crate
fastLowess is a high-level wrapper around the core lowess crate that adds:
- Parallel execution via rayon for multi-core systems
- ndarray support for seamless integration with scientific computing workflows
- Extended API with parallel-specific configuration options
[!IMPORTANT] Need no-std support? Use
lowess(GitHub)Need polars support? Use
polars-lowess(GitHub)
Quick Start
use *;
let x = vec!;
let y = vec!;
// Basic smoothing (parallel by default)
let result = new
.fraction
.adapter
.build?
.fit?;
println!;
# Ok::
Installation
[]
= "0.1"
Features at a Glance
| Feature | Description | Use Case |
|---|---|---|
| Parallel Execution | Multi-core processing via rayon | Large datasets, speed |
| ndarray Support | Native Array1<T> compatibility |
Scientific computing |
| Robust Smoothing | IRLS with Bisquare/Huber/Talwar weights | Outlier-contaminated data |
| Confidence Intervals | Point-wise standard errors & bounds | Uncertainty quantification |
| Cross-Validation | Auto-select optimal fraction | Unknown smoothing parameter |
| Multiple Kernels | Tricube, Epanechnikov, Gaussian, etc. | Different smoothness profiles |
| Streaming Mode | Constant memory usage | Very large datasets |
| Delta Optimization | Skip dense regions | 10× speedup on dense data |
Common Use Cases
1. Parallel Processing (Default)
use *;
# let x = vec!;
# let y = vec!;
// Parallel execution is enabled by default
let result = new
.fraction
.iterations
.adapter
.build?
.fit?;
println!;
# Ok::
2. ndarray Integration
use *;
use Array1;
let x: = linspace;
let y: = x.mapv;
// Works directly with ndarray types
let result = new
.fraction
.adapter
.build?
.fit?;
# Ok::
3. Explicit Parallel Control
use *;
# let x = vec!;
# let y = vec!;
// Disable parallelism for small datasets or debugging
let result = new
.fraction
.adapter
.parallel // Force sequential execution
.build?
.fit?;
# Ok::
4. Robust Smoothing (Handle Outliers)
use *;
# let x = vec!;
# let y = vec!;
let result = new
.fraction
.iterations // Robust iterations
.return_robustness_weights // Return outlier weights
.adapter
.build?
.fit?;
// Check which points were downweighted
if let Some = &result.robustness_weights
# Ok::
5. Uncertainty Quantification
use *;
# let x = vec!;
# let y = vec!;
let result = new
.fraction
.weight_function
.confidence_intervals
.prediction_intervals
.adapter
.build?
.fit?;
// Plot confidence bands
for i in 0..x.len
# Ok::
6. Automatic Parameter Selection
use *;
# let x = vec!;
# let y = vec!;
// Let cross-validation find the optimal smoothing fraction
let result = new
.cross_validate
.adapter
.build?
.fit?;
println!;
println!;
# Ok::
7. Large Dataset Optimization
use *;
# let large_x: = .map.collect;
# let large_y: = large_x.iter.map.collect;
// Enable all performance optimizations
let result = new
.fraction
.delta // Skip dense regions
.adapter // Parallel by default
.build?
.fit?;
# Ok::
8. Production Monitoring
use *;
# let x = vec!;
# let y = vec!;
let result = new
.fraction
.iterations
.return_diagnostics
.adapter
.build?
.fit?;
if let Some = &result.diagnostics
# Ok::
9. Convenience Constructors
Pre-configured builders for common scenarios:
use *;
# let x = vec!;
# let y = vec!;
// For noisy data with outliers
let result = robust.adapter.build?.fit?;
// For speed on clean data
let result = quick.adapter.build?.fit?;
# Ok::
API Overview
Builder Methods
use *;
new
// Core parameters
.fraction // Smoothing span (0, 1], default: 0.67
.iterations // Robustness iterations, default: 3
.delta // Interpolation threshold
// Parallel execution
.parallel // Enable/disable parallelism (default: true)
// Kernel selection
.weight_function // Default
// Robustness method
.robustness_method // Default
// Intervals & diagnostics
.confidence_intervals
.prediction_intervals
.return_diagnostics
.return_residuals
.return_robustness_weights
// Parameter selection
.cross_validate
// Convergence
.auto_converge
.max_iterations
// Execution mode
.adapter // or Streaming, Online
.build?; // Build the model
Result Structure
Execution Modes
Choose the right execution mode based on your use case:
Batch Processing (Standard)
For complete datasets in memory with full feature support:
use *;
# let x = vec!;
# let y = vec!;
let model = new
.fraction
.confidence_intervals
.return_diagnostics
.adapter // Parallel by default
.build?;
let result = model.fit?;
# Ok::
Streaming Processing
For large datasets (>100K points) that don't fit in memory:
use *;
let mut processor = new
.fraction
.iterations
.adapter
.chunk_size // Process 1000 points at a time
.overlap // 100 points overlap between chunks
.build?;
// Process data in chunks
for chunk in data_chunks
let final_result = processor.finalize?;
# Ok::
Online/Incremental Processing
For real-time data streams with sliding window:
use *;
let mut processor = new
.fraction
.iterations
.adapter
.window_capacity // Keep last 100 points
.build?;
// Process points as they arrive
for in data_stream
# Ok::
Parallel Execution
How It Works
fastLowess uses rayon for automatic parallelization across multiple CPU cores:
- Data partitioning: Input data is divided across available threads
- Independent fits: Each thread computes local polynomial fits for its partition
- Thread-safe collection: Results are collected without locks via rayon's parallel iterators
Performance Characteristics
| Dataset Size | Cores | Typical Speedup |
|---|---|---|
| 1,000 | 4 | 2-3× |
| 10,000 | 4 | 3-4× |
| 100,000 | 8 | 5-7× |
| 1,000,000 | 8 | 6-8× |
When to Disable Parallelism
# use *;
# let x = vec!;
# let y = vec!;
// Use .parallel(false) for:
let result = new
.fraction
.parallel // Disable parallelism
.adapter
.build?
.fit?;
# Ok::
Consider disabling parallelism when:
- Small datasets (<500 points): Thread overhead exceeds benefits
- Debugging: Sequential execution is easier to trace
- Resource constraints: Limiting CPU usage on shared systems
- Single-core systems: No benefit from parallelism
Parameter Selection Guide
Fraction (Smoothing Span)
- 0.1-0.3: Local, captures rapid changes (wiggly)
- 0.4-0.6: Balanced, general-purpose
- 0.7-1.0: Global, smooth trends only
- Default: 0.67 (2/3, Cleveland's choice)
- Use CV when uncertain
Robustness Iterations
- 0: Clean data, speed critical
- 1-2: Light contamination
- 3: Default, good balance (recommended)
- 4-5: Heavy outliers
- >5: Diminishing returns
Kernel Function
- Tricube (default): Best all-around, smooth, efficient
- Epanechnikov: Theoretically optimal MSE
- Gaussian: Very smooth, no compact support
- Uniform: Fastest, least smooth (moving average)
Delta Optimization
- None: Small datasets (n < 1000)
- 0.01 × range(x): Good starting point for dense data
- Manual tuning: Adjust based on data density
Error Handling
use *;
# let x = vec!;
# let y = vec!;
match new.adapter.build?.fit
# Ok::
Examples
Comprehensive examples are available in the examples/ directory:
-
batch_smoothing.rs- Batch processing scenarios- Basic smoothing, robust outlier handling, uncertainty quantification
- Cross-validation, diagnostics, kernel comparisons
- Parallel vs sequential execution comparison
-
online_smoothing.rs- Real-time processing- Basic streaming, sensor data simulation, outlier handling
- Window size effects, memory-bounded processing, sliding window behavior
-
streaming_smoothing.rs- Large dataset processing- Basic chunking, chunk size comparison, overlap strategies
- Large dataset processing, outlier handling, file-based simulation
Run examples with:
Feature Flags
default: Standard configuration with parallel supportdev: Exposes internal modules for testing
Standard configuration
[]
= "0.1"
Validation
This implementation has been extensively validated against:
- R's stats::lowess: Numerical agreement to machine precision
- Python's statsmodels: Validated on 44 test scenarios
- Cleveland's original paper: Reproduces published examples
MSRV (Minimum Supported Rust Version)
Rust 1.85.0 or later (requires Rust Edition 2024).
Contributing
Contributions welcome! See CONTRIBUTING.md for:
- Bug reports and feature requests
- Pull request guidelines
- Development workflow
- Testing requirements
License
This software is dual-licensed under:
- AGPL-3.0 — Free for open-source use with source disclosure requirements
- Commercial License — For proprietary/closed-source applications
For commercial licensing inquiries, contact: thisisamirv@gmail.com
See LICENSE for full details.
References
Original papers:
-
Cleveland, W.S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". Journal of the American Statistical Association, 74(368): 829-836. DOI:10.2307/2286407
-
Cleveland, W.S. (1981). "LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression". The American Statistician, 35(1): 54.
Related implementations:
- R stats::lowess
- Python statsmodels
- lowess crate - Core Rust implementation
- polars-lowess crate - Polars integration
Citation
Author
Amir Valizadeh
📧 thisisamirv@gmail.com
🔗 GitHub
Keywords: LOWESS, LOESS, local regression, nonparametric regression, smoothing, robust statistics, time series, bioinformatics, genomics, signal processing, parallel, ndarray, rayon