Expand description
§LOWESS — Locally Weighted Scatterplot Smoothing for Rust
A production-ready, high-performance LOWESS implementation with comprehensive features for robust nonparametric regression and trend estimation.
§What is LOWESS?
LOWESS (Locally Weighted Scatterplot Smoothing) is a nonparametric regression method that fits smooth curves through scatter plots. At each point, it fits a weighted polynomial (typically linear) using nearby data points, with weights decreasing smoothly with distance. This creates flexible, data-adaptive curves without assuming a global functional form.
Key advantages:
- No parametric assumptions about the underlying relationship
- Automatic adaptation to local data structure
- Robust to outliers (with robustness iterations enabled)
- Provides uncertainty estimates via confidence/prediction intervals
- Handles irregular sampling and missing regions gracefully
Common applications:
- Exploratory data analysis and visualization
- Trend estimation in time series
- Baseline correction in spectroscopy and signal processing
- Quality control and process monitoring
- Genomic and epigenomic data smoothing
- Removing systematic effects in scientific measurements
§Quick Start
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];
// Basic smoothing with default parameters
let result = Lowess::new()
.fraction(0.5) // Use 50% of data for each local fit
.iterations(3) // 3 robustness iterations
.fit(&x, &y)
.unwrap();
println!("Smoothed values: {:?}", result.y);§Core Features
§1. Robust Smoothing
Outlier handling via iteratively reweighted least squares (IRLS):
let result = Lowess::new()
.fraction(0.5)
.iterations(5) // More iterations = stronger outlier rejection
.robustness_method(RobustnessMethod::Bisquare) // Default: Bisquare (Tukey)
.with_robustness_weights() // Include weights in output
.fit(&x, &y)
.unwrap();
// Check which points were downweighted
if let Some(weights) = &result.robustness_weights {
for (i, &w) in weights.iter().enumerate() {
if w < 0.5 {
println!("Point {} is likely an outlier (weight: {:.3})", i, w);
}
}
}Available robustness methods:
- Bisquare (default): Smooth downweighting, Cleveland’s original choice
- Use for: General-purpose robust smoothing
- Good balance between outlier resistance and efficiency
- Huber: Less aggressive, linear downweighting beyond threshold
- Use for: Moderate outliers, when Bisquare is too aggressive
- Better for heavy-tailed but not contaminated data
- Talwar: Hard threshold rejection
- Use for: Extreme contamination, when you want complete outlier rejection
- Most aggressive, use with caution
§2. Kernel Functions
Different kernels control how neighboring points are weighted by distance:
let result = Lowess::new()
.fraction(0.5)
.weight_function(WeightFunction::Tricube) // Default
.fit(&x, &y)
.unwrap();Kernel selection guide:
| Kernel | Efficiency | Smoothness | Use Case |
|---|---|---|---|
| Tricube (default) | 0.998 | Very smooth | Best all-around choice |
| Epanechnikov | 1.000 | Smooth | Theoretically optimal MSE |
| Gaussian | 0.961 | Infinitely smooth | Very smooth data, no boundary effects |
| Biweight | 0.995 | Very smooth | Good alternative to Tricube |
| Cosine | 0.999 | Smooth | Alternative compact kernel |
| Triangle | 0.989 | Moderate | Simple, fast |
| Uniform | 0.943 | None | Fastest, moving average |
Efficiency = AMISE relative to Epanechnikov (1.0 = optimal)
When to choose each kernel:
- Tricube: Use by default. Smooth, efficient, well-tested.
- Epanechnikov: When theoretical optimality matters. Slightly less smooth.
- Gaussian: For very smooth underlying functions. Computationally expensive.
- Biweight/Cosine: Similar to Tricube, use if you prefer their properties.
- Triangle: When simplicity/speed is priority over smoothness.
- Uniform: Moving average, fastest but roughest. For quick exploration.
§3. Uncertainty Quantification
Compute confidence intervals (for the mean) and prediction intervals (for new observations):
// Confidence intervals (uncertainty in the smoothed mean)
let result = Lowess::new()
.fraction(0.5)
.with_confidence_intervals(0.95) // 95% CI
.fit(&x, &y)
.unwrap();
for i in 0..x.len() {
println!(
"x={:.1}: y={:.2} [{:.2}, {:.2}]",
x[i],
result.y[i],
result.confidence_lower.as_ref().unwrap()[i],
result.confidence_upper.as_ref().unwrap()[i]
);
}// Prediction intervals (where new observations will likely fall)
let result = Lowess::new()
.fraction(0.5)
.with_both_intervals(0.95) // Both CI and PI
.fit(&x, &y)
.unwrap();
// Prediction intervals are always wider than confidence intervals
assert!(result.has_confidence_intervals());
assert!(result.has_prediction_intervals());Interval types:
- Confidence intervals: Uncertainty in the smoothed mean function
- Use for: Understanding precision of the trend estimate
- Narrower intervals = more confident about the mean
- Prediction intervals: Uncertainty for new individual observations
- Use for: Forecasting where new data points will fall
- Always wider (includes data scatter + estimation uncertainty)
§4. Automatic Parameter Selection
Let cross-validation find the optimal smoothing fraction:
// Simple CV (fastest)
let result = Lowess::new()
.cross_validate(&[0.2, 0.3, 0.5, 0.7])
.fit(&x, &y)
.unwrap();
println!("Selected fraction: {}", result.fraction_used);
println!("CV scores: {:?}", result.cv_scores);// K-fold CV (more robust, recommended)
let result = Lowess::new()
.cross_validate_kfold(&[0.2, 0.3, 0.5, 0.7], 5) // 5-fold
.fit(&x, &y)
.unwrap();CV method comparison:
| Method | Speed | Robustness | When to Use |
|---|---|---|---|
| Simple | Fastest | Low | Quick exploration, large datasets |
| K-fold | Medium | Good | General use (k=5 or 10 recommended) |
| LOOCV | Slowest | Best | Small datasets (n<100), when accuracy critical |
Choosing fractions to test:
- Small (0.1-0.3): Local, wiggly fits. Good for capturing rapid changes.
- Medium (0.4-0.6): Balanced. Good general-purpose default range.
- Large (0.7-0.9): Global, smooth fits. Good for removing noise, finding trends.
§5. Diagnostic Statistics
Comprehensive fit quality metrics:
let result = Lowess::new()
.fraction(0.5)
.with_all_diagnostics()
.fit(&x, &y)
.unwrap();
if let Some(diag) = &result.diagnostics {
println!("{}", diag); // Pretty-printed diagnostics
// RMSE, MAE, R², AIC, AICc, effective DF, residual SD
}
// Health check for common issues
let issues = result.health_check();
if !issues.is_empty() {
println!("Potential problems:");
for issue in issues {
println!(" - {}", issue);
}
}Available diagnostics:
- RMSE: Root mean squared error (lower is better)
- MAE: Mean absolute error (robust to outliers)
- R²: Proportion of variance explained (0-1, higher is better)
- AIC/AICc: Model selection criteria (lower is better)
- Effective DF: Smoothing degrees of freedom (higher = less smoothing)
- Residual SD: Scale of unexplained variation
§6. Convergence Control
Automatic iteration stopping:
let result = Lowess::new()
.fraction(0.5)
.auto_converge(1e-4) // Stop when change < 0.0001
.max_iterations(20) // Safety limit
.fit(&x, &y)
.unwrap();
println!("Converged in {} iterations", result.iterations_used.unwrap());When to use auto-convergence:
- Unknown optimal iteration count
- Want to ensure convergence for varying data quality
- Willing to trade some speed for guaranteed convergence
Tolerance selection:
1e-3: Loose convergence, faster1e-4to1e-6: Standard (recommended)<1e-6: Very tight, slower, usually unnecessary
§Advanced Features
§Parallel Processing
With the "std" feature (enabled by default), parallel processing is automatic:
[dependencies]
lowess = "0.3" # std feature enabled by defaultuse lowess::Lowess;
// Automatically uses parallel processing for n > 1000
let result = Lowess::new()
.fraction(0.5)
.fit(&large_x, &large_y)
.unwrap();
// Or disable parallel (force sequential) for small datasets or most deterministic results
let result = Lowess::new()
.parallel(false) // Force sequential
.fit(&large_x, &large_y)
.unwrap();Parallel speedups (typical):
- Sequential bottleneck: ~O(n²) for n points
- Parallel speedup: 2-4x on 4-8 cores
- Best for: n > 1000, especially with cross-validation
§Streaming/Online Processing
For datasets too large for memory or real-time applications:
use lowess::{Lowess, ProcessingMode};
// Streaming mode for large datasets
let builder = Lowess::new()
.fraction(0.5)
.for_mode(ProcessingMode::Streaming)
.chunk_size(1000)
.build().unwrap();
// Online mode for incremental updates
let builder = Lowess::new()
.fraction(0.5)
.for_mode(ProcessingMode::Online)
.window_size(500)
.build().unwrap();§Delta Optimization
Speed up computation on dense data via interpolation:
let result = Lowess::new()
.fraction(0.5)
.delta(0.01) // Interpolate points within 0.01 distance
.fit(&x, &y)
.unwrap();Delta guidelines:
None(default): Auto-compute as 1% of x-range0.0: No interpolation (slowest, most accurate)0.001 - 0.01: Good balance for dense data>0.01: Aggressive interpolation (faster but less accurate)
When delta helps:
- Dense, evenly-spaced data (e.g., time series at 1Hz)
- Large datasets where speed matters
- Can reduce runtime by 50-90% with minimal accuracy loss
§Practical Examples
§Example 1: Time Series Trend Extraction
// Extract smooth trend from noisy time series
let result = Lowess::new()
.fraction(0.3) // Local smoothing
.iterations(2) // Light robustness
.weight_function(lowess::WeightFunction::Tricube)
.with_confidence_intervals(0.95)
.fit(&time, &signal)
.unwrap();
// Detrend: subtract the smooth trend
let detrended: Vec<f64> = signal.iter()
.zip(&result.y)
.map(|(orig, smooth)| orig - smooth)
.collect();§Example 2: Outlier Detection and Removal
// Identify outliers using robust smoothing
let result = Lowess::new()
.fraction(0.5)
.iterations(5) // Strong outlier rejection
.robustness_method(RobustnessMethod::Bisquare)
.with_robustness_weights()
.with_residuals()
.fit(&x, &y)
.unwrap();
// Flag outliers (low weight + large residual)
if let (Some(weights), Some(residuals)) =
(&result.robustness_weights, &result.residuals) {
for (i, (w, r)) in weights.iter().copied().zip(residuals.iter().copied()).enumerate() {
if w < 0.1f64 && r.abs() > 2.0f64 {
println!("Outlier at index {}: weight={:.3}, residual={:.3}", i, w, r);
}
}
}§Example 3: Model Comparison via AIC
// Compare different smoothing levels
let fractions = [0.3, 0.5, 0.7];
let mut best_aic = f64::INFINITY;
let mut best_frac = 0.0;
for &frac in &fractions {
let result = Lowess::new()
.fraction(frac)
.with_diagnostics()
.fit(&x, &y)
.unwrap();
if let Some(diag) = &result.diagnostics {
if let Some(aic) = diag.aic {
if aic < best_aic {
best_aic = aic;
best_frac = frac;
}
}
}
}
println!("Best fraction: {} (AIC: {:.2})", best_frac, best_aic);§Example 4: Genomic Data Smoothing
// Smooth methylation levels across genomic positions
let result = Lowess::new()
.fraction(0.2) // Local smoothing for position data
.iterations(3) // Robust to technical outliers
.weight_function(lowess::WeightFunction::Tricube)
.delta(100.0) // Interpolate nearby positions (bp)
.with_confidence_intervals(0.95)
.fit(&position, &methylation)
.unwrap();§Parameter Selection Guide
§Fraction (Smoothing Span)
How to choose:
- Start with default (0.67) or use cross-validation
- Visualize with multiple fractions: try 0.2, 0.5, 0.8
- Consider your goal:
- Trend extraction: Larger (0.6-0.9)
- Noise reduction: Medium (0.4-0.6)
- Detail preservation: Smaller (0.2-0.4)
Diagnostics:
- Too small: Undersmoothing (follows noise, high variance)
- Too large: Oversmoothing (misses features, high bias)
- Check effective DF: Lower DF = more smoothing
§Robustness Iterations
How many iterations:
- 0: Clean data, speed critical
- 1-2: Light contamination, mild outliers
- 3: Default, good general purpose
- 4-5: Heavy contamination, strong outliers
- >5: Extreme cases, diminishing returns
Method selection:
- Start with Bisquare (default)
- Try Huber if Bisquare is too aggressive
- Use Talwar only for severe contamination
§Kernel Function
Default: Tricube - rarely needs changing
When to change:
- Gaussian: Underlying function is very smooth, no sharp features
- Epanechnikov: Want theoretical optimality, slightly faster
- Uniform: Quick exploration, don’t care about smoothness
Most users should stick with Tricube unless they have specific requirements.
§Performance Considerations
§Computational Complexity
- Sequential: O(n²) for n points
- With delta optimization: O(n × m) where m << n
- Parallel (with std): O(n²/p) for p cores
- Cross-validation: Multiply by number of fractions tested
§Memory Usage
- Basic fit: O(n) - stores smoothed values
- With intervals: O(n) - adds standard errors and bounds
- With diagnostics: O(n) - adds residuals, weights
- Streaming mode: O(chunk_size) - constant memory
§Speed Optimization Tips
- Use delta optimization for dense data (can give 10x speedup)
- Enable std feature for automatic parallelism on n > 1000
- Reduce robustness iterations if outliers aren’t a concern
- Use simpler kernels (Triangle/Uniform) for quick exploration
- Disable diagnostics if not needed
- Use Simple CV instead of K-fold for large datasets
§Common Pitfalls and Solutions
§Issue: Fit is too wiggly/noisy
Solution: Increase fraction (try 0.7-0.8) or use cross-validation
§Issue: Fit is too smooth/missing features
Solution: Decrease fraction (try 0.2-0.3) or check for oversmoothing
§Issue: Outliers not being removed
Solution: Increase iterations (try 5) or switch to Talwar robustness
§Issue: Too many points marked as outliers
Solution: Reduce iterations or switch to Huber robustness
§Issue: Slow performance on large datasets
Solutions:
- Enable delta optimization
- Ensure std feature is enabled (default)
- Consider streaming mode
- Reduce robustness iterations
§Issue: Confidence intervals are too narrow/wide
Check:
- Are you using the right interval type? (confidence vs prediction)
- Is the residual variance estimate reasonable?
- Try with different robustness settings
§Architecture
The crate is organized in a layered architecture:
Layer 5: builder, adapters/ (High-level API, execution modes)
↓
Layer 4: engine/ (Iteration & convergence logic)
↓
Layer 3: algorithms/ (Regression, robustness, confidence)
↓
Layer 2: primitives/ (Traits, types, math utilities)
↓
Layer 1: foundation/ (Errors, constants)This structure eliminates circular dependencies and makes the codebase maintainable and extensible.
§Feature Flags
std(default): Standard library support- Enables parallel execution via Rayon
- Automatic dispatch for large inputs
- Parallel cross-validation
- 2-4x speedup on multi-core systems
- Disable for
no_stdembedded environments (requiresalloc)
§Cargo Configuration
# Default: includes std and parallelism
[dependencies]
lowess = "0.3"
# No-std mode (e.g., for embedded systems)
[dependencies]
lowess = { version = "0.3", default-features = false }§References and Further Reading
Original papers:
- Cleveland, W.S. (1979). “Robust Locally Weighted Regression and Smoothing Scatterplots”
- Cleveland, W.S. (1981). “LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression”
Theory and applications:
- Cleveland, W.S. & Devlin, S.J. (1988). “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting”
- Fan, J. & Gijbels, I. (1996). “Local Polynomial Modelling and Its Applications”
Kernel density estimation:
- Wand, M.P. & Jones, M.C. (1995). “Kernel Smoothing”
§no_std Support
This crate supports no_std environments with alloc:
[dependencies]
lowess = { version = "0.3", default-features = false }Limitations without std:
- No parallel processing (requires Rayon which needs std)
- No streaming mode (requires threading)
- All other features available (batch processing, all algorithms)
§Convenience Constructors
Three pre-configured builders for common use cases:
use lowess::Lowess;
// For noisy data with outliers
let result = Lowess::robust().fit(&x, &y)?;
// For speed on clean data
let result = Lowess::quick().fit(&x, &y)?;
// For comprehensive analysis
let result = Lowess::detailed().fit(&x, &y)?;§License and Contributing
See the repository for license information and contribution guidelines.
Re-exports§
pub use algorithms::BisquareRobustness;pub use algorithms::LowessVarianceEstimator;pub use algorithms::StandardFitter;pub use algorithms::WeightFunction;pub use builder::LowessBuilder as Lowess;pub use builder::LowessResult;pub use builder::ProcessingMode;pub use builder::ProcessingVariant;pub use builder::RobustnessMethod;pub use foundation::LowessError;pub use foundation::Result;pub use primitives::traits::FitContext;pub use primitives::traits::PointFitter;pub use primitives::traits::RobustnessUpdater;pub use primitives::traits::StandardErrorComputer;pub use primitives::traits::Window;pub use primitives::traits::ZeroWeightFallback;
Modules§
- adapters
- Adapters layer - execution modes and integration.
- algorithms
- Algorithms layer - regression, robustness, and confidence calculations.
- builder
- Builder module - high-level fluent API.
- engine
- Engine layer - iteration orchestration and convergence.
- foundation
- Foundation layer - error types, constants, result type.
- primitives
- Primitives layer - core traits, types, and utilities.