Crate lowess

Expand description

§LOWESS — Locally Weighted Scatterplot Smoothing for Rust

A production-ready, high-performance LOWESS implementation with comprehensive features for robust nonparametric regression and trend estimation.

§What is LOWESS?

LOWESS (Locally Weighted Scatterplot Smoothing) is a nonparametric regression method that fits smooth curves through scatter plots. At each point, it fits a weighted polynomial (typically linear) using nearby data points, with weights decreasing smoothly with distance. This creates flexible, data-adaptive curves without assuming a global functional form.

Key advantages:

No parametric assumptions about the underlying relationship
Automatic adaptation to local data structure
Robust to outliers (with robustness iterations enabled)
Provides uncertainty estimates via confidence/prediction intervals
Handles irregular sampling and missing regions gracefully

Common applications:

Exploratory data analysis and visualization
Trend estimation in time series
Baseline correction in spectroscopy and signal processing
Quality control and process monitoring
Genomic and epigenomic data smoothing
Removing systematic effects in scientific measurements

§Quick Start

use lowess::Lowess;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];

// Basic smoothing with default parameters
let result = Lowess::new()
    .fraction(0.5)      // Use 50% of data for each local fit
    .iterations(3)      // 3 robustness iterations
    .fit(&x, &y)
    .unwrap();

println!("Smoothed values: {:?}", result.y);

§Core Features

§1. Robust Smoothing

Outlier handling via iteratively reweighted least squares (IRLS):

let result = Lowess::new()
    .fraction(0.5)
    .iterations(5)                                    // More iterations = stronger outlier rejection
    .robustness_method(RobustnessMethod::Bisquare)    // Default: Bisquare (Tukey)
    .with_robustness_weights()                        // Include weights in output
    .fit(&x, &y)
    .unwrap();

// Check which points were downweighted
if let Some(weights) = &result.robustness_weights {
    for (i, &w) in weights.iter().enumerate() {
        if w < 0.5 {
            println!("Point {} is likely an outlier (weight: {:.3})", i, w);
        }
    }
}

Available robustness methods:

Bisquare (default): Smooth downweighting, Cleveland’s original choice
- Use for: General-purpose robust smoothing
- Good balance between outlier resistance and efficiency
Huber: Less aggressive, linear downweighting beyond threshold
- Use for: Moderate outliers, when Bisquare is too aggressive
- Better for heavy-tailed but not contaminated data
Talwar: Hard threshold rejection
- Use for: Extreme contamination, when you want complete outlier rejection
- Most aggressive, use with caution

§2. Kernel Functions

Different kernels control how neighboring points are weighted by distance:

let result = Lowess::new()
    .fraction(0.5)
    .weight_function(WeightFunction::Tricube)  // Default
    .fit(&x, &y)
    .unwrap();

Kernel selection guide:

Kernel	Efficiency	Smoothness	Use Case
Tricube (default)	0.998	Very smooth	Best all-around choice
Epanechnikov	1.000	Smooth	Theoretically optimal MSE
Gaussian	0.961	Infinitely smooth	Very smooth data, no boundary effects
Biweight	0.995	Very smooth	Good alternative to Tricube
Cosine	0.999	Smooth	Alternative compact kernel
Triangle	0.989	Moderate	Simple, fast
Uniform	0.943	None	Fastest, moving average

Efficiency = AMISE relative to Epanechnikov (1.0 = optimal)

When to choose each kernel:

Tricube: Use by default. Smooth, efficient, well-tested.
Epanechnikov: When theoretical optimality matters. Slightly less smooth.
Gaussian: For very smooth underlying functions. Computationally expensive.
Biweight/Cosine: Similar to Tricube, use if you prefer their properties.
Triangle: When simplicity/speed is priority over smoothness.
Uniform: Moving average, fastest but roughest. For quick exploration.

§3. Uncertainty Quantification

Compute confidence intervals (for the mean) and prediction intervals (for new observations):

// Confidence intervals (uncertainty in the smoothed mean)
let result = Lowess::new()
    .fraction(0.5)
    .with_confidence_intervals(0.95)  // 95% CI
    .fit(&x, &y)
    .unwrap();

for i in 0..x.len() {
    println!(
        "x={:.1}: y={:.2} [{:.2}, {:.2}]",
        x[i],
        result.y[i],
        result.confidence_lower.as_ref().unwrap()[i],
        result.confidence_upper.as_ref().unwrap()[i]
    );
}

// Prediction intervals (where new observations will likely fall)
let result = Lowess::new()
    .fraction(0.5)
    .with_both_intervals(0.95)  // Both CI and PI
    .fit(&x, &y)
    .unwrap();

// Prediction intervals are always wider than confidence intervals
assert!(result.has_confidence_intervals());
assert!(result.has_prediction_intervals());

Interval types:

Confidence intervals: Uncertainty in the smoothed mean function
- Use for: Understanding precision of the trend estimate
- Narrower intervals = more confident about the mean
Prediction intervals: Uncertainty for new individual observations
- Use for: Forecasting where new data points will fall
- Always wider (includes data scatter + estimation uncertainty)

§4. Automatic Parameter Selection

Let cross-validation find the optimal smoothing fraction:

// Simple CV (fastest)
let result = Lowess::new()
    .cross_validate(&[0.2, 0.3, 0.5, 0.7])
    .fit(&x, &y)
    .unwrap();

println!("Selected fraction: {}", result.fraction_used);
println!("CV scores: {:?}", result.cv_scores);

// K-fold CV (more robust, recommended)
let result = Lowess::new()
    .cross_validate_kfold(&[0.2, 0.3, 0.5, 0.7], 5)  // 5-fold
    .fit(&x, &y)
    .unwrap();

CV method comparison:

Method	Speed	Robustness	When to Use
Simple	Fastest	Low	Quick exploration, large datasets
K-fold	Medium	Good	General use (k=5 or 10 recommended)
LOOCV	Slowest	Best	Small datasets (n<100), when accuracy critical

Choosing fractions to test:

Small (0.1-0.3): Local, wiggly fits. Good for capturing rapid changes.
Medium (0.4-0.6): Balanced. Good general-purpose default range.
Large (0.7-0.9): Global, smooth fits. Good for removing noise, finding trends.

§5. Diagnostic Statistics

Comprehensive fit quality metrics:

let result = Lowess::new()
    .fraction(0.5)
    .with_all_diagnostics()
    .fit(&x, &y)
    .unwrap();

if let Some(diag) = &result.diagnostics {
    println!("{}", diag);  // Pretty-printed diagnostics
    // RMSE, MAE, R², AIC, AICc, effective DF, residual SD
}

// Health check for common issues
let issues = result.health_check();
if !issues.is_empty() {
    println!("Potential problems:");
    for issue in issues {
        println!("  - {}", issue);
    }
}

Available diagnostics:

RMSE: Root mean squared error (lower is better)
MAE: Mean absolute error (robust to outliers)
R²: Proportion of variance explained (0-1, higher is better)
AIC/AICc: Model selection criteria (lower is better)
Effective DF: Smoothing degrees of freedom (higher = less smoothing)
Residual SD: Scale of unexplained variation

§6. Convergence Control

Automatic iteration stopping:

let result = Lowess::new()
    .fraction(0.5)
    .auto_converge(1e-4)      // Stop when change < 0.0001
    .max_iterations(20)        // Safety limit
    .fit(&x, &y)
    .unwrap();

println!("Converged in {} iterations", result.iterations_used.unwrap());

When to use auto-convergence:

Unknown optimal iteration count
Want to ensure convergence for varying data quality
Willing to trade some speed for guaranteed convergence

Tolerance selection:

1e-3: Loose convergence, faster
1e-4 to 1e-6: Standard (recommended)
<1e-6: Very tight, slower, usually unnecessary

§Advanced Features

§Parallel Processing

With the "std" feature (enabled by default), parallel processing is automatic:

[dependencies]
lowess = "0.3"  # std feature enabled by default

use lowess::Lowess;

// Automatically uses parallel processing for n > 1000
let result = Lowess::new()
    .fraction(0.5)
    .fit(&large_x, &large_y)
    .unwrap();

// Or disable parallel (force sequential) for small datasets or most deterministic results
let result = Lowess::new()
    .parallel(false)  // Force sequential
    .fit(&large_x, &large_y)
    .unwrap();

Parallel speedups (typical):

Sequential bottleneck: ~O(n²) for n points
Parallel speedup: 2-4x on 4-8 cores
Best for: n > 1000, especially with cross-validation

§Streaming/Online Processing

For datasets too large for memory or real-time applications:

use lowess::{Lowess, ProcessingMode};

// Streaming mode for large datasets
let builder = Lowess::new()
    .fraction(0.5)
    .for_mode(ProcessingMode::Streaming)
    .chunk_size(1000)
    .build().unwrap();

// Online mode for incremental updates
let builder = Lowess::new()
    .fraction(0.5)
    .for_mode(ProcessingMode::Online)
    .window_size(500)
    .build().unwrap();

§Delta Optimization

Speed up computation on dense data via interpolation:

let result = Lowess::new()
    .fraction(0.5)
    .delta(0.01)  // Interpolate points within 0.01 distance
    .fit(&x, &y)
    .unwrap();

Delta guidelines:

None (default): Auto-compute as 1% of x-range
0.0: No interpolation (slowest, most accurate)
0.001 - 0.01: Good balance for dense data
>0.01: Aggressive interpolation (faster but less accurate)

When delta helps:

Dense, evenly-spaced data (e.g., time series at 1Hz)
Large datasets where speed matters
Can reduce runtime by 50-90% with minimal accuracy loss

§Practical Examples

§Example 1: Time Series Trend Extraction

// Extract smooth trend from noisy time series
let result = Lowess::new()
    .fraction(0.3)                    // Local smoothing
    .iterations(2)                     // Light robustness
    .weight_function(lowess::WeightFunction::Tricube)
    .with_confidence_intervals(0.95)
    .fit(&time, &signal)
    .unwrap();

// Detrend: subtract the smooth trend
let detrended: Vec<f64> = signal.iter()
    .zip(&result.y)
    .map(|(orig, smooth)| orig - smooth)
    .collect();

§Example 2: Outlier Detection and Removal

// Identify outliers using robust smoothing
let result = Lowess::new()
    .fraction(0.5)
    .iterations(5)                           // Strong outlier rejection
    .robustness_method(RobustnessMethod::Bisquare)
    .with_robustness_weights()
    .with_residuals()
    .fit(&x, &y)
    .unwrap();

// Flag outliers (low weight + large residual)
if let (Some(weights), Some(residuals)) =
    (&result.robustness_weights, &result.residuals) {
    for (i, (w, r)) in weights.iter().copied().zip(residuals.iter().copied()).enumerate() {
        if w < 0.1f64 && r.abs() > 2.0f64 {
            println!("Outlier at index {}: weight={:.3}, residual={:.3}", i, w, r);
        }
    }
}

§Example 3: Model Comparison via AIC

// Compare different smoothing levels
let fractions = [0.3, 0.5, 0.7];
let mut best_aic = f64::INFINITY;
let mut best_frac = 0.0;

for &frac in &fractions {
    let result = Lowess::new()
        .fraction(frac)
        .with_diagnostics()
        .fit(&x, &y)
        .unwrap();
     
    if let Some(diag) = &result.diagnostics {
        if let Some(aic) = diag.aic {
            if aic < best_aic {
                best_aic = aic;
                best_frac = frac;
            }
        }
    }
}

println!("Best fraction: {} (AIC: {:.2})", best_frac, best_aic);

§Example 4: Genomic Data Smoothing

// Smooth methylation levels across genomic positions
let result = Lowess::new()
    .fraction(0.2)                          // Local smoothing for position data
    .iterations(3)                          // Robust to technical outliers
    .weight_function(lowess::WeightFunction::Tricube)
    .delta(100.0)                           // Interpolate nearby positions (bp)
    .with_confidence_intervals(0.95)
    .fit(&position, &methylation)
    .unwrap();

§Parameter Selection Guide

§Fraction (Smoothing Span)

How to choose:

Start with default (0.67) or use cross-validation
Visualize with multiple fractions: try 0.2, 0.5, 0.8
Consider your goal:
- Trend extraction: Larger (0.6-0.9)
- Noise reduction: Medium (0.4-0.6)
- Detail preservation: Smaller (0.2-0.4)

Diagnostics:

Too small: Undersmoothing (follows noise, high variance)
Too large: Oversmoothing (misses features, high bias)
Check effective DF: Lower DF = more smoothing

§Robustness Iterations

How many iterations:

0: Clean data, speed critical
1-2: Light contamination, mild outliers
3: Default, good general purpose
4-5: Heavy contamination, strong outliers
>5: Extreme cases, diminishing returns

Method selection:

Start with Bisquare (default)
Try Huber if Bisquare is too aggressive
Use Talwar only for severe contamination

§Kernel Function

Default: Tricube - rarely needs changing

When to change:

Gaussian: Underlying function is very smooth, no sharp features
Epanechnikov: Want theoretical optimality, slightly faster
Uniform: Quick exploration, don’t care about smoothness

Most users should stick with Tricube unless they have specific requirements.

§Performance Considerations

§Computational Complexity

Sequential: O(n²) for n points
With delta optimization: O(n × m) where m << n
Parallel (with std): O(n²/p) for p cores
Cross-validation: Multiply by number of fractions tested

§Memory Usage

Basic fit: O(n) - stores smoothed values
With intervals: O(n) - adds standard errors and bounds
With diagnostics: O(n) - adds residuals, weights
Streaming mode: O(chunk_size) - constant memory

§Speed Optimization Tips

Use delta optimization for dense data (can give 10x speedup)
Enable std feature for automatic parallelism on n > 1000
Reduce robustness iterations if outliers aren’t a concern
Use simpler kernels (Triangle/Uniform) for quick exploration
Disable diagnostics if not needed
Use Simple CV instead of K-fold for large datasets

§Common Pitfalls and Solutions

§Issue: Fit is too wiggly/noisy

Solution: Increase fraction (try 0.7-0.8) or use cross-validation

§Issue: Fit is too smooth/missing features

Solution: Decrease fraction (try 0.2-0.3) or check for oversmoothing

§Issue: Outliers not being removed

Solution: Increase iterations (try 5) or switch to Talwar robustness

§Issue: Too many points marked as outliers

Solution: Reduce iterations or switch to Huber robustness

§Issue: Slow performance on large datasets

Solutions:

Enable delta optimization
Ensure std feature is enabled (default)
Consider streaming mode
Reduce robustness iterations

§Issue: Confidence intervals are too narrow/wide

Check:

Are you using the right interval type? (confidence vs prediction)
Is the residual variance estimate reasonable?
Try with different robustness settings

§Architecture

The crate is organized in a layered architecture:

Layer 5: builder, adapters/        (High-level API, execution modes)
         ↓
Layer 4: engine/                   (Iteration & convergence logic)
         ↓
Layer 3: algorithms/               (Regression, robustness, confidence)
         ↓
Layer 2: primitives/               (Traits, types, math utilities)
         ↓
Layer 1: foundation/               (Errors, constants)

This structure eliminates circular dependencies and makes the codebase maintainable and extensible.

§Feature Flags

std (default): Standard library support
- Enables parallel execution via Rayon
- Automatic dispatch for large inputs
- Parallel cross-validation
- 2-4x speedup on multi-core systems
- Disable for no_std embedded environments (requires alloc)

§Cargo Configuration

# Default: includes std and parallelism
[dependencies]
lowess = "0.3"

# No-std mode (e.g., for embedded systems)
[dependencies]
lowess = { version = "0.3", default-features = false }

§References and Further Reading

Original papers:

Cleveland, W.S. (1979). “Robust Locally Weighted Regression and Smoothing Scatterplots”
Cleveland, W.S. (1981). “LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression”

Theory and applications:

Cleveland, W.S. & Devlin, S.J. (1988). “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting”
Fan, J. & Gijbels, I. (1996). “Local Polynomial Modelling and Its Applications”

Kernel density estimation:

Wand, M.P. & Jones, M.C. (1995). “Kernel Smoothing”

§no_std Support

This crate supports no_std environments with alloc:

[dependencies]
lowess = { version = "0.3", default-features = false }

Limitations without std:

No parallel processing (requires Rayon which needs std)
No streaming mode (requires threading)
All other features available (batch processing, all algorithms)

§Convenience Constructors

Three pre-configured builders for common use cases:

use lowess::Lowess;

// For noisy data with outliers
let result = Lowess::robust().fit(&x, &y)?;

// For speed on clean data
let result = Lowess::quick().fit(&x, &y)?;

// For comprehensive analysis
let result = Lowess::detailed().fit(&x, &y)?;

§License and Contributing

See the repository for license information and contribution guidelines.

Re-exports§

pub use algorithms::BisquareRobustness;
pub use algorithms::LowessVarianceEstimator;
pub use algorithms::StandardFitter;
pub use algorithms::WeightFunction;
pub use builder::LowessBuilder as Lowess;
pub use builder::LowessResult;
pub use builder::ProcessingMode;
pub use builder::ProcessingVariant;
pub use builder::RobustnessMethod;
pub use foundation::LowessError;
pub use foundation::Result;
pub use primitives::traits::FitContext;
pub use primitives::traits::PointFitter;
pub use primitives::traits::RobustnessUpdater;
pub use primitives::traits::StandardErrorComputer;
pub use primitives::traits::Window;
pub use primitives::traits::ZeroWeightFallback;

Modules§

adapters: Adapters layer - execution modes and integration.
algorithms: Algorithms layer - regression, robustness, and confidence calculations.
builder: Builder module - high-level fluent API.
engine: Engine layer - iteration orchestration and convergence.
foundation: Foundation layer - error types, constants, result type.
primitives: Primitives layer - core traits, types, and utilities.

Crate lowess

Crate lowess Copy item path

§LOWESS — Locally Weighted Scatterplot Smoothing for Rust

§What is LOWESS?

§Quick Start

§Core Features

§1. Robust Smoothing

§2. Kernel Functions

§3. Uncertainty Quantification

§4. Automatic Parameter Selection

§5. Diagnostic Statistics

§6. Convergence Control

§Advanced Features

§Parallel Processing

§Streaming/Online Processing

§Delta Optimization

§Practical Examples

§Example 1: Time Series Trend Extraction

§Example 2: Outlier Detection and Removal

§Example 3: Model Comparison via AIC

§Example 4: Genomic Data Smoothing

§Parameter Selection Guide

§Fraction (Smoothing Span)

§Robustness Iterations

§Kernel Function

§Performance Considerations

§Computational Complexity

§Memory Usage

§Speed Optimization Tips

§Common Pitfalls and Solutions

§Issue: Fit is too wiggly/noisy

§Issue: Fit is too smooth/missing features

§Issue: Outliers not being removed

§Issue: Too many points marked as outliers

§Issue: Slow performance on large datasets

§Issue: Confidence intervals are too narrow/wide

§Architecture

§Feature Flags

§Cargo Configuration

§References and Further Reading

§no_std Support

§Convenience Constructors

§License and Contributing

Re-exports§

Modules§

Crate lowess