Crate lowess

Crate lowess 

Source
Expand description

§LOWESS — Locally Weighted Scatterplot Smoothing for Rust

A production-ready, high-performance LOWESS implementation with comprehensive features for robust nonparametric regression and trend estimation.

§What is LOWESS?

LOWESS (Locally Weighted Scatterplot Smoothing) is a nonparametric regression method that fits smooth curves through scatter plots. At each point, it fits a weighted polynomial (typically linear) using nearby data points, with weights decreasing smoothly with distance. This creates flexible, data-adaptive curves without assuming a global functional form.

Key advantages:

  • No parametric assumptions about the underlying relationship
  • Automatic adaptation to local data structure
  • Robust to outliers (with robustness iterations enabled)
  • Provides uncertainty estimates via confidence/prediction intervals
  • Handles irregular sampling and missing regions gracefully

Common applications:

  • Exploratory data analysis and visualization
  • Trend estimation in time series
  • Baseline correction in spectroscopy and signal processing
  • Quality control and process monitoring
  • Genomic and epigenomic data smoothing
  • Removing systematic effects in scientific measurements

§Quick Start

use lowess::Lowess;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];

// Basic smoothing with default parameters
let result = Lowess::new()
    .fraction(0.5)      // Use 50% of data for each local fit
    .iterations(3)      // 3 robustness iterations
    .fit(&x, &y)
    .unwrap();

println!("Smoothed values: {:?}", result.y);

§Core Features

§1. Robust Smoothing

Outlier handling via iteratively reweighted least squares (IRLS):

let result = Lowess::new()
    .fraction(0.5)
    .iterations(5)                                    // More iterations = stronger outlier rejection
    .robustness_method(RobustnessMethod::Bisquare)    // Default: Bisquare (Tukey)
    .with_robustness_weights()                        // Include weights in output
    .fit(&x, &y)
    .unwrap();

// Check which points were downweighted
if let Some(weights) = &result.robustness_weights {
    for (i, &w) in weights.iter().enumerate() {
        if w < 0.5 {
            println!("Point {} is likely an outlier (weight: {:.3})", i, w);
        }
    }
}

Available robustness methods:

  • Bisquare (default): Smooth downweighting, Cleveland’s original choice
    • Use for: General-purpose robust smoothing
    • Good balance between outlier resistance and efficiency
  • Huber: Less aggressive, linear downweighting beyond threshold
    • Use for: Moderate outliers, when Bisquare is too aggressive
    • Better for heavy-tailed but not contaminated data
  • Talwar: Hard threshold rejection
    • Use for: Extreme contamination, when you want complete outlier rejection
    • Most aggressive, use with caution

§2. Kernel Functions

Different kernels control how neighboring points are weighted by distance:

let result = Lowess::new()
    .fraction(0.5)
    .weight_function(WeightFunction::Tricube)  // Default
    .fit(&x, &y)
    .unwrap();

Kernel selection guide:

KernelEfficiencySmoothnessUse Case
Tricube (default)0.998Very smoothBest all-around choice
Epanechnikov1.000SmoothTheoretically optimal MSE
Gaussian0.961Infinitely smoothVery smooth data, no boundary effects
Biweight0.995Very smoothGood alternative to Tricube
Cosine0.999SmoothAlternative compact kernel
Triangle0.989ModerateSimple, fast
Uniform0.943NoneFastest, moving average

Efficiency = AMISE relative to Epanechnikov (1.0 = optimal)

When to choose each kernel:

  • Tricube: Use by default. Smooth, efficient, well-tested.
  • Epanechnikov: When theoretical optimality matters. Slightly less smooth.
  • Gaussian: For very smooth underlying functions. Computationally expensive.
  • Biweight/Cosine: Similar to Tricube, use if you prefer their properties.
  • Triangle: When simplicity/speed is priority over smoothness.
  • Uniform: Moving average, fastest but roughest. For quick exploration.

§3. Uncertainty Quantification

Compute confidence intervals (for the mean) and prediction intervals (for new observations):

// Confidence intervals (uncertainty in the smoothed mean)
let result = Lowess::new()
    .fraction(0.5)
    .with_confidence_intervals(0.95)  // 95% CI
    .fit(&x, &y)
    .unwrap();

for i in 0..x.len() {
    println!(
        "x={:.1}: y={:.2} [{:.2}, {:.2}]",
        x[i],
        result.y[i],
        result.confidence_lower.as_ref().unwrap()[i],
        result.confidence_upper.as_ref().unwrap()[i]
    );
}
// Prediction intervals (where new observations will likely fall)
let result = Lowess::new()
    .fraction(0.5)
    .with_both_intervals(0.95)  // Both CI and PI
    .fit(&x, &y)
    .unwrap();

// Prediction intervals are always wider than confidence intervals
assert!(result.has_confidence_intervals());
assert!(result.has_prediction_intervals());

Interval types:

  • Confidence intervals: Uncertainty in the smoothed mean function
    • Use for: Understanding precision of the trend estimate
    • Narrower intervals = more confident about the mean
  • Prediction intervals: Uncertainty for new individual observations
    • Use for: Forecasting where new data points will fall
    • Always wider (includes data scatter + estimation uncertainty)

§4. Automatic Parameter Selection

Let cross-validation find the optimal smoothing fraction:

// Simple CV (fastest)
let result = Lowess::new()
    .cross_validate(&[0.2, 0.3, 0.5, 0.7])
    .fit(&x, &y)
    .unwrap();

println!("Selected fraction: {}", result.fraction_used);
println!("CV scores: {:?}", result.cv_scores);
// K-fold CV (more robust, recommended)
let result = Lowess::new()
    .cross_validate_kfold(&[0.2, 0.3, 0.5, 0.7], 5)  // 5-fold
    .fit(&x, &y)
    .unwrap();

CV method comparison:

MethodSpeedRobustnessWhen to Use
SimpleFastestLowQuick exploration, large datasets
K-foldMediumGoodGeneral use (k=5 or 10 recommended)
LOOCVSlowestBestSmall datasets (n<100), when accuracy critical

Choosing fractions to test:

  • Small (0.1-0.3): Local, wiggly fits. Good for capturing rapid changes.
  • Medium (0.4-0.6): Balanced. Good general-purpose default range.
  • Large (0.7-0.9): Global, smooth fits. Good for removing noise, finding trends.

§5. Diagnostic Statistics

Comprehensive fit quality metrics:

let result = Lowess::new()
    .fraction(0.5)
    .with_all_diagnostics()
    .fit(&x, &y)
    .unwrap();

if let Some(diag) = &result.diagnostics {
    println!("{}", diag);  // Pretty-printed diagnostics
    // RMSE, MAE, R², AIC, AICc, effective DF, residual SD
}

// Health check for common issues
let issues = result.health_check();
if !issues.is_empty() {
    println!("Potential problems:");
    for issue in issues {
        println!("  - {}", issue);
    }
}

Available diagnostics:

  • RMSE: Root mean squared error (lower is better)
  • MAE: Mean absolute error (robust to outliers)
  • : Proportion of variance explained (0-1, higher is better)
  • AIC/AICc: Model selection criteria (lower is better)
  • Effective DF: Smoothing degrees of freedom (higher = less smoothing)
  • Residual SD: Scale of unexplained variation

§6. Convergence Control

Automatic iteration stopping:

let result = Lowess::new()
    .fraction(0.5)
    .auto_converge(1e-4)      // Stop when change < 0.0001
    .max_iterations(20)        // Safety limit
    .fit(&x, &y)
    .unwrap();

println!("Converged in {} iterations", result.iterations_used.unwrap());

When to use auto-convergence:

  • Unknown optimal iteration count
  • Want to ensure convergence for varying data quality
  • Willing to trade some speed for guaranteed convergence

Tolerance selection:

  • 1e-3: Loose convergence, faster
  • 1e-4 to 1e-6: Standard (recommended)
  • <1e-6: Very tight, slower, usually unnecessary

§Advanced Features

§Parallel Processing

With the "std" feature (enabled by default), parallel processing is automatic:

[dependencies]
lowess = "0.3"  # std feature enabled by default
use lowess::Lowess;

// Automatically uses parallel processing for n > 1000
let result = Lowess::new()
    .fraction(0.5)
    .fit(&large_x, &large_y)
    .unwrap();

// Or disable parallel (force sequential) for small datasets or most deterministic results
let result = Lowess::new()
    .parallel(false)  // Force sequential
    .fit(&large_x, &large_y)
    .unwrap();

Parallel speedups (typical):

  • Sequential bottleneck: ~O(n²) for n points
  • Parallel speedup: 2-4x on 4-8 cores
  • Best for: n > 1000, especially with cross-validation

§Streaming/Online Processing

For datasets too large for memory or real-time applications:

use lowess::{Lowess, ProcessingMode};

// Streaming mode for large datasets
let builder = Lowess::new()
    .fraction(0.5)
    .for_mode(ProcessingMode::Streaming)
    .chunk_size(1000)
    .build().unwrap();

// Online mode for incremental updates
let builder = Lowess::new()
    .fraction(0.5)
    .for_mode(ProcessingMode::Online)
    .window_size(500)
    .build().unwrap();

§Delta Optimization

Speed up computation on dense data via interpolation:

let result = Lowess::new()
    .fraction(0.5)
    .delta(0.01)  // Interpolate points within 0.01 distance
    .fit(&x, &y)
    .unwrap();

Delta guidelines:

  • None (default): Auto-compute as 1% of x-range
  • 0.0: No interpolation (slowest, most accurate)
  • 0.001 - 0.01: Good balance for dense data
  • >0.01: Aggressive interpolation (faster but less accurate)

When delta helps:

  • Dense, evenly-spaced data (e.g., time series at 1Hz)
  • Large datasets where speed matters
  • Can reduce runtime by 50-90% with minimal accuracy loss

§Practical Examples

§Example 1: Time Series Trend Extraction

// Extract smooth trend from noisy time series
let result = Lowess::new()
    .fraction(0.3)                    // Local smoothing
    .iterations(2)                     // Light robustness
    .weight_function(lowess::WeightFunction::Tricube)
    .with_confidence_intervals(0.95)
    .fit(&time, &signal)
    .unwrap();

// Detrend: subtract the smooth trend
let detrended: Vec<f64> = signal.iter()
    .zip(&result.y)
    .map(|(orig, smooth)| orig - smooth)
    .collect();

§Example 2: Outlier Detection and Removal

// Identify outliers using robust smoothing
let result = Lowess::new()
    .fraction(0.5)
    .iterations(5)                           // Strong outlier rejection
    .robustness_method(RobustnessMethod::Bisquare)
    .with_robustness_weights()
    .with_residuals()
    .fit(&x, &y)
    .unwrap();

// Flag outliers (low weight + large residual)
if let (Some(weights), Some(residuals)) =
    (&result.robustness_weights, &result.residuals) {
    for (i, (w, r)) in weights.iter().copied().zip(residuals.iter().copied()).enumerate() {
        if w < 0.1f64 && r.abs() > 2.0f64 {
            println!("Outlier at index {}: weight={:.3}, residual={:.3}", i, w, r);
        }
    }
}

§Example 3: Model Comparison via AIC

// Compare different smoothing levels
let fractions = [0.3, 0.5, 0.7];
let mut best_aic = f64::INFINITY;
let mut best_frac = 0.0;

for &frac in &fractions {
    let result = Lowess::new()
        .fraction(frac)
        .with_diagnostics()
        .fit(&x, &y)
        .unwrap();
     
    if let Some(diag) = &result.diagnostics {
        if let Some(aic) = diag.aic {
            if aic < best_aic {
                best_aic = aic;
                best_frac = frac;
            }
        }
    }
}

println!("Best fraction: {} (AIC: {:.2})", best_frac, best_aic);

§Example 4: Genomic Data Smoothing

// Smooth methylation levels across genomic positions
let result = Lowess::new()
    .fraction(0.2)                          // Local smoothing for position data
    .iterations(3)                          // Robust to technical outliers
    .weight_function(lowess::WeightFunction::Tricube)
    .delta(100.0)                           // Interpolate nearby positions (bp)
    .with_confidence_intervals(0.95)
    .fit(&position, &methylation)
    .unwrap();

§Parameter Selection Guide

§Fraction (Smoothing Span)

How to choose:

  1. Start with default (0.67) or use cross-validation
  2. Visualize with multiple fractions: try 0.2, 0.5, 0.8
  3. Consider your goal:
    • Trend extraction: Larger (0.6-0.9)
    • Noise reduction: Medium (0.4-0.6)
    • Detail preservation: Smaller (0.2-0.4)

Diagnostics:

  • Too small: Undersmoothing (follows noise, high variance)
  • Too large: Oversmoothing (misses features, high bias)
  • Check effective DF: Lower DF = more smoothing

§Robustness Iterations

How many iterations:

  • 0: Clean data, speed critical
  • 1-2: Light contamination, mild outliers
  • 3: Default, good general purpose
  • 4-5: Heavy contamination, strong outliers
  • >5: Extreme cases, diminishing returns

Method selection:

  • Start with Bisquare (default)
  • Try Huber if Bisquare is too aggressive
  • Use Talwar only for severe contamination

§Kernel Function

Default: Tricube - rarely needs changing

When to change:

  • Gaussian: Underlying function is very smooth, no sharp features
  • Epanechnikov: Want theoretical optimality, slightly faster
  • Uniform: Quick exploration, don’t care about smoothness

Most users should stick with Tricube unless they have specific requirements.

§Performance Considerations

§Computational Complexity

  • Sequential: O(n²) for n points
  • With delta optimization: O(n × m) where m << n
  • Parallel (with std): O(n²/p) for p cores
  • Cross-validation: Multiply by number of fractions tested

§Memory Usage

  • Basic fit: O(n) - stores smoothed values
  • With intervals: O(n) - adds standard errors and bounds
  • With diagnostics: O(n) - adds residuals, weights
  • Streaming mode: O(chunk_size) - constant memory

§Speed Optimization Tips

  1. Use delta optimization for dense data (can give 10x speedup)
  2. Enable std feature for automatic parallelism on n > 1000
  3. Reduce robustness iterations if outliers aren’t a concern
  4. Use simpler kernels (Triangle/Uniform) for quick exploration
  5. Disable diagnostics if not needed
  6. Use Simple CV instead of K-fold for large datasets

§Common Pitfalls and Solutions

§Issue: Fit is too wiggly/noisy

Solution: Increase fraction (try 0.7-0.8) or use cross-validation

§Issue: Fit is too smooth/missing features

Solution: Decrease fraction (try 0.2-0.3) or check for oversmoothing

§Issue: Outliers not being removed

Solution: Increase iterations (try 5) or switch to Talwar robustness

§Issue: Too many points marked as outliers

Solution: Reduce iterations or switch to Huber robustness

§Issue: Slow performance on large datasets

Solutions:

  • Enable delta optimization
  • Ensure std feature is enabled (default)
  • Consider streaming mode
  • Reduce robustness iterations

§Issue: Confidence intervals are too narrow/wide

Check:

  • Are you using the right interval type? (confidence vs prediction)
  • Is the residual variance estimate reasonable?
  • Try with different robustness settings

§Architecture

The crate is organized in a layered architecture:

Layer 5: builder, adapters/        (High-level API, execution modes)
         ↓
Layer 4: engine/                   (Iteration & convergence logic)
         ↓
Layer 3: algorithms/               (Regression, robustness, confidence)
         ↓
Layer 2: primitives/               (Traits, types, math utilities)
         ↓
Layer 1: foundation/               (Errors, constants)

This structure eliminates circular dependencies and makes the codebase maintainable and extensible.

§Feature Flags

  • std (default): Standard library support
    • Enables parallel execution via Rayon
    • Automatic dispatch for large inputs
    • Parallel cross-validation
    • 2-4x speedup on multi-core systems
    • Disable for no_std embedded environments (requires alloc)

§Cargo Configuration

# Default: includes std and parallelism
[dependencies]
lowess = "0.3"

# No-std mode (e.g., for embedded systems)
[dependencies]
lowess = { version = "0.3", default-features = false }

§References and Further Reading

Original papers:

  • Cleveland, W.S. (1979). “Robust Locally Weighted Regression and Smoothing Scatterplots”
  • Cleveland, W.S. (1981). “LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression”

Theory and applications:

  • Cleveland, W.S. & Devlin, S.J. (1988). “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting”
  • Fan, J. & Gijbels, I. (1996). “Local Polynomial Modelling and Its Applications”

Kernel density estimation:

  • Wand, M.P. & Jones, M.C. (1995). “Kernel Smoothing”

§no_std Support

This crate supports no_std environments with alloc:

[dependencies]
lowess = { version = "0.3", default-features = false }

Limitations without std:

  • No parallel processing (requires Rayon which needs std)
  • No streaming mode (requires threading)
  • All other features available (batch processing, all algorithms)

§Convenience Constructors

Three pre-configured builders for common use cases:

use lowess::Lowess;

// For noisy data with outliers
let result = Lowess::robust().fit(&x, &y)?;

// For speed on clean data
let result = Lowess::quick().fit(&x, &y)?;

// For comprehensive analysis
let result = Lowess::detailed().fit(&x, &y)?;

§License and Contributing

See the repository for license information and contribution guidelines.

Re-exports§

pub use algorithms::BisquareRobustness;
pub use algorithms::LowessVarianceEstimator;
pub use algorithms::StandardFitter;
pub use algorithms::WeightFunction;
pub use builder::LowessBuilder as Lowess;
pub use builder::LowessResult;
pub use builder::ProcessingMode;
pub use builder::ProcessingVariant;
pub use builder::RobustnessMethod;
pub use foundation::LowessError;
pub use foundation::Result;
pub use primitives::traits::FitContext;
pub use primitives::traits::PointFitter;
pub use primitives::traits::RobustnessUpdater;
pub use primitives::traits::StandardErrorComputer;
pub use primitives::traits::Window;
pub use primitives::traits::ZeroWeightFallback;

Modules§

adapters
Adapters layer - execution modes and integration.
algorithms
Algorithms layer - regression, robustness, and confidence calculations.
builder
Builder module - high-level fluent API.
engine
Engine layer - iteration orchestration and convergence.
foundation
Foundation layer - error types, constants, result type.
primitives
Primitives layer - core traits, types, and utilities.