lowess 0.1.0

LOWESS (Locally Weighted Scatterplot Smoothing) implementation in Rust
Documentation
# lowess

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

A production-grade Rust implementation of LOWESS (Locally Weighted Scatterplot Smoothing) with robust statistics, parallel execution, and streaming capabilities for large datasets.

## What is LOWESS?

LOWESS is a non-parametric regression method that fits smooth curves through scatter plots using locally weighted polynomial regression. Originally developed by Cleveland (1979), it's widely used in:

- 🧬 **Genomics & Bioinformatics**: DNA methylation analysis, ChIP-seq normalization, gene expression smoothing
- 📊 **Time Series Analysis**: Trend extraction, seasonal adjustment, noise reduction  
- 🔬 **Scientific Computing**: Exploratory data analysis, visualization, preprocessing
- 📈 **Signal Processing**: Baseline correction, peak detection, data denoising

## Features

### Core Capabilities
- **Robust Smoothing**: Iteratively reweighted least squares (IRLS) with multiple kernel functions
- 🎯 **Confidence & Prediction Intervals**: Per-point standard errors and statistical intervals
- 🔧 **Multiple Kernels**: Tricube (default), Epanechnikov, Gaussian, Uniform, Quartic, Cosine, Triangle
- 📏 **Automatic Fraction Selection**: Cross-validation (simple RMSE, k-fold, LOOCV)
-**Delta Optimization**: Fast interpolation for dense data

### Advanced Features
- 🚀 **Parallel Execution**: Optional multi-threaded CV and fitting (feature = `"parallel"`)
- 💾 **Streaming Processing**: Memory-efficient variants for large datasets
- 📊 **Rich Diagnostics**: RMSE, MAE, R², AIC, AICc, effective degrees of freedom
- 🔢 **ndarray Integration**: Seamless conversion to/from ndarray (feature = `"ndarray"`)
- 🎛️ **no_std Compatible**: Works in embedded environments with `alloc`

### Quality & Reliability
- ✅ Type-safe builder pattern with compile-time validation
- 🛡️ Comprehensive error handling (no panics in release builds)
- 📐 Numerically stable with defensive fallbacks
- 🔁 Deterministic outputs for reproducible research
- 📚 Extensively documented with production guidance

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
lowess = "0.1"

# Optional features
[dependencies.lowess]
version = "0.1"
features = ["parallel", "ndarray"]
```

### Feature Flags

| Feature | Description | Dependencies |
|---------|-------------|--------------|
| `parallel` | Enable parallel cross-validation and fitting | `rayon` |
| `ndarray` | Integration with ndarray arrays | `ndarray` |
| `std` | Standard library support (enabled by default) | - |

## Quick Start

### Basic Smoothing

```rust
use lowess::Lowess;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];

// Simple smoothing with defaults (fraction=0.67, no robustness iterations)
let result = Lowess::new()
    .fit(&x, &y)
    .unwrap();

println!("Smoothed values: {:?}", result.y);
```

### Robust Smoothing with Outliers

```rust
use lowess::Lowess;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0];
let y = vec![2.0, 4.0, 6.0, 100.0, 10.0, 12.0]; // outlier at index 3

// Robust smoothing with 5 IRLS iterations
let result = Lowess::robust()
    .fit(&x, &y)
    .unwrap();
```

### With Confidence Intervals

```rust
use lowess::Lowess;

let result = Lowess::new()
    .fraction(0.5)
    .iterations(3)
    .with_confidence_intervals(0.95)
    .with_all_diagnostics()
    .fit(&x, &y)
    .unwrap();

println!("RMSE: {:?}", result.diagnostics.unwrap().rmse);
println!("Lower CI: {:?}", result.confidence_lower);
println!("Upper CI: {:?}", result.confidence_upper);
```

### Automatic Fraction Selection

```rust
use lowess::Lowess;

// Cross-validate to find optimal smoothing fraction
let candidates = vec![0.2, 0.3, 0.5, 0.7];
let result = Lowess::new()
    .cross_validate(&candidates)
    .fit(&x, &y)
    .unwrap();

println!("Selected fraction: {}", result.fraction_used);
println!("CV scores: {:?}", result.cv_scores);
```

### Streaming for Large Datasets

```rust
use lowess::{Lowess, StreamingLowess};

// Process data in chunks to avoid memory issues
let base_config = Lowess::new().fraction(0.3).iterations(2);
let mut streaming = StreamingLowess::new(base_config, 1000);

// Process chunks
let chunk1_result = streaming.process_chunk(&x_chunk1, &y_chunk1)?;
let chunk2_result = streaming.process_chunk(&x_chunk2, &y_chunk2)?;
```

### Online Smoothing (Sliding Window)

```rust
use lowess::{Lowess, OnlineLowess};

// Real-time smoothing with sliding window
let base_config = Lowess::new().fraction(0.2);
let mut online = OnlineLowess::new(base_config, 100); // window size

for (&xi, &yi) in x.iter().zip(y.iter()) {
    let smoothed = online.update(xi, yi)?;
    println!("Current smooth value: {}", smoothed);
}
```

## API Overview

### Builder Pattern

```rust
Lowess::new()
    .fraction(0.5)              // Smoothing span (0, 1]
    .iterations(3)              // Robustness iterations
    .delta(Some(0.01))          // Interpolation threshold
    .kernel(WeightFunction::Epanechnikov)
    .with_confidence_intervals(0.95)
    .with_prediction_intervals(0.95)
    .with_all_diagnostics()
    .cross_validate(&[0.3, 0.5, 0.7])
    .auto_converge(1e-4)        // Early stopping
    .max_iterations(20)
    .fit(&x, &y)?
```

### Result Structure

```rust
pub struct LowessResult<T> {
    pub x: Vec<T>,                          // Sorted x values
    pub y: Vec<T>,                          // Smoothed y values
    pub standard_errors: Option<Vec<T>>,    // Per-point SE
    pub confidence_lower: Option<Vec<T>>,   // CI lower bounds
    pub confidence_upper: Option<Vec<T>>,   // CI upper bounds
    pub prediction_lower: Option<Vec<T>>,   // PI lower bounds
    pub prediction_upper: Option<Vec<T>>,   // PI upper bounds
    pub residuals: Option<Vec<T>>,          // Residuals
    pub robustness_weights: Option<Vec<T>>, // Final IRLS weights
    pub diagnostics: Option<Diagnostics<T>>,// Fit statistics
    pub iterations_used: Option<usize>,     // Actual iterations
    pub fraction_used: T,                   // Final fraction
    pub cv_scores: Option<Vec<T>>,          // CV RMSE scores
}
```

### Diagnostics

```rust
pub struct Diagnostics<T> {
    pub rmse: T,                    // Root mean squared error
    pub mae: T,                     // Mean absolute error
    pub r_squared: T,               // Coefficient of determination
    pub aic: T,                     // Akaike Information Criterion
    pub aicc: T,                    // Corrected AIC (small samples)
    pub effective_df: T,            // Effective degrees of freedom
    pub count_downweighted: usize,  // Points with low robustness weight
}
```

## Algorithm Details

### LOWESS Procedure

1. **Sorting**: Data sorted by x-values for deterministic windowing
2. **Local Neighborhoods**: For each point, select k nearest neighbors (k = fraction × n)
3. **Kernel Weighting**: Apply distance-based kernel (default: tricube)
4. **Weighted Regression**: Fit locally weighted least squares
5. **Robustness Iterations** (optional): Reweight using bisquare function on residuals
6. **Delta Interpolation** (optional): Linearly interpolate between anchor points

### Weight Functions

```rust
// Tricube (Cleveland's default)
w(d) = (1 - |d|³)³  for |d| < 1, else 0

// Epanechnikov  
w(d) = 1 - d²  for |d| < 1, else 0

// Gaussian
w(d) = exp(-d²/2)
```

### Robustness Weighting (Bisquare)

```rust
// After initial fit, compute residuals r_i
// Estimate scale: s = MAD(residuals) × 1.4826
// Robustness weights:
w_i = (1 - (r_i / (6s))²)²  for |r_i| < 6s, else 0
```

## Performance

### Complexity

- **Basic**: O(n²) for n data points
- **With delta**: Effectively O(n × k) where k ≪ n for dense data
- **Parallel CV**: Linear speedup with available cores

## Numerical Stability

The implementation includes several safeguards:

- **Scale estimation fallbacks**: MAD → mean absolute residual when MAD ≈ 0
- **Minimum tuned scales**: Clamped to ε > 0 to avoid division by zero
- **Zero-weight neighborhoods**: Configurable fallback policies
- **Uniform weight fallback**: When all kernel weights evaluate to zero
- **Auto-convergence**: Prevents excessive iterations in stable fits

## Comparison with R

This implementation is equivalent to R's `stats::lowess()`:

```r
# R
result <- lowess(x, y, f = 2/3, iter = 3, delta = 0.01)
```

```rust
// Rust (equivalent)
let result = Lowess::new()
    .fraction(2.0/3.0)
    .iterations(3)
    .delta(Some(0.01))
    .fit(&x, &y)?;
```

Key differences:
- ✨ More kernel options beyond tricube
- 📊 Statistical intervals and diagnostics
- 🔧 Cross-validation built-in
- 🚀 Parallel execution support
- 💾 Streaming variants for large data

## Error Handling

```rust
use lowess::{Lowess, LowessError};

match Lowess::new().fit(&x, &y) {
    Ok(result) => println!("Success: {:?}", result.y),
    Err(LowessError::EmptyInput) => println!("Empty arrays"),
    Err(LowessError::MismatchedInputs { x_len, y_len }) => 
        println!("Length mismatch: x={}, y={}", x_len, y_len),
    Err(LowessError::InvalidFraction(f)) => 
        println!("Bad fraction: {}", f),
    Err(e) => println!("Other error: {}", e),
}
```

## Testing

```bash
# Run all tests
cargo test

# Run with all features
cargo test --all-features

# Run benchmarks
cargo bench

# Check documentation
cargo doc --open
```

## Production Usage Guidelines

### Best Practices

1. **Pre-clean inputs**: Remove NaNs/infs before calling `fit()`
2. **Sort data**: Pre-sort x for reproducible window semantics
3. **Choose appropriate fraction**: 
   - 0.2-0.3 for very local features
   - 0.5-0.7 for general trends
   - Use cross-validation when uncertain
4. **Enable diagnostics**: Monitor RMSE, effective_df in production
5. **Use delta**: Enable for dense data (>1000 points)
6. **Tune robustness**: 2-3 iterations sufficient for most data

### Monitoring

```rust
let result = Lowess::new()
    .with_all_diagnostics()
    .fit(&x, &y)?;

if let Some(diag) = result.diagnostics {
    if diag.effective_df < 2.0 {
        log::warn!("Low effective degrees of freedom: {}", diag.effective_df);
    }
    if diag.count_downweighted > x.len() / 2 {
        log::warn!("Over 50% points downweighted - check for systematic issues");
    }
}
```

## Contributing

Contributions are welcome! Areas of interest:

- Additional kernel functions
- More cross-validation strategies  
- GPU acceleration
- Python bindings (PyO3)
- Additional examples and tutorials

Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on reporting bugs, submitting pull requests, testing, and the development workflow.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## References

### Academic Papers

- Cleveland, W. S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". *Journal of the American Statistical Association* 74(368): 829-836. [DOI: 10.2307/2286407]https://doi.org/10.2307/2286407

- Cleveland, W. S. (1981). "LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression". *The American Statistician* 35(1): 54. [DOI: 10.2307/2683591]https://doi.org/10.2307/2683591

### Related Implementations

- [R stats::lowess]https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lowess.html - Original R implementation
- [Python statsmodels.lowess]https://www.statsmodels.org/stable/generated/statsmodels.nonparametric.smoothers_lowess.lowess.html

## Citation

If you use this crate in academic work, please cite:

```bibtex
@software{lowess_rust,
  author = {Valizadeh, Amir},
  title = {lowess: Production-grade LOWESS smoothing for Rust},
  year = {2025},
  url = {https://github.com/thisisamirv/lowess}
}
```

## Author

**Amir Valizadeh**  
📧 [thisisamirv@gmail.com](mailto:thisisamirv@gmail.com)  

## Acknowledgments

- Based on Cleveland's original LOWESS algorithm
- Inspired by implementations in R and Python statsmodels
- Built with the Rust scientific computing ecosystem

---