Expand description
LOWESS (Locally Weighted Scatterplot Smoothing) for Rust.
This crate provides a fast, robust, and production-oriented LOWESS (locally weighted scatterplot smoothing) implementation. It is intended for analysis pipelines, batch jobs, and services where determinism, safety, observability, and configurable performance trade-offs are required.
Key capabilities
- Robust smoothing using iteratively reweighted least squares (IRLS).
- Multiple kernel choices (tricube default, epanechnikov, gaussian, uniform, quartic, cosine, triangle).
- Per-point standard errors, confidence intervals (mean) and prediction intervals (new observations).
- Automatic fraction selection via cross-validation (simple RMSE, k-fold, LOOCV) and optional parallel CV.
- Delta-based interpolation fast-path for dense inputs to reduce compute.
- Memory-efficient variants for datasets too large to fit in memory via the streaming/online/chunked backends.
- Optional parallel execution (feature = “parallel”) via Rayon.
- Optional ndarray convenience adapters (feature = “ndarray”).
- no_std-compatible with alloc for embedded or constrained environments.
Concepts and parameters (summary)
- x, y: aligned input slices of the independent and dependent variables.
- Caller responsibility: remove NaNs/infs and prefer pre-sorting
xfor reproducible window semantics. The builder also offers helpers to sort.
- Caller responsibility: remove NaNs/infs and prefer pre-sorting
- fraction (span): smoothing fraction ∈ (0, 1]. Controls local window size.
- Typical default: 0.67. Smaller fractions produce less smoothing.
- iterations / niter: robustness IRLS iterations (usize). 0 disables IRLS.
- Typical values: 0 (fast), 2–5 (robust). Auto-convergence can stop early.
- delta: interpolation distance threshold (T or
Option<T>).- delta <= 0 disables interpolation (fit every point).
- None resolves to a conservative default (≈1% of x-range).
- Use delta on dense inputs to interpolate between anchor fits and save time.
- weight_function: kernel choice. Tricube recommended for general use.
- interval_level and interval_type: compute confidence and/or prediction intervals at the specified probability (e.g. 0.95).
- cv_fractions and cv_method: candidate fractions and CV strategy for automatic selection. Returns cv_scores on success.
- auto_convergence and max_iterations: tolerance and cap for stopping IRLS early based on maximum change in fitted values.
- compute_diagnostics / compute_residuals / compute_robustness_weights: booleans controlling what additional outputs are produced.
- parallel feature: enable multithreaded CV and fitting for large
n. - zero_weight_fallback: policy for neighborhoods with zero kernel weight:
- UseLocalMean, ReturnOriginal, or ReturnNone (propagate failure).
Outputs (LowessResult)
- x: sorted independent variable values (builder sorts inputs).
- y: smoothed values aligned with x.
- standard_errors: per-point SE when requested.
- confidence_lower/upper, prediction_lower/upper: optional interval bounds.
- residuals: optional residual vector.
- robustness_weights: optional final IRLS weights.
- diagnostics: optional struct with RMSE, MAE, R², AIC, AICc, effective df.
- iterations_used, fraction_used, cv_scores: metadata for monitoring/telemetry.
Error handling
- Returns Result<T, LowessError> with explicit variants for common failures: EmptyInput, MismatchedInputs, InvalidFraction, InvalidDelta, InvalidNumericValue, TooFewPoints, InvalidConfidenceLevel.
- Functions are defensive: degenerate situations return safe defaults rather than panicking in release builds. Debug assertions exist for development.
Determinism & numeric safety
- Sorting, stable default choices, and avoidance of global mutable state provide deterministic outputs for a fixed configuration and inputs.
- Numerics: conservative fallbacks for near-zero scales, uniform fallback when all kernel weights evaluate to zero, and clamped tuned-scales avoid divide-by-zero issues.
Performance & operational guidance
- For large datasets, enable “parallel” and pre-allocate buffers to reduce allocation overhead across repeated calls.
- Use delta for dense inputs to reduce per-point regression costs.
- Use cross-validation sparingly on very large candidate grids; prefer coarse-to-fine search or parallel CV when available.
- Monitor diagnostics (RMSE, effective sample size, count_downweighted) in production to detect pathological fits.
Examples
- Basic smoothing
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];
let result = Lowess::new().fraction(0.5).iterations(3).fit(&x, &y).unwrap();
println!("Smoothed y: {:?}", result.y);- With 95% confidence intervals and diagnostics
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];
let result = Lowess::new()
.fraction(0.5)
.with_confidence_intervals(0.95)
.with_all_diagnostics()
.fit(&x, &y)
.unwrap();
println!("RMSE: {:?}", result.diagnostics.map(|d| d.rmse));- Cross-validated fraction selection (parallel-enabled for large n)
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.1, 5.9, 8.2, 9.8];
let candidate = vec![0.2, 0.3, 0.5, 0.7];
let result = Lowess::new()
.cross_validate(&candidate)
.fit(&x, &y)
.unwrap();
println!("Selected fraction: {}", result.fraction_used);- Streaming / online / chunked processing (use ProcessingMode)
ⓘ
use lowess::{Lowess, ProcessingMode};
// example data
let x = (0..50).map(|i| i as f64).collect::<Vec<_>>();
let y = x.iter().map(|v| 2.0 * v + 1.0).collect::<Vec<_>>();
// Obtain a processing-mode variant. Avoid calling `.build()` on the
// wrapped mode-specific builder inside doctests because some backends
// validate inter-dependent defaults (e.g. overlap < chunk_size) which
// can cause doctest failures.
let variant = Lowess::new()
.fraction(0.5)
.iterations(1)
.for_mode(ProcessingMode::Streaming)
.chunk_size(10);
match variant {
// Batch contains the standard Lowess builder — call `.fit(...)` directly.
lowess::builder::ProcessingVariant::Batch(batch_builder) => {
let result = batch_builder.fit(&x, &y).expect("fit");
println!("Batch result length: {}", result.x.len());
}
// Streaming contains the streaming-mode builder — in real code call
// `stream_builder.build()?` to obtain the processor and use its methods.
lowess::builder::ProcessingVariant::Streaming(_stream_builder) => {
// mode-specific builder is available here; avoid calling `.build()` in doctests.
}
_ => {
// Online / Chunked variants follow the same pattern.
}
}- Auto-convergence example
use lowess::Lowess;
let x = vec![1.0, 2.0, 3.0, 4.0];
let y = vec![2.0, 4.1, 5.9, 8.2];
let r = Lowess::new().auto_converge(1e-4).max_iterations(20).fit(&x, &y).unwrap();
if let Some(iters) = r.iterations_used { println!("Converged after {}", iters); }API tips and best practices
- Pre-clean inputs (remove NaNs/infs) and sort
xfor deterministic windowing. - Choose sensible defaults in the builder for production: use delta for dense data, modest iterations (2–3) for robustness, and enable diagnostics in scheduled batch jobs.
- When using parallel execution, benchmark recommended_chunk_size() and pre-allocate per-call buffers for throughput-sensitive workloads.
See module-level documentation (builder, core, regression, kernel, confidence, robustness, utils, parallel) for function-level argument descriptions, return conventions, and panic/assert behavior.
Re-exports§
pub use builder::Diagnostics;pub use builder::IntervalType;pub use builder::LowessBuilder as Lowess;pub use builder::LowessResult;pub use builder::ProcessingMode;pub use builder::ProcessingVariant;pub use kernel::WeightFunction;pub use kernel::WeightFunctionInfo;
Modules§
- builder
- LOWESS Builder Pattern
- confidence
- Confidence intervals, prediction intervals and standard error computation for LOWESS
- core
- Core LOWESS algorithm implementation.
- kernel
- Kernel weight functions for LOWESS smoothing.
- regression
- Local regression fitting for LOWESS smoothing.
- robustness
- Robustness weighting for outlier-resistant LOWESS smoothing.
- streaming
- Streaming and online LOWESS for very large datasets.
- utils
- Utility functions for LOWESS smoothing.
Enums§
- Lowess
Error - LOWESS error types.
Functions§
- lowess
- Perform LOWESS smoothing with default parameters.
- lowess_
robust - Perform robust LOWESS smoothing (5 iterations).
- lowess_
with_ fraction - Perform LOWESS smoothing with custom fraction.
Type Aliases§
- Result
- Result type for LOWESS operations.