Crate anofox_forecast

Crate anofox_forecast 

Source
Expand description

§anofox-forecast

Time series forecasting library for Rust.

Provides 35+ forecasting models including ARIMA, ETS, Theta, and baseline methods, along with seasonality decomposition (STL/MSTL), changepoint detection, and outlier detection.

For comprehensive periodicity detection, see the fdars crate.

§Architecture Decisions

§Cross-Validation Split (ts_cv_split)

Time series cross-validation with data leakage prevention is implemented in the forecast-extension DuckDB extension rather than in this crate. This section documents the rationale.

§Why CV Split is Not Part of TimeSeries

The TimeSeries struct represents a single time series with its values, timestamps, and metadata. Cross-validation splitting was considered as a method on TimeSeries but was intentionally kept separate for these reasons:

  1. Cross-series coordination: Fold generation is a global operation across multiple series, not a per-series operation. CV requires consistent fold boundaries across all series in a dataset.

  2. External feature handling: Unknown future features like stockout flags or segment_id changes are external columns that don’t belong in the series data model. These require schema-aware handling at the data layer.

  3. Data manipulation efficiency: DuckDB’s vectorized execution is more efficient for the bulk data operations (filtering, joining, filling) that CV split requires.

  4. Schema flexibility: SQL macros can handle arbitrary column schemas without requiring Rust to know the schema at compile time.

§Component Distribution

ComponentLocationRationale
Fold generationDuckDB extensionCross-series coordination, global operation
Train/test assignmentSQL/DuckDBSimple comparison, vectorized execution
Unknown feature fillingRust UDF via DuckDBPer-series state tracking
OrchestrationSQL macroFlexible, schema-agnostic

§Using CV Functionality

For time series cross-validation with data leakage prevention, use the ts_cv_split function from the forecast-extension:

-- Example: Generate CV folds with unknown feature handling
SELECT * FROM ts_cv_split(
    my_data,
    n_splits := 3,
    horizon := 7,
    unknown_features := ['stockout', 'segment_id']
);

See forecast-extension#54 for implementation details.

§Future Considerations

If per-series CV semantics become necessary in Rust (e.g., for standalone use without DuckDB), the fold generation logic could be extracted:

pub struct CvFoldGenerator {
    n_splits: usize,
    horizon: usize,
    gap: usize,
}

impl CvFoldGenerator {
    pub fn folds(&self, series_len: usize) -> Vec<usize> {
        // Returns training end indices for each fold
    }
}

This would allow fold generation to be shared while keeping data manipulation in the appropriate layer (SQL for multi-series datasets, Rust for single-series use).

Re-exports§

pub use error::ForecastError;
pub use error::Result;

Modules§

changepoint
Changepoint detection algorithms.
core
Core data structures for time series forecasting.
detection
Detection utilities for time series analysis.
error
Error types for the anofox-forecast library.
features
Time series feature extraction.
models
Forecasting models.
postprocess
Probabilistic forecasting via postprocessing.
prelude
seasonality
Seasonality detection and decomposition.
simd
SIMD-accelerated primitives via Trueno (f32 internal).
transform
Data transformations for time series.
utils
Utility functions for forecasting models.
validation
Statistical validation tests for time series models.