Expand description
§anofox-forecast
Time series forecasting library for Rust.
Provides 35+ forecasting models including ARIMA, ETS, Theta, and baseline methods, along with seasonality decomposition (STL/MSTL), changepoint detection, and outlier detection.
For comprehensive periodicity detection, see the fdars crate.
§Architecture Decisions
§Cross-Validation Split (ts_cv_split)
Time series cross-validation with data leakage prevention is implemented in the forecast-extension DuckDB extension rather than in this crate. This section documents the rationale.
§Why CV Split is Not Part of TimeSeries
The TimeSeries struct represents a single time series with
its values, timestamps, and metadata. Cross-validation splitting was considered as a
method on TimeSeries but was intentionally kept separate for these reasons:
-
Cross-series coordination: Fold generation is a global operation across multiple series, not a per-series operation. CV requires consistent fold boundaries across all series in a dataset.
-
External feature handling: Unknown future features like
stockoutflags orsegment_idchanges are external columns that don’t belong in the series data model. These require schema-aware handling at the data layer. -
Data manipulation efficiency: DuckDB’s vectorized execution is more efficient for the bulk data operations (filtering, joining, filling) that CV split requires.
-
Schema flexibility: SQL macros can handle arbitrary column schemas without requiring Rust to know the schema at compile time.
§Component Distribution
| Component | Location | Rationale |
|---|---|---|
| Fold generation | DuckDB extension | Cross-series coordination, global operation |
| Train/test assignment | SQL/DuckDB | Simple comparison, vectorized execution |
| Unknown feature filling | Rust UDF via DuckDB | Per-series state tracking |
| Orchestration | SQL macro | Flexible, schema-agnostic |
§Using CV Functionality
For time series cross-validation with data leakage prevention, use the ts_cv_split
function from the forecast-extension:
-- Example: Generate CV folds with unknown feature handling
SELECT * FROM ts_cv_split(
my_data,
n_splits := 3,
horizon := 7,
unknown_features := ['stockout', 'segment_id']
);See forecast-extension#54 for implementation details.
§Future Considerations
If per-series CV semantics become necessary in Rust (e.g., for standalone use without DuckDB), the fold generation logic could be extracted:
pub struct CvFoldGenerator {
n_splits: usize,
horizon: usize,
gap: usize,
}
impl CvFoldGenerator {
pub fn folds(&self, series_len: usize) -> Vec<usize> {
// Returns training end indices for each fold
}
}This would allow fold generation to be shared while keeping data manipulation in the appropriate layer (SQL for multi-series datasets, Rust for single-series use).
Re-exports§
pub use error::ForecastError;pub use error::Result;
Modules§
- changepoint
- Changepoint detection algorithms.
- core
- Core data structures for time series forecasting.
- detection
- Detection utilities for time series analysis.
- error
- Error types for the anofox-forecast library.
- features
- Time series feature extraction.
- models
- Forecasting models.
- postprocess
- Probabilistic forecasting via postprocessing.
- prelude
- seasonality
- Seasonality detection and decomposition.
- simd
- SIMD-accelerated primitives via Trueno (f32 internal).
- transform
- Data transformations for time series.
- utils
- Utility functions for forecasting models.
- validation
- Statistical validation tests for time series models.