Expand description
Functions and tools for evaluating polynomial fits and scoring models
This module provides functions and types to evaluate how well a polynomial model fits a dataset, and to score models when automatically selecting polynomial degrees.
§Model Fit / Regression Diagnostics
r_squared: Proportion of variance explained by the model. Higher is better (0 to 1).adjusted_r_squared: R² adjusted for number of predictors. Use to compare models of different degrees.residual_variance: Unbiased estimate of variance of errors after fitting. Used for confidence intervals.residual_normality: Likelihood that the residuals are normally distributed. Results near 0 or 1 indicate non-normality, higher results do not guarantee normality.
§Confidence Intervals
ConfidenceBand: Represents a confidence interval with lower and upper bounds, determined by a given probability.Confidence: Enum for common confidence levels (68%, 95%, 99%).
§Model Selection
DegreeBound: Enum to specify constraints on polynomial degree when automatically selecting it.
§Error Metrics
mean_absolute_error: Average absolute difference between observed and predicted values. Lower is better.mean_squared_error: Average squared difference between observed and predicted values. Lower is better.root_mean_squared_error: Square root of MSE, giving error in same units as observed values. Lower is better.huber_log_likelihood: Robust error metric less sensitive to outliers. Higher is better.
§Descriptive Statistics
mean: Arithmetic mean of a dataset.stddev_and_mean: Standard deviation and mean of a dataset.median_absolute_deviation: Average absolute deviation from the mean.spread: Difference between maximum and minimum values in a dataset.skewness_and_kurtosis: Measures of asymmetry and “tailedness” of the distribution.
§Model Fit vs Model Selection
-
Model Fit: How well does the model explain the data? Use
r_squaredorresidual_variance.- Returns a value between 0 and 1.
- 0 = model explains none of the variance.
- 1 = model perfectly fits the data.
-
Model selection: Choosing the best polynomial degree to avoid overfitting.
- Use
crate::score. - Options:
AIC: Akaike Information Criterion, more lenient penalty for complexity.BIC: Bayesian Information Criterion, stricter penalty for complexity.
- Lower scores are better; but not a measure of goodness-of-fit outside the context of model selection.
- Use
§Examples
use polyfit::statistics::r_squared;
use polyfit::score::{Aic, ModelScoreProvider};
let y = vec![1.0, 2.0, 3.0];
let y_fit = vec![1.1, 1.9, 3.05];
// Goodness-of-fit
let r2 = r_squared(y.iter().copied(), y_fit.iter().copied());
println!("R² = {r2}");
// Model scoring
let score = Aic.score(y.into_iter(), y_fit.into_iter(), 3.0);
println!("AIC score = {score}");Structs§
- Confidence
Band - Represents a predicted range for model outputs at a given confidence level.
The band contains the central estimate (
value) and the upper and lower bounds. - Derivation
Error - Error information when a derivative check fails. See
is_derivative - Domain
Normalizer - Normalizes values from one range to another.
- Uncertain
Value - A value with an associated amount of uncertainty, represented by a mean and a standard deviation.
Enums§
- Confidence
- Standard Z-score confidence levels for fitted models.
- CvStrategy
- Strategy for selecting the number of folds (k) in k-fold cross-validation.
- Degree
Bound - In order to find the best fitting polynomial degree, we need to limit the maximum degree considered. The choice of degree bound can significantly impact the model’s performance and its ability to generalize.
- Tolerance
- Specifies a tolerance level for numerical comparisons.
Functions§
- adjusted_
r_ squared - Computes the adjusted R-squared value.
- bayes_
factor - Computes the Bayes factor between two polynomial models.
- cross_
validation_ split - Splits the data into k folds for cross-validation based on the specified strategy.
- folded_
rmse - Computes the Root Mean Square Error (RMSE) for the given data and model predictions, by splitting the data into folds.
- huber_
const - Returns the standard Huber constant (1.345).
- huber_
log_ likelihood - Computes the log-likelihood of the Huber loss for a set of data points.
- huber_
loss - Computes the Huber loss for a single residual.
- is_
derivative - Checks if
f_primeis the derivative of polynomialf. - mean
- Computes the arithmetic mean of a sequence of values.
- mean_
absolute_ error - Computes the mean absolute error (MAE) between two sets of values.
- mean_
squared_ error - Computes the mean squared error (MSE) between two sets of values.
- median
- Computes the median of a sequence of values.
- median_
absolute_ deviation - Computes the median absolute deviation (MAD) between two sets of values.
- median_
squared_ deviation - Computes the median squared deviation (MSD) between two sets of values.
- r_
squared - Calculate the R-squared value for a set of data.
- residual_
normality - Returns a score measuring if the residuals can be normally distributed.
- residual_
variance - Computes the residual variance of a model’s predictions.
- robust_
r_ squared - Uses huber loss to compute a robust R-squared value.
- root_
mean_ squared_ error - Computes the root mean squared error (RMSE) between two sets of values.
- skewness_
and_ kurtosis - Computes the skewness and excess kurtosis of a dataset.
- spread
- Computes the range (spread) of a dataset
- stddev_
and_ mean - Computes the standard deviation of a sequence of values.