Crate normality

Crate normality 

Source
Expand description

§normality

Crates.io Version Documentation License

A Rust crate for assessing the normality of a data sample. It provides several common statistical tests to determine if a set of data is likely drawn from a normal distribution.

All test implementations are generic and can work with f32 or f64 data types. The implementations are ported from well-established algorithms found in popular R packages.

§Implemented Tests

§Univariate Normality

§Multivariate Normality

§Installation

Either run cargo add normality or add the crate to your Cargo.toml:

[dependencies]
normality = "2"

# To enable parallel execution for faster performance on large data:
# normality = { version = "2", features = ["parallel"] }

§Example Usage

§Univariate

use normality::{shapiro_wilk, Error};

fn main() -> Result<(), Error> {
    // Sample data that is likely from a normal distribution
    let data = vec![-1.1, -0.8, -0.5, -0.2, 0.0, 0.2, 0.5, 0.8, 1.1, 1.3];

    // Perform the Shapiro-Wilk test
    let result = shapiro_wilk(data)?;

    println!("Shapiro-Wilk Test Results:");
    println!("  W-statistic: {:.4}", result.statistic);
    println!("  p-value: {:.4}", result.p_value);

    // Interpretation: A high p-value (e.g., > 0.05) suggests that the data
    // does not significantly deviate from a normal distribution.
    if result.p_value > 0.05 {
        println!("Conclusion: The sample is likely from a normal distribution.");
    } else {
        println!("Conclusion: The sample is not likely from a normal distribution.");
    }

    Ok(())
}

§Multivariate

§Using vec!
use nalgebra::matrix;
use normality::multivariate::{henze_zirkler, HenzeZirklerMethod};
use normality::Error;

fn main() -> Result<(), Error> {
    // 3D data from a multivariate normal distribution
    let data = vec![
        vec![0.1, 0.2, 0.3],
        vec![0.5, 0.1, 0.4],
        vec![-0.2, 0.3, 0.1],
        vec![0.0, 0.0, 0.0],
        vec![0.8, -0.5, 0.2],
        vec![-0.1, -0.1, -0.1],
    ];

    // Perform the Henze-Zirkler test
    let result = henze_zirkler(data, false, HenzeZirklerMethod::LogNormal)?;

    println!("Henze-Zirkler Test Results:");
    println!("  HZ-statistic: {:.4}", result.statistic);
    println!("  p-value: {:.4}", result.p_value);

    if result.p_value > 0.05 {
        println!("Conclusion: The sample is likely from a multivariate normal distribution.");
    }

    Ok(())
}
§Using nalgebra::matrix!
use nalgebra::matrix;
use normality::multivariate::{henze_zirkler, HenzeZirklerMethod};
use normality::Error;

fn main() -> Result<(), Error> {
    // 3D data from a multivariate normal distribution
    let data = matrix![0.1, 0.2, 0.3;
        0.5, 0.1, 0.4;
        -0.2, 0.3, 0.1;
        0.0, 0.0, 0.0;
        0.8, -0.5, 0.2;
        -0.1, -0.1, -0.1];

    // Perform the Henze-Zirkler test
    let result = henze_zirkler(data.row_iter().map(|row| row.into_iter().copied()), false, HenzeZirklerMethod::LogNormal)?;

    println!("Henze-Zirkler Test Results:");
    println!("  HZ-statistic: {:.4}", result.statistic);
    println!("  p-value: {:.4}", result.p_value);

    if result.p_value > 0.05 {
        println!("Conclusion: The sample is likely from a multivariate normal distribution.");
    }

    Ok(())
}

§Parallelism

This crate supports optional parallelism via the rayon crate. This can significantly improve performance for large datasets by parallelizing sorting and statistical calculations.

To enable parallelism, add the parallel feature to your Cargo.toml:

[dependencies]
normality = { version = "2", features = ["parallel"] }

When enabled, functions will automatically use parallel iterators and parallel sorting algorithms. No changes to your code are required.

§Accuracy

The accuracy of the implemented tests has been verified against their R equivalents. Running the integration tests for this crate requires a local installation of R and for the Rscript executable to be available in the system’s PATH.

§License

This project is licensed under the MIT License.

Modules§

multivariate
Multivariate normality tests.

Macros§

iter_if_parallel
sort_if_parallel

Structs§

Computation
A generic data structure to hold the results of a normality test.

Enums§

EnergyTestMethod
Specifies the method for p-value calculation in the Energy test.
Error
Represents errors that can occur during a normality test computation.

Traits§

Float
A convenience trait combining bounds frequently used for floating-point computations.

Functions§

anderson_darling
Performs the Anderson-Darling test for normality.
anscombe_glynn
Performs the Anscombe-Glynn kurtosis test for normality.
dagostino_k_squared
Performs D’Agostino’s K-squared test for skewness to assess normality.
energy_test
Performs the Energy test for univariate normality.
jarque_bera
Performs the Jarque-Bera test for normality.
lilliefors
Performs the Lilliefors (Kolmogorov-Smirnov) test for normality.
pearson_chi_squared
Performs the Pearson chi-squared test for normality.
shapiro_wilk
Performs the Shapiro-Wilk test for normality on a given sample of data.