Expand description
§normality
A Rust crate for assessing the normality of a data sample. It provides several common statistical tests to determine if a set of data is likely drawn from a normal distribution.
All test implementations are generic and can work with f32 or f64 data types. The implementations are ported from well-established algorithms found in popular R packages.
§Implemented Tests
§Univariate Normality
- Shapiro-Wilk Test
- Lilliefors (Kolmogorov-Smirnov) Test
- Anderson-Darling Test
- Jarque-Bera Test
- Pearson Chi-squared Test
- D’Agostino’s K-squared Test
- Anscombe-Glynn Kurtosis Test
- Energy Test
§Multivariate Normality
§Installation
Either run cargo add normality or add the crate to your Cargo.toml:
[dependencies]
normality = "2"
# To enable parallel execution for faster performance on large data:
# normality = { version = "2", features = ["parallel"] }§Example Usage
§Univariate
use normality::{shapiro_wilk, Error};
fn main() -> Result<(), Error> {
// Sample data that is likely from a normal distribution
let data = vec![-1.1, -0.8, -0.5, -0.2, 0.0, 0.2, 0.5, 0.8, 1.1, 1.3];
// Perform the Shapiro-Wilk test
let result = shapiro_wilk(data)?;
println!("Shapiro-Wilk Test Results:");
println!(" W-statistic: {:.4}", result.statistic);
println!(" p-value: {:.4}", result.p_value);
// Interpretation: A high p-value (e.g., > 0.05) suggests that the data
// does not significantly deviate from a normal distribution.
if result.p_value > 0.05 {
println!("Conclusion: The sample is likely from a normal distribution.");
} else {
println!("Conclusion: The sample is not likely from a normal distribution.");
}
Ok(())
}§Multivariate
§Using vec!
use nalgebra::matrix;
use normality::multivariate::{henze_zirkler, HenzeZirklerMethod};
use normality::Error;
fn main() -> Result<(), Error> {
// 3D data from a multivariate normal distribution
let data = vec![
vec![0.1, 0.2, 0.3],
vec![0.5, 0.1, 0.4],
vec![-0.2, 0.3, 0.1],
vec![0.0, 0.0, 0.0],
vec![0.8, -0.5, 0.2],
vec![-0.1, -0.1, -0.1],
];
// Perform the Henze-Zirkler test
let result = henze_zirkler(data, false, HenzeZirklerMethod::LogNormal)?;
println!("Henze-Zirkler Test Results:");
println!(" HZ-statistic: {:.4}", result.statistic);
println!(" p-value: {:.4}", result.p_value);
if result.p_value > 0.05 {
println!("Conclusion: The sample is likely from a multivariate normal distribution.");
}
Ok(())
}§Using nalgebra::matrix!
use nalgebra::matrix;
use normality::multivariate::{henze_zirkler, HenzeZirklerMethod};
use normality::Error;
fn main() -> Result<(), Error> {
// 3D data from a multivariate normal distribution
let data = matrix![0.1, 0.2, 0.3;
0.5, 0.1, 0.4;
-0.2, 0.3, 0.1;
0.0, 0.0, 0.0;
0.8, -0.5, 0.2;
-0.1, -0.1, -0.1];
// Perform the Henze-Zirkler test
let result = henze_zirkler(data.row_iter().map(|row| row.into_iter().copied()), false, HenzeZirklerMethod::LogNormal)?;
println!("Henze-Zirkler Test Results:");
println!(" HZ-statistic: {:.4}", result.statistic);
println!(" p-value: {:.4}", result.p_value);
if result.p_value > 0.05 {
println!("Conclusion: The sample is likely from a multivariate normal distribution.");
}
Ok(())
}§Parallelism
This crate supports optional parallelism via the rayon crate. This can significantly improve performance for large datasets by parallelizing sorting and statistical calculations.
To enable parallelism, add the parallel feature to your Cargo.toml:
[dependencies]
normality = { version = "2", features = ["parallel"] }When enabled, functions will automatically use parallel iterators and parallel sorting algorithms. No changes to your code are required.
§Accuracy
The accuracy of the implemented tests has been verified against their R equivalents. Running the integration tests for this crate requires a local installation of R and for the Rscript executable to be available in the system’s PATH.
§License
This project is licensed under the MIT License.
Modules§
- multivariate
- Multivariate normality tests.
Macros§
Structs§
- Computation
- A generic data structure to hold the results of a normality test.
Enums§
- Energy
Test Method - Specifies the method for p-value calculation in the Energy test.
- Error
- Represents errors that can occur during a normality test computation.
Traits§
- Float
- A convenience trait combining bounds frequently used for floating-point computations.
Functions§
- anderson_
darling - Performs the Anderson-Darling test for normality.
- anscombe_
glynn - Performs the Anscombe-Glynn kurtosis test for normality.
- dagostino_
k_ squared - Performs D’Agostino’s K-squared test for skewness to assess normality.
- energy_
test - Performs the Energy test for univariate normality.
- jarque_
bera - Performs the Jarque-Bera test for normality.
- lilliefors
- Performs the Lilliefors (Kolmogorov-Smirnov) test for normality.
- pearson_
chi_ squared - Performs the Pearson chi-squared test for normality.
- shapiro_
wilk - Performs the Shapiro-Wilk test for normality on a given sample of data.