ndarray-glm 0.0.12

Performs regression for generalized linear models using IRLS on data stored in arrays.

Coverage
77.42%
48 out of 62 items documented1 out of 31 items with examples
Size
Source code size: 244.75 kB This is the summed size of all the files inside the crates.io package for this release.
Documentation size: 14.31 MB This is the summed size of all files generated by rustdoc for all configured targets
Ø build duration
this release: 42s Average build duration of successful builds.
all releases: 42s Average build duration of successful builds in releases after 2024-10-23.
Links
felix-clark/ndarray-glm
27 0 17
crates.io
Dependencies
Versions
Owners

ndarray-glm

Rust library for solving linear, logistic, and generalized linear models through iteratively reweighted least squares, using the ndarray-linalg module.

Downloads

Status

This package is in alpha and the interface could undergo changes. Even the return value of certain functions may change from one release to the next. Correctness is not guaranteed.

The regression algorithm uses iteratively re-weighted least squares (IRLS) with a step-halving procedure applied when the next iteration of guesses does not increase the likelihood.

Suggestions (via issues) and pull requests are welcome.

Prerequisites

The recommended approach is to use a system BLAS implementation. For instance, to install OpenBLAS on Debian/Ubuntu:

sudo apt update && sudo apt install -y libopenblas-dev

Then use this crate with the openblas-system feature.

To use an alternative backend or to build a static BLAS implementation, refer to the ndarray-linalg documentation. Use this crate with the appropriate feature flag and it will be forwarded to ndarray-linalg.

Example

To use in your crate, add the following to the Cargo.toml:

ndarray = { version = "0.15", features = ["blas"]}
ndarray-glm = { version = "0.0.12", features = ["openblas-system"] }

An example for linear regression is shown below.

use ndarray_glm::{array, Linear, ModelBuilder, utility::standardize};

// define some test data
let data_y = array![0.3, 1.3, 0.7];
let data_x = array![[0.1, 0.2], [-0.4, 0.1], [0.2, 0.4]];
// The design matrix can optionally be standardized, where the mean of each independent
// variable is subtracted and each is then divided by the standard deviation of that variable.
let data_x = standardize(data_x);
let model = ModelBuilder::<Linear>::data(&data_y, &data_x).build()?;
// L2 (ridge) regularization can be applied with l2_reg().
let fit = model.fit_options().l2_reg(1e-5).fit()?;
// Currently the result is a simple array of the MLE estimators, including the intercept term.
println!("Fit result: {}", fit.result);

Custom non-canonical link functions can be defined by the user, although the interface is currently not particularly ergonomic. See tests/custom_link.rs for examples.

Features

Linear regression
Logistic regression
Generalized linear model IRLS
Linear offsets
Generic over floating point type
Non-float domain types
Regularization
- L2 (ridge)
- L1 (lasso)
- Elastic Net
Other exponential family distributions
- Poisson
- Binomial
- Exponential
- Gamma
- Inverse Gaussian
Data standardization/normalization
- External utility function
- Automatic internal transformation
Weighted (and correlated?) regressions
Non-canonical link functions
Goodness-of-fit tests

Troubleshooting

Lasso/L1 regularization can converge slowly in some cases, particularly when the data is poorly-behaved, seperable, etc.

The following tips are recommended things to try if facing convergence issues generally, but are more likely to be necessary in a L1 regularization problem.

Standardize the feature data
Use f32 instead of f64
Increase the tolerance and/or the maximum number of iterations
Include a small L2 regularization as well.

If you encounter problems that persist even after these techniques are applied, please file an issue so the algorithm can be improved.

References

notes on generalized linear models
Generalized Linear Models and Extensions by Hardin & Hilbe