ndarray-glm 0.1.0

# ndarray-glm

Rust library for solving linear, logistic, and other generalized linear models
(GLMs), using the [`ndarray-linalg`](https://docs.rs/crate/ndarray-linalg/)
module.

[![Crate](https://img.shields.io/crates/v/ndarray-glm.svg)](https://crates.io/crates/ndarray-glm)
[![Documentation](https://docs.rs/ndarray-glm/badge.svg)](https://docs.rs/ndarray-glm)
![Downloads](https://img.shields.io/crates/d/ndarray-glm)

## Status

Numerical accuracy cannot be fully guaranteed, but the tests do include several
checks against R's `glm` and `glmnet` packages. Some edge cases may be
excluded and others may involve inherent ambiguities or imprecisions.

The regression algorithm uses iteratively re-weighted least squares (IRLS) with
a line-search procedure applied when the next iteration of guesses does not
increase the likelihood.

Suggestions (via issues) and pull requests are welcome.

## Prerequisites

The recommended approach is to use a system BLAS implementation. For instance, to install
OpenBLAS on Debian/Ubuntu:
```
sudo apt update && sudo apt install -y libopenblas-dev
```
or on Arch:
```
sudo pacman -Syu blas-openblas
```
(or perhaps just `openblas`, which is a dependency of `blas-openblas`).
Regardless of the installation method, these libraries permit use of this crate
with the `openblas-system` feature.

To use an alternative backend or to build a static BLAS implementation, refer to the
`ndarray-linalg`
[documentation](https://github.com/rust-ndarray/ndarray-linalg#backend-features). Use
this crate with the appropriate feature flag and it will be forwarded to
`ndarray-linalg`.

## Example

To use in your crate, add the following to the `Cargo.toml`:

```
ndarray = { version = "0.17", features = ["blas"]}
ndarray-glm = { version = "0.1", features = ["openblas-system"] }
```

An example for linear regression is shown below. The library is generic over
floating point type (`f32` or `f64`).

``` rust
use ndarray_glm::{array, Linear, ModelBuilder};

// define some test data
let data_y = array![0.3, 1.3, 0.7];
let data_x = array![[0.1, 0.2], [-0.4, 0.1], [0.2, 0.4]];
let model = ModelBuilder::<Linear>::data(&data_y, &data_x).build()?;

let fit = model.fit()?;
// Or instead, to e.g. apply L2 (ridge) regularization:
let fit = model.fit_options().l2_reg(1e-5).fit()?;

// The result is a simple array of the MLE estimators, including the intercept
// term in the 0th index.
println!("Fit result: {}", fit.result);
```

By default, the X data is standardized (mean-subtracted and scaled by the std
dev) for internal calculations, but the regression results are transformed back
to the external scale for the user. This reduces the risk of scale-dependent
numerical issues, and puts all features on the same footing with regards to any
regularization. This can be disabled with `no_standardize()` in the
`ModelBuilder` but is designed to be hands-off for the user, so it's
recommended to keep it in most cases.

Some common non-canonical link functions are available (see e.g. `exp_link` and
`logistic_link`), and additional custom ones can be defined by the user.
The `link` module documentation covers how to select a provided link, how to
implement a custom link (canonical or non-canonical), and how to use the
`TestLink` trait to run built-in consistency tests against a custom
implementation.

## Features

- [X] Exponential family distributions
  - [X] Linear
  - [X] Logistic
  - [X] Poisson
  - [X] Binomial
  - [X] Exponential
  - [X] Gamma
  - [X] Inverse Gaussian
- [X] Linear offsets
- [X] Generic over floating point type
- [X] Regularization
  - [X] L2 (ridge)
  - [X] L1 (lasso) via ADMM
  - [X] Elastic Net (L1 + L2)
- [X] Automatic internal data standardization (can be disabled)
- [X] Weighted regressions (frequency and/or variance weights)
- [X] Non-canonical link functions
- [X] Fit statistics (see documentation on `Fit` struct)
  - [X] Residuals
    - [X] Response, Pearson, deviance
    - [X] Standardized and Studentized variants
    - [X] Working/partial
    - [X] Quantile (with `stats` feature)
  - [X] Coefficient covariance matrix
  - [X] Dispersion (for families with free dispersion)
  - [X] Leave-one-out (LOO) computations
    - [X] One-step approximations
    - [X] Exact re-fits
    - [X] Residuals
    - [X] Coefficients
  - [X] Information criteria
  - [X] Model significance
    - [X] Wald test
    - [X] Score test
    - [X] Likelihood ratio test
  - [X] P-values for model and covariates (with the `stats` feature)
  - [X] ... and others

## Troubleshooting

Lasso/L1 regularization can converge slowly in some cases, particularly when
the data is poorly-behaved, seperable, etc.

The following tips are recommended things to try if facing convergence issues
generally, but are more likely to be necessary in a L1 regularization problem.

* Standardize the feature data
* Use f32 instead of f64
* Increase the tolerance and/or the maximum number of iterations
* Include a small L2 regularization as well.

If you encounter problems that persist even after these techniques are applied,
please file an issue so the algorithm can be improved.

## Rendering equations in docs

To render the docs from source with the equations properly rendered, the KaTeX
header must be included explicitly.

```
RUSTDOCFLAGS="--html-in-header katex-header.html" cargo doc --no-deps --open
```

## References

* [notes on generalized linear models](https://felix-clark.github.io/glm-math)
* Generalized Linear Models and Extensions by Hardin & Hilbe