ndarray-glm 0.0.2

Performs regression for general linear models on data stored in arrays using IRLS.
docs.rs failed to build ndarray-glm-0.0.2
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: ndarray-glm-0.0.12

ndarray-glm

Rust library for solving linear, logistic, and generalized linear models through iteratively reweighted least squares, using the ndarray-linalg module.

Crate Documentation Build Status

Status

This package is in early alpha and the interface is likely to undergo many changes.

Prerequisites

fortran and BLAS must be installed:

sudo apt update && sudo apt install gfortran libblas-dev

To use the OpenBLAS backend, install also libopenblas-dev and use this crate with the "openblas-src" feature.

Example

use ndarray::array;
use ndarray_glm::{linear::Linear, model::ModelBuilder, standardize::standardize};

// define some test data
let data_y = array![0.3, 1.3, 0.7];
let data_x = array![[0.1, 0.2], [-0.4, 0.1], [0.2, 0.4]];
// The design matrix can optionally be standardized, where the mean of each independent
// variable is subtracted and each is then divided by the standard deviation of that variable.
let data_x = standardize(data_x);
// The model is general over floating point type.
// If the second argument is left "_", it will be inferred if possible.
// L2 regularization can be applied with l2_reg().
let model = ModelBuilder::<Linear, f32>::new(&data_y, &data_x).l2_reg(1e-5).build()?;
let fit = model.fit()?;
println!("Fit result: {}", fit.result);

Features

  • Linear regression
  • Logistic regression
  • Generalized linear model IRLS
  • Linear offsets
  • Allow non-float domain types
  • L2 (ridge) Regularization
  • L1 (lasso) Regularization
  • Generic over floating point type
  • Poisson
  • Exponential
  • Gamma (which effectively reduces to exponential with an arbitrary dispersion parameter)
  • Inverse Gaussian
  • Other exponential family distributions
  • Option for data standardization/normalization
  • Weighted regressions
    • Weight the covariance matrix with point-by-point error bars
    • Allow for off-diagonal correlations between points
    • Fix likelihood functions
    • Check the tolerance conditions for termination
  • Non-canonical link functions
  • Goodness-of-fit tests
    • Log-likelihood difference from saturated model
    • Aikaike and Bayesian information criteria
    • generalized R^2?

TODO

  • Generalize GLM interface to allow multi-parameter fits like a gamma distribution.
  • Exact Z-scores by re-minimizing after fixing each parameter to zero (?)
  • Unit tests for correct convergence with linear offsets
  • Calculate/estimate dispersion parameter from the data

References