ngboost-rs
A Rust implementation of NGBoost (Natural Gradient Boosting for Probabilistic Prediction).
NGBoost is a modular boosting algorithm that allows you to obtain full probability distributions for your predictions, not just point estimates. This enables uncertainty quantification, prediction intervals, and probabilistic forecasting.
Features
-
Probabilistic Predictions: Get full probability distributions, not just point estimates
-
Multiple Distributions: Support for Normal, Poisson, Gamma, Exponential, Laplace, Weibull, and more
-
Classification Support: Bernoulli and multi-class Categorical distributions
-
Flexible Scoring Rules: LogScore and CRPScore implementations
-
Natural Gradient Boosting: Uses the natural gradient for efficient optimization on probability distribution manifolds
-
Generic Design: Easily extensible with custom distributions and base learners
Installation
⚠️ Important: No BLAS backend is enabled by default.
This library relies on a BLAS/LAPACK backend for matrix operations. To ensure cross-platform compatibility (e.g., macOS vs Windows), no backend is selected by default. You must explicitly enable one of the features below in your
Cargo.toml, otherwise the project will fail to link.Add this to your
Cargo.toml:[] # Replace "openblas" with "accelerate" (macOS) or "intel-mkl" (Windows/Linux) as needed = { = "0.1", = ["openblas"] } = "0.15"Platform-Specific BLAS Backend
This library requires a BLAS/LAPACK backend. Choose one based on your platform:
Add this to your Cargo.toml:
[]
= { = "0.1", = ["openblas"] }
= "0.15"
Platform-Specific BLAS Backend
This library requires a BLAS/LAPACK backend. Choose one based on your platform:
macOS (Accelerate - recommended)
[]
= { = "0.1", = ["accelerate"] }
Linux (OpenBLAS)
First install OpenBLAS:
# Ubuntu/Debian
# Fedora
# Arch
Then in Cargo.toml:
[]
= { = "0.1", = ["openblas"] }
Windows (OpenBLAS or Intel MKL)
For Windows, you can use Intel MKL or build OpenBLAS from source.
Using Intel MKL (easier on Windows):
[]
= { = "0.1", = ["intel-mkl"] }
Intel MKL (any platform - high performance)
[]
= { = "0.1", = ["intel-mkl"] }
Note: Intel MKL requires the MKL libraries to be installed.
Quick Start
Regression Example
use ;
use Normal;
use StumpLearner;
use NGBoost;
use LogScore;
Classification Example
use Bernoulli;
use ClassificationDistn;
// Binary classification
let mut model: =
new;
model.fit.expect;
// Get class predictions
let predictions = model.predict;
// Get class probabilities
let pred_dist = model.pred_dist;
let probabilities = pred_dist.class_probs; // Shape: (n_samples, n_classes)
Available Distributions
Regression Distributions
| Distribution | Parameters | Use Case |
|---|---|---|
Normal |
loc, scale | General continuous data |
NormalFixedVar |
loc | When variance is known/fixed |
NormalFixedMean |
scale | When mean is known/fixed |
LogNormal |
loc, scale | Positive, right-skewed data |
Exponential |
scale | Waiting times, survival |
Gamma |
shape, rate | Positive continuous data |
Poisson |
rate | Count data |
Laplace |
loc, scale | Heavy-tailed data |
Weibull |
shape, scale | Survival analysis |
HalfNormal |
scale | Positive data near zero |
StudentT |
loc, scale, df | Heavy tails, robust |
TFixedDf |
loc, scale | T with fixed df=3 |
Cauchy |
loc, scale | Very heavy tails |
Classification Distributions
| Distribution | Parameters | Use Case |
|---|---|---|
Bernoulli |
1 logit | Binary classification |
Categorical<K> |
K-1 logits | K-class classification |
Categorical3 |
2 logits | 3-class classification |
Categorical10 |
9 logits | 10-class (e.g., digits) |
Multivariate Distributions
| Distribution | Parameters | Use Case |
|---|---|---|
MultivariateNormal<P> |
P*(P+3)/2 | Multi-output regression |
Scoring Rules
NGBoost supports different scoring rules for training:
- LogScore: Negative log-likelihood (default, most common)
- CRPScore: Continuous Ranked Probability Score (proper scoring rule)
use ;
// Using LogScore (default)
let model: = new;
// Using CRPScore
let model: = new;
Uncertainty Quantification
One of the key advantages of NGBoost is uncertainty estimation:
let pred_dist = model.pred_dist;
// For Normal distribution
for i in 0..n_samples
Examples
Run the examples to see NGBoost in action:
# Basic regression
# Binary classification
# Comparing different distributions
# Uncertainty quantification
API Reference
NGBoost
Distribution Trait
Performance Tips
- Learning Rate: Start with 0.1 and decrease if overfitting
- Number of Estimators: More is usually better, but watch for overfitting
- Distribution Choice: Match the distribution to your data characteristics
- Natural Gradient: Enabled by default, provides faster convergence
- Release Mode: Always use
cargo build --releasefor production - it's significantly faster
Building for Performance
For best performance, always compile in release mode:
The release profile includes:
- Full optimizations (
opt-level = 3) - Link-time optimization (
lto = "thin")
Debug builds are intentionally slower but compile faster during development.
Comparison with Python NGBoost
This Rust implementation aims to be compatible with the Python NGBoost library:
| Feature | Python | Rust |
|---|---|---|
| Core Algorithm | Yes | Yes |
| Natural Gradient | Yes | Yes |
| LogScore | Yes | Yes |
| CRPScore | Yes | Yes (Normal, Laplace) |
| Regression Distributions | 16 | 16 |
| Classification | Yes | Yes |
| Survival/Censoring | Yes | Not yet |
| Scikit-learn Integration | Yes | N/A |
Distribution Parity
All Python distributions have been ported:
- Normal, NormalFixedVar, NormalFixedMean
- LogNormal, Exponential, Gamma, Poisson
- Laplace, Weibull, HalfNormal
- StudentT, TFixedDf, TFixedDfFixedVar
- Cauchy, CauchyFixedVar
- Bernoulli, Categorical (k-class)
- MultivariateNormal
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
License
This project is licensed under the MIT License - see the LICENSE file for details.
References
- NGBoost: Natural Gradient Boosting for Probabilistic Prediction
- Stanford ML Group - NGBoost
- Original Python Implementation
Acknowledgments
This is a Rust port of the excellent NGBoost Python library developed by the Stanford ML Group.