Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
nanogbm
A small gradient boosting library, in pure Rust, with a deliberately narrow scope: GBDT only, binary classification only, CPU only, dense numerical features. No DART/GOSS/RF, no multiclass, no ranking, no regression, no sparse inputs, no GPU, no FFI bindings.
What you get in return is a few thousand lines of code you can read end to end and actually follow — useful both as a learning artifact and as a no-FFI dependency in a Rust service.
[]
= { = "https://github.com/oginiaux/nanogbm" }
use ;
let cfg = Config ;
let train = from_rows?;
let model = new.fit?;
let probs = model.predict_proba;
Why does this exist?
LightGBM and XGBoost are excellent and you should reach for them whenever you can. They're also large C++ codebases with non-trivial build systems, and to actually understand what they do, you eventually have to sit down with a histogram-based learner small enough to fit in your head. That's the primary purpose of this code.
The secondary purpose is practical: when you want to train a model from
inside a Rust service, a pure-Rust crate is a much smaller commitment than
linking C++ through an FFI shim. cargo build and that's it.
What's actually in the box
- GBDT. Trees built one at a time, each one fitting the gradient of the loss so far.
- Binary logistic loss. Only. The objective is hardcoded on purpose.
- Histogram learner with sibling-by-subtraction. After a split, only the
smaller child's histograms are built from scratch; the larger sibling is
recovered by subtracting from the parent. This is the load-bearing perf
trick —
CLAUDE.mdhas the details. - Missing values handled at the split. Bucket 0 is reserved for NaN, and the learner picks per-node which side missing values go to, by gain.
- Early stopping that actually truncates the model to the best iteration, so the model you save is the one that won — not whatever the loop happened to land on when it gave up.
- Determinism. Same
Config+ same data → byte-identical model. All randomness flows through a singleChaCha8Rngseeded fromConfig::seed. - Bincode v2 serialization with serde derives. Stable across runs;
re-check after layout changes to
Tree,SplitNode,BinMapper, orModel. - A feature-encoding helper layer (
nanogbm::feature). You write oneencode_intofunction that pushesnum,bool,cat,cat_hashed, ormulti_hotvalues into a sink, and run it twice — once withDiscoverySinkto derive aSchema, then withSliceSinkper row on the hot path. Worth being precise here: the schema knows which columns are categorical (the feature-importance printer uses it), but the learner does not do native categorical splits.cat(v)writesv as f64,cat_hashedwrites a hash bucket index asf64, and the trees then split those columns numerically like any other feature. If you need true subset splits, expand to one-hot viamulti_hotand let the learner work on that.
What's not in the box
| Thing | Status |
|---|---|
| Multiclass / regression / rank | No |
| Native categorical splits | No — categoricals encode to numeric, see feature |
| Sparse input | No |
| DART / GOSS / RF mode | No |
| GPU | No |
| Multithreading | No (single-threaded today, not a principle) |
| Python / C / WASM bindings | No |
The single-thread limitation is a current fact, not a design principle:
TimingBuckets uses Cell specifically because nothing runs in parallel
yet. Parallelism may come later, but it would be a deliberate change.
Examples
cargo run --release --example basic
cargo run --release --example early_stopping
cargo run --release --example missing_and_importance
cargo run --release --example save_load
Always run in --release; debug builds of the training loop are orders of
magnitude slower and will skew any timing observation. Set
Config::verbose = true to get per-iteration validation scores and an
end-of-fit timing dump (hist_build, hist_subtract, split_search,
partition, gradients, score updates) — useful when you want to see where
the time actually went.
Tests
cargo test --release
cargo test --release --test e2e
The integration suite (tests/e2e.rs) protects three things and you
should care about all of them:
- Convergence on a synthetic problem — if it can't fit easy data, it can't fit hard data.
- Bincode round-trip — save, load, predict, identical results.
- Bin-path vs raw-path prediction consistency — predicting from raw
f64and predicting from a pre-bucketedDatasetmust produce bit-identical outputs. If you touch binning, splits, missing-direction logic, or serialization, run this.
A reading order, if you're here to learn
boosting/gbdt.rs— the outer loop. Build N trees, each one fitting the gradient of the loss the previous trees haven't explained.tree/learner.rs— the inner loop. Grow one tree leaf-wise until you hitnum_leavesor no leaf has a profitable split left.tree/histogram.rs+tree/split.rs— the part that's actually fast. Per-feature gradient/hessian histograms, regularized gain formula, missing-direction selection.dataset/bin_mapper.rs— how a column off64becomes a column ofu16bucket ids, and why bucket 0 is special.predict.rs— walk the trees, sum, sigmoid. The whole inference path.
License
MIT.