nanogbm 0.1.0

A small, pure-Rust gradient boosting library (GBDT, binary classification, CPU only).
docs.rs failed to build nanogbm-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

nanogbm

A small gradient boosting library, in pure Rust, with a deliberately narrow scope: GBDT only, binary classification only, CPU only, dense numerical features. No DART/GOSS/RF, no multiclass, no ranking, no regression, no sparse inputs, no GPU, no FFI bindings.

What you get in return is a few thousand lines of code you can read end to end and actually follow — useful both as a learning artifact and as a no-FFI dependency in a Rust service.

[dependencies]
nanogbm = { git = "https://github.com/oginiaux/nanogbm" }
use nanogbm::{Config, DatasetBuilder, GbdtTrainer};

let cfg = Config { num_iterations: 100, learning_rate: 0.1, num_leaves: 31, ..Config::default() };
let train = DatasetBuilder::from_rows(&features, n_rows, n_features, &labels, &cfg)?;
let model = GbdtTrainer::new(&cfg).fit(&train, None)?;
let probs = model.predict_proba(&features, n_rows);

Why does this exist?

LightGBM and XGBoost are excellent and you should reach for them whenever you can. They're also large C++ codebases with non-trivial build systems, and to actually understand what they do, you eventually have to sit down with a histogram-based learner small enough to fit in your head. That's the primary purpose of this code.

The secondary purpose is practical: when you want to train a model from inside a Rust service, a pure-Rust crate is a much smaller commitment than linking C++ through an FFI shim. cargo build and that's it.

What's actually in the box

  • GBDT. Trees built one at a time, each one fitting the gradient of the loss so far.
  • Binary logistic loss. Only. The objective is hardcoded on purpose.
  • Histogram learner with sibling-by-subtraction. After a split, only the smaller child's histograms are built from scratch; the larger sibling is recovered by subtracting from the parent. This is the load-bearing perf trick — CLAUDE.md has the details.
  • Missing values handled at the split. Bucket 0 is reserved for NaN, and the learner picks per-node which side missing values go to, by gain.
  • Early stopping that actually truncates the model to the best iteration, so the model you save is the one that won — not whatever the loop happened to land on when it gave up.
  • Determinism. Same Config + same data → byte-identical model. All randomness flows through a single ChaCha8Rng seeded from Config::seed.
  • Bincode v2 serialization with serde derives. Stable across runs; re-check after layout changes to Tree, SplitNode, BinMapper, or Model.
  • A feature-encoding helper layer (nanogbm::feature). You write one encode_into function that pushes num, bool, cat, cat_hashed, or multi_hot values into a sink, and run it twice — once with DiscoverySink to derive a Schema, then with SliceSink per row on the hot path. Worth being precise here: the schema knows which columns are categorical (the feature-importance printer uses it), but the learner does not do native categorical splits. cat(v) writes v as f64, cat_hashed writes a hash bucket index as f64, and the trees then split those columns numerically like any other feature. If you need true subset splits, expand to one-hot via multi_hot and let the learner work on that.

What's not in the box

Thing Status
Multiclass / regression / rank No
Native categorical splits No — categoricals encode to numeric, see feature
Sparse input No
DART / GOSS / RF mode No
GPU No
Multithreading No (single-threaded today, not a principle)
Python / C / WASM bindings No

The single-thread limitation is a current fact, not a design principle: TimingBuckets uses Cell specifically because nothing runs in parallel yet. Parallelism may come later, but it would be a deliberate change.

Examples

cargo run --release --example basic
cargo run --release --example early_stopping
cargo run --release --example missing_and_importance
cargo run --release --example save_load

Always run in --release; debug builds of the training loop are orders of magnitude slower and will skew any timing observation. Set Config::verbose = true to get per-iteration validation scores and an end-of-fit timing dump (hist_build, hist_subtract, split_search, partition, gradients, score updates) — useful when you want to see where the time actually went.

Tests

cargo test --release
cargo test --release --test e2e

The integration suite (tests/e2e.rs) protects three things and you should care about all of them:

  1. Convergence on a synthetic problem — if it can't fit easy data, it can't fit hard data.
  2. Bincode round-trip — save, load, predict, identical results.
  3. Bin-path vs raw-path prediction consistency — predicting from raw f64 and predicting from a pre-bucketed Dataset must produce bit-identical outputs. If you touch binning, splits, missing-direction logic, or serialization, run this.

A reading order, if you're here to learn

  1. boosting/gbdt.rs — the outer loop. Build N trees, each one fitting the gradient of the loss the previous trees haven't explained.
  2. tree/learner.rs — the inner loop. Grow one tree leaf-wise until you hit num_leaves or no leaf has a profitable split left.
  3. tree/histogram.rs + tree/split.rs — the part that's actually fast. Per-feature gradient/hessian histograms, regularized gain formula, missing-direction selection.
  4. dataset/bin_mapper.rs — how a column of f64 becomes a column of u16 bucket ids, and why bucket 0 is special.
  5. predict.rs — walk the trees, sum, sigmoid. The whole inference path.

License

MIT.