TreeBoost

Universal Tabular Learning Engine. Linear models, GBDTs, and Random Forests—unified.
At a Glance
- Hybrid Linear+Tree learner that extrapolates trends while capturing interactions
- AutoTuner and AutoML mode selection with conformal prediction built in
- GPU acceleration (WebGPU, CUDA) plus AVX-512/SVE2 CPU backends
- Zero-copy serialization and incremental TRB updates for production pipelines
- Rust crate, CLI, and optional PyO3 bindings in one codebase
Quick Install
# Optional Python bindings (requires Rust toolchain + maturin)
See Installation for feature flags and build notes.
Project Links
- Docs: https://docs.rs/treeboost
- Crate: https://crates.io/crates/treeboost
- GitHub: https://github.com/ml-rust/treeboost
TreeBoost combines the extrapolation power of linear models, the interaction-capturing ability of gradient boosted trees, and the robustness of random forests—all in a single, zero-copy, production-ready Rust binary. GPU-accelerated out of the box.
Why TreeBoost?
Most tabular problems are solved by Linear, Tree, or their combination. Other libraries make you pick one. TreeBoost gives you all three through a single UniversalModel interface, plus automatic mode selection via the AutoTuner.
The Architecture:
┌─────────────────────────────────────────────────────────────┐
│ UniversalModel │
├──────────────┬──────────────────────┬───────────────────────┤
│ PureTree │ LinearThenTree │ RandomForest │
│ (GBDT) │ (Hybrid) │ (Bagging) │
│ │ │ │
│ Best for: │ Best for: │ Best for: │
│ - General │ - Time-series │ - Noisy data │
│ - Categorics │ - Trending data │ - Variance reduction │
│ │ - Extrapolation │ - Avoiding overfit │
└──────────────┴──────────────────────┴───────────────────────┘
Why Rust?
- Zero-copy, type-safe data handling
- Deploy without Python runtime
- Memory safety guarantees
- Single binary, no dependencies
What You Get:
- AutoML mode selection — instant data analysis picks
PureTree,LinearThenTree, orRandomForestwithout expensive training trials. - Hybrid Linear+Tree architecture —
LinearThenTreemode captures global trends with linear models, then trees learn the residuals. Extrapolates beyond training range. - Built-in preprocessing pipeline — Scalers, encoders, imputers that serialize with the model. No train/test skew.
- Linear Trees — Decision trees with Ridge regression in leaves. 10-100x fewer trees for piecewise linear data.
- Automatic hyperparameter tuning — AutoTuner with Latin Hypercube Sampling, k-fold CV, parallel evaluation. Tries all three modes automatically.
- GPU acceleration — WGPU (all GPUs), CUDA (NVIDIA), with AVX-512/SVE2/scalar fallback
- Production features — conformal prediction intervals, entropy regularization, ordered target encoding, zero-copy serialization
Automatic Hyperparameter Optimization
TreeBoost includes a production-ready AutoTuner that finds optimal hyperparameters automatically, eliminating manual tuning:
See examples/autotuner.rs for comprehensive examples.
AutoML Mode Selection
TreeBoost can analyze your dataset and pick the best boosting mode without a full training sweep.
use ;
let model = auto?;
println!;
println!;
This analysis uses fast linear/tree probes and produces a full report you can log or inspect.
Multi-Seed Ensemble Training
Combine predictions from multiple models trained with different random seeds:
use ;
use MseLoss;
// Train with 5 ensemble members, Ridge stacking
let config = new
.with_mode
.with_ensemble_seeds
.with_stacking_strategy;
let model = train?;
let predictions = model.predict;
Stacking strategies:
- Ridge: Learns optimal weights via Ridge regression on out-of-fold predictions. Recommended for diverse ensembles.
- Average: Simple equal-weight averaging. Fast and effective for homogeneous ensembles.
// Simple averaging
let config = new
.with_mode
.with_ensemble_seeds
.with_stacking_strategy;
let model = train?;
Quick Start
Rust (Native)
use ;
use DatasetLoader;
use MseLoss;
let loader = new;
let dataset = loader.load_parquet?;
// Choose your mode based on your data
let config = new
.with_mode // Hybrid: linear trend + tree residuals
.with_num_rounds
.with_linear_rounds
.with_learning_rate;
let model = train?;
let predictions = model.predict;
Quick mode selection:
| Your Data | Use This Mode |
|---|---|
| General tabular, categoricals | BoostingMode::PureTree |
| Time-series, trending, needs extrapolation | BoostingMode::LinearThenTree |
| Noisy data, want robustness | BoostingMode::RandomForest |
Python (via PyO3)
=
=
=
= # Hybrid mode
= 100
= 10
= 0.1
=
=
Architecture note:
UniversalModelwrapsGBDTModelinternally—PureTreemode delegates directly to it. You get GPU acceleration, conformal prediction, and all mature features through either API.GBDTModelis still available for direct use if you prefer.
How It Works: Automatic Backend Selection
flowchart TD
A{GPU Available?} -->|YES| B[WGPU Tensor-Tile<br/>Vulkan/Metal/DX12]
A -->|NO| C{CPU Architecture}
C -->|x86-64| D{AVX-512?}
C -->|ARM| E{SVE2?}
D -->|YES| F[AVX-512 Tensor-Tile<br/>vpconflictd parallel]
D -->|NO| G[Scalar Backend<br/>AVX2 loads]
E -->|YES| H[SVE2 Tensor-Tile<br/>HISTCNT direct]
E -->|NO| I[Scalar Backend<br/>NEON loads]
WebGPU backend: Works on all GPUs (NVIDIA, AMD, Intel, Apple) via Vulkan, Metal, or DX12. Designed for portability - no installation required beyond your system drivers. Uses Hybrid mode (GPU histogram + CPU tree growth) due to WebGPU's higher dispatch overhead.
CUDA backend: Enables Full GPU mode with custom kernels - 2x+ faster than WebGPU on NVIDIA hardware. Low dispatch latency allows the entire tree building pipeline to run on GPU (histogram, partition, level-wise growth). The speedup grows with larger datasets. Optional but recommended for NVIDIA users.
Coming soon: Native Metal and ROCm backends for Apple and AMD GPUs.
CPU backends: AVX-512 (3rd Gen Xeon+), SVE2 (ARM Neoverse), with optimized scalar fallback.
Explicit Backend Selection
By default, TreeBoost auto-detects the best backend. Specify backends explicitly to override:
Rust:
use ;
use BackendType;
let config = new
.with_num_rounds
.with_max_depth
.with_backend; // Force CPU (AVX2/NEON)
let model = train?;
Available backends:
Scalar // CPU: AVX2 (x86) or NEON (ARM) - no GPU overhead
Avx512 // CPU: AVX-512 tensor-tile (x86-64 only)
Sve2 // CPU: SVE2 tensor-tile (ARM only)
Wgpu // GPU: All GPUs via Vulkan/Metal/DX12 (portable)
Cuda // GPU: NVIDIA CUDA (2x+ faster than WGPU)
Auto // (Default) Auto-detect: CUDA > WGPU > AVX-512 > SVE2 > Scalar
Python:
=
= 100
= 6
= # Force CPU
=
Performance
Competitive Benchmarks
Inference: Optimized for CPU execution via Rayon parallelism. Fast inference on standard compute eliminates GPU deployment overhead—no need for expensive GPU VMs just to serve predictions.
Training: Automatic backend selection balances speed and cost. CPU training is already fast for datasets <100K rows; GPU acceleration (CUDA/WGPU) provides significant speedup for larger datasets (100K–1B+ rows) where the computational advantage justifies GPU deployment.
Compared to other pure-Rust GBDT implementations:
Inference (per-batch prediction):
| Dataset | TreeBoost | gbdt-rs | forust | Speedup |
|---|---|---|---|---|
| 100 samples | 47.4 µs | 135.5 µs | 92.9 µs | 2.9x vs gbdt-rs |
| 1K samples | 202 µs | 1.29 ms | 893 µs | 6.4x vs gbdt-rs |
| 10K samples | 539 µs | 11.7 ms | 8.9 ms | 21.7x vs gbdt-rs |
Training:
| Dataset | TreeBoost | gbdt-rs | forust | Speedup |
|---|---|---|---|---|
| 100K rows, 50 rounds | 263 ms | 3,389 ms | 581 ms | 12.9x vs gbdt-rs |
| 100K rows, 100 rounds (parallel) | 344 ms | 6,600 ms | 2,020 ms | 19.2x vs gbdt-rs |
Benchmarks: NVIDIA CUDA (Full GPU mode), raw float32 data, per-iteration time. See benches/competitors.rs for reproducible methodology.
Running Benchmarks:
# CPU-only comparison (fast, ~2 minutes)
# GPU-enabled comparison (with CUDA acceleration)
# Python cross-library comparison
Core Features
Robustness
- Shannon Entropy regularization — Prevent drift across time windows
- Pseudo-Huber loss — Automatic outlier handling (smoother than MSE)
- Split Conformal Prediction — Distribution-free uncertainty intervals on predictions
Data Handling
- Ordered Target Encoding — High-cardinality categoricals without target leakage
- Count-Min Sketch — Automatic rare category compression (memory efficient)
Model Control
- Monotonic/Interaction constraints — Enforce domain knowledge
- Feature importance — Understand model decisions
Production
- Zero-copy serialization — 100MB+ models load in milliseconds via rkyv
- Streaming inference — Predict on 1M rows in seconds
Incremental Learning
- TRB format — Custom journaled file format for incremental model updates
- Warm-start training — Add trees to existing models without full retraining
- O(1) appending — Updates append to file, no rewrite required
- Crash recovery — CRC32 checksums detect corruption, partial writes recovered
- Drift detection — Monitor distribution shifts between training batches
The Hybrid Architecture
How LinearThenTree Works
The LinearThenTree mode implements what's sometimes called "Residual Boosting" or "Linear-Forest":
Final Prediction = Linear(x) + Trees(x)
↑ ↑
│ └── Captures non-linear patterns, interactions
└── Captures global trend (can extrapolate!)
- Phase 1: Train a Ridge/LASSO/ElasticNet model on all features
- Phase 2: Compute residuals:
r = y - linear_prediction - Phase 3: Train GBDT on residuals (the stuff linear couldn't explain)
This is powerful for data with underlying trends (time-series, pricing, growth curves). Pure trees can't extrapolate—they're bounded by training data. The linear component can.
LinearTreeBooster (Different Thing!)
Don't confuse LinearThenTree mode with LinearTreeBooster. They solve different problems:
| LinearThenTree (Mode) | LinearTreeBooster (Learner) | |
|---|---|---|
| Structure | 1 global linear + many standard trees | Trees with linear models in each leaf |
| Best for | Global trends + local non-linearities | Piecewise linear data (tax brackets, physics) |
| Trees needed | Normal (50-200) | Very few (5-20) |
Use LinearTreeBooster when your data looks like segments with different slopes—the tree finds the breakpoints, Ridge fits each segment.
Preprocessing That Travels With Your Model
TreeBoost's preprocessing pipeline serializes with your model:
use ;
let pipeline = new
.add_standard_scaler
.add_simple_imputer
.add_frequency_encoder
.build;
// Fit on training data
pipeline.fit?;
// Transform both train and test identically
let train_transformed = pipeline.transform?;
let test_transformed = pipeline.transform?;
// Pipeline state saved with model - no train/test skew at inference
For Trees: Use FrequencyEncoder or LabelEncoder. OneHot creates sparse nightmares.
For Linear models: Use StandardScaler (essential!) and OneHotEncoder (linear needs binary indicators).
For Hybrid (LinearThenTree): The linear component gets internally standardized. You can still preprocess for the tree component.
Incremental Learning
TreeBoost supports incremental model updates via the TRB (TreeBoost) file format—a custom journaled format optimized for appending without rewriting the base model.
Why Incremental Learning?
- Avoid full retraining — Add trees to existing models with new data
- Real-time adaptation — Update models daily/hourly as data arrives
- Lower compute costs — Train on new data only, not entire history
Rust:
use ;
use DatasetLoader;
use MseLoss;
// 1. Initial training via AutoModel (convenience wrapper)
let auto = train?;
// 2. Save UniversalModel to TRB format
auto.inner.save_trb?;
// 3. Later: Load and update with new data (uses UniversalModel directly)
let mut model = load_trb?;
let loader = new;
let new_dataset = loader.load_parquet?;
let report = model.update?; // Add 10 trees
println!;
// 4. Append update to same file (O(1) append, no rewrite)
model.save_trb_update?;
// 5. Inference: Load and predict with BinnedDataset
let model = load_trb?;
let predictions = model.predict;
Note: TRB format stores
UniversalModelonly. UseAutoModelfor initial training convenience, then work withUniversalModel+BinnedDatasetfor incremental updates and inference.
The TRB Format:
┌──────────────────────────────────────────────────────────┐
│ Header (magic, version, model type, created_at, ...) │
├──────────────────────────────────────────────────────────┤
│ Base Model Blob + CRC32 │
├──────────────────────────────────────────────────────────┤
│ Update 1: Header + Blob + CRC32 (appended) │
├──────────────────────────────────────────────────────────┤
│ Update 2: Header + Blob + CRC32 (appended) │
└──────────────────────────────────────────────────────────┘
- Journaled appends — Updates append to file end, base model untouched
- CRC32 per segment — Detect corruption at segment level
- Crash recovery — Truncated writes detected and skipped on load
- Forward compatible — Unknown JSON fields in headers ignored
Drift Detection:
Monitor distribution shifts between training batches:
use ;
// Create detector from training data
let detector = from_dataset;
// Before updating, check for drift
let result = detector.check_update;
if result.has_significant_drift
Installation
Rust Library
Python Package
# From PyPI
# From source (requires Rust toolchain)
&&
Feature Flags
| Feature | Description | Use Case |
|---|---|---|
gpu |
WGPU backend (Vulkan/Metal/DX12) | All GPUs, portable |
cuda |
NVIDIA CUDA backend | 2x+ faster than WGPU on NVIDIA |
mmap |
Memory-mapped TRB loading | Instant model load, zero-copy I/O |
python |
PyO3 bindings | Python interop |
Enable features:
# GPU acceleration
# CUDA (NVIDIA only, requires CUDA 12.x)
# Memory-mapped model loading (instant load for large models)
Memory-mapped loading (mmap feature):
For large models (100MB+), mmap provides true zero-copy I/O:
| Reader | Load Time | Memory | Use Case |
|---|---|---|---|
TrbReader |
O(model_size) | O(model_size) | Default, works everywhere |
MmapTrbReader |
O(1) | O(1) initial | Large models, inference servers |
More Examples
Rust: Train, Save Config, and Save Model
use ;
// Train with AutoML (discovers best mode and hyperparameters)
let auto = train?;
// Save the discovered configuration to JSON (useful for inspection and reuse)
auto.save_config?;
// Save the trained model for inference
auto.save?;
// Later: Load and predict (no need to retrain)
let loaded = load?;
let predictions = loaded.predict;
let importances = loaded.feature_importance;
Export config to inspect discovered hyperparameters:
// After training with AutoML
let auto = train?;
// Export to JSON
let config_json = to_string_pretty?;
write?;
// Inspect the JSON to see what mode was chosen,
// learning rates, ensemble seeds, etc.
// Then manually adjust and retrain if needed
Python: Conformal Prediction
=
= + * 0.5
=
= 100
= 6
= 0.2 # Reserve 20% for uncertainty estimation
= 0.9 # 90% prediction intervals
=
, , =
# Now you have uncertainty bounds on every prediction
Python: Categorical Features
=
# Target encoding for high-cardinality categorical
=
= 100
= True # Ordered encoding, no leakage
= 100 # Rare categories → "Unknown"
=
=
=
Automatic Hyperparameter Tuning
Rust:
use ;
let tuner_config = new
.with_iterations
.with_grid_strategy
.with_eval_strategy // 5-fold CV
.with_verbose;
let mut tuner = new
.with_config
.with_space
.with_callback;
let = tuner.tune?;
println!;
// Train final model with best configuration
let final_model = train_binned?;
Python:
=
=
=
=
, =
# Train final model
=
CLI Tool
If you're using the binary distribution:
# Train a model (rkyv format for static models)
# Make predictions
# Inspect the model
# Incremental updates (TRB format)
Incremental Learning via CLI:
# Inspect a TRB file (shows update history)
# Output:
# Format version: 1
# Created: 2024-01-15 10:30:00 UTC
# Update History:
# Update 1: 2024-02-01 09:00:00 UTC (500 rows, "February data")
# Update 2: 2024-03-01 09:00:00 UTC (450 rows, "March data")
# Current tree count: 120
# Update with new data
# Force load despite corrupted updates (loads base only)
Run treeboost <command> --help for all available options.
Configuration Reference
Core Hyperparameters
| Parameter | Default | Description |
|---|---|---|
num_rounds |
100 | Number of boosting iterations |
max_depth |
6 | Maximum tree depth (deeper = more expressive but slower) |
learning_rate |
0.1 | Shrinkage per round (lower = more stable but slower training) |
max_leaves |
31 | Maximum leaves per tree |
lambda |
1.0 | L2 leaf regularization |
loss |
mse |
mse or huber (huber for outliers) |
Advanced Features
| Parameter | Default | Description |
|---|---|---|
entropy_weight |
0.0 | Shannon entropy penalty (prevents drift) |
subsample |
1.0 | Row sampling ratio per round |
colsample |
1.0 | Feature sampling ratio per tree |
calibration_ratio |
0.0 | Fraction of data reserved for conformal calibration |
conformal_quantile |
0.9 | Quantile for prediction intervals (0.9 = 90% coverage) |
use_target_encoding |
false | Enable ordered target encoding for categoricals |
cms_threshold |
0 | Rare category threshold (0 = disabled) |
Constraints
=
=
Troubleshooting
Check which backend is being used:
RUST_LOG=treeboost=debug
GPU not detected:
- Verify your GPU drivers are installed (NVIDIA, AMD, Intel, or Apple)
- WGPU supports Vulkan (Linux), Metal (macOS), DX12 (Windows)
- For NVIDIA CUDA: Install CUDA 12.x separately
Out of memory during training:
Model won't load:
- Ensure you're using the same TreeBoost version for save/load
- The
.rkyvfile is tied to the binary layout; recompiling TreeBoost may break compatibility
Acknowledgments
TreeBoost builds on the collective knowledge of the GBDT community. We acknowledge the following projects that shaped our design and implementation:
- XGBoost — Industry-standard GBDT with GPU support; inspired our histogram-based approach and Full GPU mode architecture.
- LightGBM — Leaf-wise growth strategy and histogram optimization techniques.
- CatBoost — Ordered target encoding for categorical features and conformal prediction intervals.
- Forust — Pure-Rust GBDT implementation; motivated our focus on Rust-first performance.
- WarpGBM — GPU-accelerated histogram building patterns.
License
Apache License 2.0