Expand description
TreeBoost: Universal Tabular Learning Engine
Combines linear models, gradient boosted trees, and random forests in a single unified interface. Pick the right tool for your data—or let the AutoTuner figure it out.
§Architecture
┌─────────────────────────────────────────────────────────────┐
│ UniversalModel │
├──────────────┬──────────────────────┬───────────────────────┤
│ PureTree │ LinearThenTree │ RandomForest │
│ (GBDT) │ (Hybrid) │ (Bagging) │
└──────────────┴──────────────────────┴───────────────────────┘§Quick Start (AutoML - Recommended)
ⓘ
use polars::prelude::*;
use treeboost::auto_train;
// Load data
let df = CsvReadOptions::default()
.try_into_reader_with_file_path(Some("housing.csv".into()))?
.finish()?;
// One-line training - analyzes data, selects mode, tunes params
let model = auto_train(&df, "price")?;
// Predict
let predictions = model.predict(&test_df)?;
// See what AutoML did
println!("{}", model.summary());§Manual Configuration (Advanced)
ⓘ
use treeboost::{UniversalConfig, UniversalModel, BoostingMode};
use treeboost::dataset::DatasetLoader;
use treeboost::loss::MseLoss;
let loader = DatasetLoader::new(255);
let dataset = loader.load_parquet("data.parquet", "target", None)?;
let config = UniversalConfig::new()
.with_mode(BoostingMode::LinearThenTree) // Hybrid mode
.with_num_rounds(100)
.with_linear_rounds(10);
let model = UniversalModel::train(&dataset, config, &MseLoss)?;
let predictions = model.predict(&dataset);§Boosting Modes
| Mode | Best For |
|---|---|
BoostingMode::PureTree | General tabular, categorical features |
BoostingMode::LinearThenTree | Time-series, trending data, extrapolation |
BoostingMode::RandomForest | Noisy data, variance reduction |
§Weak Learners
LinearBooster: Ridge/LASSO/ElasticNet via Coordinate DescentLinearTreeBooster: Decision trees with linear regression in leavesTreeBooster: Standard histogram-based GBDT trees
§Preprocessing
The preprocessing module provides transforms that serialize with your model:
- Scalers:
StandardScaler,MinMaxScaler,RobustScaler - Encoders:
FrequencyEncoder,LabelEncoder,OneHotEncoder - Imputers:
SimpleImputer,IndicatorImputer - Time-series: [
LagGenerator], [RollingGenerator], [EwmaGenerator]
§Additional Features
- Histogram-based training: u8 bins for memory efficiency
- Shannon Entropy regularized splits: Drift-resilient objective
- Pseudo-Huber loss: Robust to outliers
- Split Conformal Prediction: Distribution-free prediction intervals
- Zero-copy serialization: Fast model loading via rkyv
- GPU acceleration: WGPU (all GPUs), CUDA (NVIDIA)
Re-exports§
pub use backend::scalar::kernel;pub use backend::BackendConfig;pub use backend::BackendPreset;pub use backend::BackendSelector;pub use backend::BackendType;pub use backend::GpuMode;pub use backend::HistogramBackend;pub use booster::GBDTConfig;pub use booster::GBDTModel;pub use booster::GbdtPreset;pub use dataset::BinnedDataset;pub use dataset::FeatureInfo;pub use dataset::FeatureType;pub use dataset::QuantileBinner;pub use ensemble::EnsembleBuilder;pub use ensemble::MultiSeedConfig;pub use ensemble::SelectionConfig as EnsembleSelectionConfig;pub use ensemble::StackedEnsemble;pub use ensemble::StackingConfig;pub use features::FeatureGenerationConfig;pub use features::FeatureGenerator;pub use features::FeatureSelector;pub use features::PolynomialGenerator;pub use features::RatioGenerator;pub use features::SelectionConfig;pub use features::SmartFeatureConfig;pub use features::SmartFeaturePreset;pub use histogram::HistogramBuilder;pub use inference::Prediction;pub use learner::Booster;pub use learner::LeafLinearModel;pub use learner::LinearBooster;pub use learner::LinearConfig;pub use learner::LinearPreset;pub use learner::LinearTreeBooster;pub use learner::LinearTreeConfig;pub use learner::TreeBooster;pub use learner::TreeConfig;pub use learner::TreePreset;pub use learner::WeakLearner;pub use loss::sigmoid;pub use loss::softmax;pub use loss::BinaryLogLoss;pub use loss::LossFunction;pub use loss::MseLoss;pub use loss::MultiClassLogLoss;pub use loss::PseudoHuberLoss;pub use model::AutoBuilder;pub use model::AutoConfig;pub use model::AutoEnsembleConfig;pub use model::AutoEnsembleMethod;pub use model::AutoModel;pub use model::AutoModelUpdateReport;pub use model::BoostingMode;pub use model::BuildPhaseTimes;pub use model::BuildResult;pub use model::ConsoleProgress;pub use model::IncrementalUpdateReport;pub use model::ModeSelection;pub use model::ProgressCallback;pub use model::ProgressUpdate;pub use model::QuietProgress;pub use model::StackingStrategy;pub use model::TrainingPhase;pub use model::TreeTunerPreset;pub use model::TuningLevel;pub use model::UniversalConfig;pub use model::UniversalModel;pub use model::UniversalPreset;pub use monitoring::AlertLevel;pub use monitoring::CVHoldoutTracker;pub use monitoring::ShiftDetector;pub use monitoring::ShiftResult;pub use analysis::compute_correlation;pub use analysis::compute_r2;pub use analysis::compute_variance;pub use analysis::AnalysisConfig;pub use analysis::AnalysisPreset;pub use analysis::AnalysisReport;pub use analysis::Confidence;pub use analysis::DatasetAnalysis;pub use analysis::Recommendation;pub use preprocessing::EncodingMap;pub use preprocessing::FrequencyEncoder;pub use preprocessing::ImputeStrategy;pub use preprocessing::IndicatorImputer;pub use preprocessing::LabelEncoder;pub use preprocessing::MinMaxScaler;pub use preprocessing::OneHotEncoder;pub use preprocessing::OrderedTargetEncoder;pub use preprocessing::PipelineBuilder;pub use preprocessing::Preprocessor;pub use preprocessing::RobustScaler;pub use preprocessing::Scaler;pub use preprocessing::SimpleImputer;pub use preprocessing::SmartPreprocessConfig;pub use preprocessing::SmartPreprocessPreset;pub use preprocessing::StandardScaler;pub use preprocessing::UnknownStrategy;pub use preprocessing::YeoJohnsonTransform;pub use tree::InteractionConstraints;pub use tree::MonotonicConstraint;pub use tuner::AutoTuner;pub use tuner::EvalStrategy;pub use tuner::GridStrategy;pub use tuner::ModelFormat;pub use tuner::ParameterSpace;pub use tuner::SearchHistory;pub use tuner::SpacePreset;pub use tuner::TunerConfig;pub use tuner::TunerPreset;
Modules§
- analysis
- Dataset Analysis and Intelligent Mode Selection
- backend
- Backend abstraction for histogram building.
- booster
- GBDT booster module
- dataset
- Dataset module: Data loading, binning, and columnar storage
- defaults
- encoding
- Production-grade categorical encoding for high-cardinality features
- ensemble
- Ensemble learning module
- features
- Automatic feature generation
- histogram
- Histogram construction for GBDT training
- inference
- Inference module
- learner
- Weak learner abstractions for gradient boosting
- loss
- Loss functions for GBDT training
- model
- High-level model abstractions
- monitoring
- Distribution shift detection and monitoring
- prelude
- preprocessing
- Preprocessing transformations for data preparation
- serialize
- Serialization module
- tree
- Decision tree structures and algorithms
- tuner
- AutoTuner for hyperparameter optimization
Enums§
- Tree
Boost Error - Library error type
Functions§
- auto_
train - Train a model with automatic configuration (the simplest API)
- auto_
train_ csv - Train a model from a CSV file with automatic configuration
- auto_
train_ quick - Train quickly with minimal tuning (for fast experimentation)
- auto_
train_ thorough - Train thoroughly with extensive tuning (for best accuracy)
- auto_
train_ with_ mode - Train with a specific boosting mode (bypass auto-selection)