Crate treeboost

Crate treeboost 

Source
Expand description

TreeBoost: Universal Tabular Learning Engine

Combines linear models, gradient boosted trees, and random forests in a single unified interface. Pick the right tool for your data—or let the AutoTuner figure it out.

§Architecture

┌─────────────────────────────────────────────────────────────┐
│                      UniversalModel                         │
├──────────────┬──────────────────────┬───────────────────────┤
│   PureTree   │   LinearThenTree     │    RandomForest       │
│   (GBDT)     │   (Hybrid)           │    (Bagging)          │
└──────────────┴──────────────────────┴───────────────────────┘
use polars::prelude::*;
use treeboost::auto_train;

// Load data
let df = CsvReadOptions::default()
    .try_into_reader_with_file_path(Some("housing.csv".into()))?
    .finish()?;

// One-line training - analyzes data, selects mode, tunes params
let model = auto_train(&df, "price")?;

// Predict
let predictions = model.predict(&test_df)?;

// See what AutoML did
println!("{}", model.summary());

§Manual Configuration (Advanced)

use treeboost::{UniversalConfig, UniversalModel, BoostingMode};
use treeboost::dataset::DatasetLoader;
use treeboost::loss::MseLoss;

let loader = DatasetLoader::new(255);
let dataset = loader.load_parquet("data.parquet", "target", None)?;

let config = UniversalConfig::new()
    .with_mode(BoostingMode::LinearThenTree)  // Hybrid mode
    .with_num_rounds(100)
    .with_linear_rounds(10);

let model = UniversalModel::train(&dataset, config, &MseLoss)?;
let predictions = model.predict(&dataset);

§Boosting Modes

ModeBest For
BoostingMode::PureTreeGeneral tabular, categorical features
BoostingMode::LinearThenTreeTime-series, trending data, extrapolation
BoostingMode::RandomForestNoisy data, variance reduction

§Weak Learners

§Preprocessing

The preprocessing module provides transforms that serialize with your model:

§Additional Features

  • Histogram-based training: u8 bins for memory efficiency
  • Shannon Entropy regularized splits: Drift-resilient objective
  • Pseudo-Huber loss: Robust to outliers
  • Split Conformal Prediction: Distribution-free prediction intervals
  • Zero-copy serialization: Fast model loading via rkyv
  • GPU acceleration: WGPU (all GPUs), CUDA (NVIDIA)

Re-exports§

pub use backend::scalar::kernel;
pub use backend::BackendConfig;
pub use backend::BackendPreset;
pub use backend::BackendSelector;
pub use backend::BackendType;
pub use backend::GpuMode;
pub use backend::HistogramBackend;
pub use booster::GBDTConfig;
pub use booster::GBDTModel;
pub use booster::GbdtPreset;
pub use dataset::BinnedDataset;
pub use dataset::FeatureInfo;
pub use dataset::FeatureType;
pub use dataset::QuantileBinner;
pub use ensemble::EnsembleBuilder;
pub use ensemble::MultiSeedConfig;
pub use ensemble::SelectionConfig as EnsembleSelectionConfig;
pub use ensemble::StackedEnsemble;
pub use ensemble::StackingConfig;
pub use features::FeatureGenerationConfig;
pub use features::FeatureGenerator;
pub use features::FeatureSelector;
pub use features::PolynomialGenerator;
pub use features::RatioGenerator;
pub use features::SelectionConfig;
pub use features::SmartFeatureConfig;
pub use features::SmartFeaturePreset;
pub use histogram::HistogramBuilder;
pub use inference::Prediction;
pub use learner::Booster;
pub use learner::LeafLinearModel;
pub use learner::LinearBooster;
pub use learner::LinearConfig;
pub use learner::LinearPreset;
pub use learner::LinearTreeBooster;
pub use learner::LinearTreeConfig;
pub use learner::TreeBooster;
pub use learner::TreeConfig;
pub use learner::TreePreset;
pub use learner::WeakLearner;
pub use loss::sigmoid;
pub use loss::softmax;
pub use loss::BinaryLogLoss;
pub use loss::LossFunction;
pub use loss::MseLoss;
pub use loss::MultiClassLogLoss;
pub use loss::PseudoHuberLoss;
pub use model::AutoBuilder;
pub use model::AutoConfig;
pub use model::AutoEnsembleConfig;
pub use model::AutoEnsembleMethod;
pub use model::AutoModel;
pub use model::AutoModelUpdateReport;
pub use model::BoostingMode;
pub use model::BuildPhaseTimes;
pub use model::BuildResult;
pub use model::ConsoleProgress;
pub use model::IncrementalUpdateReport;
pub use model::ModeSelection;
pub use model::ProgressCallback;
pub use model::ProgressUpdate;
pub use model::QuietProgress;
pub use model::StackingStrategy;
pub use model::TrainingPhase;
pub use model::TreeTunerPreset;
pub use model::TuningLevel;
pub use model::UniversalConfig;
pub use model::UniversalModel;
pub use model::UniversalPreset;
pub use monitoring::AlertLevel;
pub use monitoring::CVHoldoutTracker;
pub use monitoring::ShiftDetector;
pub use monitoring::ShiftResult;
pub use analysis::compute_correlation;
pub use analysis::compute_r2;
pub use analysis::compute_variance;
pub use analysis::AnalysisConfig;
pub use analysis::AnalysisPreset;
pub use analysis::AnalysisReport;
pub use analysis::Confidence;
pub use analysis::DatasetAnalysis;
pub use analysis::Recommendation;
pub use preprocessing::EncodingMap;
pub use preprocessing::FrequencyEncoder;
pub use preprocessing::ImputeStrategy;
pub use preprocessing::IndicatorImputer;
pub use preprocessing::LabelEncoder;
pub use preprocessing::MinMaxScaler;
pub use preprocessing::OneHotEncoder;
pub use preprocessing::OrderedTargetEncoder;
pub use preprocessing::PipelineBuilder;
pub use preprocessing::Preprocessor;
pub use preprocessing::RobustScaler;
pub use preprocessing::Scaler;
pub use preprocessing::SimpleImputer;
pub use preprocessing::SmartPreprocessConfig;
pub use preprocessing::SmartPreprocessPreset;
pub use preprocessing::StandardScaler;
pub use preprocessing::UnknownStrategy;
pub use preprocessing::YeoJohnsonTransform;
pub use tree::InteractionConstraints;
pub use tree::MonotonicConstraint;
pub use tuner::AutoTuner;
pub use tuner::EvalStrategy;
pub use tuner::GridStrategy;
pub use tuner::ModelFormat;
pub use tuner::ParameterSpace;
pub use tuner::SearchHistory;
pub use tuner::SpacePreset;
pub use tuner::TunerConfig;
pub use tuner::TunerPreset;

Modules§

analysis
Dataset Analysis and Intelligent Mode Selection
backend
Backend abstraction for histogram building.
booster
GBDT booster module
dataset
Dataset module: Data loading, binning, and columnar storage
defaults
encoding
Production-grade categorical encoding for high-cardinality features
ensemble
Ensemble learning module
features
Automatic feature generation
histogram
Histogram construction for GBDT training
inference
Inference module
learner
Weak learner abstractions for gradient boosting
loss
Loss functions for GBDT training
model
High-level model abstractions
monitoring
Distribution shift detection and monitoring
prelude
preprocessing
Preprocessing transformations for data preparation
serialize
Serialization module
tree
Decision tree structures and algorithms
tuner
AutoTuner for hyperparameter optimization

Enums§

TreeBoostError
Library error type

Functions§

auto_train
Train a model with automatic configuration (the simplest API)
auto_train_csv
Train a model from a CSV file with automatic configuration
auto_train_quick
Train quickly with minimal tuning (for fast experimentation)
auto_train_thorough
Train thoroughly with extensive tuning (for best accuracy)
auto_train_with_mode
Train with a specific boosting mode (bypass auto-selection)

Type Aliases§

Result