Crate treeboost

Expand description

TreeBoost: Universal Tabular Learning Engine

Combines linear models, gradient boosted trees, and random forests in a single unified interface. Pick the right tool for your data—or let the AutoTuner figure it out.

§Architecture

┌─────────────────────────────────────────────────────────────┐
│                      UniversalModel                         │
├──────────────┬──────────────────────┬───────────────────────┤
│   PureTree   │   LinearThenTree     │    RandomForest       │
│   (GBDT)     │   (Hybrid)           │    (Bagging)          │
└──────────────┴──────────────────────┴───────────────────────┘

§Quick Start (AutoML - Recommended)

use polars::prelude::*;
use treeboost::auto_train;

// Load data
let df = CsvReadOptions::default()
    .try_into_reader_with_file_path(Some("housing.csv".into()))?
    .finish()?;

// One-line training - analyzes data, selects mode, tunes params
let model = auto_train(&df, "price")?;

// Predict
let predictions = model.predict(&test_df)?;

// See what AutoML did
println!("{}", model.summary());

§Manual Configuration (Advanced)

use treeboost::{UniversalConfig, UniversalModel, BoostingMode};
use treeboost::dataset::DatasetLoader;
use treeboost::loss::MseLoss;

let loader = DatasetLoader::new(255);
let dataset = loader.load_parquet("data.parquet", "target", None)?;

let config = UniversalConfig::new()
    .with_mode(BoostingMode::LinearThenTree)  // Hybrid mode
    .with_num_rounds(100)
    .with_linear_rounds(10);

let model = UniversalModel::train(&dataset, config, &MseLoss)?;
let predictions = model.predict(&dataset);

§Boosting Modes

Mode	Best For
`BoostingMode::PureTree`	General tabular, categorical features
`BoostingMode::LinearThenTree`	Time-series, trending data, extrapolation
`BoostingMode::RandomForest`	Noisy data, variance reduction

§Weak Learners

LinearBooster: Ridge/LASSO/ElasticNet via Coordinate Descent
LinearTreeBooster: Decision trees with linear regression in leaves
TreeBooster: Standard histogram-based GBDT trees

§Preprocessing

The preprocessing module provides transforms that serialize with your model:

Scalers: StandardScaler, MinMaxScaler, RobustScaler
Encoders: FrequencyEncoder, LabelEncoder, OneHotEncoder
Imputers: SimpleImputer, IndicatorImputer
Time-series: [LagGenerator], [RollingGenerator], [EwmaGenerator]

§Additional Features

Histogram-based training: u8 bins for memory efficiency
Shannon Entropy regularized splits: Drift-resilient objective
Pseudo-Huber loss: Robust to outliers
Split Conformal Prediction: Distribution-free prediction intervals
Zero-copy serialization: Fast model loading via rkyv
GPU acceleration: WGPU (all GPUs), CUDA (NVIDIA)

Re-exports§

pub use backend::scalar::kernel;
pub use backend::BackendConfig;
pub use backend::BackendPreset;
pub use backend::BackendSelector;
pub use backend::BackendType;
pub use backend::GpuMode;
pub use backend::HistogramBackend;
pub use booster::GBDTConfig;
pub use booster::GBDTModel;
pub use booster::GbdtPreset;
pub use dataset::BinnedDataset;
pub use dataset::FeatureInfo;
pub use dataset::FeatureType;
pub use dataset::QuantileBinner;
pub use ensemble::EnsembleBuilder;
pub use ensemble::MultiSeedConfig;
pub use ensemble::SelectionConfig as EnsembleSelectionConfig;
pub use ensemble::StackedEnsemble;
pub use ensemble::StackingConfig;
pub use features::FeatureGenerationConfig;
pub use features::FeatureGenerator;
pub use features::FeatureSelector;
pub use features::PolynomialGenerator;
pub use features::RatioGenerator;
pub use features::SelectionConfig;
pub use features::SmartFeatureConfig;
pub use features::SmartFeaturePreset;
pub use histogram::HistogramBuilder;
pub use inference::Prediction;
pub use learner::Booster;
pub use learner::LeafLinearModel;
pub use learner::LinearBooster;
pub use learner::LinearConfig;
pub use learner::LinearPreset;
pub use learner::LinearTreeBooster;
pub use learner::LinearTreeConfig;
pub use learner::TreeBooster;
pub use learner::TreeConfig;
pub use learner::TreePreset;
pub use learner::WeakLearner;
pub use loss::sigmoid;
pub use loss::softmax;
pub use loss::BinaryLogLoss;
pub use loss::LossFunction;
pub use loss::MseLoss;
pub use loss::MultiClassLogLoss;
pub use loss::PseudoHuberLoss;
pub use model::AutoBuilder;
pub use model::AutoConfig;
pub use model::AutoEnsembleConfig;
pub use model::AutoEnsembleMethod;
pub use model::AutoModel;
pub use model::AutoModelUpdateReport;
pub use model::BoostingMode;
pub use model::BuildPhaseTimes;
pub use model::BuildResult;
pub use model::ConsoleProgress;
pub use model::IncrementalUpdateReport;
pub use model::ModeSelection;
pub use model::ProgressCallback;
pub use model::ProgressUpdate;
pub use model::QuietProgress;
pub use model::StackingStrategy;
pub use model::TrainingPhase;
pub use model::TreeTunerPreset;
pub use model::TuningLevel;
pub use model::UniversalConfig;
pub use model::UniversalModel;
pub use model::UniversalPreset;
pub use monitoring::AlertLevel;
pub use monitoring::CVHoldoutTracker;
pub use monitoring::ShiftDetector;
pub use monitoring::ShiftResult;
pub use analysis::compute_correlation;
pub use analysis::compute_r2;
pub use analysis::compute_variance;
pub use analysis::AnalysisConfig;
pub use analysis::AnalysisPreset;
pub use analysis::AnalysisReport;
pub use analysis::Confidence;
pub use analysis::DatasetAnalysis;
pub use analysis::Recommendation;
pub use preprocessing::EncodingMap;
pub use preprocessing::FrequencyEncoder;
pub use preprocessing::ImputeStrategy;
pub use preprocessing::IndicatorImputer;
pub use preprocessing::LabelEncoder;
pub use preprocessing::MinMaxScaler;
pub use preprocessing::OneHotEncoder;
pub use preprocessing::OrderedTargetEncoder;
pub use preprocessing::PipelineBuilder;
pub use preprocessing::Preprocessor;
pub use preprocessing::RobustScaler;
pub use preprocessing::Scaler;
pub use preprocessing::SimpleImputer;
pub use preprocessing::SmartPreprocessConfig;
pub use preprocessing::SmartPreprocessPreset;
pub use preprocessing::StandardScaler;
pub use preprocessing::UnknownStrategy;
pub use preprocessing::YeoJohnsonTransform;
pub use tree::InteractionConstraints;
pub use tree::MonotonicConstraint;
pub use tuner::AutoTuner;
pub use tuner::EvalStrategy;
pub use tuner::GridStrategy;
pub use tuner::ModelFormat;
pub use tuner::ParameterSpace;
pub use tuner::SearchHistory;
pub use tuner::SpacePreset;
pub use tuner::TunerConfig;
pub use tuner::TunerPreset;

Modules§

analysis: Dataset Analysis and Intelligent Mode Selection
backend: Backend abstraction for histogram building.
booster: GBDT booster module
dataset: Dataset module: Data loading, binning, and columnar storage
defaults
encoding: Production-grade categorical encoding for high-cardinality features
ensemble: Ensemble learning module
features: Automatic feature generation
histogram: Histogram construction for GBDT training
inference: Inference module
learner: Weak learner abstractions for gradient boosting
loss: Loss functions for GBDT training
model: High-level model abstractions
monitoring: Distribution shift detection and monitoring
prelude
preprocessing: Preprocessing transformations for data preparation
serialize: Serialization module
tree: Decision tree structures and algorithms
tuner: AutoTuner for hyperparameter optimization

Enums§

TreeBoostError: Library error type

Functions§

auto_train: Train a model with automatic configuration (the simplest API)
auto_train_csv: Train a model from a CSV file with automatic configuration
auto_train_quick: Train quickly with minimal tuning (for fast experimentation)
auto_train_thorough: Train thoroughly with extensive tuning (for best accuracy)
auto_train_with_mode: Train with a specific boosting mode (bypass auto-selection)

Type Aliases§

Result

Crate treeboost

Crate treeboost Copy item path

§Architecture

§Quick Start (AutoML - Recommended)

§Manual Configuration (Advanced)

§Boosting Modes

§Weak Learners

§Preprocessing

§Additional Features

Re-exports§

Modules§

Enums§

Functions§

Type Aliases§

Crate treeboost