Struct SGBTConfigBuilder

Source

pub struct SGBTConfigBuilder { /* private fields */ }

Available on crate feature alloc only.

Expand description

Builder for SGBTConfig with validation on build().

§Example

use irithyll::ensemble::config::{SGBTConfig, DriftDetectorType};
use irithyll::ensemble::variants::SGBTVariant;

let config = SGBTConfig::builder()
    .n_steps(200)
    .learning_rate(0.05)
    .drift_detector(DriftDetectorType::Adwin { delta: 0.01 })
    .variant(SGBTVariant::Skip { k: 10 })
    .build()
    .expect("valid config");

Implementations§

Source §

impl SGBTConfigBuilder

Source

pub fn n_steps(self, n: usize) -> Self

Set the number of boosting steps (trees in the ensemble).

Source

pub fn learning_rate(self, lr: f64) -> Self

Set the learning rate (shrinkage factor).

Source

pub fn feature_subsample_rate(self, rate: f64) -> Self

Set the fraction of features to subsample per tree.

Source

pub fn max_depth(self, depth: usize) -> Self

Set the maximum tree depth.

Source

pub fn n_bins(self, bins: usize) -> Self

Set the number of histogram bins per feature.

Source

pub fn lambda(self, l: f64) -> Self

Set the L2 regularization parameter (lambda).

Source

pub fn gamma(self, g: f64) -> Self

Set the minimum split gain (gamma).

Source

pub fn grace_period(self, gp: usize) -> Self

Set the grace period (minimum samples before evaluating splits).

Source

pub fn delta(self, d: f64) -> Self

Set the Hoeffding bound confidence parameter (delta).

Source

pub fn drift_detector(self, dt: DriftDetectorType) -> Self

Set the drift detector type for tree replacement.

Source

pub fn variant(self, v: SGBTVariant) -> Self

Set the SGBT computational variant.

Source

pub fn seed(self, seed: u64) -> Self

Set the random seed for deterministic reproducibility.

Controls feature subsampling and variant skip/MI stochastic decisions. Two models with the same seed and data sequence will produce identical results.

Source

pub fn initial_target_count(self, count: usize) -> Self

Set the number of initial targets to collect before computing the base prediction.

The model collects this many target values before initializing the base prediction (via loss.initial_prediction). Default: 50.

Source

pub fn leaf_half_life(self, n: usize) -> Self

Set the half-life for exponential leaf decay (in samples per leaf).

After n samples, a leaf’s accumulated statistics have half the weight of the most recent sample. Enables continuous adaptation to concept drift.

Source

pub fn max_tree_samples(self, n: u64) -> Self

Set the maximum samples a single tree processes before proactive replacement.

After n samples, the tree is replaced regardless of drift detector state.

Source

pub fn split_reeval_interval(self, n: usize) -> Self

Set the split re-evaluation interval for max-depth leaves.

Every n samples per leaf, max-depth leaves re-evaluate whether a split would improve them. Inspired by EFDT (Manapragada et al. 2018).

Source

pub fn feature_names(self, names: Vec<String>) -> Self

Set human-readable feature names.

Enables named feature importances and named training input. Names must be unique; validated at build().

Source

pub fn feature_types(self, types: Vec<FeatureType>) -> Self

Set per-feature type declarations.

Declares which features are categorical vs continuous. Categorical features use one-bin-per-category binning and Fisher optimal binary partitioning. Supports up to 64 distinct category values per categorical feature.

Source

pub fn gradient_clip_sigma(self, sigma: f64) -> Self

Set per-leaf gradient clipping threshold (in standard deviations).

Each leaf tracks an EWMA of gradient mean and variance. Gradients exceeding mean ± sigma * n are clamped. Prevents outlier labels from corrupting streaming model stability.

Typical value: 3.0 (3-sigma clipping).

Source

pub fn monotone_constraints(self, constraints: Vec<i8>) -> Self

Set per-feature monotonic constraints.

+1 = non-decreasing, -1 = non-increasing, 0 = unconstrained. Candidate splits violating monotonicity are rejected during tree growth.

Source

pub fn quality_prune_alpha(self, alpha: f64) -> Self

Enable quality-based tree pruning with the given EWMA smoothing factor.

Trees whose marginal contribution drops below the threshold for patience consecutive samples are replaced with fresh trees. Suggested alpha: 0.01.

Source

pub fn quality_prune_threshold(self, threshold: f64) -> Self

Set the minimum contribution threshold for quality-based pruning.

Default: 1e-6. Only relevant when quality_prune_alpha is set.

Source

pub fn quality_prune_patience(self, patience: u64) -> Self

Set the patience (consecutive low-contribution samples) before pruning.

Default: 500. Only relevant when quality_prune_alpha is set.

Source

pub fn error_weight_alpha(self, alpha: f64) -> Self

Enable error-weighted sample importance with the given EWMA smoothing factor.

Samples the model predicted poorly get higher effective weight. Suggested alpha: 0.01.

Source

pub fn uncertainty_modulated_lr(self, enabled: bool) -> Self

Enable σ-modulated learning rate for distributional models.

Scales the location (μ) learning rate by current_sigma / rolling_sigma_mean, so the model adapts faster during high-uncertainty regimes and conserves during stable periods. Only affects DistributionalSGBT.

By default uses empirical σ (EWMA of squared errors). Set scale_mode(ScaleMode::TreeChain) for feature-conditional σ.

Source

pub fn scale_mode(self, mode: ScaleMode) -> Self

Set the scale estimation mode for DistributionalSGBT.

Empirical: EWMA of squared prediction errors (default, recommended).
TreeChain: dual-chain NGBoost with scale tree ensemble.

Source

pub fn empirical_sigma_alpha(self, alpha: f64) -> Self

EWMA alpha for empirical σ. Controls adaptation speed. Default 0.01.

Only used when scale_mode is Empirical.

Source

pub fn max_leaf_output(self, max: f64) -> Self

Set the maximum absolute leaf output value.

Clamps leaf predictions to [-max, max], breaking feedback loops that cause prediction explosions.

Source

pub fn adaptive_leaf_bound(self, k: f64) -> Self

Set per-leaf adaptive output bound (sigma multiplier).

Each leaf tracks EWMA of its own output weight and clamps to |output_mean| + k * output_std. Self-calibrating per-leaf. Recommended: use with leaf_half_life for streaming scenarios.

Source

pub fn min_hessian_sum(self, min_h: f64) -> Self

Set the minimum hessian sum for leaf output.

Fresh leaves with hess_sum < min_h return 0.0, preventing post-replacement spikes.

Source

pub fn huber_k(self, k: f64) -> Self

Set the Huber loss delta multiplier for DistributionalSGBT.

When set, location gradients use Huber loss with adaptive delta = k * empirical_sigma. Standard value: 1.345 (95% Gaussian efficiency).

Source

pub fn shadow_warmup(self, warmup: usize) -> Self

Enable graduated tree handoff with the given shadow warmup samples.

Spawns an always-on shadow tree that trains alongside the active tree. After warmup samples, the shadow begins contributing to predictions via graduated blending. Eliminates prediction dips during tree replacement.

Source