pub struct HoeffdingTree { /* private fields */ }alloc only.Expand description
A streaming decision tree that uses Hoeffding-bound split decisions.
The tree grows incrementally: each call to train_one
routes one sample to its leaf, updates histograms, and potentially triggers
a split when statistical evidence is sufficient.
§Feature subsampling
When config.feature_subsample_rate < 1.0, each split evaluation considers
only a random subset of features (selected via a deterministic xorshift64 RNG).
This adds diversity when the tree is used inside an ensemble.
Implementations§
Source§impl HoeffdingTree
impl HoeffdingTree
Sourcepub fn new(config: TreeConfig) -> Self
pub fn new(config: TreeConfig) -> Self
Create a new HoeffdingTree with the given configuration.
The tree starts with a single root leaf and no feature information; the number of features is inferred from the first training sample.
Sourcepub fn from_arena(
config: TreeConfig,
arena: TreeArena,
n_features: Option<usize>,
samples_seen: u64,
rng_state: u64,
) -> Self
pub fn from_arena( config: TreeConfig, arena: TreeArena, n_features: Option<usize>, samples_seen: u64, rng_state: u64, ) -> Self
Reconstruct a HoeffdingTree from a pre-built arena.
Used during model deserialization. The tree is restored with node topology and leaf values intact, but histogram accumulators are empty (they will rebuild naturally from continued training).
The root is assumed to be NodeId(0). Leaf states are created empty
for all current leaf nodes in the arena.
Sourcepub fn tree_config(&self) -> &TreeConfig
pub fn tree_config(&self) -> &TreeConfig
Immutable access to the tree configuration.
Sourcepub fn n_features(&self) -> Option<usize>
pub fn n_features(&self) -> Option<usize>
Number of features (learned from the first sample, None before any training).
Sourcepub fn leaf_grad_hess(&self, node: NodeId) -> Option<(f64, f64)>
pub fn leaf_grad_hess(&self, node: NodeId) -> Option<(f64, f64)>
Read-only access to the gradient and hessian sums for a leaf node.
Returns Some((grad_sum, hess_sum)) if node is a leaf with an active
leaf state, or None if the node has no state (e.g. internal node
or freshly allocated).
These sums enable inverse-hessian confidence estimation:
confidence = 1.0 / (hess_sum + lambda). High hessian means the leaf
has seen consistent, informative data; low hessian means uncertainty.
Sourcepub fn predict_smooth(&self, features: &[f64], bandwidth: f64) -> f64
pub fn predict_smooth(&self, features: &[f64], bandwidth: f64) -> f64
Predict using sigmoid-blended soft routing for smooth interpolation.
Instead of hard left/right routing at each split node, uses sigmoid
blending: alpha = sigmoid((threshold - feature) / bandwidth). The
prediction is alpha * left_pred + (1 - alpha) * right_pred, computed
recursively from root to leaves.
The result is a continuous function that varies smoothly with every feature change — no bins, no boundaries, no jumps.
§Arguments
features- Input feature vector.bandwidth- Controls transition sharpness. Smaller = sharper (closer to hard splits), larger = smoother.
Sourcepub fn predict_smooth_auto(&self, features: &[f64], bandwidths: &[f64]) -> f64
pub fn predict_smooth_auto(&self, features: &[f64], bandwidths: &[f64]) -> f64
Predict using per-feature auto-calibrated bandwidths.
Each feature uses its own bandwidth derived from median split threshold
gaps. Features with f64::INFINITY bandwidth fall back to hard routing.
Sourcepub fn predict_interpolated(&self, features: &[f64]) -> f64
pub fn predict_interpolated(&self, features: &[f64]) -> f64
Predict with parent-leaf linear interpolation.
Routes to the leaf but blends the leaf prediction with the parent node’s preserved prediction based on the leaf’s hessian sum. Fresh leaves (low hess_sum) smoothly transition from parent prediction to their own:
alpha = leaf_hess / (leaf_hess + lambda)
pred = alpha * leaf_pred + (1 - alpha) * parent_pred
This fixes static predictions from leaves that split but haven’t accumulated enough samples to outperform their parent.
Sourcepub fn predict_sibling_interpolated(
&self,
features: &[f64],
bandwidths: &[f64],
) -> f64
pub fn predict_sibling_interpolated( &self, features: &[f64], bandwidths: &[f64], ) -> f64
Predict with sibling-based interpolation for feature-continuous predictions.
At the leaf’s parent split, blends the leaf prediction with its sibling’s prediction based on the feature’s distance from the split threshold:
Within the margin m around the threshold:
t = (feature - threshold + m) / (2 * m) (0 at left edge, 1 at right edge)
pred = (1 - t) * left_pred + t * right_pred
Outside the margin, returns the routed child’s prediction directly.
The margin m is derived from auto-bandwidths if available, otherwise
defaults to feature_range / n_bins heuristic per feature.
This makes predictions vary continuously as features move near split boundaries, eliminating step-function artifacts.
Sourcepub fn collect_split_thresholds_per_feature(&self) -> Vec<Vec<f64>>
pub fn collect_split_thresholds_per_feature(&self) -> Vec<Vec<f64>>
Collect all split thresholds per feature from the tree arena.
Returns a Vec<Vec<f64>> indexed by feature, containing all thresholds
used in continuous splits. Categorical splits are excluded.
Trait Implementations§
Source§impl Clone for HoeffdingTree
impl Clone for HoeffdingTree
Source§impl StreamingTree for HoeffdingTree
impl StreamingTree for HoeffdingTree
Source§fn train_one(&mut self, features: &[f64], gradient: f64, hessian: f64)
fn train_one(&mut self, features: &[f64], gradient: f64, hessian: f64)
Train the tree on a single sample.
Routes the sample to its leaf, updates histogram accumulators, and attempts a split if the Hoeffding bound is satisfied.
Source§fn predict(&self, features: &[f64]) -> f64
fn predict(&self, features: &[f64]) -> f64
Predict the leaf value for a feature vector.
Routes from the root to a leaf via threshold comparisons and returns the leaf’s current weight.
Source§fn n_samples_seen(&self) -> u64
fn n_samples_seen(&self) -> u64
Total number of samples seen since creation.