alloc and parallel only.Expand description
Parallel SGBT training with delayed gradient updates.
Instead of sequential gradient propagation through boosting steps, this module uses the full ensemble prediction as the gradient target for all steps simultaneously. Each step trains independently on the same gradient, enabling rayon-based parallelism across steps.
§Algorithm
For each incoming sample (x, y):
- Compute the full ensemble prediction:
F(x) = base + lr * sum tree_s(x) - Compute gradient
g = loss.gradient(y, F(x))and hessianh = loss.hessian(y, F(x)) - Pre-compute
train_countfor each step (sequential, uses RNG state) - Train ALL steps in parallel with the same
(x, g, h)and per-step train_count
This is a “delayed gradient” approach: all steps see the same gradient computed from the full ensemble prediction, rather than the sequential rolling prediction used in standard SGBT. This trades a small amount of gradient freshness for parallelism across boosting steps.
Requires the parallel feature flag for rayon-based parallelism. Without
the feature, the module still compiles and works correctly using sequential
iteration (identical results, just no multi-core speedup).
§Trade-offs
- Pro: Near-linear speedup with number of cores for large ensembles.
- Con: Gradient staleness may slow convergence slightly; typically compensated by a slightly higher learning rate or more training samples.
Structs§
- ParallelSGBT
- Parallel SGBT ensemble with delayed gradient updates.