Skip to main content

Module parallel

Module parallel 

Source
Available on crate features alloc and parallel only.
Expand description

Parallel SGBT training with delayed gradient updates.

Instead of sequential gradient propagation through boosting steps, this module uses the full ensemble prediction as the gradient target for all steps simultaneously. Each step trains independently on the same gradient, enabling rayon-based parallelism across steps.

§Algorithm

For each incoming sample (x, y):

  1. Compute the full ensemble prediction: F(x) = base + lr * sum tree_s(x)
  2. Compute gradient g = loss.gradient(y, F(x)) and hessian h = loss.hessian(y, F(x))
  3. Pre-compute train_count for each step (sequential, uses RNG state)
  4. Train ALL steps in parallel with the same (x, g, h) and per-step train_count

This is a “delayed gradient” approach: all steps see the same gradient computed from the full ensemble prediction, rather than the sequential rolling prediction used in standard SGBT. This trades a small amount of gradient freshness for parallelism across boosting steps.

Requires the parallel feature flag for rayon-based parallelism. Without the feature, the module still compiles and works correctly using sequential iteration (identical results, just no multi-core speedup).

§Trade-offs

  • Pro: Near-linear speedup with number of cores for large ensembles.
  • Con: Gradient staleness may slow convergence slightly; typically compensated by a slightly higher learning rate or more training samples.

Structs§

ParallelSGBT
Parallel SGBT ensemble with delayed gradient updates.