Block 9 Phase 2/3 — device kernels that consume the row-primary Hessian
cache (the per-row r × r blocks materialised by
[crate::bms::BernoulliMarginalSlopeFamily::build_row_primary_hessian_cache]
and stored in [crate::bms::RowPrimaryEvalCache])
and emit either: