1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
//! Forward pass execution for SDDP training.
//!
//! [`run_forward_pass`] simulates scenario trajectories via stage LPs with the
//! current Future Cost Function. Outputs [`TrajectoryRecord`]s and [`ForwardResult`]
//! for the backward pass and synchronisation step. Parallelised across workers
//! with deterministic scenario assignment.
//!
//! The per-scenario hot loop lives in
//! `forward_pass_state::ForwardPassState::run`. All scratch is owned by the
//! workspace and reused across scenarios; `run_forward_stage` reuses the
//! `ws.scratch.unscaled_primal` buffer via `mem::take`/restore (taken out before
//! the solve so it can be filled while the `view` borrow of `ws` is live, then
//! restored after the last read), the same pattern as `simulation/pipeline.rs`.
//!
//! ## Submodule layout
//!
//! - [`sync_forward`] (`stats_aggregation`) — cross-rank `allgatherv` plus the
//! canonical-order Welford summation for rank-count-invariant upper-bound stats.
//! - [`build_delta_cut_row_batch_into`] (`delta_cut_batch`) — delta-cut `RowBatch`
//! construction for baked-template appends.
//! - `write_capture_metadata` (`basis_capture`) — captured-basis metadata after a
//! stage solve.
//! - `run_forward_stage` (`stage_solve`) — the per-(scenario, stage) LP-solve
//! kernel carrying the hot-path scratch reuse.
//! - [`build_sampler_from_ctx`] (`sampler`) — `ForwardSampler` construction.
//!
//! This `mod.rs` owns the result structs ([`ForwardResult`], [`SyncResult`]), the
//! per-call parameter bundles ([`ForwardPassBatch`], `StageKey`), and the thin
//! [`run_forward_pass`] shim that delegates to `ForwardPassState::run`. The
//! `pub use` block below re-exports every submodule item so the
//! `crate::forward::`, `cobre_sddp::forward::`, and `training::forward::` paths
//! resolve verbatim.
use Sender;
use TrainingEvent;
use ;
use crate::;
pub use build_delta_cut_row_batch_into;
pub use build_sampler_from_ctx;
pub use sync_forward;
// `pub(crate)` re-exports match the items' own visibility: `write_capture_metadata`
// and `run_forward_stage` are crate-internal, reached through `crate::forward::`
// (and `super::`) by `training/backward.rs`, `training/forward_pass_state.rs`, and
// the in-file DCS tests. A `pub` re-export would be a private-in-public error.
pub use write_capture_metadata;
pub use run_forward_stage;
/// Local statistics from one rank's forward pass.
///
/// Carries the individual per-scenario trajectory costs in global scenario
/// index order (scenario 0 first, scenario N-1 last). The synchronisation
/// step gathers these costs from all ranks via `allgatherv` and performs
/// canonical-order summation to produce bit-identical statistics regardless
/// of the number of MPI ranks or intra-rank worker threads.
///
/// Does not contain lower bound estimate (evaluated separately after backward pass).
/// Global upper bound statistics from forward synchronisation step.
/// Bundled scalar parameters for one forward pass invocation.
///
/// Groups the per-iteration, per-rank scalar arguments that are forwarded
/// from [`crate::train`] into [`run_forward_pass`].
/// Per-stage solve context for one (stage, scenario) pair in the forward pass.
///
/// Passed to [`run_forward_stage`] to bundle scalar and slice parameters and
/// keep the argument count within the clippy `too_many_arguments` threshold.
pub
/// Execute the forward pass for one training iteration on this rank.
///
/// Simulates this rank's share of forward-pass scenarios through the full
/// stage horizon, solving the stage LP at each `(scenario, stage)` pair.
/// Pre-allocated [`TrajectoryRecord`]s in `records` are populated in-place.
///
/// ## Argument layout
///
/// - `workspaces` — one [`SolverWorkspace`] per worker thread. Scenarios are
/// statically partitioned across workspaces; each workspace owns its solver,
/// patch buffer, and current-state buffer exclusively.
/// - `basis_store` — per-scenario, per-stage basis store pre-allocated by the
/// caller. The store is split into disjoint sub-views before the parallel
/// region; each worker writes the optimal basis for its own scenarios.
/// - `ctx` — per-stage LP layout and noise scaling parameters bundled into a
/// single [`crate::context::StageContext`]. Contains: stage templates, base
/// row indices, noise scale factors, hydro and load-bus counts, load-balance
/// row starts, load-bus index mapping, and per-stage block counts.
/// - `baked` — the per-stage baked all-cuts [`StageTemplate`]s.
/// - `fcf` — Future Cost Function carrying the current Benders cut pools.
/// - `training_ctx` — study-level [`crate::context::TrainingContext`] bundle.
/// Carries `horizon` (stage count), `indexer` (LP column/row layout),
/// `initial_state` (starting state for every scenario, length `n_state`),
/// `stochastic` (pre-built pipeline: tree, seed, dim), and `inflow_method`
/// (inflow non-negativity treatment — whether slack columns absorb negative
/// inflow).
/// - `batch` — per-iteration [`ForwardPassBatch`] config. Carries
/// `local_forward_passes` (scenarios assigned to this rank; the caller splits
/// the user's total across MPI ranks), `total_forward_passes`, `iteration`
/// (0-based, for seed derivation), and `fwd_offset` (global index of this
/// rank's first forward pass; `global_scenario = fwd_offset + m`).
/// - `records` — pre-allocated output slice of length
/// `local_forward_passes * num_stages`.
///
/// ## Record layout
///
/// `records[scenario * num_stages + stage]` holds the LP solution for scenario
/// `scenario` at 0-based stage `stage`.
///
/// ## Error handling
///
/// On `SolverError::Infeasible`, returns `SddpError::Infeasible` with the
/// 0-based stage and local scenario indices. On any other `SolverError`,
/// returns `SddpError::Solver`. On error, `records` may be partially
/// populated.
///
/// # Errors
///
/// Returns `Err(SddpError::Infeasible { .. })` when a stage LP has no
/// feasible solution. Returns `Err(SddpError::Solver(_))` for all other
/// terminal LP solver failures.
///
/// # Panics (debug builds only)
///
/// Panics if a debug precondition is violated (the assertions fire inside
/// `ForwardPassState::run`):
///
/// - `records.len() != batch.local_forward_passes * num_stages`
/// - `training_ctx.initial_state.len() != training_ctx.indexer.n_state`
///
/// `run_forward_pass` is a thin shim: it constructs a temporary
/// `ForwardPassState` + `ForwardPassInputs` and calls `run` on them. Production
/// callers use `TrainingSession::run_forward_phase`, which drives
/// `ForwardPassState::run` directly and bypasses this shim.