1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
use crate;
use crate;
use crateSolver;
use crateBasicState;
use crateTerminationReason;
/// Levenberg-Marquardt solver for nonlinear least-squares problems
/// `min ½‖r(x)‖²`, with Marquardt diagonal scaling and the Nielsen
/// 1999 smooth μ-update.
///
/// Each iteration solves the damped normal equations
/// `(JᵀJ + μ·D) h = −Jᵀr` via Cholesky, then adapts the damping
/// parameter μ from the gain ratio
/// `ρ = (F(x) − F(x+h)) / (L(0) − L(h))` (Nielsen eq. 2.2). On a
/// successful step (ρ > 0) μ is reduced via the smooth cubic
/// `μ ← μ · max(1/3, 1 − (2ρ−1)³)`; on a failed step (ρ ≤ 0) μ grows
/// geometrically `μ ← μ·ν, ν ← 2ν` with ν initialized to 2 — Nielsen
/// shows this avoids the discontinuities of the classical
/// multiply/divide threshold rule and lands roughly 25 % fewer
/// iterations on average. See Nielsen, *Damping Parameter in
/// Marquardt's Method* (IMM-REP-1999-05) for the derivation and
/// Madsen, Nielsen, Tingleff (2004), *Methods for Non-Linear Least
/// Squares Problems*, §3.2.
///
/// **Marquardt scaling (`μ·D`, not `μI`).** The damping matrix is the
/// diagonal of the Gram, `D = diag(JᵀJ)` — the per-parameter curvature
/// — rather than the identity. This makes the trust region ellipsoidal
/// in the metric of the columns of `J`, so the algorithm is invariant
/// to diagonal rescaling of the parameters (Marquardt 1963; Moré 1978,
/// *The Levenberg-Marquardt Algorithm: Implementation and Theory*).
/// Isotropic `μI` damping over-damps well-scaled directions and
/// under-damps poorly-scaled ones when the columns of `J` have very
/// different norms (e.g. parameters in a mixed log/linear/angle
/// encoding), which biases the step and can pull the iterate into a
/// worse basin. `D` is maintained as a **monotone running max**
/// `D_k = max(D_{k−1}, diag(J(x_k)ᵀJ(x_k)))` so a column whose
/// curvature momentarily drops keeps the damping floor it earned
/// earlier (Moré 1978; the same safeguard MINPACK applies to its
/// column-norm scaling). Columns that are exactly zero at `x₀` (a
/// parameter with no first-order effect on any residual) would make
/// `μ·D` vanish there and the Gram singular; following MINPACK, their
/// scale is floored to `1` at `init` (see [`FloorZerosInPlace`]), so a
/// fully-insensitive parameter stays put rather than failing Cholesky.
///
/// Initial damping is `μ₀ = τ` — dimensionless, because the
/// per-parameter magnitude now lives in `D` (the initial per-column
/// damping is `τ·diag(J(x₀)ᵀJ(x₀))`). τ is the *relative* trust
/// parameter; use a smaller value (e.g. `1e-6`) when `x₀` is believed
/// close to the optimum, larger (e.g. `1.0`) when far. Default
/// `τ = 10⁻³` matches Nielsen's "moderate trust" recommendation.
///
/// **Cholesky-on-(JᵀJ + μ·D) vs QR-on-stacked-system.** The damping
/// makes the SPD path strictly better-conditioned than pure
/// Gauss-Newton's `JᵀJ` — `μ·D` regularizes the rank deficiency that
/// makes GN fail. We stay on the SPD path because that's the only one
/// the [`linalg`](crate::core::math) tier exposes today, and the
/// regularization is sufficient for unconstrained LM.
/// QR-on-stacked-system (`[J; √(μD)]`) is more robust to ill-conditioned
/// `J` near rank deficiency but adds a second factorization route to
/// the linalg surface; deferred until S6 (TRF), where rank-deficient
/// Jacobians and box constraints make QR materially better.
///
/// # Failure modes
///
/// - **Cholesky failure under bumped μ.** When the initial damping is
/// too small to make `JᵀJ + μ·D` SPD (effectively never, for any
/// sensible `JᵀJ` and finite μ), the inner damping loop bumps μ via
/// `μ := μ·ν, ν := 2ν` and retries. Default
/// [`max_inner_attempts`](Self::max_inner_attempts) is 50 — far more
/// than enough; in practice the first attempt succeeds. If the cap
/// is exhausted (μ overflowing to `inf`), the solver returns
/// [`TerminationReason::SolverFailed`]. Note that bumping μ cannot
/// rescue a coordinate whose `D` entry is zero (`μ·0 = 0`); the
/// `init` zero-column floor exists precisely to keep `D > 0`.
/// - **Divergence on highly nonlinear / poorly initialized problems.**
/// The damping itself prevents divergent steps (failed steps are
/// rejected via the gain-ratio test), so divergence manifests as
/// μ growing without bound. Catch this with
/// [`MaxIter`](crate::core::termination::MaxIter) on the executor.
///
/// # Termination
///
/// Beyond the framework criteria
/// ([`MaxIter`](crate::core::termination::MaxIter),
/// [`CostTolerance`](crate::core::termination::CostTolerance),
/// [`ParamTolerance`](crate::core::termination::ParamTolerance), …),
/// the solver emits [`TerminationReason::SolverConverged`] when any of
/// four MINPACK-style tests is satisfied — the same independent
/// `info`-code structure MINPACK uses, so converging on whichever fires
/// first:
///
/// - **`tol_grad`** — absolute first-order optimality (Madsen et al.
/// eq. 3.3a): `‖Jᵀr‖_∞ ≤ tol_grad`. Default `1e-8`; `0.0` disables.
/// - **`tol_grad_rel`** — relative first-order optimality, MINPACK
/// `gtol` (Moré 1978): the cosine of the angle between the residual
/// `r` and every column of `J`,
/// `max_j |gⱼ| / (‖J·,ⱼ‖ · ‖r‖) ≤ tol_grad_rel`. This measure is
/// dimensionless — invariant to scaling the residuals — so a single
/// tolerance is portable across problems whose residuals carry
/// different normalizations (where the absolute `‖Jᵀr‖_∞` is too
/// tight for some and too loose for others). Default `0.0`
/// (disabled); set e.g. `1e-8` for parity. The per-column norms
/// `‖J·,ⱼ‖ = √diag(JᵀJ)ⱼ` reuse the Marquardt scaling diagonal the
/// solver already forms.
/// - **`ftol`** — relative cost reduction, MINPACK `ftol` (Moré 1978):
/// `|actred| ≤ ftol·F ∧ prered ≤ ftol·F ∧ ρ ≤ 2`, with the actual
/// and *predicted* per-iteration reductions in `F = ½‖r‖²`. The
/// `prered` clause is what the framework's
/// [`RelativeCostTolerance`](crate::core::termination::RelativeCostTolerance)
/// cannot express — it gates on the LM model, so the solver iterates
/// through temporary settling points (small actual gain, large
/// predicted gain) instead of stopping short. Default `0.0`
/// (disabled). See [`ftol`](Self::ftol).
/// - **`xtol`** — relative step, MINPACK `xtol` (Moré 1978):
/// `‖h‖ ≤ xtol·‖x‖`. Default `0.0` (disabled). See [`xtol`](Self::xtol).
///
/// The two gradient tests run before the step is computed (a step at a
/// stationary point is wasted); `ftol`/`xtol` run after, since they need
/// the attempted step and its predicted/actual reduction.
///
/// LM deliberately leaves `state.gradient = None` — the framework's
/// [`GradientTolerance`](crate::core::termination::GradientTolerance)
/// uses the L2 squared norm and is the wrong metric for NLLS, where
/// the canonical first-order test is the ∞-norm of `Jᵀr`. Same choice
/// as [`GaussNewton`](super::GaussNewton).
///
/// # Backends
///
/// LA-heavy: nalgebra (`DVector<f64>` / `DMatrix<f64>`) and faer
/// (`Col<f64>` / `Mat<f64>`) at the dense tier; nalgebra-sparse
/// (`DVector<f64>` / `CscMatrix<f64>`) and faer-sparse (`Col<f64>` /
/// `SparseColMat<usize, f64>`) at the sparse tier. `Vec<f64>` and
/// `ndarray::Array1<f64>` produce a compile-time error per tenet 5.
/// The sparse damping path requires the diagonal of `JᵀJ` to be in the
/// CSC pattern (always true when `J` has no zero columns); see
/// [`AddDiagonalVectorInPlace`] and [`MatDiagonal`].
///
/// # State convention
///
/// `state.cost` carries the LM convention `½‖r‖²`, derived from the
/// residual the solver evaluates itself. The bound on `P` is
/// [`Residual`] + [`Jacobian`], not
/// [`CostFunction`](crate::core::problem::CostFunction); problems
/// whose user-facing `cost()` uses an unscaled `Σ rᵢ²` form will see
/// `state.cost()` differ from `problem.cost(state.param())` by a
/// factor of two. Both go to zero at the optimum, so cost-based
/// termination criteria are unaffected.
///
/// # Examples
///
/// Least-squares fit of an affine residual `r(x) = (x₀ − 1, x₁ − 2)` whose
/// minimum is `(1, 2)`. Levenberg–Marquardt binds on [`Residual`] +
/// [`Jacobian`] (not [`CostFunction`](crate::core::problem::CostFunction))
/// and runs on the matrix-capable backends:
///
/// ```
/// # #[cfg(feature = "nalgebra")] {
/// use basin::{BasicState, Executor, Jacobian, LevenbergMarquardt, Residual};
/// use nalgebra::{DMatrix, DVector};
///
/// struct Affine;
/// impl Residual for Affine {
/// type Param = DVector<f64>;
/// type Output = DVector<f64>;
/// type Error = std::convert::Infallible;
/// fn residual(&self, x: &DVector<f64>) -> Result<DVector<f64>, Self::Error> {
/// Ok(DVector::from_vec(vec![x[0] - 1.0, x[1] - 2.0]))
/// }
/// }
/// impl Jacobian for Affine {
/// type Jacobian = DMatrix<f64>;
/// fn jacobian(&self, _x: &DVector<f64>) -> Result<DMatrix<f64>, Self::Error> {
/// Ok(DMatrix::identity(2, 2))
/// }
/// }
///
/// let result = Executor::new(
/// Affine,
/// LevenbergMarquardt::new(),
/// BasicState::new(DVector::from_vec(vec![0.0, 0.0])),
/// )
/// .max_iter(50)
/// .run()
/// .unwrap();
/// assert!((result.param()[0] - 1.0).abs() < 1e-6);
/// assert!((result.param()[1] - 2.0).abs() < 1e-6);
/// # }
/// ```