apex_solver/optimizer/levenberg_marquardt.rs
1//! Levenberg-Marquardt algorithm implementation.
2//!
3//! The Levenberg-Marquardt (LM) method is a robust and widely-used algorithm for solving
4//! nonlinear least squares problems of the form:
5//!
6//! ```text
7//! min f(x) = ½||r(x)||² = ½Σᵢ rᵢ(x)²
8//! ```
9//!
10//! where `r: ℝⁿ → ℝᵐ` is the residual vector function.
11//!
12//! # Algorithm Overview
13//!
14//! The Levenberg-Marquardt method solves the damped normal equations at each iteration:
15//!
16//! ```text
17//! (J^T·J + λI)·h = -J^T·r
18//! ```
19//!
20//! where:
21//! - `J` is the Jacobian matrix (m × n)
22//! - `r` is the residual vector (m × 1)
23//! - `h` is the step vector (n × 1)
24//! - `λ` is the adaptive damping parameter (scalar)
25//! - `I` is the identity matrix (or diagonal scaling matrix)
26//!
27//! ## Damping Parameter Strategy
28//!
29//! The damping parameter λ adapts based on step quality:
30//!
31//! - **λ → 0** (small damping): Behaves like Gauss-Newton with fast quadratic convergence
32//! - **λ → ∞** (large damping): Behaves like gradient descent with guaranteed descent direction
33//!
34//! This interpolation between Newton and gradient descent provides excellent robustness
35//! while maintaining fast convergence near the solution.
36//!
37//! ## Step Acceptance and Damping Update
38//!
39//! The algorithm evaluates each proposed step using the gain ratio:
40//!
41//! ```text
42//! ρ = (actual reduction) / (predicted reduction)
43//! = [f(xₖ) - f(xₖ + h)] / [f(xₖ) - L(h)]
44//! ```
45//!
46//! where `L(h) = f(xₖ) + h^T·g + ½h^T·H·h` is the local quadratic model.
47//!
48//! **Step acceptance:**
49//! - If `ρ > 0`: Accept step (cost decreased), decrease λ to trust the model more
50//! - If `ρ ≤ 0`: Reject step (cost increased), increase λ to be more conservative
51//!
52//! **Damping update** (Nielsen's formula):
53//! ```text
54//! λₖ₊₁ = λₖ · max(1/3, 1 - (2ρ - 1)³)
55//! ```
56//!
57//! This provides smooth, data-driven adaptation of the damping parameter.
58//!
59//! ## Convergence Properties
60//!
61//! - **Global convergence**: Guaranteed to find a stationary point from any starting guess
62//! - **Local quadratic convergence**: Near the solution, behaves like Gauss-Newton
63//! - **Robust to poor initialization**: Adaptive damping prevents divergence
64//! - **Handles ill-conditioning**: Large λ stabilizes nearly singular Hessian
65//!
66//! ## When to Use
67//!
68//! Levenberg-Marquardt is the best general-purpose choice when:
69//! - Initial parameter guess may be far from the optimum
70//! - Problem conditioning is unknown
71//! - Robustness is prioritized over raw speed
72//! - You want reliable convergence across diverse problem types
73//!
74//! For problems with specific structure, consider:
75//! - [`GaussNewton`](crate::GaussNewton) if well-conditioned with good initialization
76//! - [`DogLeg`](crate::DogLeg) for explicit trust region control
77//!
78//! # Implementation Features
79//!
80//! - **Sparse matrix support**: Efficient handling of large-scale problems via `faer` sparse library
81//! - **Adaptive damping**: Nielsen's formula for smooth parameter adaptation
82//! - **Robust linear solvers**: Cholesky (fast) or QR (stable) factorization
83//! - **Jacobi scaling**: Optional diagonal preconditioning for mixed-scale problems
84//! - **Covariance computation**: Optional uncertainty quantification after convergence
85//! - **Manifold operations**: Native support for optimization on Lie groups (SE2, SE3, SO2, SO3)
86//! - **Comprehensive diagnostics**: Detailed summaries of convergence and performance
87//!
88//! # Mathematical Background
89//!
90//! The augmented Hessian `J^T·J + λI` combines two beneficial properties:
91//!
92//! 1. **Positive definiteness**: Always solvable even when `J^T·J` is singular
93//! 2. **Regularization**: Prevents taking steps in poorly-determined directions
94//!
95//! The trust region interpretation: λ controls an implicit spherical trust region where
96//! larger λ restricts step size, ensuring the linear model remains valid.
97//!
98//! # Examples
99//!
100//! ## Basic usage
101//!
102//! ```no_run
103//! use apex_solver::LevenbergMarquardt;
104//! use apex_solver::core::problem::Problem;
105//! use std::collections::HashMap;
106//!
107//! # fn main() -> Result<(), Box<dyn std::error::Error>> {
108//! let mut problem = Problem::new();
109//! // ... add residual blocks (factors) to problem ...
110//!
111//! let initial_values = HashMap::new();
112//! // ... initialize parameters ...
113//!
114//! let mut solver = LevenbergMarquardt::new();
115//! let result = solver.optimize(&problem, &initial_values)?;
116//!
117//! # Ok(())
118//! # }
119//! ```
120//!
121//! ## Advanced configuration
122//!
123//! ```no_run
124//! use apex_solver::optimizer::levenberg_marquardt::{LevenbergMarquardtConfig, LevenbergMarquardt};
125//! use apex_solver::linalg::LinearSolverType;
126//!
127//! # fn main() {
128//! let config = LevenbergMarquardtConfig::new()
129//! .with_max_iterations(100)
130//! .with_cost_tolerance(1e-6)
131//! .with_damping(1e-3) // Initial damping
132//! .with_damping_bounds(1e-12, 1e12) // Min/max damping
133//! .with_jacobi_scaling(true); // Improve conditioning
134//!
135//! let mut solver = LevenbergMarquardt::with_config(config);
136//! # }
137//! ```
138//!
139//! # References
140//!
141//! - Levenberg, K. (1944). "A Method for the Solution of Certain Non-Linear Problems in Least Squares". *Quarterly of Applied Mathematics*.
142//! - Marquardt, D. W. (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters". *Journal of the Society for Industrial and Applied Mathematics*.
143//! - Madsen, K., Nielsen, H. B., & Tingleff, O. (2004). *Methods for Non-Linear Least Squares Problems* (2nd ed.). Chapter 3.
144//! - Nocedal, J. & Wright, S. (2006). *Numerical Optimization* (2nd ed.). Springer. Chapter 10.
145//! - Nielsen, H. B. (1999). "Damping Parameter in Marquardt's Method". Technical Report IMM-REP-1999-05.
146
147use crate::core::problem::{Problem, VariableEnum};
148use crate::error;
149use crate::linalg::{
150 LinearSolverType, SchurPreconditioner, SchurSolverAdapter, SchurVariant, SparseCholeskySolver,
151 SparseLinearSolver, SparseQRSolver,
152};
153use crate::optimizer::{
154 ConvergenceParams, InitializedState, IterationStats, OptObserverVec, OptimizerError, Solver,
155 SolverResult, apply_negative_parameter_step, apply_parameter_step, compute_cost,
156};
157use apex_manifolds::ManifoldType;
158
159use faer::{Mat, sparse::SparseColMat};
160use nalgebra::DVector;
161use std::collections::HashMap;
162use std::time::{Duration, Instant};
163use tracing::debug;
164
165/// Configuration parameters for the Levenberg-Marquardt optimizer.
166///
167/// Controls the adaptive damping strategy, convergence criteria, and numerical stability
168/// enhancements for the Levenberg-Marquardt algorithm.
169///
170/// # Builder Pattern
171///
172/// All configuration options can be set using the builder pattern:
173///
174/// ```
175/// use apex_solver::optimizer::levenberg_marquardt::LevenbergMarquardtConfig;
176///
177/// let config = LevenbergMarquardtConfig::new()
178/// .with_max_iterations(100)
179/// .with_damping(1e-3)
180/// .with_damping_bounds(1e-12, 1e12)
181/// .with_jacobi_scaling(true);
182/// ```
183///
184/// # Damping Parameter Behavior
185///
186/// The damping parameter λ controls the trade-off between Gauss-Newton and gradient descent:
187///
188/// - **Initial damping** (`damping`): Starting value (default: 1e-4)
189/// - **Damping bounds** (`damping_min`, `damping_max`): Valid range (default: 1e-12 to 1e12)
190/// - **Adaptation**: Automatically adjusted based on step quality using Nielsen's formula
191///
192/// # Convergence Criteria
193///
194/// The optimizer terminates when ANY of the following conditions is met:
195///
196/// - **Cost tolerance**: `|cost_k - cost_{k-1}| < cost_tolerance`
197/// - **Parameter tolerance**: `||step|| < parameter_tolerance`
198/// - **Gradient tolerance**: `||J^T·r|| < gradient_tolerance`
199/// - **Maximum iterations**: `iteration >= max_iterations`
200/// - **Timeout**: `elapsed_time >= timeout`
201///
202/// # See Also
203///
204/// - [`LevenbergMarquardt`] - The solver that uses this configuration
205/// - [`GaussNewtonConfig`](crate::GaussNewtonConfig) - Undamped variant
206/// - [`DogLegConfig`](crate::DogLegConfig) - Trust region alternative
207#[derive(Clone)]
208pub struct LevenbergMarquardtConfig {
209 /// Type of linear solver for the linear systems
210 pub linear_solver_type: LinearSolverType,
211 /// Maximum number of iterations
212 pub max_iterations: usize,
213 /// Convergence tolerance for cost function
214 pub cost_tolerance: f64,
215 /// Convergence tolerance for parameter updates
216 pub parameter_tolerance: f64,
217 /// Convergence tolerance for gradient norm
218 pub gradient_tolerance: f64,
219 /// Timeout duration
220 pub timeout: Option<Duration>,
221 /// Initial damping parameter
222 pub damping: f64,
223 /// Minimum damping parameter
224 pub damping_min: f64,
225 /// Maximum damping parameter
226 pub damping_max: f64,
227 /// Damping increase factor (when step rejected)
228 pub damping_increase_factor: f64,
229 /// Damping decrease factor (when step accepted)
230 pub damping_decrease_factor: f64,
231 /// Damping nu parameter
232 pub damping_nu: f64,
233 /// Trust region radius
234 pub trust_region_radius: f64,
235 /// Minimum step quality for acceptance
236 pub min_step_quality: f64,
237 /// Good step quality threshold
238 pub good_step_quality: f64,
239 /// Minimum diagonal value for regularization
240 pub min_diagonal: f64,
241 /// Maximum diagonal value for regularization
242 pub max_diagonal: f64,
243 /// Minimum objective function cutoff (optional early termination)
244 ///
245 /// If set, optimization terminates when cost falls below this threshold.
246 /// Useful for early stopping when a "good enough" solution is acceptable.
247 ///
248 /// Default: None (disabled)
249 pub min_cost_threshold: Option<f64>,
250 /// Minimum trust region radius before termination
251 ///
252 /// When the trust region radius falls below this value, the optimizer
253 /// terminates as it indicates the search has converged or the problem
254 /// is ill-conditioned. Matches Ceres Solver's min_trust_region_radius.
255 ///
256 /// Default: 1e-32 (Ceres-compatible)
257 pub min_trust_region_radius: f64,
258 /// Maximum condition number for Jacobian matrix (optional check)
259 ///
260 /// If set, the optimizer checks if condition_number(J^T*J) exceeds this
261 /// threshold and terminates with IllConditionedJacobian status.
262 /// Note: Computing condition number is expensive, so this is disabled by default.
263 ///
264 /// Default: None (disabled)
265 pub max_condition_number: Option<f64>,
266 /// Minimum relative cost decrease for step acceptance
267 ///
268 /// Used in computing step quality (rho = actual_reduction / predicted_reduction).
269 /// Steps with rho < min_relative_decrease are rejected. Matches Ceres Solver's
270 /// min_relative_decrease parameter.
271 ///
272 /// Default: 1e-3 (Ceres-compatible)
273 pub min_relative_decrease: f64,
274 /// Use Jacobi column scaling (preconditioning)
275 ///
276 /// When enabled, normalizes Jacobian columns by their L2 norm before solving.
277 /// This can improve convergence for problems with mixed parameter scales
278 /// (e.g., positions in meters + angles in radians) but adds ~5-10% overhead.
279 ///
280 /// Default: false (to avoid performance overhead and faster convergence)
281 pub use_jacobi_scaling: bool,
282 /// Compute per-variable covariance matrices (uncertainty estimation)
283 ///
284 /// When enabled, computes covariance by inverting the Hessian matrix after
285 /// convergence. The full covariance matrix is extracted into per-variable
286 /// blocks stored in both Variable structs and SolverResult.
287 ///
288 /// Default: false (to avoid performance overhead)
289 pub compute_covariances: bool,
290 /// Schur complement solver variant (for bundle adjustment problems)
291 ///
292 /// When using LinearSolverType::SparseSchurComplement, this determines which
293 /// variant of the Schur complement method to use:
294 /// - Sparse: Direct sparse Cholesky factorization (most accurate, moderate speed)
295 /// - Iterative: Preconditioned Conjugate Gradients (memory efficient, good for large problems)
296 /// - PowerSeries: Power series approximation (fastest, less accurate)
297 ///
298 /// Default: Sparse
299 pub schur_variant: SchurVariant,
300 /// Schur complement preconditioner type
301 ///
302 /// Determines the preconditioning strategy for iterative Schur methods:
303 /// - Diagonal: Simple diagonal preconditioner (fast, less effective)
304 /// - BlockDiagonal: Block-diagonal preconditioner (balanced)
305 /// - IncompleteCholesky: Incomplete Cholesky factorization (slower, more effective)
306 ///
307 /// Default: Diagonal
308 pub schur_preconditioner: SchurPreconditioner,
309 // Note: Visualization is now handled via the observer pattern.
310 // Use `solver.add_observer(RerunObserver::new(true)?)` to enable visualization.
311 // This provides cleaner separation of concerns and allows multiple observers.
312}
313
314impl Default for LevenbergMarquardtConfig {
315 fn default() -> Self {
316 Self {
317 linear_solver_type: LinearSolverType::default(),
318 // Ceres Solver default: 50 (changed from 100 for compatibility)
319 max_iterations: 50,
320 // Ceres Solver default: 1e-6 (changed from 1e-8 for compatibility)
321 cost_tolerance: 1e-6,
322 // Ceres Solver default: 1e-8 (unchanged)
323 parameter_tolerance: 1e-8,
324 // Ceres Solver default: 1e-10 (changed from 1e-8 for compatibility)
325 // Note: Typically should be 1e-4 * cost_tolerance per Ceres docs
326 gradient_tolerance: 1e-10,
327 timeout: None,
328 damping: 1e-3, // Increased from 1e-4 for better initial convergence on BA
329 damping_min: 1e-12,
330 damping_max: 1e12,
331 damping_increase_factor: 10.0,
332 damping_decrease_factor: 0.3,
333 damping_nu: 2.0,
334 trust_region_radius: 1e4,
335 min_step_quality: 0.0,
336 good_step_quality: 0.75,
337 min_diagonal: 1e-6,
338 max_diagonal: 1e32,
339 // New Ceres-compatible parameters
340 min_cost_threshold: None,
341 min_trust_region_radius: 1e-32,
342 max_condition_number: None,
343 min_relative_decrease: 1e-3,
344 // Existing parameters
345 // Jacobi scaling disabled by default for Schur solvers (incompatible with block structure)
346 // Enable manually for Cholesky/QR solvers on mixed-scale problems
347 use_jacobi_scaling: false,
348 compute_covariances: false,
349 // Schur complement parameters
350 schur_variant: SchurVariant::default(),
351 schur_preconditioner: SchurPreconditioner::default(),
352 }
353 }
354}
355
356impl LevenbergMarquardtConfig {
357 /// Create a new Levenberg-Marquardt configuration with default values.
358 pub fn new() -> Self {
359 Self::default()
360 }
361
362 /// Set the linear solver type
363 pub fn with_linear_solver_type(mut self, linear_solver_type: LinearSolverType) -> Self {
364 self.linear_solver_type = linear_solver_type;
365 self
366 }
367
368 /// Set the maximum number of iterations
369 pub fn with_max_iterations(mut self, max_iterations: usize) -> Self {
370 self.max_iterations = max_iterations;
371 self
372 }
373
374 /// Set the cost tolerance
375 pub fn with_cost_tolerance(mut self, cost_tolerance: f64) -> Self {
376 self.cost_tolerance = cost_tolerance;
377 self
378 }
379
380 /// Set the parameter tolerance
381 pub fn with_parameter_tolerance(mut self, parameter_tolerance: f64) -> Self {
382 self.parameter_tolerance = parameter_tolerance;
383 self
384 }
385
386 /// Set the gradient tolerance
387 pub fn with_gradient_tolerance(mut self, gradient_tolerance: f64) -> Self {
388 self.gradient_tolerance = gradient_tolerance;
389 self
390 }
391
392 /// Set the timeout duration
393 pub fn with_timeout(mut self, timeout: Duration) -> Self {
394 self.timeout = Some(timeout);
395 self
396 }
397
398 /// Set the initial damping parameter.
399 pub fn with_damping(mut self, damping: f64) -> Self {
400 self.damping = damping;
401 self
402 }
403
404 /// Set the damping parameter bounds.
405 pub fn with_damping_bounds(mut self, min: f64, max: f64) -> Self {
406 self.damping_min = min;
407 self.damping_max = max;
408 self
409 }
410
411 /// Set the damping adjustment factors.
412 pub fn with_damping_factors(mut self, increase: f64, decrease: f64) -> Self {
413 self.damping_increase_factor = increase;
414 self.damping_decrease_factor = decrease;
415 self
416 }
417
418 /// Set the trust region parameters.
419 pub fn with_trust_region(mut self, radius: f64, min_quality: f64, good_quality: f64) -> Self {
420 self.trust_region_radius = radius;
421 self.min_step_quality = min_quality;
422 self.good_step_quality = good_quality;
423 self
424 }
425
426 /// Set minimum objective function cutoff for early termination.
427 ///
428 /// When set, optimization terminates with MinCostThresholdReached status
429 /// if the cost falls below this threshold. Useful for early stopping when
430 /// a "good enough" solution is acceptable.
431 pub fn with_min_cost_threshold(mut self, min_cost: f64) -> Self {
432 self.min_cost_threshold = Some(min_cost);
433 self
434 }
435
436 /// Set minimum trust region radius before termination.
437 ///
438 /// When the trust region radius falls below this value, optimization
439 /// terminates with TrustRegionRadiusTooSmall status.
440 /// Default: 1e-32 (Ceres-compatible)
441 pub fn with_min_trust_region_radius(mut self, min_radius: f64) -> Self {
442 self.min_trust_region_radius = min_radius;
443 self
444 }
445
446 /// Set maximum condition number for Jacobian matrix.
447 ///
448 /// If set, the optimizer checks if condition_number(J^T*J) exceeds this
449 /// threshold and terminates with IllConditionedJacobian status.
450 /// Note: Computing condition number is expensive, disabled by default.
451 pub fn with_max_condition_number(mut self, max_cond: f64) -> Self {
452 self.max_condition_number = Some(max_cond);
453 self
454 }
455
456 /// Set minimum relative cost decrease for step acceptance.
457 ///
458 /// Steps with rho = (actual_reduction / predicted_reduction) below this
459 /// threshold are rejected. Default: 1e-3 (Ceres-compatible)
460 pub fn with_min_relative_decrease(mut self, min_decrease: f64) -> Self {
461 self.min_relative_decrease = min_decrease;
462 self
463 }
464
465 /// Enable or disable Jacobi column scaling (preconditioning).
466 ///
467 /// When enabled, normalizes Jacobian columns by their L2 norm before solving.
468 /// Can improve convergence for mixed-scale problems but adds ~5-10% overhead.
469 pub fn with_jacobi_scaling(mut self, use_jacobi_scaling: bool) -> Self {
470 self.use_jacobi_scaling = use_jacobi_scaling;
471 self
472 }
473
474 /// Enable or disable covariance computation (uncertainty estimation).
475 ///
476 /// When enabled, computes the full covariance matrix by inverting the Hessian
477 /// after convergence, then extracts per-variable covariance blocks.
478 pub fn with_compute_covariances(mut self, compute_covariances: bool) -> Self {
479 self.compute_covariances = compute_covariances;
480 self
481 }
482
483 /// Set Schur complement solver variant
484 pub fn with_schur_variant(mut self, variant: SchurVariant) -> Self {
485 self.schur_variant = variant;
486 self
487 }
488
489 /// Set Schur complement preconditioner
490 pub fn with_schur_preconditioner(mut self, preconditioner: SchurPreconditioner) -> Self {
491 self.schur_preconditioner = preconditioner;
492 self
493 }
494
495 /// Configuration optimized for bundle adjustment problems.
496 ///
497 /// This preset uses settings tuned for large-scale bundle adjustment:
498 /// - **Schur complement solver** with iterative PCG (memory efficient)
499 /// - **Schur-Jacobi preconditioner** (Ceres-style, best PCG convergence)
500 /// - **Moderate initial damping** (1e-3) - not too aggressive
501 /// - **200 max iterations** (BA often needs more iterations for full convergence)
502 /// - **Very tight tolerances** matching Ceres Solver for accurate reconstruction
503 ///
504 /// This configuration matches Ceres Solver's recommended BA settings and
505 /// should achieve similar convergence quality.
506 ///
507 /// # Example
508 ///
509 /// ```
510 /// use apex_solver::optimizer::levenberg_marquardt::LevenbergMarquardtConfig;
511 ///
512 /// let config = LevenbergMarquardtConfig::for_bundle_adjustment();
513 /// ```
514 pub fn for_bundle_adjustment() -> Self {
515 Self::default()
516 .with_linear_solver_type(LinearSolverType::SparseSchurComplement)
517 .with_schur_variant(SchurVariant::Iterative)
518 .with_schur_preconditioner(SchurPreconditioner::SchurJacobi)
519 .with_damping(1e-3) // Moderate initial damping (Ceres default)
520 .with_max_iterations(20) // Reduced for early stop when RMSE < 1px
521 // Match Ceres tolerances for faster convergence
522 .with_cost_tolerance(1e-6) // Ceres function_tolerance (was 1e-12)
523 .with_parameter_tolerance(1e-8) // Ceres parameter_tolerance (was 1e-14)
524 .with_gradient_tolerance(1e-10) // Relaxed (was 1e-16)
525 }
526
527 /// Enable real-time visualization (graphical debugging).
528 ///
529 /// When enabled, optimization progress is logged to a Rerun viewer with:
530 /// - Time series plots of cost, gradient norm, damping, step quality
531 /// - Sparse Hessian matrix visualization as heat map
532 /// - Gradient vector visualization
533 /// - Real-time manifold state updates (for SE2/SE3 problems)
534 ///
535 /// **Note:** Requires the `visualization` feature to be enabled in `Cargo.toml`.
536 /// Use `verbose` for terminal logging.
537 ///
538 /// # Arguments
539 ///
540 /// * `enable` - Whether to enable visualization
541 // Note: with_visualization() method has been removed.
542 // Use the observer pattern instead:
543 // let mut solver = LevenbergMarquardt::with_config(config);
544 // solver.add_observer(RerunObserver::new(true)?);
545 // This provides cleaner separation and allows multiple observers.
546 /// Print configuration parameters (verbose mode only)
547 pub fn print_configuration(&self) {
548 debug!(
549 "Configuration:\n Solver: Levenberg-Marquardt\n Linear solver: {:?}\n Convergence Criteria:\n Max iterations: {}\n Cost tolerance: {:.2e}\n Parameter tolerance: {:.2e}\n Gradient tolerance: {:.2e}\n Timeout: {:?}\n Damping Parameters:\n Initial damping: {:.2e}\n Damping range: [{:.2e}, {:.2e}]\n Increase factor: {:.2}\n Decrease factor: {:.2}\n Trust Region:\n Initial radius: {:.2e}\n Min step quality: {:.2}\n Good step quality: {:.2}\n Numerical Settings:\n Jacobi scaling: {}\n Compute covariances: {}",
550 self.linear_solver_type,
551 self.max_iterations,
552 self.cost_tolerance,
553 self.parameter_tolerance,
554 self.gradient_tolerance,
555 self.timeout,
556 self.damping,
557 self.damping_min,
558 self.damping_max,
559 self.damping_increase_factor,
560 self.damping_decrease_factor,
561 self.trust_region_radius,
562 self.min_step_quality,
563 self.good_step_quality,
564 if self.use_jacobi_scaling {
565 "enabled"
566 } else {
567 "disabled"
568 },
569 if self.compute_covariances {
570 "enabled"
571 } else {
572 "disabled"
573 }
574 );
575 }
576}
577
578/// Result from step computation
579struct StepResult {
580 step: Mat<f64>,
581 gradient_norm: f64,
582 predicted_reduction: f64,
583}
584
585/// Result from step evaluation
586struct StepEvaluation {
587 accepted: bool,
588 cost_reduction: f64,
589 rho: f64,
590}
591
592/// Levenberg-Marquardt solver for nonlinear least squares optimization.
593///
594/// Implements the damped Gauss-Newton method with adaptive damping parameter λ that
595/// interpolates between Gauss-Newton and gradient descent based on step quality.
596///
597/// # Algorithm
598///
599/// At each iteration k:
600/// 1. Compute residual `r(xₖ)` and Jacobian `J(xₖ)`
601/// 2. Solve augmented system: `(J^T·J + λI)·h = -J^T·r`
602/// 3. Evaluate step quality: `ρ = (actual reduction) / (predicted reduction)`
603/// 4. If `ρ > 0`: Accept step and update `xₖ₊₁ = xₖ ⊕ h`, decrease λ
604/// 5. If `ρ ≤ 0`: Reject step (keep `xₖ₊₁ = xₖ`), increase λ
605/// 6. Check convergence criteria
606///
607/// The damping parameter λ is updated using Nielsen's smooth formula:
608/// `λₖ₊₁ = λₖ · max(1/3, 1 - (2ρ - 1)³)` for accepted steps,
609/// or `λₖ₊₁ = λₖ · ν` (with increasing ν) for rejected steps.
610///
611/// # Examples
612///
613/// ```no_run
614/// use apex_solver::optimizer::levenberg_marquardt::{LevenbergMarquardtConfig, LevenbergMarquardt};
615/// use apex_solver::linalg::LinearSolverType;
616///
617/// # fn main() {
618/// let config = LevenbergMarquardtConfig::new()
619/// .with_max_iterations(100)
620/// .with_damping(1e-3)
621/// .with_damping_bounds(1e-12, 1e12)
622/// .with_jacobi_scaling(true);
623///
624/// let mut solver = LevenbergMarquardt::with_config(config);
625/// # }
626/// ```
627///
628/// # See Also
629///
630/// - [`LevenbergMarquardtConfig`] - Configuration options
631/// - [`GaussNewton`](crate::GaussNewton) - Undamped variant (faster but less robust)
632/// - [`DogLeg`](crate::DogLeg) - Alternative trust region method
633pub struct LevenbergMarquardt {
634 config: LevenbergMarquardtConfig,
635 jacobi_scaling: Option<SparseColMat<usize, f64>>,
636 observers: OptObserverVec,
637}
638
639impl Default for LevenbergMarquardt {
640 fn default() -> Self {
641 Self::new()
642 }
643}
644
645impl LevenbergMarquardt {
646 /// Create a new Levenberg-Marquardt solver with default configuration.
647 pub fn new() -> Self {
648 Self::with_config(LevenbergMarquardtConfig::default())
649 }
650
651 /// Create a new Levenberg-Marquardt solver with the given configuration.
652 pub fn with_config(config: LevenbergMarquardtConfig) -> Self {
653 Self {
654 config,
655 jacobi_scaling: None,
656 observers: OptObserverVec::new(),
657 }
658 }
659
660 /// Add an observer to monitor optimization progress.
661 ///
662 /// Observers are notified at each iteration with the current variable values.
663 /// This enables real-time visualization, logging, metrics collection, etc.
664 ///
665 /// # Arguments
666 ///
667 /// * `observer` - Any type implementing `OptObserver`
668 ///
669 /// # Examples
670 ///
671 /// ```no_run
672 /// use apex_solver::{LevenbergMarquardt, LevenbergMarquardtConfig};
673 /// # use apex_solver::core::problem::Problem;
674 /// # use std::collections::HashMap;
675 ///
676 /// # fn main() -> Result<(), Box<dyn std::error::Error>> {
677 /// let mut solver = LevenbergMarquardt::new();
678 ///
679 /// #[cfg(feature = "visualization")]
680 /// {
681 /// use apex_solver::observers::RerunObserver;
682 /// let rerun_observer = RerunObserver::new(true)?;
683 /// solver.add_observer(rerun_observer);
684 /// }
685 ///
686 /// // ... optimize ...
687 /// # Ok(())
688 /// # }
689 /// ```
690 pub fn add_observer(&mut self, observer: impl crate::optimizer::OptObserver + 'static) {
691 self.observers.add(observer);
692 }
693
694 /// Update damping parameter based on step quality using trust region approach
695 /// Reference: Introduction to Optimization and Data Fitting
696 /// Algorithm 6.18
697 fn update_damping(&mut self, rho: f64) -> bool {
698 if rho > 0.0 {
699 // Step accepted - decrease damping
700 let coff = 2.0 * rho - 1.0;
701 self.config.damping *= (1.0_f64 / 3.0).max(1.0 - coff * coff * coff);
702 self.config.damping = self.config.damping.max(self.config.damping_min);
703 self.config.damping_nu = 2.0;
704 true
705 } else {
706 // Step rejected - increase damping
707 self.config.damping *= self.config.damping_nu;
708 self.config.damping_nu *= 2.0;
709 self.config.damping = self.config.damping.min(self.config.damping_max);
710 false
711 }
712 }
713
714 /// Compute predicted cost reduction from linear model
715 /// Standard LM formula: 0.5 * step^T * (damping * step - gradient)
716 fn compute_predicted_reduction(&self, step: &Mat<f64>, gradient: &Mat<f64>) -> f64 {
717 // Standard Levenberg-Marquardt predicted reduction formula
718 // predicted_reduction = -step^T * gradient - 0.5 * step^T * H * step
719 // = 0.5 * step^T * (damping * step - gradient)
720 let diff = self.config.damping * step - gradient;
721 (0.5 * step.transpose() * &diff)[(0, 0)]
722 }
723
724 /// Compute optimization step by solving the augmented system
725 fn compute_levenberg_marquardt_step(
726 &self,
727 residuals: &Mat<f64>,
728 scaled_jacobian: &SparseColMat<usize, f64>,
729 linear_solver: &mut Box<dyn SparseLinearSolver>,
730 ) -> Result<StepResult, OptimizerError> {
731 // Solve augmented equation: (J_scaled^T * J_scaled + λI) * dx_scaled = -J_scaled^T * r
732 let residuals_owned = residuals.as_ref().to_owned();
733 let scaled_step = linear_solver
734 .solve_augmented_equation(&residuals_owned, scaled_jacobian, self.config.damping)
735 .map_err(|e| OptimizerError::LinearSolveFailed(e.to_string()).log_with_source(e))?;
736
737 // Get cached gradient and Hessian from the solver
738 let gradient = linear_solver.get_gradient().ok_or_else(|| {
739 OptimizerError::NumericalInstability("Gradient not available".into()).log()
740 })?;
741 let _hessian = linear_solver.get_hessian().ok_or_else(|| {
742 OptimizerError::NumericalInstability("Hessian not available".into()).log()
743 })?;
744 let gradient_norm = gradient.norm_l2();
745
746 // Apply inverse Jacobi scaling to get final step (if enabled)
747 let step = if self.config.use_jacobi_scaling {
748 let scaling = self
749 .jacobi_scaling
750 .as_ref()
751 .ok_or_else(|| OptimizerError::JacobiScalingNotInitialized.log())?;
752 &scaled_step * scaling
753 } else {
754 scaled_step
755 };
756
757 // Compute predicted reduction using scaled values
758 let predicted_reduction = self.compute_predicted_reduction(&step, gradient);
759
760 Ok(StepResult {
761 step,
762 gradient_norm,
763 predicted_reduction,
764 })
765 }
766
767 /// Evaluate and apply step, handling acceptance/rejection based on step quality
768 fn evaluate_and_apply_step(
769 &mut self,
770 step_result: &StepResult,
771 state: &mut InitializedState,
772 problem: &Problem,
773 ) -> error::ApexSolverResult<StepEvaluation> {
774 // Apply parameter updates using manifold operations
775 let _step_norm = apply_parameter_step(
776 &mut state.variables,
777 step_result.step.as_ref(),
778 &state.sorted_vars,
779 );
780
781 // Compute new cost (residual only, no Jacobian needed for step evaluation)
782 let new_residual = problem.compute_residual_sparse(&state.variables)?;
783 let new_cost = compute_cost(&new_residual);
784
785 // Compute step quality
786 let rho = crate::optimizer::compute_step_quality(
787 state.current_cost,
788 new_cost,
789 step_result.predicted_reduction,
790 );
791
792 // Update damping and decide whether to accept step
793 let accepted = self.update_damping(rho);
794
795 let cost_reduction = if accepted {
796 // Accept the step - parameters already updated
797 let reduction = state.current_cost - new_cost;
798 state.current_cost = new_cost;
799 reduction
800 } else {
801 // Reject the step - revert parameter changes
802 apply_negative_parameter_step(
803 &mut state.variables,
804 step_result.step.as_ref(),
805 &state.sorted_vars,
806 );
807 0.0
808 };
809
810 Ok(StepEvaluation {
811 accepted,
812 cost_reduction,
813 rho,
814 })
815 }
816
817 pub fn optimize(
818 &mut self,
819 problem: &Problem,
820 initial_params: &HashMap<String, (ManifoldType, DVector<f64>)>,
821 ) -> Result<SolverResult<HashMap<String, VariableEnum>>, error::ApexSolverError> {
822 let start_time = Instant::now();
823 let mut iteration = 0;
824 let mut cost_evaluations = 1; // Initial cost evaluation
825 let mut jacobian_evaluations = 0;
826 let mut successful_steps = 0;
827 let mut unsuccessful_steps = 0;
828
829 // Initialize optimization state
830 let mut state = crate::optimizer::initialize_optimization_state(problem, initial_params)?;
831
832 // Create linear solver - must be after variable initialization for Schur solver
833 let mut linear_solver: Box<dyn SparseLinearSolver> = match self.config.linear_solver_type {
834 LinearSolverType::SparseCholesky => Box::new(SparseCholeskySolver::new()),
835 LinearSolverType::SparseQR => Box::new(SparseQRSolver::new()),
836 LinearSolverType::SparseSchurComplement => Box::new(
837 SchurSolverAdapter::new_with_structure_and_config(
838 &state.variables,
839 &state.variable_index_map,
840 self.config.schur_variant,
841 self.config.schur_preconditioner,
842 )
843 .map_err(|e| {
844 OptimizerError::LinearSolveFailed(format!(
845 "Failed to initialize Schur solver: {}",
846 e
847 ))
848 .log()
849 })?,
850 ),
851 };
852
853 // Initialize summary tracking variables
854 let mut max_gradient_norm: f64 = 0.0;
855 let mut max_parameter_update_norm: f64 = 0.0;
856 let mut total_cost_reduction = 0.0;
857 let mut final_gradient_norm;
858 let mut final_parameter_update_norm;
859
860 // Initialize iteration statistics tracking
861 let mut iteration_stats = Vec::with_capacity(self.config.max_iterations);
862 let mut previous_cost = state.current_cost;
863
864 // Print configuration and header if debug level is enabled
865 if tracing::enabled!(tracing::Level::DEBUG) {
866 self.config.print_configuration();
867 IterationStats::print_header();
868 }
869
870 // Main optimization loop
871 loop {
872 let iter_start = Instant::now();
873 // Evaluate residuals and Jacobian
874 let (residuals, jacobian) = problem.compute_residual_and_jacobian_sparse(
875 &state.variables,
876 &state.variable_index_map,
877 &state.symbolic_structure,
878 )?;
879 jacobian_evaluations += 1;
880
881 // Process Jacobian (apply scaling if enabled)
882 let scaled_jacobian = if self.config.use_jacobi_scaling {
883 crate::optimizer::process_jacobian(&jacobian, &mut self.jacobi_scaling, iteration)?
884 } else {
885 jacobian
886 };
887
888 // Compute optimization step
889 let step_result = self.compute_levenberg_marquardt_step(
890 &residuals,
891 &scaled_jacobian,
892 &mut linear_solver,
893 )?;
894
895 // Update tracking variables
896 max_gradient_norm = max_gradient_norm.max(step_result.gradient_norm);
897 final_gradient_norm = step_result.gradient_norm;
898 let step_norm = step_result.step.norm_l2();
899 max_parameter_update_norm = max_parameter_update_norm.max(step_norm);
900 final_parameter_update_norm = step_norm;
901
902 // Evaluate and apply step (handles accept/reject)
903 let step_eval = self.evaluate_and_apply_step(&step_result, &mut state, problem)?;
904 cost_evaluations += 1;
905
906 // Update counters based on acceptance
907 if step_eval.accepted {
908 successful_steps += 1;
909 total_cost_reduction += step_eval.cost_reduction;
910 } else {
911 unsuccessful_steps += 1;
912 }
913
914 // OPTIMIZATION: Only collect iteration statistics if debug level is enabled
915 // This eliminates ~2-5ms overhead per iteration for non-debug optimization
916 if tracing::enabled!(tracing::Level::DEBUG) {
917 let iter_elapsed_ms = iter_start.elapsed().as_secs_f64() * 1000.0;
918 let total_elapsed_ms = start_time.elapsed().as_secs_f64() * 1000.0;
919
920 let stats = IterationStats {
921 iteration,
922 cost: state.current_cost,
923 cost_change: previous_cost - state.current_cost,
924 gradient_norm: step_result.gradient_norm,
925 step_norm,
926 tr_ratio: step_eval.rho,
927 tr_radius: self.config.damping,
928 ls_iter: 0, // Direct solver (Cholesky) has no iterations
929 iter_time_ms: iter_elapsed_ms,
930 total_time_ms: total_elapsed_ms,
931 accepted: step_eval.accepted,
932 };
933
934 iteration_stats.push(stats.clone());
935 stats.print_line();
936 }
937
938 previous_cost = state.current_cost;
939
940 // Notify all observers with current state
941 crate::optimizer::notify_observers(
942 &mut self.observers,
943 &state.variables,
944 iteration,
945 state.current_cost,
946 step_result.gradient_norm,
947 Some(self.config.damping),
948 step_norm,
949 Some(step_eval.rho),
950 linear_solver.as_ref(),
951 );
952
953 // Check convergence
954 let elapsed = start_time.elapsed();
955
956 // Compute parameter norm for relative parameter tolerance check
957 let parameter_norm = crate::optimizer::compute_parameter_norm(&state.variables);
958
959 // Compute new cost for convergence check (state may already have new cost if step accepted)
960 let new_cost = if step_eval.accepted {
961 state.current_cost
962 } else {
963 // Use cost before step application
964 state.current_cost
965 };
966
967 // Cost before this step (need to add back reduction if step was accepted)
968 let cost_before_step = if step_eval.accepted {
969 state.current_cost + step_eval.cost_reduction
970 } else {
971 state.current_cost
972 };
973
974 if let Some(status) = crate::optimizer::check_convergence(&ConvergenceParams {
975 iteration,
976 current_cost: cost_before_step,
977 new_cost,
978 parameter_norm,
979 parameter_update_norm: step_norm,
980 gradient_norm: step_result.gradient_norm,
981 elapsed,
982 step_accepted: step_eval.accepted,
983 max_iterations: self.config.max_iterations,
984 gradient_tolerance: self.config.gradient_tolerance,
985 parameter_tolerance: self.config.parameter_tolerance,
986 cost_tolerance: self.config.cost_tolerance,
987 min_cost_threshold: self.config.min_cost_threshold,
988 timeout: self.config.timeout,
989 trust_region_radius: Some(self.config.trust_region_radius),
990 min_trust_region_radius: Some(self.config.min_trust_region_radius),
991 }) {
992 // Print summary only if debug level is enabled
993 if tracing::enabled!(tracing::Level::DEBUG) {
994 let summary = crate::optimizer::create_optimizer_summary(
995 "Levenberg-Marquardt",
996 state.initial_cost,
997 state.current_cost,
998 iteration + 1,
999 Some(successful_steps),
1000 Some(unsuccessful_steps),
1001 max_gradient_norm,
1002 final_gradient_norm,
1003 max_parameter_update_norm,
1004 final_parameter_update_norm,
1005 total_cost_reduction,
1006 elapsed,
1007 iteration_stats.clone(),
1008 status.clone(),
1009 Some(self.config.damping),
1010 None,
1011 Some(step_eval.rho),
1012 );
1013 debug!("{}", summary);
1014 }
1015
1016 // Compute covariances if enabled
1017 let covariances = if self.config.compute_covariances {
1018 problem.compute_and_set_covariances(
1019 &mut linear_solver,
1020 &mut state.variables,
1021 &state.variable_index_map,
1022 )
1023 } else {
1024 None
1025 };
1026
1027 // Notify observers that optimization is complete
1028 let final_parameters: HashMap<String, VariableEnum> = state
1029 .variables
1030 .iter()
1031 .map(|(k, v)| (k.clone(), v.clone()))
1032 .collect();
1033 self.observers
1034 .notify_complete(&final_parameters, iteration + 1);
1035
1036 return Ok(crate::optimizer::build_solver_result(
1037 status,
1038 iteration + 1,
1039 state,
1040 elapsed,
1041 final_gradient_norm,
1042 final_parameter_update_norm,
1043 cost_evaluations,
1044 jacobian_evaluations,
1045 covariances,
1046 ));
1047 }
1048
1049 // Note: Max iterations and timeout checks are now handled inside check_convergence()
1050
1051 iteration += 1;
1052 }
1053 }
1054}
1055// Implement Solver trait
1056impl Solver for LevenbergMarquardt {
1057 type Config = LevenbergMarquardtConfig;
1058 type Error = error::ApexSolverError;
1059
1060 fn new() -> Self {
1061 Self::default()
1062 }
1063
1064 fn optimize(
1065 &mut self,
1066 problem: &Problem,
1067 initial_params: &HashMap<String, (ManifoldType, DVector<f64>)>,
1068 ) -> Result<SolverResult<HashMap<String, VariableEnum>>, Self::Error> {
1069 self.optimize(problem, initial_params)
1070 }
1071}
1072
1073#[cfg(test)]
1074mod tests {
1075 use super::*;
1076 use crate::factors::Factor;
1077 use crate::optimizer::OptimizationStatus;
1078 use nalgebra::{DMatrix, dvector};
1079 /// Custom Rosenbrock Factor 1: r1 = 10(x2 - x1²)
1080 /// Demonstrates extensibility - custom factors can be defined outside of factors.rs
1081 #[derive(Debug, Clone)]
1082 struct RosenbrockFactor1;
1083
1084 impl Factor for RosenbrockFactor1 {
1085 fn linearize(
1086 &self,
1087 params: &[DVector<f64>],
1088 compute_jacobian: bool,
1089 ) -> (DVector<f64>, Option<DMatrix<f64>>) {
1090 let x1 = params[0][0];
1091 let x2 = params[1][0];
1092
1093 // Residual: r1 = 10(x2 - x1²)
1094 let residual = dvector![10.0 * (x2 - x1 * x1)];
1095
1096 let jacobian = if compute_jacobian {
1097 // Jacobian: ∂r1/∂x1 = -20*x1, ∂r1/∂x2 = 10
1098 let mut jacobian = DMatrix::zeros(1, 2);
1099 jacobian[(0, 0)] = -20.0 * x1;
1100 jacobian[(0, 1)] = 10.0;
1101
1102 Some(jacobian)
1103 } else {
1104 None
1105 };
1106
1107 (residual, jacobian)
1108 }
1109
1110 fn get_dimension(&self) -> usize {
1111 1
1112 }
1113 }
1114
1115 /// Custom Rosenbrock Factor 2: r2 = 1 - x1
1116 /// Demonstrates extensibility - custom factors can be defined outside of factors.rs
1117 #[derive(Debug, Clone)]
1118 struct RosenbrockFactor2;
1119
1120 impl Factor for RosenbrockFactor2 {
1121 fn linearize(
1122 &self,
1123 params: &[DVector<f64>],
1124 compute_jacobian: bool,
1125 ) -> (DVector<f64>, Option<DMatrix<f64>>) {
1126 let x1 = params[0][0];
1127
1128 // Residual: r2 = 1 - x1
1129 let residual = dvector![1.0 - x1];
1130
1131 let jacobian = if compute_jacobian {
1132 // Jacobian: ∂r2/∂x1 = -1
1133 Some(DMatrix::from_element(1, 1, -1.0))
1134 } else {
1135 None
1136 };
1137
1138 (residual, jacobian)
1139 }
1140
1141 fn get_dimension(&self) -> usize {
1142 1
1143 }
1144 }
1145
1146 #[test]
1147 fn test_rosenbrock_optimization() -> Result<(), Box<dyn std::error::Error>> {
1148 // Rosenbrock function test:
1149 // Minimize: r1² + r2² where
1150 // r1 = 10(x2 - x1²)
1151 // r2 = 1 - x1
1152 // Starting point: [-1.2, 1.0]
1153 // Expected minimum: [1.0, 1.0]
1154
1155 let mut problem = Problem::new();
1156 let mut initial_values = HashMap::new();
1157
1158 // Add variables using Rn manifold (Euclidean space)
1159 initial_values.insert("x1".to_string(), (ManifoldType::RN, dvector![-1.2]));
1160 initial_values.insert("x2".to_string(), (ManifoldType::RN, dvector![1.0]));
1161
1162 // Add custom factors (demonstrates extensibility!)
1163 problem.add_residual_block(&["x1", "x2"], Box::new(RosenbrockFactor1), None);
1164 problem.add_residual_block(&["x1"], Box::new(RosenbrockFactor2), None);
1165
1166 // Configure Levenberg-Marquardt optimizer
1167 let config = LevenbergMarquardtConfig::new()
1168 .with_max_iterations(100)
1169 .with_cost_tolerance(1e-8)
1170 .with_parameter_tolerance(1e-8)
1171 .with_gradient_tolerance(1e-10);
1172
1173 let mut solver = LevenbergMarquardt::with_config(config);
1174 let result = solver.optimize(&problem, &initial_values)?;
1175
1176 // Extract final values
1177 let x1_final = result
1178 .parameters
1179 .get("x1")
1180 .ok_or("x1 not found")?
1181 .to_vector()[0];
1182 let x2_final = result
1183 .parameters
1184 .get("x2")
1185 .ok_or("x2 not found")?
1186 .to_vector()[0];
1187
1188 // Verify convergence to [1.0, 1.0]
1189 assert!(
1190 matches!(
1191 result.status,
1192 OptimizationStatus::Converged
1193 | OptimizationStatus::CostToleranceReached
1194 | OptimizationStatus::ParameterToleranceReached
1195 | OptimizationStatus::GradientToleranceReached
1196 ),
1197 "Optimization should converge"
1198 );
1199 assert!(
1200 (x1_final - 1.0).abs() < 1e-4,
1201 "x1 should converge to 1.0, got {}",
1202 x1_final
1203 );
1204 assert!(
1205 (x2_final - 1.0).abs() < 1e-4,
1206 "x2 should converge to 1.0, got {}",
1207 x2_final
1208 );
1209 assert!(
1210 result.final_cost < 1e-6,
1211 "Final cost should be near zero, got {}",
1212 result.final_cost
1213 );
1214 Ok(())
1215 }
1216}