Skip to main content

apex_solver/optimizer/
levenberg_marquardt.rs

1//! Levenberg-Marquardt algorithm implementation.
2//!
3//! The Levenberg-Marquardt (LM) method is a robust and widely-used algorithm for solving
4//! nonlinear least squares problems of the form:
5//!
6//! ```text
7//! min f(x) = ½||r(x)||² = ½Σᵢ rᵢ(x)²
8//! ```
9//!
10//! where `r: ℝⁿ → ℝᵐ` is the residual vector function.
11//!
12//! # Algorithm Overview
13//!
14//! The Levenberg-Marquardt method solves the damped normal equations at each iteration:
15//!
16//! ```text
17//! (J^T·J + λI)·h = -J^T·r
18//! ```
19//!
20//! where:
21//! - `J` is the Jacobian matrix (m × n)
22//! - `r` is the residual vector (m × 1)
23//! - `h` is the step vector (n × 1)
24//! - `λ` is the adaptive damping parameter (scalar)
25//! - `I` is the identity matrix (or diagonal scaling matrix)
26//!
27//! ## Damping Parameter Strategy
28//!
29//! The damping parameter λ adapts based on step quality:
30//!
31//! - **λ → 0** (small damping): Behaves like Gauss-Newton with fast quadratic convergence
32//! - **λ → ∞** (large damping): Behaves like gradient descent with guaranteed descent direction
33//!
34//! This interpolation between Newton and gradient descent provides excellent robustness
35//! while maintaining fast convergence near the solution.
36//!
37//! ## Step Acceptance and Damping Update
38//!
39//! The algorithm evaluates each proposed step using the gain ratio:
40//!
41//! ```text
42//! ρ = (actual reduction) / (predicted reduction)
43//!   = [f(xₖ) - f(xₖ + h)] / [f(xₖ) - L(h)]
44//! ```
45//!
46//! where `L(h) = f(xₖ) + h^T·g + ½h^T·H·h` is the local quadratic model.
47//!
48//! **Step acceptance:**
49//! - If `ρ > 0`: Accept step (cost decreased), decrease λ to trust the model more
50//! - If `ρ ≤ 0`: Reject step (cost increased), increase λ to be more conservative
51//!
52//! **Damping update** (Nielsen's formula):
53//! ```text
54//! λₖ₊₁ = λₖ · max(1/3, 1 - (2ρ - 1)³)
55//! ```
56//!
57//! This provides smooth, data-driven adaptation of the damping parameter.
58//!
59//! ## Convergence Properties
60//!
61//! - **Global convergence**: Guaranteed to find a stationary point from any starting guess
62//! - **Local quadratic convergence**: Near the solution, behaves like Gauss-Newton
63//! - **Robust to poor initialization**: Adaptive damping prevents divergence
64//! - **Handles ill-conditioning**: Large λ stabilizes nearly singular Hessian
65//!
66//! ## When to Use
67//!
68//! Levenberg-Marquardt is the best general-purpose choice when:
69//! - Initial parameter guess may be far from the optimum
70//! - Problem conditioning is unknown
71//! - Robustness is prioritized over raw speed
72//! - You want reliable convergence across diverse problem types
73//!
74//! For problems with specific structure, consider:
75//! - [`GaussNewton`](crate::GaussNewton) if well-conditioned with good initialization
76//! - [`DogLeg`](crate::DogLeg) for explicit trust region control
77//!
78//! # Implementation Features
79//!
80//! - **Sparse matrix support**: Efficient handling of large-scale problems via `faer` sparse library
81//! - **Adaptive damping**: Nielsen's formula for smooth parameter adaptation
82//! - **Robust linear solvers**: Cholesky (fast) or QR (stable) factorization
83//! - **Jacobi scaling**: Optional diagonal preconditioning for mixed-scale problems
84//! - **Covariance computation**: Optional uncertainty quantification after convergence
85//! - **Manifold operations**: Native support for optimization on Lie groups (SE2, SE3, SO2, SO3)
86//! - **Comprehensive diagnostics**: Detailed summaries of convergence and performance
87//!
88//! # Mathematical Background
89//!
90//! The augmented Hessian `J^T·J + λI` combines two beneficial properties:
91//!
92//! 1. **Positive definiteness**: Always solvable even when `J^T·J` is singular
93//! 2. **Regularization**: Prevents taking steps in poorly-determined directions
94//!
95//! The trust region interpretation: λ controls an implicit spherical trust region where
96//! larger λ restricts step size, ensuring the linear model remains valid.
97//!
98//! # Examples
99//!
100//! ## Basic usage
101//!
102//! ```no_run
103//! use apex_solver::LevenbergMarquardt;
104//! use apex_solver::core::problem::Problem;
105//! use std::collections::HashMap;
106//!
107//! # fn main() -> Result<(), Box<dyn std::error::Error>> {
108//! let mut problem = Problem::new();
109//! // ... add residual blocks (factors) to problem ...
110//!
111//! let initial_values = HashMap::new();
112//! // ... initialize parameters ...
113//!
114//! let mut solver = LevenbergMarquardt::new();
115//! let result = solver.optimize(&problem, &initial_values)?;
116//!
117//! # Ok(())
118//! # }
119//! ```
120//!
121//! ## Advanced configuration
122//!
123//! ```no_run
124//! use apex_solver::optimizer::levenberg_marquardt::{LevenbergMarquardtConfig, LevenbergMarquardt};
125//! use apex_solver::linalg::LinearSolverType;
126//!
127//! # fn main() {
128//! let config = LevenbergMarquardtConfig::new()
129//!     .with_max_iterations(100)
130//!     .with_cost_tolerance(1e-6)
131//!     .with_damping(1e-3)  // Initial damping
132//!     .with_damping_bounds(1e-12, 1e12)  // Min/max damping
133//!     .with_jacobi_scaling(true);  // Improve conditioning
134//!
135//! let mut solver = LevenbergMarquardt::with_config(config);
136//! # }
137//! ```
138//!
139//! # References
140//!
141//! - Levenberg, K. (1944). "A Method for the Solution of Certain Non-Linear Problems in Least Squares". *Quarterly of Applied Mathematics*.
142//! - Marquardt, D. W. (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters". *Journal of the Society for Industrial and Applied Mathematics*.
143//! - Madsen, K., Nielsen, H. B., & Tingleff, O. (2004). *Methods for Non-Linear Least Squares Problems* (2nd ed.). Chapter 3.
144//! - Nocedal, J. & Wright, S. (2006). *Numerical Optimization* (2nd ed.). Springer. Chapter 10.
145//! - Nielsen, H. B. (1999). "Damping Parameter in Marquardt's Method". Technical Report IMM-REP-1999-05.
146
147use crate::core::problem::{Problem, VariableEnum};
148use crate::error;
149use crate::linalg::{
150    LinearSolverType, SchurPreconditioner, SchurSolverAdapter, SchurVariant, SparseCholeskySolver,
151    SparseLinearSolver, SparseQRSolver,
152};
153use crate::optimizer::{
154    ConvergenceParams, InitializedState, IterationStats, OptObserverVec, OptimizerError, Solver,
155    SolverResult, apply_negative_parameter_step, apply_parameter_step, compute_cost,
156};
157use apex_manifolds::ManifoldType;
158
159use faer::{Mat, sparse::SparseColMat};
160use nalgebra::DVector;
161use std::collections::HashMap;
162use std::time::{Duration, Instant};
163use tracing::debug;
164
165/// Configuration parameters for the Levenberg-Marquardt optimizer.
166///
167/// Controls the adaptive damping strategy, convergence criteria, and numerical stability
168/// enhancements for the Levenberg-Marquardt algorithm.
169///
170/// # Builder Pattern
171///
172/// All configuration options can be set using the builder pattern:
173///
174/// ```
175/// use apex_solver::optimizer::levenberg_marquardt::LevenbergMarquardtConfig;
176///
177/// let config = LevenbergMarquardtConfig::new()
178///     .with_max_iterations(100)
179///     .with_damping(1e-3)
180///     .with_damping_bounds(1e-12, 1e12)
181///     .with_jacobi_scaling(true);
182/// ```
183///
184/// # Damping Parameter Behavior
185///
186/// The damping parameter λ controls the trade-off between Gauss-Newton and gradient descent:
187///
188/// - **Initial damping** (`damping`): Starting value (default: 1e-4)
189/// - **Damping bounds** (`damping_min`, `damping_max`): Valid range (default: 1e-12 to 1e12)
190/// - **Adaptation**: Automatically adjusted based on step quality using Nielsen's formula
191///
192/// # Convergence Criteria
193///
194/// The optimizer terminates when ANY of the following conditions is met:
195///
196/// - **Cost tolerance**: `|cost_k - cost_{k-1}| < cost_tolerance`
197/// - **Parameter tolerance**: `||step|| < parameter_tolerance`
198/// - **Gradient tolerance**: `||J^T·r|| < gradient_tolerance`
199/// - **Maximum iterations**: `iteration >= max_iterations`
200/// - **Timeout**: `elapsed_time >= timeout`
201///
202/// # See Also
203///
204/// - [`LevenbergMarquardt`] - The solver that uses this configuration
205/// - [`GaussNewtonConfig`](crate::GaussNewtonConfig) - Undamped variant
206/// - [`DogLegConfig`](crate::DogLegConfig) - Trust region alternative
207#[derive(Clone)]
208pub struct LevenbergMarquardtConfig {
209    /// Type of linear solver for the linear systems
210    pub linear_solver_type: LinearSolverType,
211    /// Maximum number of iterations
212    pub max_iterations: usize,
213    /// Convergence tolerance for cost function
214    pub cost_tolerance: f64,
215    /// Convergence tolerance for parameter updates
216    pub parameter_tolerance: f64,
217    /// Convergence tolerance for gradient norm
218    pub gradient_tolerance: f64,
219    /// Timeout duration
220    pub timeout: Option<Duration>,
221    /// Initial damping parameter
222    pub damping: f64,
223    /// Minimum damping parameter
224    pub damping_min: f64,
225    /// Maximum damping parameter
226    pub damping_max: f64,
227    /// Damping increase factor (when step rejected)
228    pub damping_increase_factor: f64,
229    /// Damping decrease factor (when step accepted)
230    pub damping_decrease_factor: f64,
231    /// Damping nu parameter
232    pub damping_nu: f64,
233    /// Trust region radius
234    pub trust_region_radius: f64,
235    /// Minimum step quality for acceptance
236    pub min_step_quality: f64,
237    /// Good step quality threshold
238    pub good_step_quality: f64,
239    /// Minimum diagonal value for regularization
240    pub min_diagonal: f64,
241    /// Maximum diagonal value for regularization
242    pub max_diagonal: f64,
243    /// Minimum objective function cutoff (optional early termination)
244    ///
245    /// If set, optimization terminates when cost falls below this threshold.
246    /// Useful for early stopping when a "good enough" solution is acceptable.
247    ///
248    /// Default: None (disabled)
249    pub min_cost_threshold: Option<f64>,
250    /// Minimum trust region radius before termination
251    ///
252    /// When the trust region radius falls below this value, the optimizer
253    /// terminates as it indicates the search has converged or the problem
254    /// is ill-conditioned. Matches Ceres Solver's min_trust_region_radius.
255    ///
256    /// Default: 1e-32 (Ceres-compatible)
257    pub min_trust_region_radius: f64,
258    /// Maximum condition number for Jacobian matrix (optional check)
259    ///
260    /// If set, the optimizer checks if condition_number(J^T*J) exceeds this
261    /// threshold and terminates with IllConditionedJacobian status.
262    /// Note: Computing condition number is expensive, so this is disabled by default.
263    ///
264    /// Default: None (disabled)
265    pub max_condition_number: Option<f64>,
266    /// Minimum relative cost decrease for step acceptance
267    ///
268    /// Used in computing step quality (rho = actual_reduction / predicted_reduction).
269    /// Steps with rho < min_relative_decrease are rejected. Matches Ceres Solver's
270    /// min_relative_decrease parameter.
271    ///
272    /// Default: 1e-3 (Ceres-compatible)
273    pub min_relative_decrease: f64,
274    /// Use Jacobi column scaling (preconditioning)
275    ///
276    /// When enabled, normalizes Jacobian columns by their L2 norm before solving.
277    /// This can improve convergence for problems with mixed parameter scales
278    /// (e.g., positions in meters + angles in radians) but adds ~5-10% overhead.
279    ///
280    /// Default: false (to avoid performance overhead and faster convergence)
281    pub use_jacobi_scaling: bool,
282    /// Compute per-variable covariance matrices (uncertainty estimation)
283    ///
284    /// When enabled, computes covariance by inverting the Hessian matrix after
285    /// convergence. The full covariance matrix is extracted into per-variable
286    /// blocks stored in both Variable structs and SolverResult.
287    ///
288    /// Default: false (to avoid performance overhead)
289    pub compute_covariances: bool,
290    /// Schur complement solver variant (for bundle adjustment problems)
291    ///
292    /// When using LinearSolverType::SparseSchurComplement, this determines which
293    /// variant of the Schur complement method to use:
294    /// - Sparse: Direct sparse Cholesky factorization (most accurate, moderate speed)
295    /// - Iterative: Preconditioned Conjugate Gradients (memory efficient, good for large problems)
296    /// - PowerSeries: Power series approximation (fastest, less accurate)
297    ///
298    /// Default: Sparse
299    pub schur_variant: SchurVariant,
300    /// Schur complement preconditioner type
301    ///
302    /// Determines the preconditioning strategy for iterative Schur methods:
303    /// - Diagonal: Simple diagonal preconditioner (fast, less effective)
304    /// - BlockDiagonal: Block-diagonal preconditioner (balanced)
305    /// - IncompleteCholesky: Incomplete Cholesky factorization (slower, more effective)
306    ///
307    /// Default: Diagonal
308    pub schur_preconditioner: SchurPreconditioner,
309    // Note: Visualization is now handled via the observer pattern.
310    // Use `solver.add_observer(RerunObserver::new(true)?)` to enable visualization.
311    // This provides cleaner separation of concerns and allows multiple observers.
312}
313
314impl Default for LevenbergMarquardtConfig {
315    fn default() -> Self {
316        Self {
317            linear_solver_type: LinearSolverType::default(),
318            // Ceres Solver default: 50 (changed from 100 for compatibility)
319            max_iterations: 50,
320            // Ceres Solver default: 1e-6 (changed from 1e-8 for compatibility)
321            cost_tolerance: 1e-6,
322            // Ceres Solver default: 1e-8 (unchanged)
323            parameter_tolerance: 1e-8,
324            // Ceres Solver default: 1e-10 (changed from 1e-8 for compatibility)
325            // Note: Typically should be 1e-4 * cost_tolerance per Ceres docs
326            gradient_tolerance: 1e-10,
327            timeout: None,
328            damping: 1e-3, // Increased from 1e-4 for better initial convergence on BA
329            damping_min: 1e-12,
330            damping_max: 1e12,
331            damping_increase_factor: 10.0,
332            damping_decrease_factor: 0.3,
333            damping_nu: 2.0,
334            trust_region_radius: 1e4,
335            min_step_quality: 0.0,
336            good_step_quality: 0.75,
337            min_diagonal: 1e-6,
338            max_diagonal: 1e32,
339            // New Ceres-compatible parameters
340            min_cost_threshold: None,
341            min_trust_region_radius: 1e-32,
342            max_condition_number: None,
343            min_relative_decrease: 1e-3,
344            // Existing parameters
345            // Jacobi scaling disabled by default for Schur solvers (incompatible with block structure)
346            // Enable manually for Cholesky/QR solvers on mixed-scale problems
347            use_jacobi_scaling: false,
348            compute_covariances: false,
349            // Schur complement parameters
350            schur_variant: SchurVariant::default(),
351            schur_preconditioner: SchurPreconditioner::default(),
352        }
353    }
354}
355
356impl LevenbergMarquardtConfig {
357    /// Create a new Levenberg-Marquardt configuration with default values.
358    pub fn new() -> Self {
359        Self::default()
360    }
361
362    /// Set the linear solver type
363    pub fn with_linear_solver_type(mut self, linear_solver_type: LinearSolverType) -> Self {
364        self.linear_solver_type = linear_solver_type;
365        self
366    }
367
368    /// Set the maximum number of iterations
369    pub fn with_max_iterations(mut self, max_iterations: usize) -> Self {
370        self.max_iterations = max_iterations;
371        self
372    }
373
374    /// Set the cost tolerance
375    pub fn with_cost_tolerance(mut self, cost_tolerance: f64) -> Self {
376        self.cost_tolerance = cost_tolerance;
377        self
378    }
379
380    /// Set the parameter tolerance
381    pub fn with_parameter_tolerance(mut self, parameter_tolerance: f64) -> Self {
382        self.parameter_tolerance = parameter_tolerance;
383        self
384    }
385
386    /// Set the gradient tolerance
387    pub fn with_gradient_tolerance(mut self, gradient_tolerance: f64) -> Self {
388        self.gradient_tolerance = gradient_tolerance;
389        self
390    }
391
392    /// Set the timeout duration
393    pub fn with_timeout(mut self, timeout: Duration) -> Self {
394        self.timeout = Some(timeout);
395        self
396    }
397
398    /// Set the initial damping parameter.
399    pub fn with_damping(mut self, damping: f64) -> Self {
400        self.damping = damping;
401        self
402    }
403
404    /// Set the damping parameter bounds.
405    pub fn with_damping_bounds(mut self, min: f64, max: f64) -> Self {
406        self.damping_min = min;
407        self.damping_max = max;
408        self
409    }
410
411    /// Set the damping adjustment factors.
412    pub fn with_damping_factors(mut self, increase: f64, decrease: f64) -> Self {
413        self.damping_increase_factor = increase;
414        self.damping_decrease_factor = decrease;
415        self
416    }
417
418    /// Set the trust region parameters.
419    pub fn with_trust_region(mut self, radius: f64, min_quality: f64, good_quality: f64) -> Self {
420        self.trust_region_radius = radius;
421        self.min_step_quality = min_quality;
422        self.good_step_quality = good_quality;
423        self
424    }
425
426    /// Set minimum objective function cutoff for early termination.
427    ///
428    /// When set, optimization terminates with MinCostThresholdReached status
429    /// if the cost falls below this threshold. Useful for early stopping when
430    /// a "good enough" solution is acceptable.
431    pub fn with_min_cost_threshold(mut self, min_cost: f64) -> Self {
432        self.min_cost_threshold = Some(min_cost);
433        self
434    }
435
436    /// Set minimum trust region radius before termination.
437    ///
438    /// When the trust region radius falls below this value, optimization
439    /// terminates with TrustRegionRadiusTooSmall status.
440    /// Default: 1e-32 (Ceres-compatible)
441    pub fn with_min_trust_region_radius(mut self, min_radius: f64) -> Self {
442        self.min_trust_region_radius = min_radius;
443        self
444    }
445
446    /// Set maximum condition number for Jacobian matrix.
447    ///
448    /// If set, the optimizer checks if condition_number(J^T*J) exceeds this
449    /// threshold and terminates with IllConditionedJacobian status.
450    /// Note: Computing condition number is expensive, disabled by default.
451    pub fn with_max_condition_number(mut self, max_cond: f64) -> Self {
452        self.max_condition_number = Some(max_cond);
453        self
454    }
455
456    /// Set minimum relative cost decrease for step acceptance.
457    ///
458    /// Steps with rho = (actual_reduction / predicted_reduction) below this
459    /// threshold are rejected. Default: 1e-3 (Ceres-compatible)
460    pub fn with_min_relative_decrease(mut self, min_decrease: f64) -> Self {
461        self.min_relative_decrease = min_decrease;
462        self
463    }
464
465    /// Enable or disable Jacobi column scaling (preconditioning).
466    ///
467    /// When enabled, normalizes Jacobian columns by their L2 norm before solving.
468    /// Can improve convergence for mixed-scale problems but adds ~5-10% overhead.
469    pub fn with_jacobi_scaling(mut self, use_jacobi_scaling: bool) -> Self {
470        self.use_jacobi_scaling = use_jacobi_scaling;
471        self
472    }
473
474    /// Enable or disable covariance computation (uncertainty estimation).
475    ///
476    /// When enabled, computes the full covariance matrix by inverting the Hessian
477    /// after convergence, then extracts per-variable covariance blocks.
478    pub fn with_compute_covariances(mut self, compute_covariances: bool) -> Self {
479        self.compute_covariances = compute_covariances;
480        self
481    }
482
483    /// Set Schur complement solver variant
484    pub fn with_schur_variant(mut self, variant: SchurVariant) -> Self {
485        self.schur_variant = variant;
486        self
487    }
488
489    /// Set Schur complement preconditioner
490    pub fn with_schur_preconditioner(mut self, preconditioner: SchurPreconditioner) -> Self {
491        self.schur_preconditioner = preconditioner;
492        self
493    }
494
495    /// Configuration optimized for bundle adjustment problems.
496    ///
497    /// This preset uses settings tuned for large-scale bundle adjustment:
498    /// - **Schur complement solver** with iterative PCG (memory efficient)
499    /// - **Schur-Jacobi preconditioner** (Ceres-style, best PCG convergence)
500    /// - **Moderate initial damping** (1e-3) - not too aggressive
501    /// - **200 max iterations** (BA often needs more iterations for full convergence)
502    /// - **Very tight tolerances** matching Ceres Solver for accurate reconstruction
503    ///
504    /// This configuration matches Ceres Solver's recommended BA settings and
505    /// should achieve similar convergence quality.
506    ///
507    /// # Example
508    ///
509    /// ```
510    /// use apex_solver::optimizer::levenberg_marquardt::LevenbergMarquardtConfig;
511    ///
512    /// let config = LevenbergMarquardtConfig::for_bundle_adjustment();
513    /// ```
514    pub fn for_bundle_adjustment() -> Self {
515        Self::default()
516            .with_linear_solver_type(LinearSolverType::SparseSchurComplement)
517            .with_schur_variant(SchurVariant::Iterative)
518            .with_schur_preconditioner(SchurPreconditioner::SchurJacobi)
519            .with_damping(1e-3) // Moderate initial damping (Ceres default)
520            .with_max_iterations(20) // Reduced for early stop when RMSE < 1px
521            // Match Ceres tolerances for faster convergence
522            .with_cost_tolerance(1e-6) // Ceres function_tolerance (was 1e-12)
523            .with_parameter_tolerance(1e-8) // Ceres parameter_tolerance (was 1e-14)
524            .with_gradient_tolerance(1e-10) // Relaxed (was 1e-16)
525    }
526
527    /// Enable real-time visualization (graphical debugging).
528    ///
529    /// When enabled, optimization progress is logged to a Rerun viewer with:
530    /// - Time series plots of cost, gradient norm, damping, step quality
531    /// - Sparse Hessian matrix visualization as heat map
532    /// - Gradient vector visualization
533    /// - Real-time manifold state updates (for SE2/SE3 problems)
534    ///
535    /// **Note:** Requires the `visualization` feature to be enabled in `Cargo.toml`.
536    /// Use `verbose` for terminal logging.
537    ///
538    /// # Arguments
539    ///
540    /// * `enable` - Whether to enable visualization
541    // Note: with_visualization() method has been removed.
542    // Use the observer pattern instead:
543    //   let mut solver = LevenbergMarquardt::with_config(config);
544    //   solver.add_observer(RerunObserver::new(true)?);
545    // This provides cleaner separation and allows multiple observers.
546    ///   Print configuration parameters (verbose mode only)
547    pub fn print_configuration(&self) {
548        debug!(
549            "Configuration:\n  Solver:        Levenberg-Marquardt\n  Linear solver: {:?}\n  Convergence Criteria:\n  Max iterations:      {}\n  Cost tolerance:      {:.2e}\n  Parameter tolerance: {:.2e}\n  Gradient tolerance:  {:.2e}\n  Timeout:             {:?}\n  Damping Parameters:\n  Initial damping:     {:.2e}\n  Damping range:       [{:.2e}, {:.2e}]\n  Increase factor:     {:.2}\n  Decrease factor:     {:.2}\n  Trust Region:\n  Initial radius:      {:.2e}\n  Min step quality:    {:.2}\n  Good step quality:   {:.2}\n  Numerical Settings:\n  Jacobi scaling:      {}\n  Compute covariances: {}",
550            self.linear_solver_type,
551            self.max_iterations,
552            self.cost_tolerance,
553            self.parameter_tolerance,
554            self.gradient_tolerance,
555            self.timeout,
556            self.damping,
557            self.damping_min,
558            self.damping_max,
559            self.damping_increase_factor,
560            self.damping_decrease_factor,
561            self.trust_region_radius,
562            self.min_step_quality,
563            self.good_step_quality,
564            if self.use_jacobi_scaling {
565                "enabled"
566            } else {
567                "disabled"
568            },
569            if self.compute_covariances {
570                "enabled"
571            } else {
572                "disabled"
573            }
574        );
575    }
576}
577
578/// Result from step computation
579struct StepResult {
580    step: Mat<f64>,
581    gradient_norm: f64,
582    predicted_reduction: f64,
583}
584
585/// Result from step evaluation
586struct StepEvaluation {
587    accepted: bool,
588    cost_reduction: f64,
589    rho: f64,
590}
591
592/// Levenberg-Marquardt solver for nonlinear least squares optimization.
593///
594/// Implements the damped Gauss-Newton method with adaptive damping parameter λ that
595/// interpolates between Gauss-Newton and gradient descent based on step quality.
596///
597/// # Algorithm
598///
599/// At each iteration k:
600/// 1. Compute residual `r(xₖ)` and Jacobian `J(xₖ)`
601/// 2. Solve augmented system: `(J^T·J + λI)·h = -J^T·r`
602/// 3. Evaluate step quality: `ρ = (actual reduction) / (predicted reduction)`
603/// 4. If `ρ > 0`: Accept step and update `xₖ₊₁ = xₖ ⊕ h`, decrease λ
604/// 5. If `ρ ≤ 0`: Reject step (keep `xₖ₊₁ = xₖ`), increase λ
605/// 6. Check convergence criteria
606///
607/// The damping parameter λ is updated using Nielsen's smooth formula:
608/// `λₖ₊₁ = λₖ · max(1/3, 1 - (2ρ - 1)³)` for accepted steps,
609/// or `λₖ₊₁ = λₖ · ν` (with increasing ν) for rejected steps.
610///
611/// # Examples
612///
613/// ```no_run
614/// use apex_solver::optimizer::levenberg_marquardt::{LevenbergMarquardtConfig, LevenbergMarquardt};
615/// use apex_solver::linalg::LinearSolverType;
616///
617/// # fn main() {
618/// let config = LevenbergMarquardtConfig::new()
619///     .with_max_iterations(100)
620///     .with_damping(1e-3)
621///     .with_damping_bounds(1e-12, 1e12)
622///     .with_jacobi_scaling(true);
623///
624/// let mut solver = LevenbergMarquardt::with_config(config);
625/// # }
626/// ```
627///
628/// # See Also
629///
630/// - [`LevenbergMarquardtConfig`] - Configuration options
631/// - [`GaussNewton`](crate::GaussNewton) - Undamped variant (faster but less robust)
632/// - [`DogLeg`](crate::DogLeg) - Alternative trust region method
633pub struct LevenbergMarquardt {
634    config: LevenbergMarquardtConfig,
635    jacobi_scaling: Option<SparseColMat<usize, f64>>,
636    observers: OptObserverVec,
637}
638
639impl Default for LevenbergMarquardt {
640    fn default() -> Self {
641        Self::new()
642    }
643}
644
645impl LevenbergMarquardt {
646    /// Create a new Levenberg-Marquardt solver with default configuration.
647    pub fn new() -> Self {
648        Self::with_config(LevenbergMarquardtConfig::default())
649    }
650
651    /// Create a new Levenberg-Marquardt solver with the given configuration.
652    pub fn with_config(config: LevenbergMarquardtConfig) -> Self {
653        Self {
654            config,
655            jacobi_scaling: None,
656            observers: OptObserverVec::new(),
657        }
658    }
659
660    /// Add an observer to monitor optimization progress.
661    ///
662    /// Observers are notified at each iteration with the current variable values.
663    /// This enables real-time visualization, logging, metrics collection, etc.
664    ///
665    /// # Arguments
666    ///
667    /// * `observer` - Any type implementing `OptObserver`
668    ///
669    /// # Examples
670    ///
671    /// ```no_run
672    /// use apex_solver::{LevenbergMarquardt, LevenbergMarquardtConfig};
673    /// # use apex_solver::core::problem::Problem;
674    /// # use std::collections::HashMap;
675    ///
676    /// # fn main() -> Result<(), Box<dyn std::error::Error>> {
677    /// let mut solver = LevenbergMarquardt::new();
678    ///
679    /// #[cfg(feature = "visualization")]
680    /// {
681    ///     use apex_solver::observers::RerunObserver;
682    ///     let rerun_observer = RerunObserver::new(true)?;
683    ///     solver.add_observer(rerun_observer);
684    /// }
685    ///
686    /// // ... optimize ...
687    /// # Ok(())
688    /// # }
689    /// ```
690    pub fn add_observer(&mut self, observer: impl crate::optimizer::OptObserver + 'static) {
691        self.observers.add(observer);
692    }
693
694    /// Update damping parameter based on step quality using trust region approach
695    /// Reference: Introduction to Optimization and Data Fitting
696    /// Algorithm 6.18
697    fn update_damping(&mut self, rho: f64) -> bool {
698        if rho > 0.0 {
699            // Step accepted - decrease damping
700            let coff = 2.0 * rho - 1.0;
701            self.config.damping *= (1.0_f64 / 3.0).max(1.0 - coff * coff * coff);
702            self.config.damping = self.config.damping.max(self.config.damping_min);
703            self.config.damping_nu = 2.0;
704            true
705        } else {
706            // Step rejected - increase damping
707            self.config.damping *= self.config.damping_nu;
708            self.config.damping_nu *= 2.0;
709            self.config.damping = self.config.damping.min(self.config.damping_max);
710            false
711        }
712    }
713
714    /// Compute predicted cost reduction from linear model
715    /// Standard LM formula: 0.5 * step^T * (damping * step - gradient)
716    fn compute_predicted_reduction(&self, step: &Mat<f64>, gradient: &Mat<f64>) -> f64 {
717        // Standard Levenberg-Marquardt predicted reduction formula
718        // predicted_reduction = -step^T * gradient - 0.5 * step^T * H * step
719        //                     = 0.5 * step^T * (damping * step - gradient)
720        let diff = self.config.damping * step - gradient;
721        (0.5 * step.transpose() * &diff)[(0, 0)]
722    }
723
724    /// Compute optimization step by solving the augmented system
725    fn compute_levenberg_marquardt_step(
726        &self,
727        residuals: &Mat<f64>,
728        scaled_jacobian: &SparseColMat<usize, f64>,
729        linear_solver: &mut Box<dyn SparseLinearSolver>,
730    ) -> Result<StepResult, OptimizerError> {
731        // Solve augmented equation: (J_scaled^T * J_scaled + λI) * dx_scaled = -J_scaled^T * r
732        let residuals_owned = residuals.as_ref().to_owned();
733        let scaled_step = linear_solver
734            .solve_augmented_equation(&residuals_owned, scaled_jacobian, self.config.damping)
735            .map_err(|e| OptimizerError::LinearSolveFailed(e.to_string()).log_with_source(e))?;
736
737        // Get cached gradient and Hessian from the solver
738        let gradient = linear_solver.get_gradient().ok_or_else(|| {
739            OptimizerError::NumericalInstability("Gradient not available".into()).log()
740        })?;
741        let _hessian = linear_solver.get_hessian().ok_or_else(|| {
742            OptimizerError::NumericalInstability("Hessian not available".into()).log()
743        })?;
744        let gradient_norm = gradient.norm_l2();
745
746        // Apply inverse Jacobi scaling to get final step (if enabled)
747        let step = if self.config.use_jacobi_scaling {
748            let scaling = self
749                .jacobi_scaling
750                .as_ref()
751                .ok_or_else(|| OptimizerError::JacobiScalingNotInitialized.log())?;
752            &scaled_step * scaling
753        } else {
754            scaled_step
755        };
756
757        // Compute predicted reduction using scaled values
758        let predicted_reduction = self.compute_predicted_reduction(&step, gradient);
759
760        Ok(StepResult {
761            step,
762            gradient_norm,
763            predicted_reduction,
764        })
765    }
766
767    /// Evaluate and apply step, handling acceptance/rejection based on step quality
768    fn evaluate_and_apply_step(
769        &mut self,
770        step_result: &StepResult,
771        state: &mut InitializedState,
772        problem: &Problem,
773    ) -> error::ApexSolverResult<StepEvaluation> {
774        // Apply parameter updates using manifold operations
775        let _step_norm = apply_parameter_step(
776            &mut state.variables,
777            step_result.step.as_ref(),
778            &state.sorted_vars,
779        );
780
781        // Compute new cost (residual only, no Jacobian needed for step evaluation)
782        let new_residual = problem.compute_residual_sparse(&state.variables)?;
783        let new_cost = compute_cost(&new_residual);
784
785        // Compute step quality
786        let rho = crate::optimizer::compute_step_quality(
787            state.current_cost,
788            new_cost,
789            step_result.predicted_reduction,
790        );
791
792        // Update damping and decide whether to accept step
793        let accepted = self.update_damping(rho);
794
795        let cost_reduction = if accepted {
796            // Accept the step - parameters already updated
797            let reduction = state.current_cost - new_cost;
798            state.current_cost = new_cost;
799            reduction
800        } else {
801            // Reject the step - revert parameter changes
802            apply_negative_parameter_step(
803                &mut state.variables,
804                step_result.step.as_ref(),
805                &state.sorted_vars,
806            );
807            0.0
808        };
809
810        Ok(StepEvaluation {
811            accepted,
812            cost_reduction,
813            rho,
814        })
815    }
816
817    pub fn optimize(
818        &mut self,
819        problem: &Problem,
820        initial_params: &HashMap<String, (ManifoldType, DVector<f64>)>,
821    ) -> Result<SolverResult<HashMap<String, VariableEnum>>, error::ApexSolverError> {
822        let start_time = Instant::now();
823        let mut iteration = 0;
824        let mut cost_evaluations = 1; // Initial cost evaluation
825        let mut jacobian_evaluations = 0;
826        let mut successful_steps = 0;
827        let mut unsuccessful_steps = 0;
828
829        // Initialize optimization state
830        let mut state = crate::optimizer::initialize_optimization_state(problem, initial_params)?;
831
832        // Create linear solver - must be after variable initialization for Schur solver
833        let mut linear_solver: Box<dyn SparseLinearSolver> = match self.config.linear_solver_type {
834            LinearSolverType::SparseCholesky => Box::new(SparseCholeskySolver::new()),
835            LinearSolverType::SparseQR => Box::new(SparseQRSolver::new()),
836            LinearSolverType::SparseSchurComplement => Box::new(
837                SchurSolverAdapter::new_with_structure_and_config(
838                    &state.variables,
839                    &state.variable_index_map,
840                    self.config.schur_variant,
841                    self.config.schur_preconditioner,
842                )
843                .map_err(|e| {
844                    OptimizerError::LinearSolveFailed(format!(
845                        "Failed to initialize Schur solver: {}",
846                        e
847                    ))
848                    .log()
849                })?,
850            ),
851        };
852
853        // Initialize summary tracking variables
854        let mut max_gradient_norm: f64 = 0.0;
855        let mut max_parameter_update_norm: f64 = 0.0;
856        let mut total_cost_reduction = 0.0;
857        let mut final_gradient_norm;
858        let mut final_parameter_update_norm;
859
860        // Initialize iteration statistics tracking
861        let mut iteration_stats = Vec::with_capacity(self.config.max_iterations);
862        let mut previous_cost = state.current_cost;
863
864        // Print configuration and header if debug level is enabled
865        if tracing::enabled!(tracing::Level::DEBUG) {
866            self.config.print_configuration();
867            IterationStats::print_header();
868        }
869
870        // Main optimization loop
871        loop {
872            let iter_start = Instant::now();
873            // Evaluate residuals and Jacobian
874            let (residuals, jacobian) = problem.compute_residual_and_jacobian_sparse(
875                &state.variables,
876                &state.variable_index_map,
877                &state.symbolic_structure,
878            )?;
879            jacobian_evaluations += 1;
880
881            // Process Jacobian (apply scaling if enabled)
882            let scaled_jacobian = if self.config.use_jacobi_scaling {
883                crate::optimizer::process_jacobian(&jacobian, &mut self.jacobi_scaling, iteration)?
884            } else {
885                jacobian
886            };
887
888            // Compute optimization step
889            let step_result = self.compute_levenberg_marquardt_step(
890                &residuals,
891                &scaled_jacobian,
892                &mut linear_solver,
893            )?;
894
895            // Update tracking variables
896            max_gradient_norm = max_gradient_norm.max(step_result.gradient_norm);
897            final_gradient_norm = step_result.gradient_norm;
898            let step_norm = step_result.step.norm_l2();
899            max_parameter_update_norm = max_parameter_update_norm.max(step_norm);
900            final_parameter_update_norm = step_norm;
901
902            // Evaluate and apply step (handles accept/reject)
903            let step_eval = self.evaluate_and_apply_step(&step_result, &mut state, problem)?;
904            cost_evaluations += 1;
905
906            // Update counters based on acceptance
907            if step_eval.accepted {
908                successful_steps += 1;
909                total_cost_reduction += step_eval.cost_reduction;
910            } else {
911                unsuccessful_steps += 1;
912            }
913
914            // OPTIMIZATION: Only collect iteration statistics if debug level is enabled
915            // This eliminates ~2-5ms overhead per iteration for non-debug optimization
916            if tracing::enabled!(tracing::Level::DEBUG) {
917                let iter_elapsed_ms = iter_start.elapsed().as_secs_f64() * 1000.0;
918                let total_elapsed_ms = start_time.elapsed().as_secs_f64() * 1000.0;
919
920                let stats = IterationStats {
921                    iteration,
922                    cost: state.current_cost,
923                    cost_change: previous_cost - state.current_cost,
924                    gradient_norm: step_result.gradient_norm,
925                    step_norm,
926                    tr_ratio: step_eval.rho,
927                    tr_radius: self.config.damping,
928                    ls_iter: 0, // Direct solver (Cholesky) has no iterations
929                    iter_time_ms: iter_elapsed_ms,
930                    total_time_ms: total_elapsed_ms,
931                    accepted: step_eval.accepted,
932                };
933
934                iteration_stats.push(stats.clone());
935                stats.print_line();
936            }
937
938            previous_cost = state.current_cost;
939
940            // Notify all observers with current state
941            crate::optimizer::notify_observers(
942                &mut self.observers,
943                &state.variables,
944                iteration,
945                state.current_cost,
946                step_result.gradient_norm,
947                Some(self.config.damping),
948                step_norm,
949                Some(step_eval.rho),
950                linear_solver.as_ref(),
951            );
952
953            // Check convergence
954            let elapsed = start_time.elapsed();
955
956            // Compute parameter norm for relative parameter tolerance check
957            let parameter_norm = crate::optimizer::compute_parameter_norm(&state.variables);
958
959            // Compute new cost for convergence check (state may already have new cost if step accepted)
960            let new_cost = if step_eval.accepted {
961                state.current_cost
962            } else {
963                // Use cost before step application
964                state.current_cost
965            };
966
967            // Cost before this step (need to add back reduction if step was accepted)
968            let cost_before_step = if step_eval.accepted {
969                state.current_cost + step_eval.cost_reduction
970            } else {
971                state.current_cost
972            };
973
974            if let Some(status) = crate::optimizer::check_convergence(&ConvergenceParams {
975                iteration,
976                current_cost: cost_before_step,
977                new_cost,
978                parameter_norm,
979                parameter_update_norm: step_norm,
980                gradient_norm: step_result.gradient_norm,
981                elapsed,
982                step_accepted: step_eval.accepted,
983                max_iterations: self.config.max_iterations,
984                gradient_tolerance: self.config.gradient_tolerance,
985                parameter_tolerance: self.config.parameter_tolerance,
986                cost_tolerance: self.config.cost_tolerance,
987                min_cost_threshold: self.config.min_cost_threshold,
988                timeout: self.config.timeout,
989                trust_region_radius: Some(self.config.trust_region_radius),
990                min_trust_region_radius: Some(self.config.min_trust_region_radius),
991            }) {
992                // Print summary only if debug level is enabled
993                if tracing::enabled!(tracing::Level::DEBUG) {
994                    let summary = crate::optimizer::create_optimizer_summary(
995                        "Levenberg-Marquardt",
996                        state.initial_cost,
997                        state.current_cost,
998                        iteration + 1,
999                        Some(successful_steps),
1000                        Some(unsuccessful_steps),
1001                        max_gradient_norm,
1002                        final_gradient_norm,
1003                        max_parameter_update_norm,
1004                        final_parameter_update_norm,
1005                        total_cost_reduction,
1006                        elapsed,
1007                        iteration_stats.clone(),
1008                        status.clone(),
1009                        Some(self.config.damping),
1010                        None,
1011                        Some(step_eval.rho),
1012                    );
1013                    debug!("{}", summary);
1014                }
1015
1016                // Compute covariances if enabled
1017                let covariances = if self.config.compute_covariances {
1018                    problem.compute_and_set_covariances(
1019                        &mut linear_solver,
1020                        &mut state.variables,
1021                        &state.variable_index_map,
1022                    )
1023                } else {
1024                    None
1025                };
1026
1027                // Notify observers that optimization is complete
1028                let final_parameters: HashMap<String, VariableEnum> = state
1029                    .variables
1030                    .iter()
1031                    .map(|(k, v)| (k.clone(), v.clone()))
1032                    .collect();
1033                self.observers
1034                    .notify_complete(&final_parameters, iteration + 1);
1035
1036                return Ok(crate::optimizer::build_solver_result(
1037                    status,
1038                    iteration + 1,
1039                    state,
1040                    elapsed,
1041                    final_gradient_norm,
1042                    final_parameter_update_norm,
1043                    cost_evaluations,
1044                    jacobian_evaluations,
1045                    covariances,
1046                ));
1047            }
1048
1049            // Note: Max iterations and timeout checks are now handled inside check_convergence()
1050
1051            iteration += 1;
1052        }
1053    }
1054}
1055// Implement Solver trait
1056impl Solver for LevenbergMarquardt {
1057    type Config = LevenbergMarquardtConfig;
1058    type Error = error::ApexSolverError;
1059
1060    fn new() -> Self {
1061        Self::default()
1062    }
1063
1064    fn optimize(
1065        &mut self,
1066        problem: &Problem,
1067        initial_params: &HashMap<String, (ManifoldType, DVector<f64>)>,
1068    ) -> Result<SolverResult<HashMap<String, VariableEnum>>, Self::Error> {
1069        self.optimize(problem, initial_params)
1070    }
1071}
1072
1073#[cfg(test)]
1074mod tests {
1075    use super::*;
1076    use crate::factors::Factor;
1077    use crate::optimizer::OptimizationStatus;
1078    use nalgebra::{DMatrix, dvector};
1079    /// Custom Rosenbrock Factor 1: r1 = 10(x2 - x1²)
1080    /// Demonstrates extensibility - custom factors can be defined outside of factors.rs
1081    #[derive(Debug, Clone)]
1082    struct RosenbrockFactor1;
1083
1084    impl Factor for RosenbrockFactor1 {
1085        fn linearize(
1086            &self,
1087            params: &[DVector<f64>],
1088            compute_jacobian: bool,
1089        ) -> (DVector<f64>, Option<DMatrix<f64>>) {
1090            let x1 = params[0][0];
1091            let x2 = params[1][0];
1092
1093            // Residual: r1 = 10(x2 - x1²)
1094            let residual = dvector![10.0 * (x2 - x1 * x1)];
1095
1096            let jacobian = if compute_jacobian {
1097                // Jacobian: ∂r1/∂x1 = -20*x1, ∂r1/∂x2 = 10
1098                let mut jacobian = DMatrix::zeros(1, 2);
1099                jacobian[(0, 0)] = -20.0 * x1;
1100                jacobian[(0, 1)] = 10.0;
1101
1102                Some(jacobian)
1103            } else {
1104                None
1105            };
1106
1107            (residual, jacobian)
1108        }
1109
1110        fn get_dimension(&self) -> usize {
1111            1
1112        }
1113    }
1114
1115    /// Custom Rosenbrock Factor 2: r2 = 1 - x1
1116    /// Demonstrates extensibility - custom factors can be defined outside of factors.rs
1117    #[derive(Debug, Clone)]
1118    struct RosenbrockFactor2;
1119
1120    impl Factor for RosenbrockFactor2 {
1121        fn linearize(
1122            &self,
1123            params: &[DVector<f64>],
1124            compute_jacobian: bool,
1125        ) -> (DVector<f64>, Option<DMatrix<f64>>) {
1126            let x1 = params[0][0];
1127
1128            // Residual: r2 = 1 - x1
1129            let residual = dvector![1.0 - x1];
1130
1131            let jacobian = if compute_jacobian {
1132                // Jacobian: ∂r2/∂x1 = -1
1133                Some(DMatrix::from_element(1, 1, -1.0))
1134            } else {
1135                None
1136            };
1137
1138            (residual, jacobian)
1139        }
1140
1141        fn get_dimension(&self) -> usize {
1142            1
1143        }
1144    }
1145
1146    #[test]
1147    fn test_rosenbrock_optimization() -> Result<(), Box<dyn std::error::Error>> {
1148        // Rosenbrock function test:
1149        // Minimize: r1² + r2² where
1150        //   r1 = 10(x2 - x1²)
1151        //   r2 = 1 - x1
1152        // Starting point: [-1.2, 1.0]
1153        // Expected minimum: [1.0, 1.0]
1154
1155        let mut problem = Problem::new();
1156        let mut initial_values = HashMap::new();
1157
1158        // Add variables using Rn manifold (Euclidean space)
1159        initial_values.insert("x1".to_string(), (ManifoldType::RN, dvector![-1.2]));
1160        initial_values.insert("x2".to_string(), (ManifoldType::RN, dvector![1.0]));
1161
1162        // Add custom factors (demonstrates extensibility!)
1163        problem.add_residual_block(&["x1", "x2"], Box::new(RosenbrockFactor1), None);
1164        problem.add_residual_block(&["x1"], Box::new(RosenbrockFactor2), None);
1165
1166        // Configure Levenberg-Marquardt optimizer
1167        let config = LevenbergMarquardtConfig::new()
1168            .with_max_iterations(100)
1169            .with_cost_tolerance(1e-8)
1170            .with_parameter_tolerance(1e-8)
1171            .with_gradient_tolerance(1e-10);
1172
1173        let mut solver = LevenbergMarquardt::with_config(config);
1174        let result = solver.optimize(&problem, &initial_values)?;
1175
1176        // Extract final values
1177        let x1_final = result
1178            .parameters
1179            .get("x1")
1180            .ok_or("x1 not found")?
1181            .to_vector()[0];
1182        let x2_final = result
1183            .parameters
1184            .get("x2")
1185            .ok_or("x2 not found")?
1186            .to_vector()[0];
1187
1188        // Verify convergence to [1.0, 1.0]
1189        assert!(
1190            matches!(
1191                result.status,
1192                OptimizationStatus::Converged
1193                    | OptimizationStatus::CostToleranceReached
1194                    | OptimizationStatus::ParameterToleranceReached
1195                    | OptimizationStatus::GradientToleranceReached
1196            ),
1197            "Optimization should converge"
1198        );
1199        assert!(
1200            (x1_final - 1.0).abs() < 1e-4,
1201            "x1 should converge to 1.0, got {}",
1202            x1_final
1203        );
1204        assert!(
1205            (x2_final - 1.0).abs() < 1e-4,
1206            "x2 should converge to 1.0, got {}",
1207            x2_final
1208        );
1209        assert!(
1210            result.final_cost < 1e-6,
1211            "Final cost should be near zero, got {}",
1212            result.final_cost
1213        );
1214        Ok(())
1215    }
1216}