//! Levenberg-Marquardt algorithm implementation.
//!
//! The Levenberg-Marquardt (LM) method is a robust and widely-used algorithm for solving
//! nonlinear least squares problems of the form:
//!
//! ```text
//! min f(x) = ½||r(x)||² = ½Σᵢ rᵢ(x)²
//! ```
//!
//! where `r: ℝⁿ → ℝᵐ` is the residual vector function.
//!
//! # Algorithm Overview
//!
//! The Levenberg-Marquardt method solves the damped normal equations at each iteration:
//!
//! ```text
//! (J^T·J + λI)·h = -J^T·r
//! ```
//!
//! where:
//! - `J` is the Jacobian matrix (m × n)
//! - `r` is the residual vector (m × 1)
//! - `h` is the step vector (n × 1)
//! - `λ` is the adaptive damping parameter (scalar)
//! - `I` is the identity matrix (or diagonal scaling matrix)
//!
//! ## Damping Parameter Strategy
//!
//! The damping parameter λ adapts based on step quality:
//!
//! - **λ → 0** (small damping): Behaves like Gauss-Newton with fast quadratic convergence
//! - **λ → ∞** (large damping): Behaves like gradient descent with guaranteed descent direction
//!
//! This interpolation between Newton and gradient descent provides excellent robustness
//! while maintaining fast convergence near the solution.
//!
//! ## Step Acceptance and Damping Update
//!
//! The algorithm evaluates each proposed step using the gain ratio:
//!
//! ```text
//! ρ = (actual reduction) / (predicted reduction)
//! = [f(xₖ) - f(xₖ + h)] / [f(xₖ) - L(h)]
//! ```
//!
//! where `L(h) = f(xₖ) + h^T·g + ½h^T·H·h` is the local quadratic model.
//!
//! **Step acceptance:**
//! - If `ρ > 0`: Accept step (cost decreased), decrease λ to trust the model more
//! - If `ρ ≤ 0`: Reject step (cost increased), increase λ to be more conservative
//!
//! **Damping update** (Nielsen's formula):
//! ```text
//! λₖ₊₁ = λₖ · max(1/3, 1 - (2ρ - 1)³)
//! ```
//!
//! This provides smooth, data-driven adaptation of the damping parameter.
//!
//! ## Convergence Properties
//!
//! - **Global convergence**: Guaranteed to find a stationary point from any starting guess
//! - **Local quadratic convergence**: Near the solution, behaves like Gauss-Newton
//! - **Robust to poor initialization**: Adaptive damping prevents divergence
//! - **Handles ill-conditioning**: Large λ stabilizes nearly singular Hessian
//!
//! ## When to Use
//!
//! Levenberg-Marquardt is the best general-purpose choice when:
//! - Initial parameter guess may be far from the optimum
//! - Problem conditioning is unknown
//! - Robustness is prioritized over raw speed
//! - You want reliable convergence across diverse problem types
//!
//! For problems with specific structure, consider:
//! - [`GaussNewton`](crate::GaussNewton) if well-conditioned with good initialization
//! - [`DogLeg`](crate::DogLeg) for explicit trust region control
//!
//! # Implementation Features
//!
//! - **Sparse matrix support**: Efficient handling of large-scale problems via `faer` sparse library
//! - **Adaptive damping**: Nielsen's formula for smooth parameter adaptation
//! - **Robust linear solvers**: Cholesky (fast) or QR (stable) factorization
//! - **Jacobi scaling**: Optional diagonal preconditioning for mixed-scale problems
//! - **Covariance computation**: Optional uncertainty quantification after convergence
//! - **Manifold operations**: Native support for optimization on Lie groups (SE2, SE3, SO2, SO3)
//! - **Comprehensive diagnostics**: Detailed summaries of convergence and performance
//!
//! # Mathematical Background
//!
//! The augmented Hessian `J^T·J + λI` combines two beneficial properties:
//!
//! 1. **Positive definiteness**: Always solvable even when `J^T·J` is singular
//! 2. **Regularization**: Prevents taking steps in poorly-determined directions
//!
//! The trust region interpretation: λ controls an implicit spherical trust region where
//! larger λ restricts step size, ensuring the linear model remains valid.
//!
//! # Examples
//!
//! ## Basic usage
//!
//! ```no_run
//! use apex_solver::LevenbergMarquardt;
//! use apex_solver::core::problem::Problem;
//! use std::collections::HashMap;
//!
//! # fn main() -> Result<(), Box<dyn std::error::Error>> {
//! let mut problem = Problem::new();
//! // ... add residual blocks (factors) to problem ...
//!
//! let initial_values = HashMap::new();
//! // ... initialize parameters ...
//!
//! let mut solver = LevenbergMarquardt::new();
//! let result = solver.optimize(&problem, &initial_values)?;
//!
//! # Ok(())
//! # }
//! ```
//!
//! ## Advanced configuration
//!
//! ```no_run
//! use apex_solver::optimizer::levenberg_marquardt::{LevenbergMarquardtConfig, LevenbergMarquardt};
//! use apex_solver::linalg::LinearSolverType;
//!
//! # fn main() {
//! let config = LevenbergMarquardtConfig::new()
//! .with_max_iterations(100)
//! .with_cost_tolerance(1e-6)
//! .with_damping(1e-3) // Initial damping
//! .with_damping_bounds(1e-12, 1e12) // Min/max damping
//! .with_jacobi_scaling(true); // Improve conditioning
//!
//! let mut solver = LevenbergMarquardt::with_config(config);
//! # }
//! ```
//!
//! # References
//!
//! - Levenberg, K. (1944). "A Method for the Solution of Certain Non-Linear Problems in Least Squares". *Quarterly of Applied Mathematics*.
//! - Marquardt, D. W. (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters". *Journal of the Society for Industrial and Applied Mathematics*.
//! - Madsen, K., Nielsen, H. B., & Tingleff, O. (2004). *Methods for Non-Linear Least Squares Problems* (2nd ed.). Chapter 3.
//! - Nocedal, J. & Wright, S. (2006). *Numerical Optimization* (2nd ed.). Springer. Chapter 10.
//! - Nielsen, H. B. (1999). "Damping Parameter in Marquardt's Method". Technical Report IMM-REP-1999-05.
use crate::core::problem::{Problem, VariableEnum};
use crate::error;
use crate::linalg::{
LinearSolverType, SchurPreconditioner, SchurSolverAdapter, SchurVariant, SparseCholeskySolver,
SparseLinearSolver, SparseQRSolver,
};
use crate::optimizer::{
ConvergenceParams, InitializedState, IterationStats, OptObserverVec, OptimizerError, Solver,
SolverResult, apply_negative_parameter_step, apply_parameter_step, compute_cost,
};
use apex_manifolds::ManifoldType;
use faer::{Mat, sparse::SparseColMat};
use nalgebra::DVector;
use std::collections::HashMap;
use std::time::{Duration, Instant};
use tracing::debug;
/// Configuration parameters for the Levenberg-Marquardt optimizer.
///
/// Controls the adaptive damping strategy, convergence criteria, and numerical stability
/// enhancements for the Levenberg-Marquardt algorithm.
///
/// # Builder Pattern
///
/// All configuration options can be set using the builder pattern:
///
/// ```
/// use apex_solver::optimizer::levenberg_marquardt::LevenbergMarquardtConfig;
///
/// let config = LevenbergMarquardtConfig::new()
/// .with_max_iterations(100)
/// .with_damping(1e-3)
/// .with_damping_bounds(1e-12, 1e12)
/// .with_jacobi_scaling(true);
/// ```
///
/// # Damping Parameter Behavior
///
/// The damping parameter λ controls the trade-off between Gauss-Newton and gradient descent:
///
/// - **Initial damping** (`damping`): Starting value (default: 1e-4)
/// - **Damping bounds** (`damping_min`, `damping_max`): Valid range (default: 1e-12 to 1e12)
/// - **Adaptation**: Automatically adjusted based on step quality using Nielsen's formula
///
/// # Convergence Criteria
///
/// The optimizer terminates when ANY of the following conditions is met:
///
/// - **Cost tolerance**: `|cost_k - cost_{k-1}| < cost_tolerance`
/// - **Parameter tolerance**: `||step|| < parameter_tolerance`
/// - **Gradient tolerance**: `||J^T·r|| < gradient_tolerance`
/// - **Maximum iterations**: `iteration >= max_iterations`
/// - **Timeout**: `elapsed_time >= timeout`
///
/// # See Also
///
/// - [`LevenbergMarquardt`] - The solver that uses this configuration
/// - [`GaussNewtonConfig`](crate::GaussNewtonConfig) - Undamped variant
/// - [`DogLegConfig`](crate::DogLegConfig) - Trust region alternative
#[derive(Clone)]
pub struct LevenbergMarquardtConfig {
/// Type of linear solver for the linear systems
pub linear_solver_type: LinearSolverType,
/// Maximum number of iterations
pub max_iterations: usize,
/// Convergence tolerance for cost function
pub cost_tolerance: f64,
/// Convergence tolerance for parameter updates
pub parameter_tolerance: f64,
/// Convergence tolerance for gradient norm
pub gradient_tolerance: f64,
/// Timeout duration
pub timeout: Option<Duration>,
/// Initial damping parameter
pub damping: f64,
/// Minimum damping parameter
pub damping_min: f64,
/// Maximum damping parameter
pub damping_max: f64,
/// Damping increase factor (when step rejected)
pub damping_increase_factor: f64,
/// Damping decrease factor (when step accepted)
pub damping_decrease_factor: f64,
/// Damping nu parameter
pub damping_nu: f64,
/// Trust region radius
pub trust_region_radius: f64,
/// Minimum step quality for acceptance
pub min_step_quality: f64,
/// Good step quality threshold
pub good_step_quality: f64,
/// Minimum diagonal value for regularization
pub min_diagonal: f64,
/// Maximum diagonal value for regularization
pub max_diagonal: f64,
/// Minimum objective function cutoff (optional early termination)
///
/// If set, optimization terminates when cost falls below this threshold.
/// Useful for early stopping when a "good enough" solution is acceptable.
///
/// Default: None (disabled)
pub min_cost_threshold: Option<f64>,
/// Minimum trust region radius before termination
///
/// When the trust region radius falls below this value, the optimizer
/// terminates as it indicates the search has converged or the problem
/// is ill-conditioned. Matches Ceres Solver's min_trust_region_radius.
///
/// Default: 1e-32 (Ceres-compatible)
pub min_trust_region_radius: f64,
/// Maximum condition number for Jacobian matrix (optional check)
///
/// If set, the optimizer checks if condition_number(J^T*J) exceeds this
/// threshold and terminates with IllConditionedJacobian status.
/// Note: Computing condition number is expensive, so this is disabled by default.
///
/// Default: None (disabled)
pub max_condition_number: Option<f64>,
/// Minimum relative cost decrease for step acceptance
///
/// Used in computing step quality (rho = actual_reduction / predicted_reduction).
/// Steps with rho < min_relative_decrease are rejected. Matches Ceres Solver's
/// min_relative_decrease parameter.
///
/// Default: 1e-3 (Ceres-compatible)
pub min_relative_decrease: f64,
/// Use Jacobi column scaling (preconditioning)
///
/// When enabled, normalizes Jacobian columns by their L2 norm before solving.
/// This can improve convergence for problems with mixed parameter scales
/// (e.g., positions in meters + angles in radians) but adds ~5-10% overhead.
///
/// Default: false (to avoid performance overhead and faster convergence)
pub use_jacobi_scaling: bool,
/// Compute per-variable covariance matrices (uncertainty estimation)
///
/// When enabled, computes covariance by inverting the Hessian matrix after
/// convergence. The full covariance matrix is extracted into per-variable
/// blocks stored in both Variable structs and SolverResult.
///
/// Default: false (to avoid performance overhead)
pub compute_covariances: bool,
/// Schur complement solver variant (for bundle adjustment problems)
///
/// When using LinearSolverType::SparseSchurComplement, this determines which
/// variant of the Schur complement method to use:
/// - Sparse: Direct sparse Cholesky factorization (most accurate, moderate speed)
/// - Iterative: Preconditioned Conjugate Gradients (memory efficient, good for large problems)
/// - PowerSeries: Power series approximation (fastest, less accurate)
///
/// Default: Sparse
pub schur_variant: SchurVariant,
/// Schur complement preconditioner type
///
/// Determines the preconditioning strategy for iterative Schur methods:
/// - Diagonal: Simple diagonal preconditioner (fast, less effective)
/// - BlockDiagonal: Block-diagonal preconditioner (balanced)
/// - IncompleteCholesky: Incomplete Cholesky factorization (slower, more effective)
///
/// Default: Diagonal
pub schur_preconditioner: SchurPreconditioner,
// Note: Visualization is now handled via the observer pattern.
// Use `solver.add_observer(RerunObserver::new(true)?)` to enable visualization.
// This provides cleaner separation of concerns and allows multiple observers.
}
impl Default for LevenbergMarquardtConfig {
fn default() -> Self {
Self {
linear_solver_type: LinearSolverType::default(),
// Ceres Solver default: 50 (changed from 100 for compatibility)
max_iterations: 50,
// Ceres Solver default: 1e-6 (changed from 1e-8 for compatibility)
cost_tolerance: 1e-6,
// Ceres Solver default: 1e-8 (unchanged)
parameter_tolerance: 1e-8,
// Ceres Solver default: 1e-10 (changed from 1e-8 for compatibility)
// Note: Typically should be 1e-4 * cost_tolerance per Ceres docs
gradient_tolerance: 1e-10,
timeout: None,
damping: 1e-3, // Increased from 1e-4 for better initial convergence on BA
damping_min: 1e-12,
damping_max: 1e12,
damping_increase_factor: 10.0,
damping_decrease_factor: 0.3,
damping_nu: 2.0,
trust_region_radius: 1e4,
min_step_quality: 0.0,
good_step_quality: 0.75,
min_diagonal: 1e-6,
max_diagonal: 1e32,
// New Ceres-compatible parameters
min_cost_threshold: None,
min_trust_region_radius: 1e-32,
max_condition_number: None,
min_relative_decrease: 1e-3,
// Existing parameters
// Jacobi scaling disabled by default for Schur solvers (incompatible with block structure)
// Enable manually for Cholesky/QR solvers on mixed-scale problems
use_jacobi_scaling: false,
compute_covariances: false,
// Schur complement parameters
schur_variant: SchurVariant::default(),
schur_preconditioner: SchurPreconditioner::default(),
}
}
}
impl LevenbergMarquardtConfig {
/// Create a new Levenberg-Marquardt configuration with default values.
pub fn new() -> Self {
Self::default()
}
/// Set the linear solver type
pub fn with_linear_solver_type(mut self, linear_solver_type: LinearSolverType) -> Self {
self.linear_solver_type = linear_solver_type;
self
}
/// Set the maximum number of iterations
pub fn with_max_iterations(mut self, max_iterations: usize) -> Self {
self.max_iterations = max_iterations;
self
}
/// Set the cost tolerance
pub fn with_cost_tolerance(mut self, cost_tolerance: f64) -> Self {
self.cost_tolerance = cost_tolerance;
self
}
/// Set the parameter tolerance
pub fn with_parameter_tolerance(mut self, parameter_tolerance: f64) -> Self {
self.parameter_tolerance = parameter_tolerance;
self
}
/// Set the gradient tolerance
pub fn with_gradient_tolerance(mut self, gradient_tolerance: f64) -> Self {
self.gradient_tolerance = gradient_tolerance;
self
}
/// Set the timeout duration
pub fn with_timeout(mut self, timeout: Duration) -> Self {
self.timeout = Some(timeout);
self
}
/// Set the initial damping parameter.
pub fn with_damping(mut self, damping: f64) -> Self {
self.damping = damping;
self
}
/// Set the damping parameter bounds.
pub fn with_damping_bounds(mut self, min: f64, max: f64) -> Self {
self.damping_min = min;
self.damping_max = max;
self
}
/// Set the damping adjustment factors.
pub fn with_damping_factors(mut self, increase: f64, decrease: f64) -> Self {
self.damping_increase_factor = increase;
self.damping_decrease_factor = decrease;
self
}
/// Set the trust region parameters.
pub fn with_trust_region(mut self, radius: f64, min_quality: f64, good_quality: f64) -> Self {
self.trust_region_radius = radius;
self.min_step_quality = min_quality;
self.good_step_quality = good_quality;
self
}
/// Set minimum objective function cutoff for early termination.
///
/// When set, optimization terminates with MinCostThresholdReached status
/// if the cost falls below this threshold. Useful for early stopping when
/// a "good enough" solution is acceptable.
pub fn with_min_cost_threshold(mut self, min_cost: f64) -> Self {
self.min_cost_threshold = Some(min_cost);
self
}
/// Set minimum trust region radius before termination.
///
/// When the trust region radius falls below this value, optimization
/// terminates with TrustRegionRadiusTooSmall status.
/// Default: 1e-32 (Ceres-compatible)
pub fn with_min_trust_region_radius(mut self, min_radius: f64) -> Self {
self.min_trust_region_radius = min_radius;
self
}
/// Set maximum condition number for Jacobian matrix.
///
/// If set, the optimizer checks if condition_number(J^T*J) exceeds this
/// threshold and terminates with IllConditionedJacobian status.
/// Note: Computing condition number is expensive, disabled by default.
pub fn with_max_condition_number(mut self, max_cond: f64) -> Self {
self.max_condition_number = Some(max_cond);
self
}
/// Set minimum relative cost decrease for step acceptance.
///
/// Steps with rho = (actual_reduction / predicted_reduction) below this
/// threshold are rejected. Default: 1e-3 (Ceres-compatible)
pub fn with_min_relative_decrease(mut self, min_decrease: f64) -> Self {
self.min_relative_decrease = min_decrease;
self
}
/// Enable or disable Jacobi column scaling (preconditioning).
///
/// When enabled, normalizes Jacobian columns by their L2 norm before solving.
/// Can improve convergence for mixed-scale problems but adds ~5-10% overhead.
pub fn with_jacobi_scaling(mut self, use_jacobi_scaling: bool) -> Self {
self.use_jacobi_scaling = use_jacobi_scaling;
self
}
/// Enable or disable covariance computation (uncertainty estimation).
///
/// When enabled, computes the full covariance matrix by inverting the Hessian
/// after convergence, then extracts per-variable covariance blocks.
pub fn with_compute_covariances(mut self, compute_covariances: bool) -> Self {
self.compute_covariances = compute_covariances;
self
}
/// Set Schur complement solver variant
pub fn with_schur_variant(mut self, variant: SchurVariant) -> Self {
self.schur_variant = variant;
self
}
/// Set Schur complement preconditioner
pub fn with_schur_preconditioner(mut self, preconditioner: SchurPreconditioner) -> Self {
self.schur_preconditioner = preconditioner;
self
}
/// Configuration optimized for bundle adjustment problems.
///
/// This preset uses settings tuned for large-scale bundle adjustment:
/// - **Schur complement solver** with iterative PCG (memory efficient)
/// - **Schur-Jacobi preconditioner** (Ceres-style, best PCG convergence)
/// - **Moderate initial damping** (1e-3) - not too aggressive
/// - **200 max iterations** (BA often needs more iterations for full convergence)
/// - **Very tight tolerances** matching Ceres Solver for accurate reconstruction
///
/// This configuration matches Ceres Solver's recommended BA settings and
/// should achieve similar convergence quality.
///
/// # Example
///
/// ```
/// use apex_solver::optimizer::levenberg_marquardt::LevenbergMarquardtConfig;
///
/// let config = LevenbergMarquardtConfig::for_bundle_adjustment();
/// ```
pub fn for_bundle_adjustment() -> Self {
Self::default()
.with_linear_solver_type(LinearSolverType::SparseSchurComplement)
.with_schur_variant(SchurVariant::Iterative)
.with_schur_preconditioner(SchurPreconditioner::SchurJacobi)
.with_damping(1e-3) // Moderate initial damping (Ceres default)
.with_max_iterations(20) // Reduced for early stop when RMSE < 1px
// Match Ceres tolerances for faster convergence
.with_cost_tolerance(1e-6) // Ceres function_tolerance (was 1e-12)
.with_parameter_tolerance(1e-8) // Ceres parameter_tolerance (was 1e-14)
.with_gradient_tolerance(1e-10) // Relaxed (was 1e-16)
}
/// Enable real-time visualization (graphical debugging).
///
/// When enabled, optimization progress is logged to a Rerun viewer with:
/// - Time series plots of cost, gradient norm, damping, step quality
/// - Sparse Hessian matrix visualization as heat map
/// - Gradient vector visualization
/// - Real-time manifold state updates (for SE2/SE3 problems)
///
/// **Note:** Requires the `visualization` feature to be enabled in `Cargo.toml`.
/// Use `verbose` for terminal logging.
///
/// # Arguments
///
/// * `enable` - Whether to enable visualization
// Note: with_visualization() method has been removed.
// Use the observer pattern instead:
// let mut solver = LevenbergMarquardt::with_config(config);
// solver.add_observer(RerunObserver::new(true)?);
// This provides cleaner separation and allows multiple observers.
/// Print configuration parameters (verbose mode only)
pub fn print_configuration(&self) {
debug!(
"Configuration:\n Solver: Levenberg-Marquardt\n Linear solver: {:?}\n Convergence Criteria:\n Max iterations: {}\n Cost tolerance: {:.2e}\n Parameter tolerance: {:.2e}\n Gradient tolerance: {:.2e}\n Timeout: {:?}\n Damping Parameters:\n Initial damping: {:.2e}\n Damping range: [{:.2e}, {:.2e}]\n Increase factor: {:.2}\n Decrease factor: {:.2}\n Trust Region:\n Initial radius: {:.2e}\n Min step quality: {:.2}\n Good step quality: {:.2}\n Numerical Settings:\n Jacobi scaling: {}\n Compute covariances: {}",
self.linear_solver_type,
self.max_iterations,
self.cost_tolerance,
self.parameter_tolerance,
self.gradient_tolerance,
self.timeout,
self.damping,
self.damping_min,
self.damping_max,
self.damping_increase_factor,
self.damping_decrease_factor,
self.trust_region_radius,
self.min_step_quality,
self.good_step_quality,
if self.use_jacobi_scaling {
"enabled"
} else {
"disabled"
},
if self.compute_covariances {
"enabled"
} else {
"disabled"
}
);
}
}
/// Result from step computation
struct StepResult {
step: Mat<f64>,
gradient_norm: f64,
predicted_reduction: f64,
}
/// Result from step evaluation
struct StepEvaluation {
accepted: bool,
cost_reduction: f64,
rho: f64,
}
/// Levenberg-Marquardt solver for nonlinear least squares optimization.
///
/// Implements the damped Gauss-Newton method with adaptive damping parameter λ that
/// interpolates between Gauss-Newton and gradient descent based on step quality.
///
/// # Algorithm
///
/// At each iteration k:
/// 1. Compute residual `r(xₖ)` and Jacobian `J(xₖ)`
/// 2. Solve augmented system: `(J^T·J + λI)·h = -J^T·r`
/// 3. Evaluate step quality: `ρ = (actual reduction) / (predicted reduction)`
/// 4. If `ρ > 0`: Accept step and update `xₖ₊₁ = xₖ ⊕ h`, decrease λ
/// 5. If `ρ ≤ 0`: Reject step (keep `xₖ₊₁ = xₖ`), increase λ
/// 6. Check convergence criteria
///
/// The damping parameter λ is updated using Nielsen's smooth formula:
/// `λₖ₊₁ = λₖ · max(1/3, 1 - (2ρ - 1)³)` for accepted steps,
/// or `λₖ₊₁ = λₖ · ν` (with increasing ν) for rejected steps.
///
/// # Examples
///
/// ```no_run
/// use apex_solver::optimizer::levenberg_marquardt::{LevenbergMarquardtConfig, LevenbergMarquardt};
/// use apex_solver::linalg::LinearSolverType;
///
/// # fn main() {
/// let config = LevenbergMarquardtConfig::new()
/// .with_max_iterations(100)
/// .with_damping(1e-3)
/// .with_damping_bounds(1e-12, 1e12)
/// .with_jacobi_scaling(true);
///
/// let mut solver = LevenbergMarquardt::with_config(config);
/// # }
/// ```
///
/// # See Also
///
/// - [`LevenbergMarquardtConfig`] - Configuration options
/// - [`GaussNewton`](crate::GaussNewton) - Undamped variant (faster but less robust)
/// - [`DogLeg`](crate::DogLeg) - Alternative trust region method
pub struct LevenbergMarquardt {
config: LevenbergMarquardtConfig,
jacobi_scaling: Option<SparseColMat<usize, f64>>,
observers: OptObserverVec,
}
impl Default for LevenbergMarquardt {
fn default() -> Self {
Self::new()
}
}
impl LevenbergMarquardt {
/// Create a new Levenberg-Marquardt solver with default configuration.
pub fn new() -> Self {
Self::with_config(LevenbergMarquardtConfig::default())
}
/// Create a new Levenberg-Marquardt solver with the given configuration.
pub fn with_config(config: LevenbergMarquardtConfig) -> Self {
Self {
config,
jacobi_scaling: None,
observers: OptObserverVec::new(),
}
}
/// Add an observer to monitor optimization progress.
///
/// Observers are notified at each iteration with the current variable values.
/// This enables real-time visualization, logging, metrics collection, etc.
///
/// # Arguments
///
/// * `observer` - Any type implementing `OptObserver`
///
/// # Examples
///
/// ```no_run
/// use apex_solver::{LevenbergMarquardt, LevenbergMarquardtConfig};
/// # use apex_solver::core::problem::Problem;
/// # use std::collections::HashMap;
///
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
/// let mut solver = LevenbergMarquardt::new();
///
/// #[cfg(feature = "visualization")]
/// {
/// use apex_solver::observers::RerunObserver;
/// let rerun_observer = RerunObserver::new(true)?;
/// solver.add_observer(rerun_observer);
/// }
///
/// // ... optimize ...
/// # Ok(())
/// # }
/// ```
pub fn add_observer(&mut self, observer: impl crate::optimizer::OptObserver + 'static) {
self.observers.add(observer);
}
/// Update damping parameter based on step quality using trust region approach
/// Reference: Introduction to Optimization and Data Fitting
/// Algorithm 6.18
fn update_damping(&mut self, rho: f64) -> bool {
if rho > 0.0 {
// Step accepted - decrease damping
let coff = 2.0 * rho - 1.0;
self.config.damping *= (1.0_f64 / 3.0).max(1.0 - coff * coff * coff);
self.config.damping = self.config.damping.max(self.config.damping_min);
self.config.damping_nu = 2.0;
true
} else {
// Step rejected - increase damping
self.config.damping *= self.config.damping_nu;
self.config.damping_nu *= 2.0;
self.config.damping = self.config.damping.min(self.config.damping_max);
false
}
}
/// Compute predicted cost reduction from linear model
/// Standard LM formula: 0.5 * step^T * (damping * step - gradient)
fn compute_predicted_reduction(&self, step: &Mat<f64>, gradient: &Mat<f64>) -> f64 {
// Standard Levenberg-Marquardt predicted reduction formula
// predicted_reduction = -step^T * gradient - 0.5 * step^T * H * step
// = 0.5 * step^T * (damping * step - gradient)
let diff = self.config.damping * step - gradient;
(0.5 * step.transpose() * &diff)[(0, 0)]
}
/// Compute optimization step by solving the augmented system
fn compute_levenberg_marquardt_step(
&self,
residuals: &Mat<f64>,
scaled_jacobian: &SparseColMat<usize, f64>,
linear_solver: &mut Box<dyn SparseLinearSolver>,
) -> Result<StepResult, OptimizerError> {
// Solve augmented equation: (J_scaled^T * J_scaled + λI) * dx_scaled = -J_scaled^T * r
let residuals_owned = residuals.as_ref().to_owned();
let scaled_step = linear_solver
.solve_augmented_equation(&residuals_owned, scaled_jacobian, self.config.damping)
.map_err(|e| OptimizerError::LinearSolveFailed(e.to_string()).log_with_source(e))?;
// Get cached gradient and Hessian from the solver
let gradient = linear_solver.get_gradient().ok_or_else(|| {
OptimizerError::NumericalInstability("Gradient not available".into()).log()
})?;
let _hessian = linear_solver.get_hessian().ok_or_else(|| {
OptimizerError::NumericalInstability("Hessian not available".into()).log()
})?;
let gradient_norm = gradient.norm_l2();
// Apply inverse Jacobi scaling to get final step (if enabled)
let step = if self.config.use_jacobi_scaling {
let scaling = self
.jacobi_scaling
.as_ref()
.ok_or_else(|| OptimizerError::JacobiScalingNotInitialized.log())?;
&scaled_step * scaling
} else {
scaled_step
};
// Compute predicted reduction using scaled values
let predicted_reduction = self.compute_predicted_reduction(&step, gradient);
Ok(StepResult {
step,
gradient_norm,
predicted_reduction,
})
}
/// Evaluate and apply step, handling acceptance/rejection based on step quality
fn evaluate_and_apply_step(
&mut self,
step_result: &StepResult,
state: &mut InitializedState,
problem: &Problem,
) -> error::ApexSolverResult<StepEvaluation> {
// Apply parameter updates using manifold operations
let _step_norm = apply_parameter_step(
&mut state.variables,
step_result.step.as_ref(),
&state.sorted_vars,
);
// Compute new cost (residual only, no Jacobian needed for step evaluation)
let new_residual = problem.compute_residual_sparse(&state.variables)?;
let new_cost = compute_cost(&new_residual);
// Compute step quality
let rho = crate::optimizer::compute_step_quality(
state.current_cost,
new_cost,
step_result.predicted_reduction,
);
// Update damping and decide whether to accept step
let accepted = self.update_damping(rho);
let cost_reduction = if accepted {
// Accept the step - parameters already updated
let reduction = state.current_cost - new_cost;
state.current_cost = new_cost;
reduction
} else {
// Reject the step - revert parameter changes
apply_negative_parameter_step(
&mut state.variables,
step_result.step.as_ref(),
&state.sorted_vars,
);
0.0
};
Ok(StepEvaluation {
accepted,
cost_reduction,
rho,
})
}
pub fn optimize(
&mut self,
problem: &Problem,
initial_params: &HashMap<String, (ManifoldType, DVector<f64>)>,
) -> Result<SolverResult<HashMap<String, VariableEnum>>, error::ApexSolverError> {
let start_time = Instant::now();
let mut iteration = 0;
let mut cost_evaluations = 1; // Initial cost evaluation
let mut jacobian_evaluations = 0;
let mut successful_steps = 0;
let mut unsuccessful_steps = 0;
// Initialize optimization state
let mut state = crate::optimizer::initialize_optimization_state(problem, initial_params)?;
// Create linear solver - must be after variable initialization for Schur solver
let mut linear_solver: Box<dyn SparseLinearSolver> = match self.config.linear_solver_type {
LinearSolverType::SparseCholesky => Box::new(SparseCholeskySolver::new()),
LinearSolverType::SparseQR => Box::new(SparseQRSolver::new()),
LinearSolverType::SparseSchurComplement => Box::new(
SchurSolverAdapter::new_with_structure_and_config(
&state.variables,
&state.variable_index_map,
self.config.schur_variant,
self.config.schur_preconditioner,
)
.map_err(|e| {
OptimizerError::LinearSolveFailed(format!(
"Failed to initialize Schur solver: {}",
e
))
.log()
})?,
),
};
// Initialize summary tracking variables
let mut max_gradient_norm: f64 = 0.0;
let mut max_parameter_update_norm: f64 = 0.0;
let mut total_cost_reduction = 0.0;
let mut final_gradient_norm;
let mut final_parameter_update_norm;
// Initialize iteration statistics tracking
let mut iteration_stats = Vec::with_capacity(self.config.max_iterations);
let mut previous_cost = state.current_cost;
// Print configuration and header if debug level is enabled
if tracing::enabled!(tracing::Level::DEBUG) {
self.config.print_configuration();
IterationStats::print_header();
}
// Main optimization loop
loop {
let iter_start = Instant::now();
// Evaluate residuals and Jacobian
let (residuals, jacobian) = problem.compute_residual_and_jacobian_sparse(
&state.variables,
&state.variable_index_map,
&state.symbolic_structure,
)?;
jacobian_evaluations += 1;
// Process Jacobian (apply scaling if enabled)
let scaled_jacobian = if self.config.use_jacobi_scaling {
crate::optimizer::process_jacobian(&jacobian, &mut self.jacobi_scaling, iteration)?
} else {
jacobian
};
// Compute optimization step
let step_result = self.compute_levenberg_marquardt_step(
&residuals,
&scaled_jacobian,
&mut linear_solver,
)?;
// Update tracking variables
max_gradient_norm = max_gradient_norm.max(step_result.gradient_norm);
final_gradient_norm = step_result.gradient_norm;
let step_norm = step_result.step.norm_l2();
max_parameter_update_norm = max_parameter_update_norm.max(step_norm);
final_parameter_update_norm = step_norm;
// Evaluate and apply step (handles accept/reject)
let step_eval = self.evaluate_and_apply_step(&step_result, &mut state, problem)?;
cost_evaluations += 1;
// Update counters based on acceptance
if step_eval.accepted {
successful_steps += 1;
total_cost_reduction += step_eval.cost_reduction;
} else {
unsuccessful_steps += 1;
}
// OPTIMIZATION: Only collect iteration statistics if debug level is enabled
// This eliminates ~2-5ms overhead per iteration for non-debug optimization
if tracing::enabled!(tracing::Level::DEBUG) {
let iter_elapsed_ms = iter_start.elapsed().as_secs_f64() * 1000.0;
let total_elapsed_ms = start_time.elapsed().as_secs_f64() * 1000.0;
let stats = IterationStats {
iteration,
cost: state.current_cost,
cost_change: previous_cost - state.current_cost,
gradient_norm: step_result.gradient_norm,
step_norm,
tr_ratio: step_eval.rho,
tr_radius: self.config.damping,
ls_iter: 0, // Direct solver (Cholesky) has no iterations
iter_time_ms: iter_elapsed_ms,
total_time_ms: total_elapsed_ms,
accepted: step_eval.accepted,
};
iteration_stats.push(stats.clone());
stats.print_line();
}
previous_cost = state.current_cost;
// Notify all observers with current state
crate::optimizer::notify_observers(
&mut self.observers,
&state.variables,
iteration,
state.current_cost,
step_result.gradient_norm,
Some(self.config.damping),
step_norm,
Some(step_eval.rho),
linear_solver.as_ref(),
);
// Check convergence
let elapsed = start_time.elapsed();
// Compute parameter norm for relative parameter tolerance check
let parameter_norm = crate::optimizer::compute_parameter_norm(&state.variables);
// Compute new cost for convergence check (state may already have new cost if step accepted)
let new_cost = if step_eval.accepted {
state.current_cost
} else {
// Use cost before step application
state.current_cost
};
// Cost before this step (need to add back reduction if step was accepted)
let cost_before_step = if step_eval.accepted {
state.current_cost + step_eval.cost_reduction
} else {
state.current_cost
};
if let Some(status) = crate::optimizer::check_convergence(&ConvergenceParams {
iteration,
current_cost: cost_before_step,
new_cost,
parameter_norm,
parameter_update_norm: step_norm,
gradient_norm: step_result.gradient_norm,
elapsed,
step_accepted: step_eval.accepted,
max_iterations: self.config.max_iterations,
gradient_tolerance: self.config.gradient_tolerance,
parameter_tolerance: self.config.parameter_tolerance,
cost_tolerance: self.config.cost_tolerance,
min_cost_threshold: self.config.min_cost_threshold,
timeout: self.config.timeout,
trust_region_radius: Some(self.config.trust_region_radius),
min_trust_region_radius: Some(self.config.min_trust_region_radius),
}) {
// Print summary only if debug level is enabled
if tracing::enabled!(tracing::Level::DEBUG) {
let summary = crate::optimizer::create_optimizer_summary(
"Levenberg-Marquardt",
state.initial_cost,
state.current_cost,
iteration + 1,
Some(successful_steps),
Some(unsuccessful_steps),
max_gradient_norm,
final_gradient_norm,
max_parameter_update_norm,
final_parameter_update_norm,
total_cost_reduction,
elapsed,
iteration_stats.clone(),
status.clone(),
Some(self.config.damping),
None,
Some(step_eval.rho),
);
debug!("{}", summary);
}
// Compute covariances if enabled
let covariances = if self.config.compute_covariances {
problem.compute_and_set_covariances(
&mut linear_solver,
&mut state.variables,
&state.variable_index_map,
)
} else {
None
};
// Notify observers that optimization is complete
let final_parameters: HashMap<String, VariableEnum> = state
.variables
.iter()
.map(|(k, v)| (k.clone(), v.clone()))
.collect();
self.observers
.notify_complete(&final_parameters, iteration + 1);
return Ok(crate::optimizer::build_solver_result(
status,
iteration + 1,
state,
elapsed,
final_gradient_norm,
final_parameter_update_norm,
cost_evaluations,
jacobian_evaluations,
covariances,
));
}
// Note: Max iterations and timeout checks are now handled inside check_convergence()
iteration += 1;
}
}
}
// Implement Solver trait
impl Solver for LevenbergMarquardt {
type Config = LevenbergMarquardtConfig;
type Error = error::ApexSolverError;
fn new() -> Self {
Self::default()
}
fn optimize(
&mut self,
problem: &Problem,
initial_params: &HashMap<String, (ManifoldType, DVector<f64>)>,
) -> Result<SolverResult<HashMap<String, VariableEnum>>, Self::Error> {
self.optimize(problem, initial_params)
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::factors::Factor;
use crate::optimizer::OptimizationStatus;
use nalgebra::{DMatrix, dvector};
/// Custom Rosenbrock Factor 1: r1 = 10(x2 - x1²)
/// Demonstrates extensibility - custom factors can be defined outside of factors.rs
#[derive(Debug, Clone)]
struct RosenbrockFactor1;
impl Factor for RosenbrockFactor1 {
fn linearize(
&self,
params: &[DVector<f64>],
compute_jacobian: bool,
) -> (DVector<f64>, Option<DMatrix<f64>>) {
let x1 = params[0][0];
let x2 = params[1][0];
// Residual: r1 = 10(x2 - x1²)
let residual = dvector![10.0 * (x2 - x1 * x1)];
let jacobian = if compute_jacobian {
// Jacobian: ∂r1/∂x1 = -20*x1, ∂r1/∂x2 = 10
let mut jacobian = DMatrix::zeros(1, 2);
jacobian[(0, 0)] = -20.0 * x1;
jacobian[(0, 1)] = 10.0;
Some(jacobian)
} else {
None
};
(residual, jacobian)
}
fn get_dimension(&self) -> usize {
1
}
}
/// Custom Rosenbrock Factor 2: r2 = 1 - x1
/// Demonstrates extensibility - custom factors can be defined outside of factors.rs
#[derive(Debug, Clone)]
struct RosenbrockFactor2;
impl Factor for RosenbrockFactor2 {
fn linearize(
&self,
params: &[DVector<f64>],
compute_jacobian: bool,
) -> (DVector<f64>, Option<DMatrix<f64>>) {
let x1 = params[0][0];
// Residual: r2 = 1 - x1
let residual = dvector![1.0 - x1];
let jacobian = if compute_jacobian {
// Jacobian: ∂r2/∂x1 = -1
Some(DMatrix::from_element(1, 1, -1.0))
} else {
None
};
(residual, jacobian)
}
fn get_dimension(&self) -> usize {
1
}
}
#[test]
fn test_rosenbrock_optimization() -> Result<(), Box<dyn std::error::Error>> {
// Rosenbrock function test:
// Minimize: r1² + r2² where
// r1 = 10(x2 - x1²)
// r2 = 1 - x1
// Starting point: [-1.2, 1.0]
// Expected minimum: [1.0, 1.0]
let mut problem = Problem::new();
let mut initial_values = HashMap::new();
// Add variables using Rn manifold (Euclidean space)
initial_values.insert("x1".to_string(), (ManifoldType::RN, dvector![-1.2]));
initial_values.insert("x2".to_string(), (ManifoldType::RN, dvector![1.0]));
// Add custom factors (demonstrates extensibility!)
problem.add_residual_block(&["x1", "x2"], Box::new(RosenbrockFactor1), None);
problem.add_residual_block(&["x1"], Box::new(RosenbrockFactor2), None);
// Configure Levenberg-Marquardt optimizer
let config = LevenbergMarquardtConfig::new()
.with_max_iterations(100)
.with_cost_tolerance(1e-8)
.with_parameter_tolerance(1e-8)
.with_gradient_tolerance(1e-10);
let mut solver = LevenbergMarquardt::with_config(config);
let result = solver.optimize(&problem, &initial_values)?;
// Extract final values
let x1_final = result
.parameters
.get("x1")
.ok_or("x1 not found")?
.to_vector()[0];
let x2_final = result
.parameters
.get("x2")
.ok_or("x2 not found")?
.to_vector()[0];
// Verify convergence to [1.0, 1.0]
assert!(
matches!(
result.status,
OptimizationStatus::Converged
| OptimizationStatus::CostToleranceReached
| OptimizationStatus::ParameterToleranceReached
| OptimizationStatus::GradientToleranceReached
),
"Optimization should converge"
);
assert!(
(x1_final - 1.0).abs() < 1e-4,
"x1 should converge to 1.0, got {}",
x1_final
);
assert!(
(x2_final - 1.0).abs() < 1e-4,
"x2 should converge to 1.0, got {}",
x2_final
);
assert!(
result.final_cost < 1e-6,
"Final cost should be near zero, got {}",
result.final_cost
);
Ok(())
}
}