Levenberg–Marquardt Method

The Levenberg–Marquardt (LM) Method is an iterative optimization algorithm that blends the strengths of both Gauss–Newton (fast convergence) and gradient descent (guarantees the optimal) methods. It tries to balance the advantages of both methods by incorporating a damping parameter.

1. Method

The compromise between steepest descent and newton's method is achieved by introducing a damping parameter \( \alpha \).

Descent direction \( \mathbf{d^{sd}} \) of steepest descent methods is
Descent direction of Newton's method is
Descent direction of Levenberg–Marquardt method is

When \( \alpha \) is small, the method behaves like Newton's method.
When \( \alpha \) is large, it behaves like steepest descent.

The Levenberg–Marquardt method iteratively updates the parameters \( \mathbf{x} \) using the formula: \[ \mathbf{x}_{k+1} = \mathbf{x}_k + t_k \cdot \mathbf{d^{LM}} \] Where \( t_k \) is the step size at iteration \( k \). The process continues until convergence.

2. Algorithm

The Levenberg–Marquardt algorithm can be summarized in the following steps:

Initialize \( \mathbf{x}_0 \) and \( \alpha_0 \)
Compute the Hessian \( H \)
Compute the descent direction \( d^{LM} = -(H + \alpha I)^{-1} \nabla f(\mathbf{x}) \)
Compute the step size \( t_k \)
Update the parameters: \( x_{k+1} = x_k + t_k d^{LM} \)
Check for convergence. Continue from step 2 if convergence not satisfied

3. Example

Consider the following example function: \[ f(x, y) = (x - 1)^2 + (y - 2)^2 \] The gradient of this function is: \[ \nabla f(x, y) = (2(x - 1), 2(y - 2)) \] The Hessian is: \[ H(x, y) = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} \] The function \( f(x, y) \) is a simple quadratic function with a minimum at \( (1, 2) \). We can use the Levenberg–Marquardt algorithm to find the minimum of this function.

The algorithm starts with an initial guess \( (x_0, y_0) = (0, 0) \), a damping parameter \( \alpha = 0.5 \), and step length \( t_k = 0.5 \). The first iteration as follows: \[ \begin{align*} \nabla f(x_0, y_0) &= (2(0 - 1), 2(0 - 2)) = (-2, -4) \\ H(x_0, y_0) &= \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} \\ \alpha I &= \begin{pmatrix} 0.5 & 0 \\ 0 & 0.5 \end{pmatrix} \\ d^{LM} &= -\begin{pmatrix} 2 + 0.5 & 0 \\ 0 & 2 + 0.5 \end{pmatrix}^{-1} \begin{pmatrix} -2 \\ -4 \end{pmatrix} \\ &= -\begin{pmatrix} 0.4 & 0 \\ 0 & 0.4 \end{pmatrix} \begin{pmatrix} -2 \\ -4 \end{pmatrix} \\ &= \begin{pmatrix} 0.8 \\ 1.6 \end{pmatrix} \\ x_1 &= x_0 + t_k d^{LM} \\ &= \begin{pmatrix} 0 \\ 0\end{pmatrix} + 0.5 \begin{pmatrix} 0.8 \\ 1.6 \end{pmatrix} \\ &= \begin{pmatrix} 0.4 \\ 0.8 \end{pmatrix} \end{align*} \] The algorithm iteratively updates the parameters until convergence.

4. Discussion

Combines the speed of Newton's method with the stability of gradient descent
Robust to poor initial estimates compared to Newton’s method