Iteration | x | y | f(x, y) |
---|
While first-order methods like Steepest Descent and Conjugate Gradient use only the gradient of the function \( f(\mathbf{x}) \) to calculate the direction of local minima. At a given function state, Newton's Method uses both first derivative and second derivative of the function to move in the direction on local minima. Using second derivative allows Newton's method to reach the local minima faster than first-order methods (Steepest descent and Conjugate gradient method). However, it is really improtant to choose the initial guess \( \mathbf{x}_0 \) close to the local minima, otherwise the method may diverge.
Newton's method is an iterative optimization algorithm that uses the first and second derivatives of a function to find its local minima or maxima. In each iteration, the algorithm updates the current point \( \mathbf{x}_k \) using the formula: \[ \mathbf{x}_{k+1} = \mathbf{x}_k + d_k \quad \quad \text{where } \quad d_k = - H^{-1}(\mathbf{x}_k) \nabla f(\mathbf{x}_k) \] where \( d_k \) is the descent direction, \( H(\mathbf{x}_k) \) is the second derivative, and \( \nabla f(\mathbf{x}_k) \) is the gradient of the function at \( \mathbf{x}_k \). Second derivative, \( H(\mathbf{x}_k) \), is also known as the Hessian. For a function with one variable, hessian is a scalar value, while for function with multiple variables, it is a sqaure matrix. The Hessian matrix provides information about the curvature of the function, allowing for more accurate step directions than first-order methods.
The algorithm for Newton's method is as follows:
Consider the function \( f(x, y) = x^2 + y^2 \) with global minimum at \( (0, 0) \). The gradient and Hessian are given by: \[ \nabla f(x, y) = \begin{pmatrix} 2x \\ 2y \end{pmatrix}, \quad H(x, y) = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} \] Starting from an initial guess \( (x_0, y_0) = (1, 1) \), the algorithm iteratively updates the point until convergence. The first iteration would yield: \[ \mathbf{x}_1 = \begin{pmatrix} 1 \\ 1 \end{pmatrix} - \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}^{-1} \begin{pmatrix} 2 \\ 2 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \] The algorithm converges to the minimum \( (0, 0) \) in one step.
Newton's method has several advantages:
However, it also has some disadvantages: