Iteration x y f(x, y)

Conjugate Gradient Method

Conjugate gradient method is also an iterative first-order optimization algorithm used to find the minimum of a function. While steepest descent method uses the gradient direction at current iteration, conjugate gradient method takes into account the previous iterations to find a more optimal direction. This method is particularly improves the fast convergence of the algorithm.

1. Method

Consider a function \( f: \mathbb{R}^n \rightarrow \mathbb{R} \) with \( x \) as the input variable, \( \nabla f(x) \) as the gradient, and \( x_0 \) is the initial guess to the optimization problem. For the first iteration \(k = 1\) the input variable \(x = x_0\) is updated to \(x = x_1\) using below formula:

\[ x_1 = x_0 + t_0 d_0 \]

where \(t_0\) is the step length, and \(d_0 = - \nabla f(x_0)\) is the descent direction at the first iteration

Subsequently, for the next iteration \(k = 2\), the input variable \(x\) is updated to \(x_2\) using the formula:

\[ x_2 = x_1 + t_1\,d_1, \quad\quad \text{where } \quad d_1 = -\nabla f(x_1) + \beta_1 d_0 \]

where \(d_2\) is the descent direction, and \(t_2\) is the step length in the second iteration. It is observed that at each iteration \( k > 1 \) the algorithms is using gradients from previous two iterations to compute the descent direction.

To generalize the above function for \((k)^{th}\) iteration, where \( k > 1 \):

\[ x_{k+1} = x_{k} + t_k\,d_k, \quad\quad \text{where } \quad d_k = -\nabla f(x_{k}) + \beta_k d_{k-1} \]

There are several methods proposed for the computation of parameter \(\beta_k\), see the section below. Using of gradients from two previous iterations helps to reach optimal value faster.

2. Beta Computation/Method

The parameter \(\beta_k\) is computed according to the following equations:

2.1 Fletcher-Reeves (FR)
\[ \beta_k^{FR} = \frac{|-\nabla f(x_k)|^2}{|-\nabla f(x_{k-1})|^2} \]
2.2 Polak-Ribiere (PR)
\[ \beta_k^{PR} = \frac{\nabla f(x_k)^T (\nabla f(x_k) - \nabla f(x_{k-1}))}{|-\nabla f(x_{k-1})|^2} \]
2.3 Hestenes-Stiefel (HS)
\[ \beta_k^{HS} = \frac{\nabla f(x_k)^T (\nabla f(x_k) - \nabla f(x_{k-1}))}{-d_{k-1}^T (-\nabla f(x_k) - \nabla f(x_{k-1}))} \]
2.4 Dai-Yuan (DY)
\[ \beta_k^{DY} = \frac{|-\nabla f(x_k)|^2}{-d_{k-1}^T (-\nabla f(x_k) - \nabla f(x_{k-1}))} \]

3. Algorithm

The algorithm for the conjugate gradient method is as follows:

4. Example

In this 2 iteration example of the conjugate gradient method, consider the function \(f(x, y) = x^2 + y^2\). The gradient is given by \(\nabla f(x, y) = [2x, 2y]^T\). Starting from an initial guess \(x_0 = [1, 1]^T\), we can apply the conjugate gradient method with a constant step size of \(\alpha = 0.1\). The iterations would look like this:

Iteration 1:

Iteration 2: