Conjugate Gradient Method

Conjugate gradient method is also an iterative first-order optimization algorithm used to find the minimum of a function. While steepest descent method uses the gradient direction at current iteration, conjugate gradient method takes into account the previous iterations to find a more optimal direction. This method is particularly improves the fast convergence of the algorithm.

1. Method

Consider a function \( f: \mathbb{R}^n \rightarrow \mathbb{R} \) with \( x \) as the input variable, \( \nabla f(x) \) as the gradient, and \( x_0 \) is the initial guess to the optimization problem. For the first iteration \(k = 1\) the input variable \(x = x_0\) is updated to \(x = x_1\) using below formula:

\[ x_1 = x_0 + t_0 d_0 \]

where \(t_0\) is the step length, and \(d_0 = - \nabla f(x_0)\) is the descent direction at the first iteration

Subsequently, for the next iteration \(k = 2\), the input variable \(x\) is updated to \(x_2\) using the formula:

\[ x_2 = x_1 + t_1\,d_1, \quad\quad \text{where } \quad d_1 = -\nabla f(x_1) + \beta_1 d_0 \]

where \(d_2\) is the descent direction, and \(t_2\) is the step length in the second iteration. It is observed that at each iteration \( k > 1 \) the algorithms is using gradients from previous two iterations to compute the descent direction.

To generalize the above function for \((k)^{th}\) iteration, where \( k > 1 \):

\[ x_{k+1} = x_{k} + t_k\,d_k, \quad\quad \text{where } \quad d_k = -\nabla f(x_{k}) + \beta_k d_{k-1} \]

There are several methods proposed for the computation of parameter \(\beta_k\), see the section below. Using of gradients from two previous iterations helps to reach optimal value faster.

2. Beta Computation/Method

The parameter \(\beta_k\) is computed according to the following equations:

2.1 Fletcher-Reeves (FR)

\[ \beta_k^{FR} = \frac{|-\nabla f(x_k)|^2}{|-\nabla f(x_{k-1})|^2} \]

2.2 Polak-Ribiere (PR)

\[ \beta_k^{PR} = \frac{\nabla f(x_k)^T (\nabla f(x_k) - \nabla f(x_{k-1}))}{|-\nabla f(x_{k-1})|^2} \]

2.3 Hestenes-Stiefel (HS)

\[ \beta_k^{HS} = \frac{\nabla f(x_k)^T (\nabla f(x_k) - \nabla f(x_{k-1}))}{-d_{k-1}^T (-\nabla f(x_k) - \nabla f(x_{k-1}))} \]

2.4 Dai-Yuan (DY)

\[ \beta_k^{DY} = \frac{|-\nabla f(x_k)|^2}{-d_{k-1}^T (-\nabla f(x_k) - \nabla f(x_{k-1}))} \]

3. Algorithm

The algorithm for the conjugate gradient method is as follows:

Initialize at \( x_0 \in \mathbb{R}^n \), \( k = 1 \)
\( d_k = -\nabla f(x_k) \)
Set \( \epsilon \)
while \( \|\nabla f(x_k)\| > \epsilon \) do

if \( k = 1 \) then

\( d_k = -\nabla f(x_k) \)

else

compute \( \beta \) (see above mentioned methods)
\( d_k = -\nabla f(x_k) + \beta d_{k-1} \)

store \( d_k, \nabla f(x_k) \) for the next step
choose \( t_k \), e.g., approximate linesearch
\( x_{k+1} = x_k + t_k d_k \)
\( k = k + 1 \)

4. Example

In this 2 iteration example of the conjugate gradient method, consider the function \(f(x, y) = x^2 + y^2\). The gradient is given by \(\nabla f(x, y) = [2x, 2y]^T\). Starting from an initial guess \(x_0 = [1, 1]^T\), we can apply the conjugate gradient method with a constant step size of \(\alpha = 0.1\). The iterations would look like this:

Iteration 1:

\( \nabla f(x_0) = [2, 2]^T \)
\( d_0 = -\nabla f(x_0) = [-2, -2]^T \)
\( x_1 = x_0 + \alpha d_0 = [1, 1]^T + 0.1 \cdot [-2, -2]^T = [0.8, 0.8]^T \)

Iteration 2:

\( \nabla f(x_1) = [1.6, 1.6]^T \)
Compute \( \beta_1^{FR} \) using:
\[ \beta_1^{FR} = \frac{\|\nabla f(x_1)\|^2}{\|\nabla f(x_0)\|^2} = \frac{(1.6)^2 + (1.6)^2}{(2)^2 + (2)^2} = \frac{5.12}{8} = 0.64 \]
\( d_1 = -\nabla f(x_1) + \beta_1^{FR} d_0 = [-1.6, -1.6]^T + 0.64 \cdot [-2, -2]^T = [-2.88, -2.88]^T \)
\( x_2 = x_1 + \alpha d_1 = [0.8, 0.8]^T + 0.1 \cdot [-2.88, -2.88]^T = [0.512, 0.512]^T \)