Conjugate gradient methods

From optimization
Revision as of 01:35, 8 June 2015 by ErikZuehlke (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Author: Erik Zuehlke
Stewards: Dajun Yue and Fengqi You

Magnus Hestenes one of the creators of the Conjugate Gradient Method [1]



The conjugate gradient method is a mathematical technique that can be useful for the optimization of both linear and non-linear systems. This technique is generally used as an iterative algorithm, however, it can be used as a direct method, and it will produce a numerical solution. Generally this method is used for very large systems where it is not practical to solve with a direct method. This method was developed by Magnus Hestenes and Eduard Stiefel[1].

The Conjugate Gradient Model for Linear Systems

There are a set of linear equations that we want to solve represented in vector notation as:

Ax = b

Here A is an n x n known symmetric, real, and positive definite matrix, b is a known vector and we want to solve for the vector x. A unique solution to this problem is represented by the vector x*.

There is a set of n conjugate vectors P where P=\{\mathbf{p}_k: \forall i\neq k, i, k \in [1,n], \langle\mathbf{p}_i,\mathbf{p}_k\rangle_{A}=0  \}. Within the set P the solution x* can be expanded out.

So  \mathbf{x}_* = \sum^{n}_{i=1} \alpha_i \mathbf{p}_i and from this  \mathbf{b}=\mathbf{A}\mathbf{x}_* = \sum^{n}_{i=1} \alpha_i  \mathbf{A} \mathbf{p}_i.

By substituting the above equations it is shown that:  \mathbf{p}_k^\mathrm{T} \mathbf{b}=\alpha_k\mathbf{p}_k^\mathrm{T} \mathbf{A} \mathbf{p}_k.

Finally the coeffcients  \alpha_k = \frac{\langle \mathbf{p}_k, \mathbf{b}\rangle}{\,\,\,\|\mathbf{p}_k\|_\mathbf{A}^2}.

This method solves the given system of linear equations by finding n conjugate vectors, and then computing the coeffcients  \alpha_k[2,3].

Non-Linear Conjugate Gradient Method

Image of the Conjugate Gradient method graphically[4]

This is a non-linear form of the Conjugate Gradient Method and will be used to show the iterative nature of the method.

Lets start with a quadratic function \displaystyle f(x)

A residual should be calculated and in the non-linear case the residual is always the negative of the gradient r_i=-\displaystyle f'(x) The search direction should then be calculated using the Gram-Schmidt conjugation of the residuals[4]. A line search is then used and this is more difficult than in the linear case. A value of \alpha_i is found so that \displaystyle f(x_i+\alpha_i*d_i) is minimized. This is accomplished by ensuring the gradient is orthogonal to the search direction.

To find the value of \beta there are multiple methods. Two of the better equations are the Fletcher-Reeves (which is used in linear GC) and the Polak-Ribiere method. The former converges only if initial guess is sufficiently close to the desired minimum, while the latter can sometimes cycle infinitely but often converges more quickly.

Fletcher-Reeves: \beta^{FR} = \frac{\Delta x_n^\top \Delta x_n}
{\Delta x_{n-1}^\top \Delta x_{n-1}}

Polak-Ribiere: \beta^{PR} = \frac{\Delta x_n^\top (\Delta x_n-\Delta x_{n-1})}
{\Delta x_{n-1}^\top \Delta x_{n-1}}

A way to guarantee the Polak-Ribiere method to converge is by choosing \beta = max{\beta^{PR},0}this restarts the method if the Polak-Ribiere method becomes less than zero, and then a new search direction can be chosen[4].

Numerical Example of the method

This will be an example of the linear method.

Ax=b is a linear system and it is described by

\mathbf{A} \mathbf{x}= \begin{bmatrix}
5 & 1 \\
1 & 2 \end{bmatrix}\begin{bmatrix}
x_1 \\
x_2 \end{bmatrix} = \begin{bmatrix}
2 \\
2 \end{bmatrix}

An initial guess of x_0 = \begin{bmatrix}
1 \\
2 \end{bmatrix} is made.

The residual will be found which is computed from the formula r0 = b - Ax0. so in our case r0 = \begin{bmatrix}
-5 \\
-3 \end{bmatrix}

As this is the first iteration the residual vector will be used as the initial search direction. Now α0 is calculated using the equation  
\alpha_0 = \frac{\mathbf{r}_0^\mathrm{T} \mathbf{r}_0}{\mathbf{p}_0^\mathrm{T} \mathbf{A p}_0} =
\frac{\begin{bmatrix} -5 & -3 \end{bmatrix} \begin{bmatrix} -5 \\ -3 \end{bmatrix}}{  \begin{bmatrix} -5 & -3 \end{bmatrix} \begin{bmatrix} 5 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} -5 \\ -3 \end{bmatrix}  } =

With our first alpha found we can now compute \mathbf{x}_1=\mathbf{x}_0 + \alpha_0\mathbf{p}_0 = 
\begin{bmatrix} 1 \\ 2 \end{bmatrix} + \frac{34}{173} \begin{bmatrix} -5 \\ -3 \end{bmatrix} = \begin{bmatrix} 0.017 \\ 1.411 \end{bmatrix}.

With the new x now calculated one must calculate the next residual using the following equation:

\mathbf{r}_1 = \mathbf{r}_0 - \alpha_0 \mathbf{A} \mathbf{p}_0 = 
\begin{bmatrix} -5 \\ -3 \end{bmatrix} - \frac{34}{173} \begin{bmatrix} 5 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} -5 \\ -3 \end{bmatrix} = \begin{bmatrix} 0.5 \\ -0.25 \end{bmatrix}.

With our new residual we now must compute p1 but first we need to find β0.

\beta_0 = \frac{\mathbf{r}_1^\mathrm{T} \mathbf{r}_1}{\mathbf{r}_0^\mathrm{T} \mathbf{r}_0} =
\frac{\begin{bmatrix} .75 & -.25 \end{bmatrix} \begin{bmatrix} .75 \\ -0.25 \end{bmatrix}}{\begin{bmatrix} -5 & -3 \end{bmatrix} \begin{bmatrix} -5 \\ -3 \end{bmatrix}} = 0.0183

With β0 now the next search direction p1 can be calculated:

\mathbf{p}_1 = \mathbf{r}_1 + \beta_0 \mathbf{p}_0 = 
\begin{bmatrix} .75 \\ -.25 \end{bmatrix} + 0.0183 \begin{bmatrix} -5 \\ -3 \end{bmatrix} = \begin{bmatrix} 0.6585 \\ -0.3049 \end{bmatrix}.

With this it is the last of the unique computations. To continue on in the problem one would solve for α2 using the same equation as was for α1. Then x2 would be found. One would then continue on finding new residuals and search vectors until the solution converges.


[1] Straeter, T. A. "On the Extension of the Davidon-Broyden Class of Rank One, Quasi-Newton Minimization Methods to an Infinite Dimensional Hilbert Space with Applications to Optimal Control Problems". NASA Technical Reports Server. NASA. Retrieved 10 October 2011.

[2] Wikipedia page for Conjugate Gradient Methods

[3] Hestenes, Magnus R.; Stiefel, Eduard (December 1952). "Methods of Conjugate Gradients for Solving Linear Systems". Journal of Research of the National Bureau of Standards 49 (6)

[4] Shewchuk, J. R. "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain" 1994, August 4. Carnegie Mellon Univerisity