Quasi-Newton methods

From optimization
Jump to: navigation, search

Author: Vincent Cericola [ChE 345, Spring 2015]
Steward: Dajun Yue and Fengqi You

Quasi-Newton Methods (QNMs) are generally a class of optimization methods that are used in Non-Linear Programming when full Newton’s Methods are either too time consuming or difficult to use. More specifically, these methods are used to find the global minimum of a function f(x) that is twice-differentiable. There are distinct advantages to using Quasi-Newton Methods over the full Newton's Method for expansive and complex non-linear problems. These methods are not perfect, however, and can have some drawbacks depending on the exact type of Quasi-Newton Method used and the problem to which it is applied. Despite this, Quasi-Newton Methods are generally worth using with the exception of very simple problems.

Contents

Introduction

There are numerous QNMs used to optimize twice-differentiable functions. The difference between them are the exact methods that they use to approximate the inverse Hessian Matrix (H-1) that is used in the full Newton’s Method. The primary benefits of QNMs stem from the fact that they do not need to iteratively calculate the inverse Hessian, thus the different types of QNMs are highly dependent on what approximation is used. The most simplistic approximation uses the same value of the inverse Hessian for each iteration. Obviously, this is not ideal, but is illustrative none the less.

Differences from Newton’s Method

While similar to the full Netwon's Method, the Quasi-Newton Method has some distinct differences. These differences are summarized briefly in the table below:

Newton's Method Quasi-Newton Method
Computationally expensive Computationally cheap
Slow computation Fast(er) computation
Need to iteratively calculate second derivative No need for second derivative
Need to iteratively solve linear system of equations No need to solve linear system of equations
Less convergence steps More convergence steps
More precise convergence path Less precise convergence path


Advantages:

While the last two rows in the table appear to be disadvantages of the Quasi-Newton Method, the faster computation time can end up balancing them out. For large and complicated problems, this balance is the benefit of the Quasi-Newton Methods over the full Newton's Method owing to the overall faster solution time.

Disadvantages:

The lack of precision in the Hessian calculation leads to slower convergence in terms of steps. Because of this, Quasi-Newton Methods can be slower (and thus worse) than using the full Newton's Method. This occurs for simple problems where the extra computation time to actually compute the Hessian inverse is low. In this case, the full Newton's Method is better anyway. An additional potential disadvantage to the Quasi-Newton Methods is the need to store the inverse Hessian approximation. This could require a large amount of memory and could thus be detrimental in the cases of large complicated systems.

Procedure

The procedure is much the same as regular Newton’s Method with a modification to the Hessian Matrix step. Again, the modification depends on the specific type of Quasi-Newton Method being used.

For a given f(x) that is twice-differentiable:
1. Choose a starting point  x_o
2. Calculate search direction by estimating  H^{-1} (method varies)
3. Calculate change in  x using the following equation:

 x^{k+1} = x^k - [H^{-1}]_k * grad(x^k)

4. Determine new  x value,  x^1
5. Determine if method has converged using convergence criteria (gradient)
6. Repeat from step 2 if not optimum

Example

The following is a brief numerical example of one type of Quasi-Newton Method. This method uses the original inverse Hessian for each iteration.

Objective function:
min  f(x) = 2x_1^2 + 3x_2^2

Step 1: Choose starting point  x_o
 \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}

Step 2: Calculate inverse Hessian (approximate)
 \nabla f(x) = \begin{bmatrix} 2x_1 \\ 3x_2 \end{bmatrix}
 \nabla^2 f(x) = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}
 H^{-1} = \begin{bmatrix} 0.5 & 0 \\ 0 & 0.3 \end{bmatrix}

Step 3: Find new  x
 x^{k+1} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} - 
\begin{bmatrix} 0.5&0 \\ 0&0.3 \end{bmatrix} \begin{bmatrix} 2 \\ 3 \end{bmatrix}

Step 4: Determine new  x value
 x^{k+1} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

Step 5: Determine if converged
 \nabla f(x) = 0
Converged!

Step 6: Repeat (if necessary)
Since this example converged, this step is not necessary. If the problem had not converged, solution would have resumed from step 2. Unlike in Newton's Method, the inverse Hessian would not be recalculated exactly. In this particular example, the inverse Hessian would be reused at each step instead of being recalculated. In most cases with Quasi-Newton Methods, there would be some kind of "update" to the original value. Again, these can vary from case to case.

Conclusion

Quasi-Newton Methods are an efficient way to optimize functions when either computation or iteration is costly. While their exact methods vary, they all can determine the optimum faster and more efficiently than Newton’s Method when the problems are complex.

References

  1. Vandenberghe, L. (n.d). Quasi-Newton Methods. Retrieved from http://www.seas.ucla.edu/~vandenbe/236C/lectures/qnewton.pdf. Web.
  2. Bryan, K. (n.d). Quasi-Newton Methods. Retrieved from https://www.rose-hulman.edu/~bryan/lottamath/quasinewton.pdf. Web.
  3. Blomgren, P. (n.d.). Numerical Optimization: Quasi-Newton Methods. http://terminus.sdsu.edu/SDSU/Math693a_f2013/Lectures/18/lecture.pdf. Web.
  4. Quasi-Newton Method Retrieved from http://en.wikipedia.org/wiki/Quasi-Newton_method. Web.
  5. Newton’s Method. Retrieved from http://en.wikipedia.org/wiki/Newton%27s_method. Web.
  6. Lambers, J. (n.d.). Quasi-Newton Methods. Retrieved from http://www.math.usm.edu/lambers/mat461/spr10/lecture24.pdf. Web.