# Difference between revisions of "Trust-region methods"

Yewenhe0904 (Talk | contribs) (→Conjugated Gradient Steihaug’s Method) |
Yewenhe0904 (Talk | contribs) (→Example) |
||

(17 intermediate revisions by one user not shown) | |||

Line 3: | Line 3: | ||

Date Presented: Apr. 10, 2014 | Date Presented: Apr. 10, 2014 | ||

− | ==Introduction== | + | ==Introduction== |

− | Trust-region method (TRM) is one of the most important numerical optimization | + | [[File: Trust-Region Method Overview.png|thumb|right|A pictorial view of trust-region method optimization trajectory]] |

− | + | Trust-region method (TRM) is one of the most important numerical optimization methods in solving nonlinear programming (NLP) problems. It works in a way that first define a region around the current best solution, in which a certain model (usually a quadratic model) can to some extent approximate the original objective function. TRM then take a step forward according to the model depicts within the region. Unlike the line search methods, TRM usually determines the step size before the improving direction (or at the same time). If a notable decrease (our following discussion will based on minimization problems) is gained after the step forward, then the model is believed to be a good representation of the original objective function. If the improvement is too subtle or even a negative improvement is gained, then the model is not to be believed as a good representation of the original objective function within that region. The convergence can be ensured that the size of the “trust region” (usually defined by the radius in Euclidean norm) in each iteration would depend on the improvement previously made. | |

− | + | ||

− | + | ||

− | + | ||

==Important Concepts== | ==Important Concepts== | ||

+ | [[File:TRM_stepsize.png|thumb|right|The picture shows both the stepsize and the improving direction is a consequence of a pre-determined trust-region size.]] | ||

'''Trust-region''' | '''Trust-region''' | ||

In most cases, the trust-region is defined as a spherical area of radius <math>\Delta_k</math> in which the trust-region subproblem lies. | In most cases, the trust-region is defined as a spherical area of radius <math>\Delta_k</math> in which the trust-region subproblem lies. | ||

+ | |||

'''Trust-region subproblem''' | '''Trust-region subproblem''' | ||

Line 26: | Line 25: | ||

Where <math>\Delta_k</math> is the trust region radius, <math>g_k</math> is the gradient at current point and <math>B_k</math> is the hessian (or a hessian approximation). It is easy to find the solution to the trust-region subproblem if <math>B_k</math> is positive definite. | Where <math>\Delta_k</math> is the trust region radius, <math>g_k</math> is the gradient at current point and <math>B_k</math> is the hessian (or a hessian approximation). It is easy to find the solution to the trust-region subproblem if <math>B_k</math> is positive definite. | ||

+ | |||

'''Actual reduction and predicted reduction''' | '''Actual reduction and predicted reduction''' | ||

Line 35: | Line 35: | ||

Whether to take a more ambitious step or a more conservative one is depend on the ratio between the actual reduction gained by true reduction in the original objective function and the predicted reduction expected in the model function. Empirical threshold values of the ratio <math>\rho_k</math> will guide us in determining the size of the trust-region. | Whether to take a more ambitious step or a more conservative one is depend on the ratio between the actual reduction gained by true reduction in the original objective function and the predicted reduction expected in the model function. Empirical threshold values of the ratio <math>\rho_k</math> will guide us in determining the size of the trust-region. | ||

− | |||

− | |||

− | |||

==Trust Region Algorithm== | ==Trust Region Algorithm== | ||

Before implementing the trust-region algorithm, we should first determine several parameters. | Before implementing the trust-region algorithm, we should first determine several parameters. | ||

− | <math>\Delta_M</math> is the upper bound for the size of the trust region. <math>\eta_1</math>, <math>\eta_2</math> and <math>\eta_3</math>,<math>t_1</math>,<math>t_2</math> are the threshold values for evaluating the goodness of the quadratic model thus for determining the trust-region’s size in the next iteration. | + | <math>\Delta_M</math> is the upper bound for the size of the trust region. <math>\eta_1</math>, <math>\eta_2</math> and <math>\eta_3</math>,<math>t_1</math>,<math>t_2</math> are the threshold values for evaluating the goodness of the quadratic model thus for determining the trust-region’s size in the next iteration. A typical set for these values are <math>0=<\eta_1<=\eta_2</math>, <math>\eta_2=0.25</math> and <math>\eta_3=0.75</math>,<math>t_1=0.25</math>,<math>t_2=2.0</math>. |

---- | ---- | ||

'''Pseudo-code''' | '''Pseudo-code''' | ||

− | + | '''Set''' the starting point at <math>x_0</math>, set the iteration number <math>k=1</math> | |

− | ''' | + | '''for''' <math>k=1,2...</math> |

− | Get the improving step by solving trust-region | + | Get the improving step by solving trust-region sub-problem () |

Evaluate <math>\rho_k</math> from equation() | Evaluate <math>\rho_k</math> from equation() | ||

− | ''' | + | '''if''' <math>\rho_k<\eta_2</math> |

<math>\Delta_{k+1}=t_1\Delta_k</math> | <math>\Delta_{k+1}=t_1\Delta_k</math> | ||

− | ''' | + | '''else''' |

− | ''' | + | '''if''' <math>\rho_k>\eta_3</math> and <math>p_k=||\Delta_k||</math> (full step and model is a good approximation) |

<math>\Delta_{k+1}=min(t_2\Delta_k,\Delta_M)</math> | <math>\Delta_{k+1}=min(t_2\Delta_k,\Delta_M)</math> | ||

− | ''' | + | '''else''' |

<math>\Delta_{k+1}=\Delta_k</math> | <math>\Delta_{k+1}=\Delta_k</math> | ||

− | ''' | + | '''if''' <math>\rho_k>\eta_1</math> |

<math>x_{k+1}=x_k+p_k</math> | <math>x_{k+1}=x_k+p_k</math> | ||

− | ''' | + | '''else''' |

<math>x_{k+1}=x_k</math>(the model is not a good approximation and need to solve another trust-region subproblem within a smaller trust-region) | <math>x_{k+1}=x_k</math>(the model is not a good approximation and need to solve another trust-region subproblem within a smaller trust-region) | ||

Line 82: | Line 79: | ||

==Methods of Solving the Trust-region Subproblem== | ==Methods of Solving the Trust-region Subproblem== | ||

===Cauchy point calculation=== | ===Cauchy point calculation=== | ||

+ | [[File:TRM_CauchyPoint.png|thumb|right|Illustration of Cauchy point calculation.]] | ||

In line search methods, we may find an improving direction from the gradient information, that is, by taking the steepest descent direction with regard to the maximum range we could make. We can solve the trust-region subproblem in an inexpensive way. This method is also denoted as the Cauchy point calculation. We can also express the improving step explicitly by the following closed-form equations | In line search methods, we may find an improving direction from the gradient information, that is, by taking the steepest descent direction with regard to the maximum range we could make. We can solve the trust-region subproblem in an inexpensive way. This method is also denoted as the Cauchy point calculation. We can also express the improving step explicitly by the following closed-form equations | ||

Line 90: | Line 88: | ||

otherwise <math>\tau_k=min~ ({||g_k||}^3/(\Delta_k{g_k}^TB_kg_k),1)</math> | otherwise <math>\tau_k=min~ ({||g_k||}^3/(\Delta_k{g_k}^TB_kg_k),1)</math> | ||

− | |||

− | |||

− | |||

===Limitations and Further Improvements=== | ===Limitations and Further Improvements=== | ||

Though Cauchy point is cheap to implement, like the steepest descent method, it performs poorly in some cases. Varies kinds of improvements are based on including the curvature information from <math>B_k</math>. | Though Cauchy point is cheap to implement, like the steepest descent method, it performs poorly in some cases. Varies kinds of improvements are based on including the curvature information from <math>B_k</math>. | ||

====Dogleg Method==== | ====Dogleg Method==== | ||

− | If <math>B_k</math> is positive definite (we can use quasi-Newton | + | [[File:TRM_Dogleg.png|thumb|right|Illustration of dogleg method trajectory.]] |

+ | If <math>B_k</math> is positive definite (we can use quasi-Newton Hessian approximation &updating to guarantee), then a V-shaped trajectory can be determined by | ||

if <math>0<=\tau<=1,~~p(\tau)=\tau p^U</math> | if <math>0<=\tau<=1,~~p(\tau)=\tau p^U</math> | ||

Line 103: | Line 99: | ||

if <math>1<=\tau<=2,~~p(\tau)=\tau p^U+(\tau-1)(p^B-p^U)</math> | if <math>1<=\tau<=2,~~p(\tau)=\tau p^U+(\tau-1)(p^B-p^U)</math> | ||

− | where <math>p^U=-\frac{g^Tg}{g^TBg}g</math> is the steepest descent direction | + | where <math>p^U=-\frac{g^Tg}{g^TBg}g</math> is the steepest descent direction and <math>p^B</math> is the optimal solution of the quadratic model <math>m_k(p)</math>. Therefore, a further improvement could be achieved compared to using only Cauchy point calculation method in one iteration. (Note that hessian or approximate hessian will be evaluated in dogleg method) |

− | + | ||

− | Note that hessian or approximate hessian will be evaluated | + | |

− | + | ||

− | + | ||

− | |||

====Conjugated Gradient Steihaug’s Method==== | ====Conjugated Gradient Steihaug’s Method==== | ||

− | The most widely used method for solving a trust-region sub-problem is by using the idea of conjugated gradient (CG) method for minimizing a quadratic function since CG guarantees convergence within a finite number of iterations for a quadratic programming. Also, CG Steihaug’s method has the merit of Cauchy point and | + | The most widely used method for solving a trust-region sub-problem is by using the idea of conjugated gradient (CG) method for minimizing a quadratic function since CG guarantees convergence within a finite number of iterations for a quadratic programming. Also, CG Steihaug’s method has the merit of Cauchy point calculation and dogleg method that both in terms of super-linear convergence rate and inexpensiveness to compute.(No expensive Hessian evaluation) |

---- | ---- | ||

Line 160: | Line 151: | ||

==Example== | ==Example== | ||

+ | [[File:TRM-Branin.png|thumb|right|Contour of a 'Branin' function.]] | ||

Here we use the trust-region method to solve an unconstrained problem as an example. The trust-region subproblems are solved by calculate the Cauchy point. | Here we use the trust-region method to solve an unconstrained problem as an example. The trust-region subproblems are solved by calculate the Cauchy point. | ||

− | <math>min~f(x_1,x_2)= | + | <math>min~f(x_1,x_2)=(x_2-0.129{x_1}^2+1.6x_1-6)^2+6.07cos(x_1)+10 </math> (This is the Branin function which is widely used as a test function. It has 3 global optima.) |

− | Starting point <math>x_1= | + | Starting point <math>x_1=6.00, x_2=14.00</math> The iteration stops when the stopping criteria <math>||g_k||<=0.01</math> is met. |

− | <math>\ | + | <math>\Delta_0=2.0, \Delta_M=5.0, t_1=0.25, t_2=2.0, \eta_1=0.2,\eta_2=0.25,\eta_3=0.75</math> |

− | |||

− | Optimization trajectory | + | '''Improving Process''' [[File:TRM-Iter1.png|thumb|right|Graphical illustration of iteration 1 (The quadratic model's contours are marked as red lines).]][[File:TRM-Iter2.png|thumb|right|Graphical illustration of iteration2.]][[File:TRM-Iter3.png|thumb|right|Graphical illustration of iteration3.]][[File:TRM-Iter4.png|thumb|right|Graphical illustration of iteration4.]][[File:TRM-Iter5.png|thumb|right|Graphical illustration of iteration5.]][[File:TRM-Iter6.png|thumb|right|Graphical illustration of iteration6.]][[File:TRM-Trajectory.png|thumb|right|Optimization trajectory of the example function(unconstrained).]] |

---- | ---- | ||

− | |||

− | <math> | + | '''Iteration 1''': The algorithm start from the initial point(marked as a green dot) <math>x_1=6.00, x_2=14.00</math>. The trust-region is defined as the area inside the circle centered at the starting point. The contour of the quadratic model can be visualized. After calculating the Cauchy point, <math>\rho_k</math> is evaluated and a full step was taken since the model gives a good prediction. Set <math>x_{k+1}=x_k+p_k(x_1=5.767,x_2=12.014)</math> and <math>\Delta_k=min(2\Delta_k,\Delta_M)~(\rho_k>\eta_3</math> and a full step was taken.<math>)</math> (The current best solution is denoted as the red dot.) |

− | |||

− | <math> | + | '''Iteration 2''': Start with <math>x_1=5.767,x_2=12.014</math> and an enlarged trust-region. The new iteration gives a more ambitious full step to the new point<math>(x_1=4.800,x_2=8.132</math>). With <math>\rho_k=0.980</math>, the model is "trusted" again to increase its size in the next iteration. |

− | |||

− | <math> | + | '''Iteration 3''': Start with <math>x_1=4.800,x_2=8.132</math> and an enlarged trust-region. The new iteration gives a satisfactory but not good enough to the new point<math>(x_1=1.668,x_2=4.235</math>). With <math>\rho_k=0.578</math>, which is not high enough to trigger a new increment for the trust-region's size. So the radius is maintained in the next iteration. |

− | |||

− | <math> | + | '''Iteration 4''': Start with <math>x_1=1.668,x_2=4.235</math> and the maximum-sized trust-region. The new iteration gives a poor prediction. With <math>\rho_k=-0.160</math>, which incurs the decrease in the trust-region's size to improve the model's validity. Current best solution is unchanged and the radius for the trust-region is diminished to 1/4 of the current iteration. |

− | |||

− | <math> | + | '''Iteration 5''': Start with <math>x_1=1.668,x_2=4.235</math> and a shrinked trust-region. The new iteration gives a satisfactory but not good enough to the new point<math>(x_1=2.887,x_2=3.956</math>). With <math>\rho_k=0.729</math>, which is not high enough to trigger a new increment for the trust-region's size. So the radius is maintained in the next iteration. |

− | |||

− | <math> | + | '''Iteration 6''': Start with <math>x_1=2.887,x_2=3.956</math>. The new iteration gives a satisfactory but not a full step to the new point<math>(x_1=2.594,x_2=3.109</math>). With <math>\rho_k=0.989</math>, which is high enough to trigger a new increment for the trust-region's size, however not a full step is taken thereby the radius is maintained in the next iteration. |

− | |||

− | + | .... | |

− | + | ---- | |

+ | The following table is a summary for the improving process. Notably the Cauchy point calculation does not give an efficient convergence rate especially at the end stage of the computation(not a full step was taken so it is essentially equivalent to a steepest decent line search algorithm). Dogleg and CG Steihaug's method will give faster convergence as explained previously. | ||

− | + | [[File:TRM-Table.png]] | |

− | + | Another graphical illustration is available at Kranf site: [http://www.applied-mathematics.net/optimization/optimizationIntro.html] | |

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | ||

==Conclusion== | ==Conclusion== | ||

Line 232: | Line 199: | ||

- Pick improving direction | - Pick improving direction | ||

− | - Pick | + | - Pick step-size to minimize |

- Update the incumbent solution | - Update the incumbent solution | ||

+ | |||

+ | - Convergence rate is not guaranteed | ||

Trust-region methods: | Trust-region methods: | ||

− | - Pick the | + | - Pick the step-size (the trust-region sub-problem is constrained) |

− | - Solving the | + | - Solving the sub-problem using the approximated model |

- If the improvement is acceptable, update the incumbent solution and the size of the trust-region | - If the improvement is acceptable, update the incumbent solution and the size of the trust-region | ||

+ | |||

+ | - Can have super-linear convergence rate when conjugated gradient method or dogleg method is used | ||

==References== | ==References== | ||

− | 1. J. Nocedal, S. J. Wright, Numerical optimization. | + | [1] W. Sun and Y.-x. Yuan, Optimization theory and methods : nonlinear programming. New York: Springer, 2006. |

+ | |||

+ | [2] J. Nocedal, S. J. Wright, and SpringerLink (Online service). (2006). Numerical optimization (2nd ed.). Available: [http://turing.library.northwestern.edu/login?url=http://dx.doi.org/10.1007/978-0-387-40065-5] | ||

− | + | [3] L. Hei, "Practical techniques for nonlinear optimization," Ph D, Northwestern University, 2007. | |

− | + | [4] Wikipedia page for Trust region. Available: [http://en.wikipedia.org/wiki/Trust_region] |

## Latest revision as of 15:35, 5 June 2014

Authors: Wenhe (Wayne) Ye (ChE 345 Spring 2014) Steward: Dajun Yue, Fengqi You Date Presented: Apr. 10, 2014

## Contents |

## Introduction

Trust-region method (TRM) is one of the most important numerical optimization methods in solving nonlinear programming (NLP) problems. It works in a way that first define a region around the current best solution, in which a certain model (usually a quadratic model) can to some extent approximate the original objective function. TRM then take a step forward according to the model depicts within the region. Unlike the line search methods, TRM usually determines the step size before the improving direction (or at the same time). If a notable decrease (our following discussion will based on minimization problems) is gained after the step forward, then the model is believed to be a good representation of the original objective function. If the improvement is too subtle or even a negative improvement is gained, then the model is not to be believed as a good representation of the original objective function within that region. The convergence can be ensured that the size of the “trust region” (usually defined by the radius in Euclidean norm) in each iteration would depend on the improvement previously made.

## Important Concepts

**Trust-region**

In most cases, the trust-region is defined as a spherical area of radius in which the trust-region subproblem lies.

**Trust-region subproblem**

If we are using the quadratic model to approximate the original objective function, then our optimization problem is essentially reduced to solving a sequence of trust-region subporblems

Where is the trust region radius, is the gradient at current point and is the hessian (or a hessian approximation). It is easy to find the solution to the trust-region subproblem if is positive definite.

**Actual reduction and predicted reduction**

The most critical issue underlying the trust-region method is to update the size of the trust-region at every iteration. If the current iteration makes a satisfactory reduction, we may exploits our model more in the next iteration by setting a larger . If we only achieved a limited improvement after the current iteration, the radius of the trust-region then should not have any increase, or in the worst cases, we may decrease the size of the trust-region by adjusting the radius to a smaller value to check the model’s validity.

Whether to take a more ambitious step or a more conservative one is depend on the ratio between the actual reduction gained by true reduction in the original objective function and the predicted reduction expected in the model function. Empirical threshold values of the ratio will guide us in determining the size of the trust-region.

## Trust Region Algorithm

Before implementing the trust-region algorithm, we should first determine several parameters. is the upper bound for the size of the trust region. , and ,, are the threshold values for evaluating the goodness of the quadratic model thus for determining the trust-region’s size in the next iteration. A typical set for these values are , and ,,.

**Pseudo-code**

**Set** the starting point at , set the iteration number

**for**

Get the improving step by solving trust-region sub-problem ()

Evaluate from equation()

**if**

**else**

**if** and (full step and model is a good approximation)

**else**

**if**

**else**

(the model is not a good approximation and need to solve another trust-region subproblem within a smaller trust-region)

**end**
>

## Methods of Solving the Trust-region Subproblem

### Cauchy point calculation

In line search methods, we may find an improving direction from the gradient information, that is, by taking the steepest descent direction with regard to the maximum range we could make. We can solve the trust-region subproblem in an inexpensive way. This method is also denoted as the Cauchy point calculation. We can also express the improving step explicitly by the following closed-form equations

if

otherwise

### Limitations and Further Improvements

Though Cauchy point is cheap to implement, like the steepest descent method, it performs poorly in some cases. Varies kinds of improvements are based on including the curvature information from .

#### Dogleg Method

If is positive definite (we can use quasi-Newton Hessian approximation &updating to guarantee), then a V-shaped trajectory can be determined by

if

if

where is the steepest descent direction and is the optimal solution of the quadratic model . Therefore, a further improvement could be achieved compared to using only Cauchy point calculation method in one iteration. (Note that hessian or approximate hessian will be evaluated in dogleg method)

#### Conjugated Gradient Steihaug’s Method

The most widely used method for solving a trust-region sub-problem is by using the idea of conjugated gradient (CG) method for minimizing a quadratic function since CG guarantees convergence within a finite number of iterations for a quadratic programming. Also, CG Steihaug’s method has the merit of Cauchy point calculation and dogleg method that both in terms of super-linear convergence rate and inexpensiveness to compute.(No expensive Hessian evaluation)

**Pseudo-code for CG Steihaug method in solving trust region sub-problem**

*Given tolerance ;*

**Set**

**if**

**return** ;

**for**

**if**

Find such that minimizes

and satisfies ;

**return** ;

**Set** ;

**Set** ;

**if**

Find such that satisfies ;

**return** ;

**Set** ;

**if**

**return** ;

**Set** ;

**Set**

**end **

## Example

Here we use the trust-region method to solve an unconstrained problem as an example. The trust-region subproblems are solved by calculate the Cauchy point.

(This is the Branin function which is widely used as a test function. It has 3 global optima.)

Starting point The iteration stops when the stopping criteria is met.

**Improving Process**

**Iteration 1**: The algorithm start from the initial point(marked as a green dot) . The trust-region is defined as the area inside the circle centered at the starting point. The contour of the quadratic model can be visualized. After calculating the Cauchy point, is evaluated and a full step was taken since the model gives a good prediction. Set and and a full step was taken. (The current best solution is denoted as the red dot.)

**Iteration 2**: Start with and an enlarged trust-region. The new iteration gives a more ambitious full step to the new point). With , the model is "trusted" again to increase its size in the next iteration.

**Iteration 3**: Start with and an enlarged trust-region. The new iteration gives a satisfactory but not good enough to the new point). With , which is not high enough to trigger a new increment for the trust-region's size. So the radius is maintained in the next iteration.

**Iteration 4**: Start with and the maximum-sized trust-region. The new iteration gives a poor prediction. With , which incurs the decrease in the trust-region's size to improve the model's validity. Current best solution is unchanged and the radius for the trust-region is diminished to 1/4 of the current iteration.

**Iteration 5**: Start with and a shrinked trust-region. The new iteration gives a satisfactory but not good enough to the new point). With , which is not high enough to trigger a new increment for the trust-region's size. So the radius is maintained in the next iteration.

**Iteration 6**: Start with . The new iteration gives a satisfactory but not a full step to the new point). With , which is high enough to trigger a new increment for the trust-region's size, however not a full step is taken thereby the radius is maintained in the next iteration.

....

The following table is a summary for the improving process. Notably the Cauchy point calculation does not give an efficient convergence rate especially at the end stage of the computation(not a full step was taken so it is essentially equivalent to a steepest decent line search algorithm). Dogleg and CG Steihaug's method will give faster convergence as explained previously.

Another graphical illustration is available at Kranf site: [1]

## Conclusion

**Trust-Region vs. Line Search**

Line search methods:

- Pick improving direction

- Pick step-size to minimize

- Update the incumbent solution

- Convergence rate is not guaranteed

Trust-region methods:

- Pick the step-size (the trust-region sub-problem is constrained)

- Solving the sub-problem using the approximated model

- If the improvement is acceptable, update the incumbent solution and the size of the trust-region

- Can have super-linear convergence rate when conjugated gradient method or dogleg method is used

## References

[1] W. Sun and Y.-x. Yuan, Optimization theory and methods : nonlinear programming. New York: Springer, 2006.

[2] J. Nocedal, S. J. Wright, and SpringerLink (Online service). (2006). Numerical optimization (2nd ed.). Available: [2]

[3] L. Hei, "Practical techniques for nonlinear optimization," Ph D, Northwestern University, 2007.

[4] Wikipedia page for Trust region. Available: [3]