https://optimization.mccormick.northwestern.edu/api.php?action=feedcontributions&user=Ben+Goodman&feedformat=atomoptimization - User contributions [en]2022-05-24T16:40:20ZUser contributionsMediaWiki 1.21.3https://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-07T20:39:31Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. The method dates back to 1963 and was developed and refined in the 1970's .[1] SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Since p is an incremental change to the objective function, this equation then resembles a two-term Taylor Series for the derivative of the objective function, which shows that a Taylor expansion with the increment p as a variable is equivalent to a Newton iteration. Decomposing the different equations within this system and cutting the second order term in half to match Taylor Series concepts, a minimization sub-problem can be obtained. This problem is quadratic and thus must be solved with non-linear methods, which once again introduces the need to solve a non-linear problem into the algorithm, but this predictable sub-problem with one variable is much easier to tackle than the parent problem. <br/><br />
<math>\text{min(p) } f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math>\text{s.t. } \nabla h_k p + h_k = 0</math> <br/><br />
<math>\text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:Code_for_Wiki.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the quadratic minimization sub-problem above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-7u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. fmincon is itself an SQP piece of software. In each step, the incumbent guess is plugged into the gradient, hessian, and constraint arrays, which then become parameters for the minimization problem. <br/><br />
<br/><br />
<br />
=Conclusion=<br />
SQP is powerful enough to be used in commercial software but also burdened by some intricacy. In addition to the complication from needing full-rank constraint gradients, the divergence matrix can be very difficult or laborious to assemble analytically. Commercial SQP packages include checks for the feasibility of the sub-problem in order to account for rank deficiencies. In addition to fmincon, SNOPT and FILTERSQP are two other commercial SQP packages, and each uses a different non-linear method to solve the quadratic subproblem.[1] [https://optimization.mccormick.northwestern.edu/index.php/Line_search_methods Line search methods] and [https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods trust-region methods] are trusted options for this step, and sub-gradient methods have also been proposed. The other common modification to SQP (ubiquitous to commercial packages) is to invoke [https://optimization.mccormick.northwestern.edu/index.php/Quasi-Newton_methods quasi-Newton methods] in order to avoid computing the Hessian entirely. SQP is thus very much a family of algorithms rather than a stand-alone tool for optimization. At its core, it is a method for turning large, very non-linear problems into a sequence of small quadratic problems to reduce the computational expense of the problem.<br />
<br />
=Sources=<br />
[1] Nocedal, J. and Wright, S. Numerical Optimization, 2nd. ed., Ch. 18. Springer, 2006. <br/><br />
[2] You, Fengqi. Lecture Notes, Chemical Engineering 345 Optimization. Northwestern University, 2015. <br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/File:Code_for_Wiki.JPGFile:Code for Wiki.JPG2015-06-07T20:38:43Z<p>Ben Goodman: </p>
<hr />
<div></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-07T20:37:08Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. The method dates back to 1963 and was developed and refined in the 1970's .[1] SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Since p is an incremental change to the objective function, this equation then resembles a two-term Taylor Series for the derivative of the objective function, which shows that a Taylor expansion with the increment p as a variable is equivalent to a Newton iteration. Decomposing the different equations within this system and cutting the second order term in half to match Taylor Series concepts, a minimization sub-problem can be obtained. This problem is quadratic and thus must be solved with non-linear methods, which once again introduces the need to solve a non-linear problem into the algorithm, but this predictable sub-problem with one variable is much easier to tackle than the parent problem. <br/><br />
<math>\text{min(p) } f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math>\text{s.t. } \nabla h_k p + h_k = 0</math> <br/><br />
<math>\text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the quadratic minimization sub-problem above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-7u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. fmincon is itself an SQP piece of software. In each step, the incumbent guess is plugged into the gradient, hessian, and constraint arrays, which then become parameters for the minimization problem. <br/><br />
<br/><br />
<br />
=Conclusion=<br />
SQP is powerful enough to be used in commercial software but also burdened by some intricacy. In addition to the complication from needing full-rank constraint gradients, the divergence matrix can be very difficult or laborious to assemble analytically. Commercial SQP packages include checks for the feasibility of the sub-problem in order to account for rank deficiencies. In addition to fmincon, SNOPT and FILTERSQP are two other commercial SQP packages, and each uses a different non-linear method to solve the quadratic subproblem.[1] [https://optimization.mccormick.northwestern.edu/index.php/Line_search_methods Line search methods] and [https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods trust-region methods] are trusted options for this step, and sub-gradient methods have also been proposed. The other common modification to SQP (ubiquitous to commercial packages) is to invoke [https://optimization.mccormick.northwestern.edu/index.php/Quasi-Newton_methods quasi-Newton methods] in order to avoid computing the Hessian entirely. SQP is thus very much a family of algorithms rather than a stand-alone tool for optimization. At its core, it is a method for turning large, very non-linear problems into a sequence of small quadratic problems to reduce the computational expense of the problem.<br />
<br />
=Sources=<br />
[1] Nocedal, J. and Wright, S. Numerical Optimization, 2nd. ed., Ch. 18. Springer, 2006. <br/><br />
[2] You, Fengqi. Lecture Notes, Chemical Engineering 345 Optimization. Northwestern University, 2015. <br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-07T20:34:01Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. The method dates back to 1963 and was developed and refined in the 1970's .[1] SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Since p is an incremental change to the objective function, this equation then resembles a two-term Taylor Series for the derivative of the objective function, which shows that a Taylor expansion with the increment p as a variable is equivalent to a Newton iteration. Decomposing the different equations within this system and cutting the second order term in half to match Taylor Series concepts, a minimization sub-problem can be obtained. This problem is quadratic and thus must be solved with non-linear methods, which once again introduces the need to solve a non-linear problem into the algorithm, but this predictable sub-problem with one variable is much easier to tackle than the parent problem. <br/><br />
<math>\text{min(p) } f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math>\text{s.t. } \nabla h_k p + h_k = 0</math> <br/><br />
<math>\text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the quadratic minimization sub-problem above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-7u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. fmincon is itself an SQP piece of software. In each step, the incumbent guess is plugged into the gradient, hessian, and constraint arrays, which then become parameters for the minimization problem. <br/><br />
<br/><br />
<br />
=Conclusion=<br />
SQP is powerful enough to be used in commercial software but also burdened by some intricacy. In addition to the complication from needing full-rank constraint gradients, the divergence matrix can be very difficult or laborious to assemble analytically. Commercial SQP packages include checks for the feasibility of the sub-problem in order to account for rank deficiencies. In addition to fmincon, SNOPT and FILTERSQP are two other commercial SQP packages, and each uses a different non-linear method to solve the quadratic subproblem.[1] [[Line_search_methods] Line search methods] and [[trust-region]] methods are trusted options for this step, and sub-gradient methods have also been proposed. The other common modification to SQP (ubiquitous to commercial packages) is to invoke [[Quasi-Newton_methods] Quasi-Newton Methods] methods in order to avoid computing the Hessian entirely. SQP is thus very much a family of algorithms rather than a stand-alone tool for optimization. At its core, it is a method for turning large, very non-linear problems into a sequence of small quadratic problems to reduce the computational expense of the problem.<br />
<br />
=Sources=<br />
[1] Nocedal, J. and Wright, S. Numerical Optimization, 2nd. ed., Ch. 18. Springer, 2006. <br/><br />
[2] You, Fengqi. Lecture Notes, Chemical Engineering 345 Optimization. Northwestern University, 2015. <br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-07T20:30:30Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. The method dates back to 1963 and was developed and refined in the 1970's .[1] SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Since p is an incremental change to the objective function, this equation then resembles a two-term Taylor Series for the derivative of the objective function, which shows that a Taylor expansion with the increment p as a variable is equivalent to a Newton iteration. Decomposing the different equations within this system and cutting the second order term in half to match Taylor Series concepts, a minimization sub-problem can be obtained. This problem is quadratic and thus must be solved with non-linear methods, which once again introduces the need to solve a non-linear problem into the algorithm, but this predictable sub-problem with one variable is much easier to tackle than the parent problem. <br/><br />
<math>\text{min(p) } f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math>\text{s.t. } \nabla h_k p + h_k = 0</math> <br/><br />
<math>\text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the quadratic minimization sub-problem above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-7u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. fmincon is itself an SQP piece of software. In each step, the incumbent guess is plugged into the gradient, hessian, and constraint arrays, which then become parameters for the minimization problem. <br/><br />
<br/><br />
<br />
=Conclusion=<br />
SQP is powerful enough to be used in commercial software but also burdened by some intricacy. In addition to the complication from needing full-rank constraint gradients, the divergence matrix can be very difficult or laborious to assemble analytically. Commercial SQP packages include checks for the feasibility of the sub-problem in order to account for rank deficiencies. In addition to fmincon, SNOPT and FILTERSQP are two other commercial SQP packages, and each uses a different non-linear method to solve the quadratic subproblem.[1] [[Line_search_methods Line search methods]] and [[trust-region]] methods are trusted options for this step, and sub-gradient methods have also been proposed. The other common modification to SQP (ubiquitous to commercial packages) is to invoke [[Quasi-Newton_methods Quasi-Newton Methods]] methods in order to avoid computing the Hessian entirely. SQP is thus very much a family of algorithms rather than a stand-alone tool for optimization. At its core, it is a method for turning large, very non-linear problems into a sequence of small quadratic problems to reduce the computational expense of the problem.<br />
<br />
=Sources=<br />
[1] Nocedal, J. and Wright, S. Numerical Optimization, 2nd. ed., Ch. 18. Springer, 2006. <br/><br />
[2] You, Fengqi. Lecture Notes, Chemical Engineering 345 Optimization. Northwestern University, 2015. <br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-07T20:27:29Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. The method dates back to 1963 and was developed and refined in the 1970's .[1] SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Since p is an incremental change to the objective function, this equation then resembles a two-term Taylor Series for the derivative of the objective function, which shows that a Taylor expansion with the increment p as a variable is equivalent to a Newton iteration. Decomposing the different equations within this system and cutting the second order term in half to match Taylor Series concepts, a minimization sub-problem can be obtained. This problem is quadratic and thus must be solved with non-linear methods, which once again introduces the need to solve a non-linear problem into the algorithm, but this predictable sub-problem with one variable is much easier to tackle than the parent problem. <br/><br />
<math>\text{min(p) } f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math>\text{s.t. } \nabla h_k p + h_k = 0</math> <br/><br />
<math>\text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the quadratic minimization sub-problem above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-7u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. fmincon is itself an SQP piece of software. In each step, the incumbent guess is plugged into the gradient, hessian, and constraint arrays, which then become parameters for the minimization problem. <br/><br />
<br/><br />
<br />
=Conclusion=<br />
SQP is powerful enough to be used in commercial software but also burdened by some intricacy. In addition to the complication from needing full-rank constraint gradients, the divergence matrix can be very difficult or laborious to assemble analytically. Commercial SQP packages include checks for the feasibility of the sub-problem in order to account for rank deficiencies. In addition to fmincon, SNOPT and FILTERSQP are two other commercial SQP packages, and each uses a different non-linear method to solve the quadratic subproblem.[1] [[Line-Search]] and [[trust-region]] methods are trusted options for this step, and sub-gradient methods have also been proposed. The other common modification to SQP (ubiquitous to commercial packages) is to invoke [[Quasi-Newton Methods]] methods in order to avoid computing the Hessian entirely. SQP is thus very much a family of algorithms rather than a stand-alone tool for optimization. At its core, it is a method for turning large, very non-linear problems into a sequence of small quadratic problems to reduce the computational expense of the problem.<br />
<br />
=Sources=<br />
[1] Nocedal, J. and Wright, S. Numerical Optimization, 2nd. ed., Ch. 18. Springer, 2006. <br/><br />
[2] You, Fengqi. Lecture Notes, Chemical Engineering 345 Optimization. Northwestern University, 2015. <br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-07T20:22:36Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. The method dates back to 1963 and was developed and refined in the 1970's .[1] SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Since p is an incremental change to the objective function, this equation then resembles a two-term Taylor Series for the derivative of the objective function, which shows that a Taylor expansion with the increment p as a variable is equivalent to a Newton iteration. Decomposing the different equations within this system and cutting the second order term in half to match Taylor Series concepts, a minimization sub-problem can be obtained. This problem is quadratic and thus must be solved with non-linear methods, which once again introduces the need to solve a non-linear problem into the algorithm, but this predictable sub-problem with one variable is much easier to tackle than the parent problem. <br/><br />
<math>\text{min(p) } f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math>\text{s.t. } \nabla h_k p + h_k = 0</math> <br/><br />
<math>\text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the quadratic minimization sub-problem above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-7u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. fmincon is itself an SQP piece of software. In each step, the incumbent guess is plugged into the gradient, hessian, and constraint arrays, which then become parameters for the minimization problem. <br/><br />
<br/><br />
<br />
=Conclusion=<br />
SQP is powerful enough to be used in commercial software but also burdened by some intricacy. In addition to the complication from needing full-rank constraint gradients, the divergence matrix can be very difficult or laborious to assemble analytically. Commercial SQP packages include checks for the feasibility of the sub-problem in order to account for rank deficiencies. In addition to fmincon, SNOPT and FILTERSQP are two other commercial SQP packages, and each uses a different non-linear method to solve the quadratic subproblem. Line-Search and trust-region methods are trusted options for this step, and sub-gradient methods have also been proposed. The other common modifications to SQP is to invoke [[Quasi-Newton_methods]] methods in order to avoid computing the Hessian entirely.<br />
<br />
=Sources=<br />
[1] You, Fengqi. Lecture Notes, Chemical Engineering 345 Optimization. Northwestern University, 2015. <br/><br />
[2] Nocedal, J. and Wright, S. Numerical Optimization, 2nd. ed., Ch. 18. Springer, 2006.</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-07T20:13:03Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. The method dates back to 1963 and was developed and refined in the 1970's .[1] SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Since p is an incremental change to the objective function, this equation then resembles a two-term Taylor Series for the derivative of the objective function, which shows that a Taylor expansion with the increment p as a variable is equivalent to a Newton iteration. Decomposing the different equations within this system and cutting the second order term in half to match Taylor Series concepts, a minimization sub-problem can be obtained. This problem is quadratic and thus must be solved with non-linear methods, which once again introduces the need to solve a non-linear problem into the algorithm, but this predictable sub-problem with one variable is much easier to tackle than the parent problem. <br/><br />
<math>\text{min(p) } f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math>\text{s.t. } \nabla h_k p + h_k = 0</math> <br/><br />
<math>\text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the quadratic minimization sub-problem above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-7u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. fmincon is itself an SQP piece of software. In each step, the incumbent guess is plugged into the gradient, hessian, and constraint arrays, which then become parameters for the minimization problem. <br/><br />
<br/><br />
<br />
=Conclusion=<br />
SQP is powerful enough to be used in commercial software but also burdened by some intricacy. In addition to the complication from needing full-rank constraint gradients, the divergence matrix can be very difficult or laborious to assemble analytically. Commercial SQP packages include checks for the feasibility of the sub-problem in order to account for rank deficiencies. In addition to fmincon, SNOPT and FILTERSQP are two other commercial SQP packages, and each uses a different non-linear method to solve the quadratic subproblem. Line-Search and trust-region methods are trusted options for this step, and sub-gradient methods have also been proposed. The other common modifications to SQP is to invoke Quasi-Newton methods in order to avoid computing the Hessian entirely.<br />
<br />
=Sources=<br />
[1] You, Fengqi. Lecture Notes, Chemical Engineering 345 Optimization. Northwestern University, 2015. <br/><br />
[2] Nocedal, J. and Wright, S. Numerical Optimization, 2nd. ed., Ch. 18. Springer, 2006.</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T20:51:49Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Decomposing the different equations within this system, a minimization formula can be obtained: <br/><br />
<math>\text{min(p) } f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math>\text{s.t. } \nabla h_k p + h_k = 0</math> <br/><br />
<math>\text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the alternate formulation above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-1000u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. fmincon is itself an SQP piece of software. Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in 4 iterations. <br/><br />
<br/><br />
<br />
=Conclusion=<br />
SQP is powerful enough to be used in commercial software but also burdened by some intricacy. In addition to the complication from needing full-rank constraint gradients, the divergence matrix can be very difficult or laborious to assemble analytically. Quasi-Newton, Line-Search, and Sub-gradient modifications to SQP are thus common in order to avoid the Hessian. <br />
<br />
=Sources=<br />
[1] You, Fengqi. Lecture Notes, Chemical Engineering 345 Optimization. Northwestern University, 2015. <br/><br />
[2] Nocedal, J. and Wright, S. Numerical Optimization, 2nd. ed., Ch. 18. Springer, 2006.</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T20:44:06Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Decomposing the different equations within this system, a minimization formula can be obtained: <br/><br />
<math>\text{min(p) } f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math>\text{s.t. } \nabla h_k p + h_k = 0</math> <br/><br />
<math>\text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the alternate formulation above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-1000u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in 4 iterations. <br/><br />
<br/><br />
<br />
=Conclusion=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T20:42:37Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Decomposing the different equations within this system, a minimization formula can be obtained: <br/><br />
<math>\text{min(p)} f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math> s.t. \nabla h_k p + h_k = 0</math> <br/><br />
<math> \text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the alternate formulation above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-1000u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in 4 iterations. <br/><br />
<br/><br />
<br />
=Conclusion=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T20:41:36Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Decomposing the different equations within this system, a minimization formula can be obtained: <br/><br />
<math>\text{min(p)} f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math> such that \nabla h_k p + h_k = 0</math> <br/><br />
<math> \text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the alternate formulation above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-1000u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. Now, the problem is ready to be solved. The MATLAB code in figure two was implemented, using the function fmincon to solve the minimization subproblems. Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in 4 iterations. <br/><br />
<br/><br />
<br />
=Conclusion=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T20:38:33Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Decomposing the different equations within this system, a minimization formula can be obtained: <br/><br />
<math>\text{min(p)} f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math> such that \nabla h_k p + h_k = 0</math> <br/><br />
<math> \text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
[[File:SQPWikiCode.JPG|frame|Figure 2: MATLAB program for performing sequential Newton steps on quadratic subproblem.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the alternate formulation above, but even with this alternate framework, the gradient of the constraints must be full rank. This can be handled for now by artificially constraining the problem a bit further so that the derivatives of the inequality constraints are not linearly dependent. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-1000u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. With this modification, the divergence and gradient of the Lagrangian function are now as follows: <br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2(1-100exp(-100u)) \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v + exp(-100u)- \pi \end{bmatrix} </math><br/><br />
<br/> <br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z+10^4\mu_2exp(-100u) & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
<math>\text{ with }Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/> <br />
<br />
Now, the problem is ready to be solved. The MATLAB code in figure two was implemented in order to generate the iterations in table one below. Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in 4 iterations. <br/><br />
<br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/File:SQPWikiCode.JPGFile:SQPWikiCode.JPG2015-06-04T20:32:38Z<p>Ben Goodman: </p>
<hr />
<div></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T19:24:09Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Decomposing the different equations within this system, a minimization formula can be obtained: <br/><br />
<math>\text{min(p)} f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math> such that \nabla h_k p + h_k = 0</math> <br/><br />
<math> \text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the alternate formulation above, solving each subproblem in GAMS.<br />
<br />
Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in XXX iterations. <br/><br />
First Iteration: <br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T19:23:30Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. Theoretically, If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change. In practice, however, it is likely that the divergence will not be an invertible matrix because variables are likely to be linearly bound from above and below. The improvement direction "p" for the Newton's Method iterations is thus typically found in a more indirect fashion: with a quadratic minimization sub-problem that is solved using quadratic algorithms. The subproblem is derived as follows:<br/><br />
<br/><br />
<math> p = \frac{\nabla L}{\nabla^2 L} = \frac{(\nabla L)p}{(\nabla^2 L)p}</math> <br/><br />
<br/><br />
Decomposing the different equations within this system, a minimization formula can be obtained: <br/><br />
<math>\text{min{p}} f_k(x) + \nabla f_k^T p + \frac{1}{2}p^T\nabla_{xx}^2L_k p</math> <br/><br />
<math> such that \nabla h_k p + h_k = 0</math> <br/><br />
<math> \text{ and } \nabla g_k p + g_k = 0</math> <br/><br />
<br/><br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. We will switch to the alternate formulation above, solving each subproblem in GAMS.<br />
<br />
Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in XXX iterations. <br/><br />
First Iteration: <br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T00:33:34Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. This can be handled for now by artificially constraining the problem a bit further so that the divergence matrix is full rank. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-1000u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. With this modification, the divergence and gradient of the Lagrangian function are now as follows: <br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2(1-100exp(-100u)) \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v + exp(-100u)- \pi \end{bmatrix} </math><br/><br />
<br/> <br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z+10^4\mu_2exp(-100u) & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
<math>\text{ with }Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in XXX iterations. <br/><br />
First Iteration: <br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T00:32:25Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. This can be handled for now by artificially constraining the problem a bit further so that the divergence matrix is full rank. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-1000u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. With this modification, the divergence and gradient of the Lagrangian function are now as follows: <br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2(1-100exp(-100u)) \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v + exp(-100u)- \pi \end{bmatrix} </math><br />
<br/> <br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z+10^4\mu_2exp(-100u) & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in XXX iterations. <br/><br />
First Iteration: <br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T00:30:12Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. This can be handled for now by artificially constraining the problem a bit further so that the divergence matrix is full rank. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-1000u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. With this modification, the divergence and gradient of the Lagrangian function are now as follows: <br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2(1-100exp(-100u)) \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v + exp(-100u)- \pi \end{bmatrix} </math><br />
and <br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z+10^4exp(-100u) & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in XXX iterations. <br/><br />
First Iteration: <br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-04T00:29:12Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
The first important limitation in using SQP is now apparent: the divergence matrix is not invertible because it is not full rank. This can be handled for now by artificially constraining the problem a bit further so that the divergence matrix is full rank. This can be accomplished through a small modification to the <math>\mu_2</math> constraint. <br/><br />
If <math> u + v \le \pi</math>, then it is also true that <math> u + v + exp(-1000u) \le \pi</math> <br/><br />
<br/><br />
The addition to the left-hand-side is relatively close to zero for the range of possible values of <math>u</math>, so the feasible region has not changed much. This complication is certainly annoying, however, for a problem that's easily solved by inspection. It illustrates that SQP is truly best for problems with highly non-linear objectives and constraints. With this modification, the divergence and gradient of the Lagrangian function are now as follows: <br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2(1-100exp(-100u)) \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v + exp(-100u)- \pi \end{bmatrix} </math><br />
and <br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z+10^4exp(-100U) & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in XXX iterations. <br/><br />
First Iteration: <br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-03T22:47:48Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br />
with <math>Z = sin(u)cos(v) + cos(u)sin(v)</math><br/><br />
<br/><br />
<br />
Starting with an initial guess of <math>(u, v, \lambda_1, \mu_1, \mu_2) = (1, 1, 0, 0, 0)</math>, SQP converges in XXX iterations. <br/><br />
First Iteration: <br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-03T22:24:35Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background: Prerequisite Methods=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. Lagrangian funtion with its gradient and divergence are as follows: <br/><br />
<math> L = sin(u)cos(v) + cos(u)sin(v) + \lambda_1 (u - v^3) + \mu_1 ( -u - v) + \mu_2 (u + v - \pi)</math> <br/><br />
<br/><br />
<br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{du} \\ \frac{dL}{dv} \\ \frac{dL}{d\lambda_1} \\ \frac{dL}{d\mu_1} \\ \frac{dL}{d\mu_2} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} cos(v)cos(u) - sin(u)sin(v) + \lambda_1 - \mu_1 + \mu_2 \\ cos(v)cos(u) - sin(u)sin(v) - 3\lambda_1v^2 - \mu_1 + \mu_2 \\ u - v^3 \\ -u - v \\ u + v - \pi \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<math>\nabla^2 L = </math><math>\begin{bmatrix} -Z & -Z & 1 & -1 & 1 \\ -Z & -(Z + 6\lambda_1v^2) & -3v^2 & -1 & 1 \\ 1 & -3v^2 & 0 & 0 & 0 \\ -1 & -1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-03T21:56:33Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Figure 1: Solution of Example Problem by Inspection.<span style="font-size: 12pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{\pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
Now, the problem will be solved using the sequential quadratic programming algorithm. The gradient and divergence of the Lagrangian function are as follows: <br/><br />
<br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-03T21:48:56Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.jpg|frame|Solution of Example Problem by Inspection.<span style="font-size: 8pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-03T21:48:03Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_inspection.JPG|frame|Solution of Example Problem by Inspection.<span style="font-size: 8pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-03T21:46:30Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
[[File:Wiki_Inspection.JPG|frame|Solution of Example Problem by Inspection.<span style="font-size: 8pt; position:relative; bottom: 0.3em;">10</span>]]<br />
<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/File:Wiki_Inspection.jpgFile:Wiki Inspection.jpg2015-06-03T21:42:47Z<p>Ben Goodman: Relevant to example problem in Sequential Quadratic Programming Page</p>
<hr />
<div>Relevant to example problem in Sequential Quadratic Programming Page</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-06-03T21:29:43Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
<math> \text{ Max Z = sin(u)cos(v) + cos(u)sin(v)}</math> <br/><br />
<math> \text{ s.t. } 0 \le \text{(u+v)} \le \pi</math> <br/><br />
<math> \text{ } u = v^3 </math> <br/><br />
<br/><br />
<br />
This example problem was chosen for being highly non linear but also easy to solve by inspection as a reference. The objective function Z is a trigonometric identity: <br/><br />
<math> \text{ sin(u)cos(v) + cos(u)sin(v) = sin(u+v)}</math> <br/><br />
<br/><br />
The first constraint then just restricts the feasible zone to the first half of a period of the sine function, making the problem convex. The maximum of the sine function within this region occurs at <math>\frac{\pi}{2}</math>, as shown in Figure 1. The last constraint then makes the problem easy to solve algebraically: <br/><br />
<math> u + v = \frac{\pi}{2}</math> <br/><br />
<math> u = v^3 </math> <br/><br />
<math>v^3 + v = \frac{pi}{2}</math> <br/><br />
<math>v = 0.883</math> and <math>u = v^3 = 0.688</math> <br/><br />
<br/><br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-28T05:12:19Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^{-1} \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
Unlike the active set method, the need to ever solve a system of non-linear equations has been entirely eliminated in SQP, no matter how non-linear the objective and constraints. If the derivative expressions above can be formulated analytically then coded, software could iterate very quickly because the system doesn't change.<br/><br />
<br/><br />
<br />
<br />
==Example Problem==<br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-28T05:02:04Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> (\nabla^2 L_k)^-1 \nabla L_k</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<br />
<br />
<br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/></div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-28T04:55:36Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
<br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
<br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. This is an important consideration in non-convex problems with multiple local maximums and minimums. Newton's method will find the critical point closest to the original guess. Incorporating Newton's Method into the active set method will transform the iteration above into a matrix equation. <br/><br />
<br/><br />
<br />
<br />
=The SQP Algorithm= <br />
Critical points of the objective function will also be critical points of the Lagrangian function and vice versa because the Lagrangian function is equal to the objective function at a KKT point; all constraints are either equal to zero or inactive. The algorithm is thus simply iterating Newton's method to find critical points of the Lagrangian function. Since the Lagrangian multipliers are additional variables, the iteration forms a system:<br/><br />
<math> \begin{bmatrix} x_{k+1} \\ \lambda_{k+1} \\ \mu_{k+1} \end{bmatrix} =</math><br />
<math> \begin{bmatrix} x_{k} \\ \lambda_{k} \\ \mu_{k} \end{bmatrix} -</math><math> \nabla^2^-1 L (\nabla L)</math> <br/><br />
<br/><br />
<br />
Recall: <math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math> <br/><br />
<br/><br />
Then <math>\nabla^2 L = </math><math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
<br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
As with the active set method, the Lagrangian function forms the basis of SQP: <br/><br />
<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
The active set method alone must be performed with only the first order term of the Taylor Series for <math>L</math> so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. The quadratic sub-problem is itself a minimization problems with an improvement parameter <math>p_x</math> and the Lagrangian operators <math>\lambda</math> and <math>\mu</math> as the as the variables. The functions in the problem have been fed incumbent guesses for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>, and so the functions are denoted with a subscript "k," they have been fed <math>x_k</math>, <math>\lambda_k</math>, and<math>\mu_k</math>. The problem is: <br/><br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/><br />
Z is, in essence, the derivative of the objective function and has a minimum of zero in this algorithm because the improvement parameters will solve to zero once this global convergence to a critical point has been achieved. <br />
<br />
<br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g \\ h \\ g \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, a "second" derivative matrix is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The "second" derivative matrix does indeed include the divergence of the Lagrangian function with respect to <math>x</math>, but it can be understood more accurately to be the gradient of the KKT conditions system above; derivatives with respect to each variable. This matrix is then: <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
The system to solve is then:<br/><br />
<br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-28T04:23:39Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx} \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. <br/><br />
<br/><br />
=The SQP Algorithm= <br />
<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
<br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
As with the active set method, the Lagrangian function forms the basis of SQP: <br/><br />
<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
The active set method alone must be performed with only the first order term of the Taylor Series for <math>L</math> so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. The quadratic sub-problem is itself a minimization problems with an improvement parameter <math>p_x</math> and the Lagrangian operators <math>\lambda</math> and <math>\mu</math> as the as the variables. The functions in the problem have been fed incumbent guesses for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>, and so the functions are denoted with a subscript "k," they have been fed <math>x_k</math>, <math>\lambda_k</math>, and<math>\mu_k</math>. The problem is: <br/><br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/><br />
Z is, in essence, the derivative of the objective function and has a minimum of zero in this algorithm because the improvement parameters will solve to zero once this global convergence to a critical point has been achieved. <br />
<br />
<br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g \\ h \\ g \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, a "second" derivative matrix is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The "second" derivative matrix does indeed include the divergence of the Lagrangian function with respect to <math>x</math>, but it can be understood more accurately to be the gradient of the KKT conditions system above; derivatives with respect to each variable. This matrix is then: <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
The system to solve is then:<br/><br />
<br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-28T04:22:50Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
<br/><br />
<br />
<br />
=Introduction=<br />
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world. It is powerful enough for real problems because it can handle any degree of non-linearity including non-linearity in the constraints. The main disadvantage is that the method incorporates several derivatives, which likely need to be worked analytically in advance of iterating to a solution, so SQP becomes quite cumbersome for large problems with many variables or constraints. SQP combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method, both of which are explained briefly below. Previous exposure to the component methods as well as to Lagrangian multipliers and Karush-Kuhn-Tucker (KKT) conditions is helpful in understanding SQP. The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <math>x</math> is potentially a vector of many variables for the optimization, in which case h(x) and g(x) are systems.<br/><br />
<br/><br />
<br />
<br />
=Background=<br />
==Karush-Kuhn-Tucker (KKT) Conditions and the Lagrangian Function==<br />
The Lagrangian function combines all the information about the problem into one function using Lagrangian multipliers <math>\lambda</math> for equality constraints and <math>\mu</math> for inequality constraints:<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
A single function can be optimized by finding critical points where the gradient is zero. This procedure now includes <math>\lambda</math> and <math>\mu</math> as variables (which are vectors for multi-constraint NLP). The system formed from this gradient is given the label KKT conditions: <br/><br />
<br/><br />
<math>\nabla L =</math><math>\begin{bmatrix} \frac{dL}{dx] \\ \frac{dL}{d\lambda} \\ \frac{dL}{d\mu} \end{bmatrix} =</math><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g^* \\ h \\ g^* \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
The second KKT condition is merely feasibility; h(x) were constrained to zero in the original NLP. The third KKT condition is a bit trickier in that only the set of active inequality constraints need satisfy this equality, the active set being denoted by <math>g^*</math>. Inequality constraints that are nowhere near the optimal solution are inconsequential, but constraints that actively participate in determining the optimal solution will be at their limit of zero, and thus the third KKT condition holds. Ultimately, the Lagrangian multipliers describe the change in the objective function with respect to a change in a constraint, so <math>\mu</math> is zero for inactive constraints, so those inactive constraints can be considered removed from the Lagrangian function before the gradient is even taken. <br/><br />
<br/><br />
==The Active Set Method and its Limitations==<br />
The active set method solves the KKT conditions using guess and check to find critical points. Guessing that every inequality constraints is inactive is conventionally the first step. After solving the remaining system for <math>x</math>, feasibility can be checked. If any constraints are violated, they should be considered active in the next iteration, and if any multipliers are found to be negative, their constraints should be considered inactive in the next iteration. Efficient convergence and potentially large systems of equations are of some concern, but the main limitation of the active set method is that many of the derivative expressions in the KKT conditions could still be highly non-linear and thus difficult to solve. Indeed, only quadratic problems seem reasonable to tackle with the active set method because the KKT conditions are linear. Sequential Quadratic Programming addresses this key limitation by incorporating a means of handling highly non-linear functions: Newton's Method.<br/><br />
<br/><br />
==Newton's Method==<br />
The main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. Walking through a few extreme scenarios makes this approach more intuitive: a long, steep incline in a function will not be close to a critical point, so the improvement should be large, and a shallow incline that is rapidly expiring is likely to be near a critical point, so the improvement should be small. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The negative sign is important. Near minimums, a positive gradient should decrease the guess and vice versa, and the divergence is positive. Near maximums, a positive gradient should increase the guess and vice versa, but the divergence is negative. This sign convention also prevents the algorithm from escaping a single convex or concave region; the improvement will reverse direction if it overshoots. <br/><br />
<br/><br />
=The SQP Algorithm= <br />
<br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
<br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
As with the active set method, the Lagrangian function forms the basis of SQP: <br/><br />
<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
The active set method alone must be performed with only the first order term of the Taylor Series for <math>L</math> so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. The quadratic sub-problem is itself a minimization problems with an improvement parameter <math>p_x</math> and the Lagrangian operators <math>\lambda</math> and <math>\mu</math> as the as the variables. The functions in the problem have been fed incumbent guesses for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>, and so the functions are denoted with a subscript "k," they have been fed <math>x_k</math>, <math>\lambda_k</math>, and<math>\mu_k</math>. The problem is: <br/><br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/><br />
Z is, in essence, the derivative of the objective function and has a minimum of zero in this algorithm because the improvement parameters will solve to zero once this global convergence to a critical point has been achieved. <br />
<br />
<br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g \\ h \\ g \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, a "second" derivative matrix is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The "second" derivative matrix does indeed include the divergence of the Lagrangian function with respect to <math>x</math>, but it can be understood more accurately to be the gradient of the KKT conditions system above; derivatives with respect to each variable. This matrix is then: <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
The system to solve is then:<br/><br />
<br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-28T02:20:23Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
<br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
As with the active set method, the Lagrangian function forms the basis of SQP: <br/><br />
<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
<br/><br />
The active set method alone must be performed with only the first order term of the Taylor Series for <math>L</math> so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. The quadratic sub-problem is itself a minimization problems with an improvement parameter <math>p_x</math> and the Lagrangian operators <math>\lambda</math> and <math>\mu</math> as the as the variables. The functions in the problem have been fed incumbent guesses for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>, and so the functions are denoted with a subscript "k," they have been fed <math>x_k</math>, <math>\lambda_k</math>, and<math>\mu_k</math>. The problem is: <br/><br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/><br />
Z is, in essence, the derivative of the objective function and has a minimum of zero in this algorithm because the improvement parameters will solve to zero once this global convergence to a critical point has been achieved. <br />
<br />
<br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g \\ h \\ g \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, a "second" derivative matrix is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The "second" derivative matrix does indeed include the divergence of the Lagrangian function with respect to <math>x</math>, but it can be understood more accurately to be the gradient of the KKT conditions system above; derivatives with respect to each variable. This matrix is then: <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
The system to solve is then:<br/><br />
<br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-28T02:10:41Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
<br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. The quadratic sub-problem is itself a minimization problems with improvement parameters <math>p_x</math>, <math>p_\lambda</math>, and <math>p_\mu</math> as the variables. The functions in the problem have been fed incumbent guesses for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>, and so the functions are denoted with a subscript "k," they have been fed <math>x_k</math>, <math>\lambda_k</math>, and<math>\mu_k</math>. The problem is: <br/><br />
<br/><br />
<math> \text{min Z} =</math><math>L(x_k,\lambda_k, \mu_k) +</math><math>\nabla L_k *p_x +</math><math>\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x</math> <br/><br />
<math>s.t. </math> <br />
<br/><br />
Z is, in essence, the derivative of the objective function and has a minimum of zero in this algorithm because the improvement parameters will solve to zero once this global convergence to a critical point has been achieved. <br />
<br />
<br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g \\ h \\ g \end{bmatrix} </math><math>=0</math> <br/><br />
<br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, a "second" derivative matrix is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The "second" derivative matrix does indeed include the divergence of the Lagrangian function with respect to <math>x</math>, but it can be understood more accurately to be the gradient of the KKT conditions system above; derivatives with respect to each variable. This matrix is then: <br/><br />
<br/><br />
<math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
The system to solve is then:<br/><br />
<br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-28T01:33:32Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
<br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math> <br/><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g \\ h \\ g \end{bmatrix} </math><math>=0</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, a "second" derivative matrix is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The "second" derivative matrix does indeed include the divergence of the Lagrangian function with respect to <math>x</math>, but it can be understood more accurately to be the gradient of the KKT conditions system above; derivatives with respect to each variable. This matrix is then: <br/><br />
<math>\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix} </math> <br/><br />
<br/><br />
<br />
The system to solve is then:<br/><br />
<br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-28T01:11:14Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
<br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math> <br/><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g \\ h \\ g \end{bmatrix} </math><math>=0</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, the second derivative of the Lagrangian function is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The system to solve is then:<br/><br />
<br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-26T00:46:05Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math> <br/><br />
<math>\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g \\ h \\ g \end{bmatrix} </math><math>=0</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, the second derivative of the Lagrangian function is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The system to solve is then:<br/><br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-26T00:44:40Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math> <br/><br />
<math>\begin{bmatrix} \nabla \text{f(x)+} \lambda \nabla \text{h(x)+} \mu \nabla g(x) \\ h(x) \\ g(x) \end{bmatrix} </math><math>=0</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, the second derivative of the Lagrangian function is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The system to solve is then:<br/><br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-26T00:43:04Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math> <br/><br />
<math>\begin{bmatrix}\nabla \text{f(x)+} \lambda \nabla \text{h(x)+} \mu \nabla g(x) \\ h(x) \\ g(x) \end{bmatrix}</math><math>=0</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, the second derivative of the Lagrangian function is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The system to solve is then:<br/><br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-26T00:41:18Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math> <br/><br />
<math>\begin{bmatrix}\nabla f(x)+ \lambda \nabla h(x)+ \mu \nabla g(x)\\h(x)\\g(x)\end{bmatrix}</math><math>=0</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, the second derivative of the Lagrangian function is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The system to solve is then:<br/><br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-26T00:39:29Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked. <br/><br />
Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function <math>f</math> with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math>, <math>\lambda</math>, and<math>\mu</math> <br/><br />
<math>\begin{bmatrix}\nabla f(x) + \lambda \nabla h(x) + \mu \nabla g(x)\\h(x)\\g(x)\end{bmatrix} = 0</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math>, <math>\lambda</math>, and<math>\mu</math>. Similarly, the second derivative of the Lagrangian function is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The system to solve is then:<br/><br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-26T00:24:28Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. Newton’s method is an algorithm for converging to critical values of any function with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
Briefly, the idea with this algorithm is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess.<br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math><math>\sum_i</math><math>\lambda_i h_i(x)+</math><math>\sum_i</math><math>\mu_i g_i(x)</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math> and <math>\lambda</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math> and <math>\lambda</math>. Similarly, the second derivative of the Lagrangian function is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters <math>p_x</math> and <math>p_\lambda</math>. The system to solve is then:<br/><br />
<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-26T00:15:54Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. Newton’s method is an algorithm for converging to critical values of any function with improvement steps that follow the form below: <br/><br />
<math>x_{k+1} =</math> <math> x_k - </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
Briefly, the idea with this algorithm is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess.<br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math> and <math>\lambda</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math> and <math>\lambda</math>, and these computed values then scale the first order improvement parameter <math>p</math> from the Taylor series. Similarly, the second derivative of the Lagrangian function is fed the incumbent guess and scales the second order term for <math>p_x</math>. <math>p_{\lambda}</math> is scaled in the second order term by the derivative of the complementary KKT condition. Thus, the second-order parameters form the gradient (evaluated for each variable) of the first-order parameters. The system to solve is then:<br/><br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-26T00:13:10Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. Newton’s method is an algorithm for converging to critical values of any function with improvement steps that follow the form below: <br/><br />
<math>x_{k+1}</math> <math> \text{= x_k -} </math> <math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
Briefly, the idea with this algorithm is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess.<br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lambda,\mu)</math><math>\text{ = f(x) +}</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math> and <math>\lambda</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math> and <math>\lambda</math>, and these computed values then scale the first order improvement parameter <math>p</math> from the Taylor series. Similarly, the second derivative of the Lagrangian function is fed the incumbent guess and scales the second order term for <math>p_x</math>. <math>p_{\lambda}</math> is scaled in the second order term by the derivative of the complementary KKT condition. Thus, the second-order parameters form the gradient (evaluated for each variable) of the first-order parameters. The system to solve is then:<br/><br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-25T03:06:39Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. Newton’s method is an algorithm for converging to critical values of any function with improvement steps that follow the form below: <br/><br />
<math>x_{k+1}</math><math> \text{ = x_k - } </math><math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
Briefly, the idea with this algorithm is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess.<br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br/><br />
<math>L(x,\lamda,\mu)</math><math>\text{ = f(x) +}</math> <br/><br />
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of <math>x</math> and <math>\lambda</math> <br/><br />
In SQP, the system is not solved but rather fed the incumbent guess for <math>x</math> and <math>\lambda</math>, and these computed values then scale the first order improvement parameter <math>p</math> from the Taylor series. Similarly, the second derivative of the Lagrangian function is fed the incumbent guess and scales the second order term for <math>p_x</math>. <math>p_{\lambda}</math> is scaled in the second order term by the derivative of the complementary KKT condition. Thus, the second-order parameters form the gradient (evaluated for each variable) of the first-order parameters. The system to solve is then:<br/><br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-25T02:53:22Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. Newton’s method is an algorithm for converging to critical values of any function with improvement steps that follow the form below: <br/><br />
<math>x_{k+1}</math><math> \text{ = x_k - } </math><math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
<br/><br />
Briefly, the idea with this algorithm is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess.<br/><br />
<br/><br />
The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods. <br/><br />
=SQP Algorithm=<br />
The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. <br/><br />
As with the active set method alone, the Lagrangian function is used under KKT conditions:<br />
<math>L(x,\lamda,\mu)</math><math>\text{ = f(x) +}</math><br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-25T02:21:56Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. Newton’s method is an algorithm for converging to critical values of any function with improvement steps that follow the form below: <br/><br />
<math>x_{k+1}</math><math> \text{ = x_k - } </math><math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
=SQP Algorithms=<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-25T02:21:17Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. Newton’s method is an algorithm for converging to critical values of any function with improvement steps that follow the form below: <br/><br />
<math>x_{k+1}</math><math> \text{ = x_k - }</math><math>\frac{\nabla f}{\nabla^2 f} </math> <br/><br />
=SQP Algorithms=<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-25T02:20:16Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. Newton’s method is an algorithm for converging to critical values of any function with improvement steps that follow the form below: <br/><br />
<math>x_{k+1}</math><math> = \frac{\nabla f}{\nabla^2 f} </math> <br/><br />
=SQP Algorithms=<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodmanhttps://optimization.mccormick.northwestern.edu/index.php/Sequential_quadratic_programmingSequential quadratic programming2015-05-25T02:19:40Z<p>Ben Goodman: </p>
<hr />
<div>Authored by: Ben Goodman (ChE 345 Spring 2016) <br/><br />
Steward: Dajun Yue and Fenqi You<br/><br />
=Introduction=<br />
Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.<br/><br />
<br/><br />
The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:<br/><br />
<math> \text{min f(x)} </math> <br/><br />
<math> \text{s.t. h(x) = 0} </math> <br/><br />
<math> \text{and g(x)} \le 0 </math> <br/><br />
<br/><br />
with f(x), h(x), and g(x) each potentially non-linear. <br/><br />
=Background=<br />
Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. Newton’s method is an algorithm for converging to critical values of any function with improvement steps that follow the form below: <br/><br />
<math>x_{k+1}</math><math> x_k \frac{\nabla f}{\nabla^2 f} </math> <br/><br />
=SQP Algorithms=<br />
=Convergence Analysis=<br />
=Example=</div>Ben Goodman