# Sequential quadratic programming

Jump to: navigation, search

Authored by: Ben Goodman (ChE 345 Spring 2016)
Steward: Dajun Yue and Fenqi You

# Introduction

Sequential quadratic programming (SQP) combines two fundamental algorithms for solving non-linear optimization problems: an active set method and Newton’s method. The approach creates quadratic sub-problems using the active set method, and these subproblems reveal the best improvement to be made to a current guess. Newton’s method can then find the solution to this sub-problem in one iteration because it is quadratic. The added efficiency from this dual approach makes SQP appropriate for larger non-linear problems and problems with high non-linearity in the constraints.

The abstracted, general problem below will be used for the remainder of this page to explain and discuss SQP:
$\text{min f(x)}$
$\text{s.t. h(x) = 0}$
$\text{and g(x)} \le 0$

with f(x), h(x), and g(x) each potentially non-linear.

# Background

Previous knowledge of the component methods is helpful in understanding sequential quadratic programming. Briefly, the active set method confines the search for optimal solutions to regions where the objective function is increasing significantly with respect to constraint functions. Lagrangian Parameters and KKT conditions provide the framework to find these regions and converge to the optimum solution. The basic idea is that Lagrangian parameters represent the change in the objective with respect to the constraint, which allows the chain rule and single-variable calculus optimization approaches to be invoked.

Briefly, the main idea behind Newton's Method is to improve a guess in proportion to how quickly the function is changing at the guess and inversely proportional to how the function is accelerating at the guess. The iterations converge to critical values of any function $f$ with improvement steps that follow the form below:
$x_{k+1} =$ $x_k -$ $\frac{\nabla f}{\nabla^2 f}$

The final concept fundamental to SQP is Taylor Series expansions; the idea that any function can be well represented by an infinite series of polynomial terms. This concept extends to expressing derivatives as a series of polynomial deviations from a given starting point with each term scaled by the analytical derivative evaluated at the starting point. When the deviation is small, one or two terms can be used with adequate accuracy. This concept allows highly non-linear problems to be handled using linear and quadratic methods.

# SQP Algorithm

The active set method alone must be performed with only the first order term of the Taylor Series so that the resulting sub-problem is linear. Newton's method in tandem allows the second order term of the Taylor Series to be added, forming a quadratic sub-problem, because Newton's method converges in one iteration for quadratic problems. The quadratic sub-problem is itself a minimization problems with improvement parameters $p_x$, $p_\lambda$, and $p_\mu$ as the variables. The functions in the problem have been fed incumbent guesses for $x$, $\lambda$, and$\mu$, and so the functions are denoted with a subscript "k," they have been fed $x_k$, $\lambda_k$, and$\mu_k$. The problem is:

$\text{min Z} =$$L(x_k,\lambda_k, \mu_k) +$$\nabla L_k *p_x +$$\frac{1}{2} p_x^T \nabla_{xx}^2 L_k p_x$
$s.t.$
Z is, in essence, the derivative of the objective function and has a minimum of zero in this algorithm because the improvement parameters will solve to zero once this global convergence to a critical point has been achieved.

As with the active set method alone, the Lagrangian function is used under KKT conditions:
$L(x,\lambda,\mu)$$\text{ = f(x) +}$$\sum_i$$\lambda_i h_i(x)+$$\sum_i$$\mu_i g_i(x)$
The KKT conditions form the system of equations below. In the active set method alone, this system is solved directly for values of $x$, $\lambda$, and$\mu$.

$\begin{bmatrix} \nabla f + \lambda \nabla h + \mu \nabla g \\ h \\ g \end{bmatrix}$$=0$

In SQP, the system is not solved but rather fed the incumbent guess for $x$, $\lambda$, and$\mu$. Similarly, a "second" derivative matrix is also fed the incumbent guess, and these computed values then scale terms in the system of equation for the improvement parameters $p_x$ and $p_\lambda$. The "second" derivative matrix does indeed include the divergence of the Lagrangian function with respect to $x$, but it can be understood more accurately to be the gradient of the KKT conditions system above; derivatives with respect to each variable. This matrix is then:

$\begin{bmatrix} \nabla_{xx}^2 L & \nabla h & \nabla g \\ \nabla h & 0 & 0 \\ \nabla g & 0 & 0 \end{bmatrix}$

The system to solve is then: