Difference between revisions of "Subgradient optimization"

From optimization
Jump to: navigation, search
(Example: Piecewise linear minimization)
(Example: Piecewise linear minimization)
Line 46: Line 46:
The subgradient in this case is <math>g=a_j</math>. Thus the iterative update is then:
The subgradient in this case is <math>g=a_j</math>. Thus the iterative update is then:
Where <math>j</math> is chosen such to satisfy [[File:Example4.png]]

Revision as of 16:38, 24 May 2015

Author Name: Aaron Anderson (ChE 345 Spring 2015)
Steward: Dajun Yue and Fengqi You

A convex nondifferentiable function (blue) with red "subtangent" lines approximating the derivative at the nondifferentiable point x0.

Subgradient Optimization (or Subgradient Method) is an iterative algorithm for minimizing convex functions, used predominantly in Nondifferentiable optimization for functions that are convex but nondifferentiable. It is often slower than Newton's Method when applied to convex differentiable functions, but can be used on convex nondifferentiable functions where Newton's Method will not converge. It was first developed by Naum Z. Shor in the Soviet Union in the 1960's.



The Subgradient (related to Subderivative and Subdifferential) of a function is a way of generalizing or approximating the derivative of a convex function at nondifferentiable points. The definition of a subgradient is as follows: g is a subgradient of f at x if, for all y, the following is true:


An example of the subgradient of a nondifferentiable convex function f can be seen below:


Where g_1 is a subgradient at point x_1 and g_2 and g_3 are subgradients at point x_2. Notice that when the function is differentiable, such as at point x_1, the subgradient, g_1, just becomes the gradient to the function. Other important factors of the subgradient to note are that the subgradient gives a linear global underestimator of f and if f is convex, then there is at least one subgradient at every point in its domain. The set of all subgradients at a certain point is called the subdifferential, and is written as \partial f(x_0) at point x_0.

The Subgradient Method

Suppose f:\mathbb{R}^n \to \mathbb{R} is a convex function with domain \mathbb{R}^n. To minimize f the subgradient method uses the iteration:


Where k is the number of iterations, x^{(k)} is the kth iterate, g^{(x)} is any subgradient at x^{(k)}, and \alpha_k(> 0) is the kth step size. Thus, at each iteration of the subgradient method, we take a step in the direction of a negative subgradient. As explained above, when f is differentiable, g^{(k)} simply reduces to \nablaf(x^{(k)}). It is also important to note that the subgradient method is not a descent method in that the new iterate is not always the best iterate. Thus we need some way to keep track of the best solution found so far, i.e. the one with the smallest function value. We can do this by, after each step, setting


and setting i_{\text{best}}^{(k)} = k if x^{(k)} is the best (smallest) point found so far. Thus we have:


which gives the best objective value found in k iterations. Since this value is decreasing, it has a limit (which can be -\infty).

Step size

Several different step size rules can be used:

  • Constant step size: \alpha_k = h independent of k.
  • Constant step length: Stepsize1.png This means that Stepsize2.png
  • Square summable but not summable: These step sizes satisfy
One typical example is Stepsize4.png where a>0 and b\ge0.
  • Nonsummable diminishing: These step sizes satisfy
One typical example is Stepsize6.png where a>0.

An important thing to note is that for all four of the rules given here, the step sizes are determined "off-line", or before the method is iterated. Thus the step sizes do not depend on preceding iterations. This "off-line" property of subgradient methods differs from the "on-line" step size rules used for descent methods for differentiable functions where the step sizes do depend on preceding iterations.

Convergence Results

There are different results on convergence for the subgradient method depending on the different step size rules applied. For constant step size rules and constant step length rules the subgradient method is guaranteed to converge within some range of the optimal value. Thus:


where f^{*} is the optimal solution to the problem and \epsilon is the aforementioned range of convergence. This means that the subgradient method finds a point within \epsilon of the optimal solution f^{*}. \epsilon is number that is a function of the step size parameter h, and as h decreases the range of convergence \epsilon also decreases, i.e. the solution of the subgradient method gets closer to f^{*} with a smaller step size parameter h. For the diminishing step size rule and the square summable but not summable rule, the algorithm is guaranteed to converge to the optimal value or Convergence2.png When the function f is differentiable the subgradient method with constant step size yields convergence to the optimal value, provided the parameter h is small enough.

Example: Piecewise linear minimization

Suppose we wanted to minimize the following piecewise linear convex function using the subgradient method:


Since this is a linear programming problem finding a subgradient is simple: given x we can find an index j for which:


The subgradient in this case is g=a_j. Thus the iterative update is then:


Where j is chosen such to satisfy Example4.png