Subgradient optimization

From optimization
Revision as of 16:19, 24 May 2015 by AaronAnderson (Talk | contribs)

Jump to: navigation, search

Author Name: Aaron Anderson (ChE 345 Spring 2015)
Steward: Dajun Yue and Fengqi You

Subgradient Optimization (or Subgradient Method) is an iterative algorithm for minimizing convex functions, used predominantly in Nondifferentiable optimization for functions that are convex but nondifferentiable. It is often slower than Newton's Method when applied to convex differentiable functions, but can be used on convex nondifferentiable functions where Newton's Method will not converge. It was first developed by Naum Z. Shor in the Soviet Union in the 1960's.

Contents

Introduction

A convex nondifferentiable function (blue) with red "subtangent" lines approximating the derivative at the nondifferentiable point x0.

The Subgradient (related to Subderivative and Subdifferential) of a function is a way of generalizing or approximating the derivative of a convex function at nondifferentiable points. The definition of a subgradient is as follows: g is a subgradient of f at x if, for all y, the following is true:

Subgradient.png

An example of the subgradient of a nondifferentiable convex function f can be seen below:

Subgradient2.png

Where g_1 is a subgradient at point x_1 and g_2 and g_3 are subgradients at point x_2. Notice that when the function is differentiable, such as at point x_1, the subgradient, g_1, just becomes the gradient to the function. Other important factors of the subgradient to note are that the subgradient gives a linear global underestimator of f and if f is convex, then there is at least one subgradient at every point in its domain. The set of all subgradients at a certain point is called the subdifferential, and is written as \partial f(x_0) at point x_0.

The Subgradient Method

Suppose f:\mathbb{R}^n \to \mathbb{R} is a convex function with domain \mathbb{R}^n. To minimize f the subgradient method uses the iteration:

Submethod1.png

Where k is the number of iterations, x^{(k)} is the kth iterate, g^{(x)} is any subgradient at x^{(k)}, and \alpha_k(> 0) is the kth step size. Thus, at each iteration of the subgradient method, we take a step in the direction of a negative subgradient. As explained above, when f is differentiable, g^{(k)} simply reduces to \nablaf(x^{(k)}). It is also important to note that the subgradient method is not a descent method in that the new iterate is not always the best iterate. Thus we need some way to keep track of the best solution found so far, i.e. the one with the smallest function value. We can do this by, after each step, setting

Submethod2.png

and setting i_{\text{best}}^{(k)} = k if x^{(k)} is the best (smallest) point found so far. Thus we have:

Submethod3.png

which gives the best objective value found in k iterations. Since this value is decreasing, it has a limit (which can be -\infty).

Step size

Several different step size rules can be used:

  • Constant step size: \alpha_k = h independent of k.
  • Constant step length: Stepsize1.png This means that Stepsize2.png
  • Square summable but not summable: These step sizes satisfy
Stepsize3.png
One typical example is Stepsize4.png where a>0 and b\ge0.
  • Nonsummable diminishing: These step sizes satisfy
Stepsize5.png
One typical example is Stepsize6.png where a>0.

An important thing to note is that for all four of the rules given here, the step sizes are determined "off-line", or before the method is iterated. Thus the step sizes do not depend on preceding iterations. This "off-line" property of subgradient methods differs from the "on-line" step size rules used for descent methods for differentiable functions where the step sizes do depend on preceding iterations.

Convergence Results

There are different results on convergence for the subgradient method depending on the different step size rules applied. For constant step size rules and constant step length rules the subgradient method is guaranteed to converge within some range of the optimal value. Thus:

Convergence1.png

where f^{*} is the optimal solution to the problem and \epsilon is the aforementioned range of convergence. This means that the subgradient method finds a point within \epsilon of the optimal solution f^{*}. \epsilon is number that is a function of the step size parameter h, and as h decreases the range of convergence \epsilon also decreases, i.e. the solution of the subgradient method gets closer to f^{*} with a smaller step size parameter h. For the diminishing step size rule and the square summable but not summable rule, the algorithm is guaranteed to converge to the optimal value or \lim_{k\rightarrow\infty} f(x^k).