Author Name: Aaron Anderson (ChE 345 Spring 2015)
Steward: Dajun Yue and Fengqi You
Subgradient Optimization (or Subgradient Method) is an iterative algorithm for minimizing convex functions, used predominantly in Nondifferentiable optimization for functions that are convex but nondifferentiable. It is often slower than Newton's Method when applied to convex differentiable functions, but can be used on convex nondifferentiable functions where Newton's Method will not converge. It was first developed by Naum Z. Shor in the Soviet Union in the 1960's.
The Subgradient (related to Subderivative and Subdifferential) of a function is a way of generalizing or approximating the derivative of a convex function at nondifferentiable points. The definition of a subgradient is as follows: is a subgradient of at if, for all , the following is true:
An example of the subgradient of a nondifferentiable convex function can be seen below:
Where is a subgradient at point and and are subgradients at point . Notice that when the function is differentiable, such as at point , the subgradient, , just becomes the gradient to the function. Other important factors of the subgradient to note are that the subgradient gives a linear global underestimator of and if is convex, then there is at least one subgradient at every point in its domain. The set of all subgradients at a certain point is called the subdifferential, and is written as at point .
The Subgradient Method
Suppose is a convex function with domain . To minimize the subgradient method uses the iteration:
Where is the number of iterations, is the th iterate, is any subgradient at , and is the th step size. Thus, at each iteration of the subgradient method, we take a step in the direction of a negative subgradient. As explained above, when is differentiable, simply reduces to . It is also important to note that the subgradient method is not a descent method in that the new iterate is not always the best iterate. Thus we need some way to keep track of the best solution found so far, i.e. the one with the smallest function value. We can do this by, after each step, setting
and setting if is the best (smallest) point found so far. Thus we have:
which gives the best objective value found in iterations. Since this value is decreasing, it has a limit (which can be ).
Several different step size rules can be used:
- Constant step size: independent of .
- Constant step length: This means that
- Square summable but not summable: These step sizes satisfy
- Nonsummable diminishing: These step sizes satisfy
An important thing to note is that for all four of the rules given here, the step sizes are determined "off-line", or before the method is iterated. Thus the step sizes do not depend on preceding iterations. This "off-line" property of subgradient methods differs from the "on-line" step size rules used for descent methods for differentiable functions where the step sizes do depend on preceding iterations.
There are different results on convergence for the subgradient method depending on the different step size rules applied. For constant step size rules and constant step length rules the subgradient method is guaranteed to converge within some range of the optimal value. Thus:
where is the optimal solution to the problem and is the aforementioned range of convergence. This means that the subgradient method finds a point within of the optimal solution . is number that is a function of the step size parameter , and as decreases the range of convergence also decreases, i.e. the solution of the subgradient method gets closer to with a smaller step size parameter . For the diminishing step size rule and the square summable but not summable rule, the algorithm is guaranteed to converge to the optimal value or When the function is differentiable the subgradient method with constant step size yields convergence to the optimal value, provided the parameter is small enough.
Example: Piecewise linear minimization
Suppose we wanted to minimize the following piecewise linear convex function using the subgradient method:
Since this is a linear programming problem finding a subgradient is simple: given we can find an index for which:
The subgradient in this case is . Thus the iterative update is then:
Where is chosen such to satisfy
In order to apply the subgradient method to this problem all that is needed is some way to calculate and the ability to carry out the iterative update. Even if the problem is dense and very large (where standard linear programming might fail), if there is some efficient way to calculate then the subgradient method is a reasonable choice for algorithm.
Consider a problem with variables and terms and with data and generated from a normal distribution. We will consider all four of the step size rules mentioned above and will plot or the difference between the optimal solution and the subgradient solution as a function of , the nuber of iterations.
For the constant step size rule for several values of the following plot was obtained:
For the constant step length rule for several values of the following plot was obtained:
The above figures reveal a trade-off: a larger step size parameter gives a faster convergence but in the end gives a larger range of suboptimality so it is important to determine an that will converge close to the optimal solution without taking a very large number of iterations.
For the subgradient method using diminishing step size rules, both the nonsummable diminishing step size rule (blue) and the square summable but not summable step size rule (red) are plotted below for convergence: