<< Chapter < Page | Chapter >> Page > |
Consider the following, which we'll call the primal optimization problem:
To solve it, we start by defining the generalized Lagrangian
Here, the 's and 's are the Lagrange multipliers. Consider the quantity
Here, the “ ” subscript stands for “primal.” Let some be given. If violates any of the primal constraints (i.e., if either or for some ), then you should be able to verify that
Conversely, if the constraints are indeed satisfied for a particular value of , then . Hence,
Thus, takes the same value as the objective in our problem for all values of that satisfies the primal constraints, and is positive infinity if the constraints are violated. Hence, if weconsider the minimization problem
we see that it is the same problem (i.e., and has the same solutions as) our original, primal problem. For later use, we also define the optimal value of the objective tobe ; we call this the value of the primal problem.
Now, let's look at a slightly different problem. We define
Here, the “ ” subscript stands for “dual.” Note also that whereas in the definition of we were optimizing (maximizing) with respect to , here are are minimizing with respect to .
We can now pose the dual optimization problem:
This is exactly the same as our primal problem shown above, except that the order of the “ ” and the “ ” are now exchanged. We also define the optimal value of the dual problem's objective to be .
How are the primal and the dual problems related? It can easily be shown that
(You should convince yourself of this; this follows from the “ ” of a function always being less than or equal to the “ .”) However, under certain conditions, we will have
so that we can solve the dual problem in lieu of the primal problem. Let's see what these conditions are.
Suppose and the 's are convex, When has a Hessian, then it is convex if and only if the Hessian is positive semi-definite. For instance, is convex; similarly, all linear (and affine) functions are also convex.(A function can also be convex without being differentiable, but we won't need those more general definitions of convexity here.) and the 's are affine. I.e., there exists , , so that . “Affine” means the same thing as linear, except that we also allow the extra intercept term . Suppose further that the constraints are (strictly) feasible; this means that there exists some so that for all .
Under our above assumptions, there must exist so that is the solution to the primal problem, are the solution to the dual problem, and moreover . Moreover, and satisfy the Karush-Kuhn-Tucker (KKT) conditions , which are as follows:
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?