<< Chapter < Page Chapter >> Page >

Consider the following, which we'll call the primal optimization problem:

min w f ( w ) s.t. g i ( w ) 0 , i = 1 , ... , k h i ( w ) = 0 , i = 1 , ... , l .

To solve it, we start by defining the generalized Lagrangian

L ( w , α , β ) = f ( w ) + i = 1 k α i g i ( w ) + i = 1 l β i h i ( w ) .

Here, the α i 's and β i 's are the Lagrange multipliers. Consider the quantity

θ P ( w ) = max α , β : α i 0 L ( w , α , β ) .

Here, the “ P ” subscript stands for “primal.” Let some w be given. If w violates any of the primal constraints (i.e., if either g i ( w ) > 0 or h i ( w ) 0 for some i ), then you should be able to verify that

θ P ( w ) = max α , β : α i 0 f ( w ) + i = 1 k α i g i ( w ) + i = 1 l β i h i ( w ) = .

Conversely, if the constraints are indeed satisfied for a particular value of w , then θ P ( w ) = f ( w ) . Hence,

θ P ( w ) = f ( w ) if w satisfies primal constraints otherwise .

Thus, θ P takes the same value as the objective in our problem for all values of w that satisfies the primal constraints, and is positive infinity if the constraints are violated. Hence, if weconsider the minimization problem

min w θ P ( w ) = min w max α , β : α i 0 L ( w , α , β ) ,

we see that it is the same problem (i.e., and has the same solutions as) our original, primal problem. For later use, we also define the optimal value of the objective tobe p * = min w θ P ( w ) ; we call this the value of the primal problem.

Now, let's look at a slightly different problem. We define

θ D ( α , β ) = min w L ( w , α , β ) .

Here, the “ D ” subscript stands for “dual.” Note also that whereas in the definition of θ P we were optimizing (maximizing) with respect to α , β , here are are minimizing with respect to w .

We can now pose the dual optimization problem:

max α , β : α i 0 θ D ( α , β ) = max α , β : α i 0 min w L ( w , α , β ) .

This is exactly the same as our primal problem shown above, except that the order of the “ max ” and the “ min ” are now exchanged. We also define the optimal value of the dual problem's objective to be d * = max α , β : α i 0 θ D ( w ) .

How are the primal and the dual problems related? It can easily be shown that

d * = max α , β : α i 0 min w L ( w , α , β ) min w max α , β : α i 0 L ( w , α , β ) = p * .

(You should convince yourself of this; this follows from the “ max min ” of a function always being less than or equal to the “ min max .”) However, under certain conditions, we will have

d * = p * ,

so that we can solve the dual problem in lieu of the primal problem. Let's see what these conditions are.

Suppose f and the g i 's are convex, When f has a Hessian, then it is convex if and only if the Hessian is positive semi-definite. For instance, f ( w ) = w T w is convex; similarly, all linear (and affine) functions are also convex.(A function f can also be convex without being differentiable, but we won't need those more general definitions of convexity here.) and the h i 's are affine. I.e., there exists a i , b i , so that h i ( w ) = a i T w + b i . “Affine” means the same thing as linear, except that we also allow the extra intercept term b i . Suppose further that the constraints g i are (strictly) feasible; this means that there exists some w so that g i ( w ) < 0 for all i .

Under our above assumptions, there must exist w * , α * , β * so that w * is the solution to the primal problem, α * , β * are the solution to the dual problem, and moreover p * = d * = L ( w * , α * , β * ) . Moreover, w * , α * and β * satisfy the Karush-Kuhn-Tucker (KKT) conditions , which are as follows:

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask