2.2 Machine learning lecture 3 course notes (Page 5/12)

Machine learning

Page 5 / 12

Consider the following, which we'll call the primal optimization problem:

\begin{matrix} {min}_{w} & f (w) \\ s.t. & g_{i} (w) \leq 0, i = 1, ..., k \\ h_{i} (w) = 0, i = 1, ..., l . \end{matrix}

To solve it, we start by defining the generalized Lagrangian

L (w, α, β) = f (w) + \sum_{i = 1}^{k} α_{i} g_{i} (w) + \sum_{i = 1}^{l} β_{i} h_{i} (w) .

Here, the $α_{i}$ 's and $β_{i}$ 's are the Lagrange multipliers. Consider the quantity

θ_{P} (w) = max_{α, β : α_{i} \geq 0} L (w, α, β) .

Here, the “ $P$ ” subscript stands for “primal.” Let some $w$ be given. If $w$ violates any of the primal constraints (i.e., if either $g_{i} (w) > 0$ or $h_{i} (w) \neq 0$ for some $i$ ), then you should be able to verify that

\begin{matrix} θ_{P} (w) & = & max_{α, β : α_{i} \geq 0} f (w) + \sum_{i = 1}^{k} α_{i} g_{i} (w) + \sum_{i = 1}^{l} β_{i} h_{i} (w) \\ = & \infty . \end{matrix}

Conversely, if the constraints are indeed satisfied for a particular value of $w$ , then $θ_{P} (w) = f (w)$ . Hence,

θ_{P} (w) = \{\begin{matrix} f (w) & if w satisfies primal constraints \\ \infty & otherwise . \end{matrix})

Thus, $θ_{P}$ takes the same value as the objective in our problem for all values of $w$ that satisfies the primal constraints, and is positive infinity if the constraints are violated. Hence, if weconsider the minimization problem

min_{w} θ_{P} (w) = min_{w} max_{α, β : α_{i} \geq 0} L (w, α, β),

we see that it is the same problem (i.e., and has the same solutions as) our original, primal problem. For later use, we also define the optimal value of the objective tobe $p^{*} = {min}_{w} θ_{P} (w)$ ; we call this the value of the primal problem.

Now, let's look at a slightly different problem. We define

θ_{D} (α, β) = min_{w} L (w, α, β) .

Here, the “ $D$ ” subscript stands for “dual.” Note also that whereas in the definition of $θ_{P}$ we were optimizing (maximizing) with respect to $α, β$ , here are are minimizing with respect to $w$ .

We can now pose the dual optimization problem:

max_{α, β : α_{i} \geq 0} θ_{D} (α, β) = max_{α, β : α_{i} \geq 0} min_{w} L (w, α, β) .

This is exactly the same as our primal problem shown above, except that the order of the “ $max$ ” and the “ $min$ ” are now exchanged. We also define the optimal value of the dual problem's objective to be $d^{*} = {max}_{α, β : α_{i} \geq 0} θ_{D} (w)$ .

How are the primal and the dual problems related? It can easily be shown that

d^{*} = max_{α, β : α_{i} \geq 0} min_{w} L (w, α, β) \leq min_{w} max_{α, β : α_{i} \geq 0} L (w, α, β) = p^{*} .

(You should convince yourself of this; this follows from the “ $max min$ ” of a function always being less than or equal to the “ $min max$ .”) However, under certain conditions, we will have

d^{*} = p^{*},

so that we can solve the dual problem in lieu of the primal problem. Let's see what these conditions are.

Suppose $f$ and the $g_{i}$ 's are convex, When $f$ has a Hessian, then it is convex if and only if the Hessian is positive semi-definite. For instance, $f (w) = w^{T} w$ is convex; similarly, all linear (and affine) functions are also convex.(A function $f$ can also be convex without being differentiable, but we won't need those more general definitions of convexity here.) and the $h_{i}$ 's are affine. I.e., there exists $a_{i}$ , $b_{i}$ , so that $h_{i} (w) = a_{i}^{T} w + b_{i}$ . “Affine” means the same thing as linear, except that we also allow the extra intercept term $b_{i}$ . Suppose further that the constraints $g_{i}$ are (strictly) feasible; this means that there exists some $w$ so that $g_{i} (w) < 0$ for all $i$ .

Under our above assumptions, there must exist $w^{*}, α^{*}, β^{*}$ so that $w^{*}$ is the solution to the primal problem, $α^{*}, β^{*}$ are the solution to the dual problem, and moreover $p^{*} = d^{*} = L (w^{*}, α^{*}, β^{*})$ . Moreover, $w^{*}, α^{*}$ and $β^{*}$ satisfy the Karush-Kuhn-Tucker (KKT) conditions , which are as follows:

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask

	4 AP 04 Tissue Level of Organization Essay By OpenStax Start Flashcards
	23 AP Key Terms 23 The Digestive System By OpenStax Start Key Terms
	11 Clin Sci I - GI Twedt quiz 2 By Brooke Delaney Start Exam
	Are you a 5sos fan? By Rylee Minllic Start Quiz
	Does your crush like you? By Hoy Wen Start Quiz
	4 Arts Society: Theater 4 By Jonathan Long Start Quiz
©flickr: eLife	Mycobacterial Infections (Mandell 8th Ch 251-254) By Cath Yu Start Quiz
©flickr: brett	Celine Dion Quiz By JavaChamp Team Start Quiz
	24 Biology 24 Fungi MCQ By OpenStax Start Quiz
	Real Estate Finance By Mldelatte Start Quiz