2.2 Machine learning lecture 3 course notes (Page 7/12)

Machine learning

Page 7 / 12

But from Equation [link] , the last term must be zero, so we obtain

L (w, b, α) = \sum_{i = 1}^{m} α_{i} - \frac{1}{2} \sum_{i, j = 1}^{m} y^{(i)} y^{(j)} α_{i} α_{j} {(x^{(i)})}^{T} x^{(j)} .

Recall that we got to the equation above by minimizing $L$ with respect to $w$ and $b$ . Putting this together with the constraints $α_{i} \geq 0$ (that we always had) and the constraint [link] , we obtain the following dual optimization problem:

\begin{matrix} {max}_{α} & W (α) = \sum_{i = 1}^{m} α_{i} - \frac{1}{2} \sum_{i, j = 1}^{m} y^{(i)} y^{(j)} α_{i} α_{j} ⟨ x^{(i)}, x^{(j)} ⟩ . \\ s.t. & α_{i} \geq 0, i = 1, ..., m \\ \sum_{i = 1}^{m} α_{i} y^{(i)} = 0, \end{matrix}

You should also be able to verify that the conditions required for $p^{*} = d^{*}$ and the KKT conditions ( [link] ) to hold are indeed satisfied in our optimization problem. Hence, we can solve the dual in lieu of solving the primal problem. Specifically, in the dual problemabove, we have a maximization problem in which the parameters are the $α_{i}$ 's. We'll talk later about the specific algorithm that we're going to use to solve the dual problem, but if we areindeed able to solve it (i.e., find the $α$ 's that maximize $W (α)$ subject to the constraints), then we can use Equation [link] to go back and find the optimal $w$ 's as a function of the $α$ 's. Having found $w^{*}$ , by considering the primal problem, it is also straightforward to find the optimal value forthe intercept term $b$ as

b^{*} = - \frac{{max}_{i : y^{(i)} = - 1} {w^{*}}^{T} x^{(i)} + {min}_{i : y^{(i)} = 1} {w^{*}}^{T} x^{(i)}}{2} .

(Check for yourself that this is correct.)

Before moving on, let's also take a more careful look at Equation [link] , which gives the optimal value of $w$ in terms of (the optimal value of) $α$ . Suppose we've fit our model's parametersto a training set, and now wish to make a prediction at a new point input $x$ . We would then calculate $w^{T} x + b$ , and predict $y = 1$ if and only if this quantity is bigger than zero. But using [link] , this quantity can also be written:

\begin{matrix} w^{T} x + b & = & {(\sum_{i = 1}^{m}, α_{i}, y^{(i)}, x^{(i)})}^{T} x + b \\ = & \sum_{i = 1}^{m} α_{i} y^{(i)} ⟨ x^{(i)}, x ⟩ + b . \end{matrix}

Hence, if we've found the $α_{i}$ 's, in order to make a prediction, we have to calculate a quantity that depends only on the inner product between $x$ and the points in the training set. Moreover, we saw earlier that the $α_{i}$ 's will all be zero except for the support vectors. Thus, many of the terms in the sum above will be zero, and wereally need to find only the inner products between $x$ and the support vectors (of which there is often only a small number) in order calculate [link] and make our prediction.

By examining the dual form of the optimization problem, we gained significant insight into the structure of the problem,and were also able to write the entire algorithm in terms of only inner products between input feature vectors. In the next section, we will exploit this property toapply the kernels to our classification problem. The resulting algorithm, support vector machines , will be able to efficiently learn in very high dimensional spaces.

Kernels

Back in our discussion of linear regression, we had a problem in which the input $x$ was the living area of a house, and we considered performing regression using the features $x$ , $x^{2}$ and $x^{3}$ (say) to obtain a cubic function. To distinguish between these two sets of variables, we'llcall the “original” input value the input attributes of a problem (in this case, $x$ , the living area). When that is mapped to some new set of quantities that are then passed to the learning algorithm, we'll call thosenew quantities the input features . (Unfortunately, different authors use different terms to describe these two things, but we'll try to use thisterminology consistently in these notes.) We will also let $Φ$ denote the feature mapping , which maps from the attributes to the features. For instance, in our example, we had

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask

	11 Biology 11 Meiosis Sexual Reproduction MCQ By OpenStax Start Quiz
	Cardiac Electrophysiology Basic 2 By Mistry Bhavesh Start Test
	12 Dr Dowers Endocrinology By Brooke Delaney Start Exam
	2 BOD - Neuropathology By Brooke Delaney Start Exam
	Biology Exam Final By Savannah Parrish Start Exam
	Cultural Anthropology Definition By Richley Crapo Start Assignment
	20 Hemostasis Dr. Olver By Brooke Delaney Start Exam
	2 Arts Society: Theater 2 By Jonathan Long Start Quiz
	Autonomic Nervous System By Marriyam Rana Start Quiz
	1 Psychology Concept Test By John Gabrieli Start Test