<< Chapter < Page Chapter >> Page >

How can we find the value of γ ( i ) ? Well, w / | | w | | is a unit-length vector pointing in the same direction as w . Since A represents x ( i ) , we therefore find that the point B is given by x ( i ) - γ ( i ) · w / | | w | | . But this point lies on the decision boundary, and all points x on the decision boundary satisfy the equation w T x + b = 0 . Hence,

w T x ( i ) - γ ( i ) w | | w | | + b = 0 .

Solving for γ ( i ) yields

γ ( i ) = w T x ( i ) + b | | w | | = w | | w | | T x ( i ) + b | | w | | .

This was worked out for the case of a positive training example at A in the figure, where being on the “positive” side of the decision boundary is good. More generally, we define the geometricmargin of ( w , b ) with respect to a training example ( x ( i ) , y ( i ) ) to be

γ ( i ) = y ( i ) w | | w | | T x ( i ) + b | | w | | .

Note that if | | w | | = 1 , then the functional margin equals the geometric margin—this thus gives us a way of relating these two different notions of margin. Also, the geometric marginis invariant to rescaling of the parameters; i.e., if we replace w with 2 w and b with 2 b , then the geometric margin does not change.This will in fact come in handy later. Specifically, because of this invariance to the scaling of the parameters, when trying to fit w and b to training data, we can impose an arbitrary scaling constraint on w without changing anything important; for instance, we can demand that | | w | | = 1 , or | w 1 | = 5 , or | w 1 + b | + | w 2 | = 2 , and any of these can be satisfied simply by rescaling w and b .

Finally, given a training set S = { ( x ( i ) , y ( i ) ) ; i = 1 , ... , m } , we also define the geometric margin of ( w , b ) with respect to S to be the smallest of the geometric margins on the individual training examples:

γ = min i = 1 , ... , m γ ( i ) .

The optimal margin classifier

Given a training set, it seems from our previous discussion that a natural desideratum is to try to find a decision boundary that maximizes the (geometric) margin, since thiswould reflect a very confident set of predictions on the training set and a good “fit” to the training data. Specifically, this will result in a classifier that separates the positiveand the negative training examples with a “gap” (geometric margin).

For now, we will assume that we are given a training set that is linearly separable; i.e., that it is possible to separate the positive and negative examples using some separating hyperplane.How we we find the one that achieves the maximum geometric margin? We can pose the following optimization problem:

max γ , w , b γ s.t. y ( i ) ( w T x ( i ) + b ) γ , i = 1 , ... , m | | w | | = 1 .

I.e., we want to maximize γ , subject to each training example having functional margin at least γ . The | | w | | = 1 constraint moreover ensures that the functional margin equals to the geometric margin, so we are also guaranteed that all the geometric margins are at least γ . Thus, solving this problem will result in ( w , b ) with the largest possible geometric margin with respect to the training set.

If we could solve the optimization problem above, we'd be done. But the “ | | w | | = 1 ” constraint is a nasty (non-convex) one, and this problem certainly isn't in any format that we can plug into standard optimizationsoftware to solve. So, let's try transforming the problem into a nicer one. Consider:

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask