<< Chapter < Page Chapter >> Page >

Notation

To make our discussion of SVMs easier, we'll first need to introduce a new notation for talking about classification.We will be considering a linear classifier for a binary classification problem with labels y and features x . From now, we'll use y { - 1 , 1 } (instead of { 0 , 1 } ) to denote the class labels. Also, rather than parameterizing our linearclassifier with the vector θ , we will use parameters w , b , and write our classifier as

h w , b ( x ) = g ( w T x + b ) .

Here, g ( z ) = 1 if z 0 , and g ( z ) = - 1 otherwise. This “ w , b ” notation allows us to explicitly treat the intercept term b separately from the other parameters. (We also drop the convention we had previously of letting x 0 = 1 be an extra coordinate in the input feature vector.) Thus, b takes the role of what was previously θ 0 , and w takes the role of [ θ 1 ... θ n ] T .

Note also that, from our definition of g above, our classifier will directly predict either 1 or - 1 (cf. the perceptron algorithm), without first going through the intermediate step of estimating the probability of y being 1 (which was what logistic regression did).

Functional and geometric margins

Let's formalize the notions of the functional and geometric margins. Given a training example ( x ( i ) , y ( i ) ) , we define the functional margin of ( w , b ) with respect to the training example

γ ^ ( i ) = y ( i ) ( w T x + b ) .

Note that if y ( i ) = 1 , then for the functional margin to be large (i.e., for our prediction to be confident and correct), we need w T x + b to be a large positive number. Conversely, if y ( i ) = - 1 , then for the functional margin to be large, we need w T x + b to be a large negative number. Moreover, if y ( i ) ( w T x + b ) > 0 , then our prediction on this example is correct. (Check this yourself.)Hence, a large functional margin represents a confident and a correct prediction.

For a linear classifier with the choice of g given above (taking values in { - 1 , 1 } ), there's one property of the functional margin that makes it not a very good measure of confidence,however. Given our choice of g , we note that if we replace w with 2 w and b with 2 b , then since g ( w T x + b ) = g ( 2 w T x + 2 b ) , this would not change h w , b ( x ) at all. I.e., g , and hence also h w , b ( x ) , depends only on the sign, but not on the magnitude, of w T x + b . However, replacing ( w , b ) with ( 2 w , 2 b ) also results in multiplying our functional margin by a factor of 2. Thus, it seems that by exploiting our freedom to scale w and b , we can make the functional margin arbitrarily large without really changing anything meaningful. Intuitively, it might therefore make senseto impose some sort of normalization condition such as that | | w | | 2 = 1 ; i.e., we might replace ( w , b ) with ( w / | | w | | 2 , b / | | w | | 2 ) , and instead consider the functional margin of ( w / | | w | | 2 , b / | | w | | 2 ) . We'll come back to this later.

Given a training set S = { ( x ( i ) , y ( i ) ) ; i = 1 , ... , m } , we also define the function margin of ( w , b ) with respect to S to be the smallest of the functional margins of the individual training examples. Denotedby γ ^ , this can therefore be written:

γ ^ = min i = 1 , ... , m γ ^ ( i ) .

Next, let's talk about geometric margins . Consider the picture below:

measuring the distance from the line to each type of data point

The decision boundary corresponding to ( w , b ) is shown, along with the vector w . Note that w is orthogonal (at 90 ) to the separating hyperplane. (You should convince yourself that this must be the case.) Consider the point at A, which represents the input x ( i ) of some training example with label y ( i ) = 1 . Its distance to the decision boundary, γ ( i ) , is given by the line segment AB.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask