<< Chapter < Page Chapter >> Page >

And you then run exactly the same support vector machine algorithm, only everywhere you see these inner products, you replace them with that, and what you’ve just done is you’ve taken a support vector machine and you’ve taken each of your feature vectors X and you’ve replaced it with implicitly a very high dimensional feature vector.

It turns out that the Galcean kernel corresponds to a feature vector that’s infinite dimensional. Nonetheless, you can run a support vector machine in a finite amount of time, even though you’re working with infinite dimensional feature vectors, because all you ever need to do is compute these things, and you don’t ever need to represent these infinite dimensional feature vectors explicitly.

Why is this a good idea? It turns out – I think I started off talking about support vector machines. I started saying that we wanted to start to develop non-linear learning algorithms. So here’s one useful picture to keep in mind, which is that let’s say your original data – I didn’t mean to draw that slanted. Let’s say you have one-dimensional input data. You just have one feature X and R. What a kernel does is the following. It takes your original input data and maps it to a very high dimensional feature space.

In the case of Galcean kernels, an infinite dimensional feature space – for pedagogical reasons, I’ll draw two dimensions here. So say [inaudible] very high dimensional feature space where – like so. So it takes all your data in R1 and maps it to R infinity, and then you run a support vector machine in this infinite dimensional space and also exponentially high dimensional space, and you’ll find the optimal margin classifier – in other words, the classifier that separates your data in this very high dimensional space with the largest possible geometric margin.

In this example that you just drew anyway, whereas your data was not linearly separable in your originally one dimensional space, when you map it to this much higher dimensional space, it becomes linearly separable, so you can use your linear classifier to [inaudible] which data is not really separable in your original space. This is what support vector machines output nonlinear decision boundaries and in the entire process, all you ever need to do is solve complex optimization problems. Questions about any of this?

Student: [Inaudible] sigmer?

Instructor (Andrew Ng) :Yeah, so sigmer is – let’s see. Well, I was going to talk about [inaudible] later. One way to choose sigmer is save aside a small amount of your data and try different values of sigmer and train an SVM using, say, two thirds of your data. Try different values of sigmer, then see what works best on a separate hold out cross validation set – on a separate set that you’re testing.

Something about learning algorithms we talked about – locally [inaudible] linear aggressions [inaudible]bandwidth parameter, so there are a number of parameters to some of these algorithms that you can choose IDs by saving aside some data to test on. I’ll talk more about model selection [inaudible] explicitly. Are there other questions?

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask