<< Chapter < Page Chapter >> Page >

And if you fill in a bunch more points, you get a curve like that, and then you can keep going. Let’s say for a point like that, you can ask what’s the probability of X being one? Well, if it’s that far out, then clearly, it belongs to this rightmost Gaussian, and so the probability of Y being a one would be very high; it would be almost one, okay?

And so you can repeat this exercise for a bunch of points. All right, compute PFY equals one given X for a bunch of points, and if you connect up these points, you find that the curve you get [Pause] plotted takes a form of sigmoid function, okay?

So, in other words, when you make the assumptions under the Gaussian discriminant analysis model, that PFX given Y is Gaussian, when you go back and compute what PFY given X is, you actually get back exactly the same sigmoid function that we’re using which is the progression, okay?

But it turns out the key difference is that Gaussian discriminant analysis will end up choosing a different position and a steepness of the sigmoid than would logistic regression. Is there a question?

Student: I’m just wondering, the Gaussian of PFY [inaudible] you do?

Instructor (Andrew Ng) :No, let’s see. The Gaussian – so this Gaussian is PFX given Y = 1, and this Gaussian is PFX given Y = 0; does that make sense? Anything else?

Student: Okay.

Instructor (Andrew Ng) :Yeah?

Student: When you drawing all the dots, how did you decide what Y given PFX was?

Instructor (Andrew Ng) :What – say that again.

Student: I’m sorry. Could you go over how you figured out where to draw each dot?

Instructor (Andrew Ng) :Let’s see, okay. So the computation is as follows, right? The steps are I have the training sets, and so given my training set, I’m going to fit a Gaussian discriminant analysis model to it, and what that means is I’ll build a model for PFX given Y = 1. I’ll build a model for PFX given Y = 0, and I’ll also fit a Bernoulli distribution to PFY, okay?

So, in other words, given my training set, I’ll fit PFX given Y and PFY to my data, and now I’ve chosen my parameters of find mew0, mew1, and the sigma, okay? Then this is the process I went through to plot all these dots, right? It’s just I pick a point in the X axis, and then I compute PFY given X for that value of X, and PFY given 1 conditioned on X will be some value between zero and one. It’ll be some real number, and whatever that real number is, I then plot it on the vertical axis, okay?

And the way I compute PFY = 1 conditioned on X is I would use these quantities. I would use PFX given Y and PFY, and, sort of, plug them into Bayes rule, and that allows me to compute PFY given X from these three quantities; does that make sense?

Student: Yeah.

Instructor (Andrew Ng) :Was there something more that –

Student: And how did you model PFX; is that –

Instructor (Andrew Ng) :Oh, okay. Yeah, so – well, got this right here. So PFX can be written as, right, so PFX given Y = 0 × PFY = 0 + PFX given Y = 1, PFY = 1, right? And so each of these terms, PFX given Y and PFY, these are terms I can get out of, directly, from my Gaussian discriminant analysis model. Each of these terms is something that my model gives me directly, so plugged in as the denominator, and by doing that, that’s how I compute PFY = 1 given X, make sense?

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask