<< Chapter < Page | Chapter >> Page > |
And it turns out, generative learning algorithms often do surprisingly well even when these modeling assumptions are not met, but one other tradeoff is that by making stronger assumptions about the data, Gaussian discriminant analysis often needs less data in order to fit, like, an okay model, even if there’s less training data.
Whereas, in contrast, logistic regression by making less assumption is more robust to your modeling assumptions because you’re making a weaker assumption; you’re making less assumptions, but sometimes it takes a slightly larger training set to fit than Gaussian discriminant analysis. Question?
Student: In order to meet any assumption about the number [inaudible], plus here we assume that PFY = 1, equal two number of. [Inaudible]. Is true when the number of samples is marginal?
Instructor (Andrew Ng) :Okay. So let’s see. So there’s a question of is this true – what was that? Let me translate that differently. So the marving assumptions are made independently of the size of your training set, right? So, like, in least/great regression – well, in all of these models I’m assuming that these are random variables flowing from some distribution, and then, finally, I’m giving a single training set and that as for the parameters of the distribution, right?
Student: So what’s the probability of Y = 1?
Instructor (Andrew Ng) :Probability of Y + 1?
Student: Yeah, you used the –
Instructor (Andrew Ng) :Sort of, this like – back to the philosophy of mass molecular estimation, right? I’m assuming that they’re PFY is equal to phi to the Y, Y - phi to the Y or Y - Y. So I’m assuming that there’s some true value of Y generating all my data, and then – well, when I write this, I guess, maybe what I should write isn’t – so when I write this, I guess there are already two values of phi. One is there’s a true underlying value of phi that guards the use to generate the data, and then there’s the maximum likelihood estimate of the value of phi, and so when I was writing those formulas earlier, those formulas are writing for phi, and mew0, and mew1 were really the maximum likelihood estimates for phi, mew0, and mew1, and that’s different from the true underlying values of phi, mew0, and mew1, but –
Student: [Off mic].
Instructor (Andrew Ng) :Yeah, right. So maximum likelihood estimate comes from the data, and there’s some, sort of, true underlying value of phi that I’m trying to estimate, and my maximum likelihood estimate is my attempt to estimate the true value, but, you know, by notational and convention often are just right as that as well without bothering to distinguish between the maximum likelihood value and the true underlying value that I’m assuming is out there, and that I’m only hoping to estimate.
Actually, yeah, so for the sample of questions like these about maximum likelihood and so on, I hope to tease to the Friday discussion section as a good time to ask questions about, sort of, probabilistic definitions like these as well. Are there any other questions? No, great. Okay.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?