<< Chapter < Page | Chapter >> Page > |
So the maximum likelihood estimate for phi would be Sum over I, YI ÷ M, or written alternatively as Sum over – all your training examples of indicator YI = 1 ÷ M, okay? In other words, maximum likelihood estimate for a newly parameter phi is just the faction of training examples with label one, with Y equals 1. Maximum likelihood estimate for mew0 is this, okay? You should stare at this for a second and see if it makes sense.
Actually, I’ll just write on the next one for mew1 while you do that. Okay? So what this is is what the denominator is sum of your training sets indicated YI = 0. So for every training example for which YI = 0, this will increment the count by one, all right?
So the denominator is just the number of examples with label zero, all right? And then the numerator will be, let’s see, Sum from I = 1 for M, or every time YI is equal to 0, this will be a one, and otherwise, this thing will be zero, and so this indicator function means that you’re including only the times for which YI is equal to one – only the turns which Y is equal to zero because for all the times where YI is equal to one, this sum and will be equal to zero, and then you multiply that by XI, and so the numerator is really the sum of XI’s corresponding to examples where the class labels were zero, okay? Raise your hand if this makes sense. Okay, cool.
So just to say this fancifully, this just means look for your training set, find all the examples for which Y = 0, and take the average of the value of X for all your examples which Y = 0. So take all your negative fitting examples and average the values for X and that’s mew0, okay?
If this notation is still a little bit cryptic – if you’re still not sure why this equation translates into what I just said, do go home and stare at it for a while until it just makes sense. This is, sort of, no surprise. It just says to estimate the mean for the negative examples, take all your negative examples, and average them. So no surprise, but this is a useful practice to indicate a notation.
[Inaudible] divide the maximum likelihood estimate for sigma. I won’t do that. You can read that in the notes yourself. And so having fit the parameters find mew0, mew1, and sigma to your data, well, you now need to make a prediction. You know, when you’re given a new value of X, when you’re given a new cancer, you need to predict whether it’s malignant or benign.
Your prediction is then going to be, let’s say, the most likely value of Y given X. I should write semicolon the parameters there. I’ll just give that – which is the [inaudible] of a Y by Bayes rule, all right? And that is, in turn, just that because the denominator PFX doesn’t depend on Y, and if PFY is uniform.
In other words, if each of your constants is equally likely, so if PFY takes the same value for all values of Y, then this is just arc X over Y, PFX given Y, okay?
This happens sometimes, maybe not very often, so usually you end up using this formula where you compute PFX given Y and PFY using your model, okay?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?