<< Chapter < Page Chapter >> Page >

And so having estimated all these parameters, when you’re given a new piece of email that you want to classify, you can then compute PFY given X using Bayes rule, right? Same as before because together these parameters gives you a model for PFX given Y and for PFY, and by using Bayes rule, given these two terms, you can compute PFX given Y, and there’s your spam classifier, okay? Turns out we need one more elaboration to this idea, but let me check if there are questions about this so far.

Student: So does this model depend on the number of inputs?

Instructor (Andrew Ng) :What do you mean, number of inputs, the number of features?

Student: No, number of samples.

Instructor (Andrew Ng) :Well, N is the number of training examples, so this given M training examples, this is the formula for the maximum likelihood estimate of the parameters, right? So other questions, does it make sense? Or M is the number of training examples, so when you have M training examples, you plug them into this formula, and that’s how you compute the maximum likelihood estimates.

Student: Is training examples you mean M is the number of emails?

Instructor (Andrew Ng) :Yeah, right. So, right. So it’s, kind of, your training set. I would go through all the email I’ve gotten in the last two months and label them as spam or not spam, and so you have – I don’t know, like, a few hundred emails labeled as spam or not spam, and that will comprise your training sets for X1 and Y1 through XM, YM, where X is one of those vectors representing which words appeared in the email and Y is 0, 1 depending on whether they equal spam or not spam, okay?

Student: So you are saying that this model depends on the number of examples, but the last model doesn’t depend on the models, but your phi is the same for either one.

Instructor (Andrew Ng) :They’re different things, right? There’s the model which is – the modeling assumptions aren’t made very well. I’m assuming that – I’m making the Naive Bayes assumption. So the probabilistic model is an assumption on the joint distribution of X and Y. That’s what the model is, and then I’m given a fixed number of training examples. I’m given M training examples, and then it’s, like, after I’m given the training sets, I’ll then go in to write the maximum likelihood estimate of the parameters, right? So that’s, sort of, maybe we should take that offline for – yeah, ask a question?

Student: Then how would you do this, like, if this [inaudible] didn’t work?

Instructor (Andrew Ng) :Say that again.

Student: How would you do it, say, like the 50,000 words –

Instructor (Andrew Ng) :Oh, okay. How to do this with the 50,000 words, yeah. So it turns out this is, sort of, a very practical question, really. How do I count this list of words? One common way to do this is to actually find some way to count a list of words, like go through all your emails, go through all the – in practice, one common way to count a list of words is to just take all the words that appear in your training set.

That’s one fairly common way to do it, or if that turns out to be too many words, you can take all words that appear at least three times in your training set. So words that you didn’t even see three times in the emails you got in the last two months, you discard. So those are – I was talking about going through a dictionary, which is a nice way of thinking about it, but in practice, you might go through your training set and then just take the union of all the words that appear in it.

Questions & Answers

what is biology
Hajah Reply
the study of living organisms and their interactions with one another and their environments
AI-Robot
what is biology
Victoria Reply
HOW CAN MAN ORGAN FUNCTION
Alfred Reply
the diagram of the digestive system
Assiatu Reply
allimentary cannel
Ogenrwot
How does twins formed
William Reply
They formed in two ways first when one sperm and one egg are splited by mitosis or two sperm and two eggs join together
Oluwatobi
what is genetics
Josephine Reply
Genetics is the study of heredity
Misack
how does twins formed?
Misack
What is manual
Hassan Reply
discuss biological phenomenon and provide pieces of evidence to show that it was responsible for the formation of eukaryotic organelles
Joseph Reply
what is biology
Yousuf Reply
the study of living organisms and their interactions with one another and their environment.
Wine
discuss the biological phenomenon and provide pieces of evidence to show that it was responsible for the formation of eukaryotic organelles in an essay form
Joseph Reply
what is the blood cells
Shaker Reply
list any five characteristics of the blood cells
Shaker
lack electricity and its more savely than electronic microscope because its naturally by using of light
Abdullahi Reply
advantage of electronic microscope is easily and clearly while disadvantage is dangerous because its electronic. advantage of light microscope is savely and naturally by sun while disadvantage is not easily,means its not sharp and not clear
Abdullahi
cell theory state that every organisms composed of one or more cell,cell is the basic unit of life
Abdullahi
is like gone fail us
DENG
cells is the basic structure and functions of all living things
Ramadan
What is classification
ISCONT Reply
is organisms that are similar into groups called tara
Yamosa
in what situation (s) would be the use of a scanning electron microscope be ideal and why?
Kenna Reply
A scanning electron microscope (SEM) is ideal for situations requiring high-resolution imaging of surfaces. It is commonly used in materials science, biology, and geology to examine the topography and composition of samples at a nanoscale level. SEM is particularly useful for studying fine details,
Hilary
cell is the building block of life.
Condoleezza Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask