<< Chapter < Page Chapter >> Page >

We now have:

p ( x 1 , ... , x 50000 | y ) = p ( x 1 | y ) p ( x 2 | y , x 1 ) p ( x 3 | y , x 1 , x 2 ) p ( x 50000 | y , x 1 , ... , x 49999 ) = p ( x 1 | y ) p ( x 2 | y ) p ( x 3 | y ) p ( x 50000 | y ) = i = 1 n p ( x i | y )

The first equality simply follows from the usual properties of probabilities, and the second equality used the NB assumption.We note that even though the Naive Bayes assumption is an extremely strong assumptions, the resulting algorithm works well on many problems.

Our model is parameterized by Φ i | y = 1 = p ( x i = 1 | y = 1 ) , Φ i | y = 0 = p ( x i = 1 | y = 0 ) , and Φ y = p ( y = 1 ) . As usual, given a training set { ( x ( i ) , y ( i ) ) ; i = 1 , ... , m } , we can write down the joint likelihood of the data:

L ( Φ y , Φ j | y = 0 , Φ j | y = 1 ) = i = 1 m p ( x ( i ) , y ( i ) ) .

Maximizing this with respect to Φ y , Φ i | y = 0 and Φ i | y = 1 gives the maximum likelihood estimates:

Φ j | y = 1 = i = 1 m 1 { x j ( i ) = 1 y ( i ) = 1 } i = 1 m 1 { y ( i ) = 1 } Φ j | y = 0 = i = 1 m 1 { x j ( i ) = 1 y ( i ) = 0 } i = 1 m 1 { y ( i ) = 0 } Φ y = i = 1 m 1 { y ( i ) = 1 } m

In the equations above, the “ ” symbol means “and.” The parameters have a very natural interpretation. For instance, Φ j | y = 1 is just the fraction of the spam ( y = 1 ) emails in which word j does appear.

Having fit all these parameters, to make a prediction on a new example with features x , we then simply calculate

p ( y = 1 | x ) = p ( x | y = 1 ) p ( y = 1 ) p ( x ) = i = 1 n p ( x i | y = 1 ) p ( y = 1 ) i = 1 n p ( x i | y = 1 ) p ( y = 1 ) + i = 1 n p ( x i | y = 0 ) p ( y = 0 ) ,

and pick whichever class has the higher posterior probability.

Lastly, we note that while we have developed the Naive Bayes algorithm mainly for the case of problems where the features x i are binary-valued, the generalization to where x i can take values in { 1 , 2 , ... , k i } is straightforward. Here, we would simply model p ( x i | y ) as multinomial rather than as Bernoulli. Indeed,even if some original input attribute (say, the living area of a house,as in our earlier example) were continuous valued, it is quite common to discretize it—that is, turn it into a small set of discrete values—and apply Naive Bayes. For instance, if we use somefeature x i to represent living area, we might discretize the continuous values as follows:

Living area (sq. feet) < 400 400-800 800-1200 1200-1600 > 1600
x i 1 2 3 4 5

Thus, for a house with living area 890 square feet, we would set the value of the corresponding feature x i to 3. We can then apply the Naive Bayes algorithm, and model p ( x i | y ) with a multinomial distribution, as described previously. When the original, continuous-valued attributes are not well-modeled by a multivariate normal distribution, discretizing thefeatures and using Naive Bayes (instead of GDA) will often result in a better classifier.

Laplace smoothing

The Naive Bayes algorithm as we have described it will work fairly well for many problems, but there is a simple change that makes it work much better, especially for text classification. Let's briefly discuss a problem with the algorithm in its current form, and then talk abouthow we can fix it.

Consider spam/email classification, and let's suppose that, after completing CS229 and having done excellent work on the project, you decide around June 2003 to submit thework you did to the NIPS conference for publication. (NIPS is one of the top machine learning conferences, and the deadline for submitting apaper is typically in late June or early July.) Because you end up discussing the conference in your emails, you also start getting messages with the word “nips” in it. But this is your firstNIPS paper, and until this time, you had not previously seen any emails containing the word “nips”; in particular “nips” did not ever appear in your training set of spam/non-spam emails.Assuming that “nips” was the 35000th word in the dictionary, your Naive Bayes spam filter therefore had picked itsmaximum likelihood estimates of the parameters Φ 35000 | y to be

Questions & Answers

if three forces F1.f2 .f3 act at a point on a Cartesian plane in the daigram .....so if the question says write down the x and y components ..... I really don't understand
Syamthanda Reply
hey , can you please explain oxidation reaction & redox ?
Boitumelo Reply
hey , can you please explain oxidation reaction and redox ?
Boitumelo
for grade 12 or grade 11?
Sibulele
the value of V1 and V2
Tumelo Reply
advantages of electrons in a circuit
Rethabile Reply
we're do you find electromagnetism past papers
Ntombifuthi
what a normal force
Tholulwazi Reply
it is the force or component of the force that the surface exert on an object incontact with it and which acts perpendicular to the surface
Sihle
what is physics?
Petrus Reply
what is the half reaction of Potassium and chlorine
Anna Reply
how to calculate coefficient of static friction
Lisa Reply
how to calculate static friction
Lisa
How to calculate a current
Tumelo
how to calculate the magnitude of horizontal component of the applied force
Mogano
How to calculate force
Monambi
a structure of a thermocouple used to measure inner temperature
Anna Reply
a fixed gas of a mass is held at standard pressure temperature of 15 degrees Celsius .Calculate the temperature of the gas in Celsius if the pressure is changed to 2×10 to the power 4
Amahle Reply
How is energy being used in bonding?
Raymond Reply
what is acceleration
Syamthanda Reply
a rate of change in velocity of an object whith respect to time
Khuthadzo
how can we find the moment of torque of a circular object
Kidist
Acceleration is a rate of change in velocity.
Justice
t =r×f
Khuthadzo
how to calculate tension by substitution
Precious Reply
hi
Shongi
hi
Leago
use fnet method. how many obects are being calculated ?
Khuthadzo
khuthadzo hii
Hulisani
how to calculate acceleration and tension force
Lungile Reply
you use Fnet equals ma , newtoms second law formula
Masego
please help me with vectors in two dimensions
Mulaudzi Reply
how to calculate normal force
Mulaudzi
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask