2.1 Machine learning lecture 2 course notes (Page 5/6)

Machine learning Page 5 / 6

We now have:

\begin{matrix} p (x_{1}, ..., x_{50000} | y) & = & p (x_{1} | y) p (x_{2} | y, x_{1}) p (x_{3} | y, x_{1}, x_{2}) \dots p (x_{50000} | y, x_{1}, ..., x_{49999}) \\ = & p (x_{1} | y) p (x_{2} | y) p (x_{3} | y) \dots p (x_{50000} | y) \\ = & \prod_{i = 1}^{n} p (x_{i} | y) \end{matrix}

The first equality simply follows from the usual properties of probabilities, and the second equality used the NB assumption.We note that even though the Naive Bayes assumption is an extremely strong assumptions, the resulting algorithm works well on many problems.

Our model is parameterized by $Φ_{i | y = 1} = p (x_{i} = 1 | y = 1)$ , $Φ_{i | y = 0} = p (x_{i} = 1 | y = 0)$ , and $Φ_{y} = p (y = 1)$ . As usual, given a training set ${(x^{(i)}, y^{(i)}); i = 1, ..., m}$ , we can write down the joint likelihood of the data:

L (Φ_{y}, Φ_{j | y = 0}, Φ_{j | y = 1}) = \prod_{i = 1}^{m} p (x^{(i)}, y^{(i)}) .

Maximizing this with respect to $Φ_{y}, Φ_{i | y = 0}$ and $Φ_{i | y = 1}$ gives the maximum likelihood estimates:

\begin{matrix} Φ_{j | y = 1} & = & \frac{\sum_{i = 1}^{m} 1 {x_{j}^{(i)} = 1 \land y^{(i)} = 1}}{\sum_{i = 1}^{m} 1 {y^{(i)} = 1}} \\ Φ_{j | y = 0} & = & \frac{\sum_{i = 1}^{m} 1 {x_{j}^{(i)} = 1 \land y^{(i)} = 0}}{\sum_{i = 1}^{m} 1 {y^{(i)} = 0}} \\ Φ_{y} & = & \frac{\sum_{i = 1}^{m} 1 {y^{(i)} = 1}}{m} \end{matrix}

In the equations above, the “ $\land$ ” symbol means “and.” The parameters have a very natural interpretation. For instance, $Φ_{j | y = 1}$ is just the fraction of the spam ( $y = 1$ ) emails in which word $j$ does appear.

Having fit all these parameters, to make a prediction on a new example with features $x$ , we then simply calculate

\begin{matrix} p (y = 1 | x) & = & \frac{p (x | y = 1) p (y = 1)}{p (x)} \\ = & \frac{(\prod_{i = 1}^{n}, p, (x_{i} | y = 1)) p (y = 1)}{(\prod_{i = 1}^{n}, p, (x_{i} | y = 1)) p (y = 1) + (\prod_{i = 1}^{n}, p, (x_{i} | y = 0)) p (y = 0)}, \end{matrix}

and pick whichever class has the higher posterior probability.

Lastly, we note that while we have developed the Naive Bayes algorithm mainly for the case of problems where the features $x_{i}$ are binary-valued, the generalization to where $x_{i}$ can take values in ${1, 2, ..., k_{i}}$ is straightforward. Here, we would simply model $p (x_{i} | y)$ as multinomial rather than as Bernoulli. Indeed,even if some original input attribute (say, the living area of a house,as in our earlier example) were continuous valued, it is quite common to discretize it—that is, turn it into a small set of discrete values—and apply Naive Bayes. For instance, if we use somefeature $x_{i}$ to represent living area, we might discretize the continuous values as follows:

Living area (sq. feet)	$<$ 400	400-800	800-1200	1200-1600	$>$ 1600
$x_{i}$	1	2	3	4	5

Thus, for a house with living area 890 square feet, we would set the value of the corresponding feature $x_{i}$ to 3. We can then apply the Naive Bayes algorithm, and model $p (x_{i} | y)$ with a multinomial distribution, as described previously. When the original, continuous-valued attributes are not well-modeled by a multivariate normal distribution, discretizing thefeatures and using Naive Bayes (instead of GDA) will often result in a better classifier.

Laplace smoothing

The Naive Bayes algorithm as we have described it will work fairly well for many problems, but there is a simple change that makes it work much better, especially for text classification. Let's briefly discuss a problem with the algorithm in its current form, and then talk abouthow we can fix it.

Consider spam/email classification, and let's suppose that, after completing CS229 and having done excellent work on the project, you decide around June 2003 to submit thework you did to the NIPS conference for publication. (NIPS is one of the top machine learning conferences, and the deadline for submitting apaper is typically in late June or early July.) Because you end up discussing the conference in your emails, you also start getting messages with the word “nips” in it. But this is your firstNIPS paper, and until this time, you had not previously seen any emails containing the word “nips”; in particular “nips” did not ever appear in your training set of spam/non-spam emails.Assuming that “nips” was the 35000th word in the dictionary, your Naive Bayes spam filter therefore had picked itsmaximum likelihood estimates of the parameters $Φ_{35000 | y}$ to be

Questions & Answers

if three forces F1.f2 .f3 act at a point on a Cartesian plane in the daigram .....so if the question says write down the x and y components ..... I really don't understand

Syamthanda Reply

hey , can you please explain oxidation reaction & redox ?

Boitumelo Reply

hey , can you please explain oxidation reaction and redox ?

Boitumelo

for grade 12 or grade 11?

Sibulele

the value of V1 and V2

Tumelo Reply

advantages of electrons in a circuit

Rethabile Reply

we're do you find electromagnetism past papers

Ntombifuthi

what a normal force

Tholulwazi Reply

it is the force or component of the force that the surface exert on an object incontact with it and which acts perpendicular to the surface

Sihle

what is physics?

Petrus Reply

what is the half reaction of Potassium and chlorine

Anna Reply

how to calculate coefficient of static friction

Lisa Reply

how to calculate static friction

Lisa

How to calculate a current

Tumelo

how to calculate the magnitude of horizontal component of the applied force

Mogano

How to calculate force

Monambi

a structure of a thermocouple used to measure inner temperature

Anna Reply

a fixed gas of a mass is held at standard pressure temperature of 15 degrees Celsius .Calculate the temperature of the gas in Celsius if the pressure is changed to 2×10 to the power 4

Amahle Reply

How is energy being used in bonding?

Raymond Reply

what is acceleration

Syamthanda Reply

a rate of change in velocity of an object whith respect to time

Khuthadzo

how can we find the moment of torque of a circular object

Kidist

Acceleration is a rate of change in velocity.

Justice

t =r×f

Khuthadzo

how to calculate tension by substitution

Precious Reply

Shongi

Leago

use fnet method. how many obects are being calculated ?

Khuthadzo

khuthadzo hii

Hulisani

how to calculate acceleration and tension force

Lungile Reply

you use Fnet equals ma , newtoms second law formula

Masego

please help me with vectors in two dimensions

Mulaudzi Reply

how to calculate normal force

Mulaudzi

Got questions? Join the online conversation and get instant answers!

Jobilize.com Reply

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask

	2 Microeconomics 02 Choice in a World of Scarcity By OpenStax Start Flashcards
	2 Understanding Societies 2 By Jessica Collett Start Exam
©flickr: Derek	Treatment Of Psychological Disorders By Michael Nelson Start Quiz
	5 Microeconomics 05 Elasticity By OpenStax Start Flashcards
	19 AP Key Terms 19 Cardiovascular System Heart By OpenStax Start Key Terms
	8 AP Key Terms 08 The Appendicular Skeleton By OpenStax Start Key Terms
	1 Business Law MCQ 1 By Maureen Miller Start Exam
	Theater History Final - Review By Cameron Casey Start Exam
	9 Sociology 09 Social Stratification in the US MCQ By OpenStax Start Quiz
	12 AP 12 Nervous System Essay By OpenStax Start Flashcards