2.1 Machine learning lecture 2 course notes (Page 2/6)

Machine learning Page 2 / 6

p (x; μ, Σ) = \frac{1}{{(2 π)}^{n / 2} {| Σ |}^{1 / 2}} exp (- \frac{1}{2} {(x - μ)}^{T} Σ^{- 1} (x - μ)) .

In the equation above, “ $| Σ |$ ” denotes the determinant of the matrix $Σ$ .

For a random variable $X$ distributed $N (μ, Σ)$ , the mean is (unsurprisingly) given by $μ$ :

E [X] = \int_{x} x p (x; μ, Σ) d x = μ

The covariance of a vector-valued random variable $Z$ is defined as $Cov (Z) = E [(Z - E [Z]) {(Z - E [Z])}^{T}]$ . This generalizes the notion of the variance of a real-valued random variable. The covariance canalso be defined as $Cov (Z) = E [Z Z^{T}] - (E [Z]) {(E [Z])}^{T}$ . (You should be able to prove to yourself that these two definitions are equivalent.)If $X \sim N (μ, Σ)$ , then

Cov (X) = Σ .

Here're some examples of what the density of a Gaussian distribution looks like:

a 3D coordinate plane. Everything is centered around (0,0), medium height

a 3D coordinate plane. Everything is centered around (0,0), tall height

The left-most figure shows a Gaussian with mean zero (that is, the 2x1 zero-vector) and covariance matrix $Σ = I$ (the 2x2 identity matrix). A Gaussian with zero mean and identity covariance is also called the standard normal distribution . The middle figure shows the density of a Gaussian with zero mean and $Σ = 0.6 I$ ; and in the rightmost figure shows one with , $Σ = 2 I$ . We see that as $Σ$ becomes larger, the Gaussian becomes more “spread-out,” and as it becomes smaller, the distribution becomesmore “compressed.”

Let's look at some more examples.

a 3D coordinate plane. Points are centered around the line x=y, Mostly centered around (0,0), medium height

a 3D coordinate plane. Points are centered around the line x=y, Mostly centered around (0,0), high height

a 3D coordinate plane. Points are centered around the line x=y, Mostly centered around (0,0), highest height

The figures above show Gaussians with mean 0, and with covariance matrices respectively

Σ = [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}]; Σ = [\begin{matrix} 1 & 0.5 \\ 0.5 & 1 \end{matrix}]; . Σ = [\begin{matrix} 1 & 0.8 \\ 0.8 & 1 \end{matrix}] .

The leftmost figure shows the familiar standard normal distribution, and we see that as we increase the off-diagonal entry in $Σ$ , the density becomes more “compressed” towards the $45^{\circ}$ line (given by $x_{1} = x_{2}$ ). We can see this more clearly when we look at the contours of the same three densities:

density view, most dense in middle. Spread out in a circle

density view, most dense in middle. Spread out in a line similar to x=y, spreading out like an ellipse

density view, most dense in middle. Spread out in a line similar to x=y, spreading out like an ellipse, but skinnier than above

Here's one last set of examples generated by varying $Σ$ :

density view, most dense in middle. Spread out in a line similar to x=-y, spreading out like an ellipse, skinnier than above

density view, most dense in middle. Spread out in a line similar to 2x=y, spreading out like an ellipse

The plots above used, respectively,

Σ = [\begin{matrix} 1 & -0.5 \\ -0.5 & 1 \end{matrix}]; Σ = [\begin{matrix} 1 & -0.8 \\ -0.8 & 1 \end{matrix}]; . Σ = [\begin{matrix} 3 & 0.8 \\ 0.8 & 1 \end{matrix}] .

From the leftmost and middle figures, we see that by decreasing the diagonal elements of the covariance matrix, the density now becomes “compressed” again, but in the opposite direction.Lastly, as we vary the parameters, more generally the contours will form ellipses (the rightmost figure showing an example).

As our last set of examples, fixing $Σ = I$ , by varying $μ$ , we can also move the mean of the density around.

a 3D coordinate plane. Points are centered around the line x=y, Mostly centered around (0,1), medium height

a 3D coordinate plane. Points are centered around the line x=y, Mostly centered around (-1,-1), medium height

The figures above were generated using $Σ = I$ , and respectively

μ = [\begin{matrix} 1 \\ 0 \end{matrix}]; μ = [\begin{matrix} -0.5 \\ 0 \end{matrix}]; μ = [\begin{matrix} -1 \\ -1.5 \end{matrix}] .

The gaussian discriminant analysis model

When we have a classification problem in which the input features $x$ are continuous-valued random variables, we can then use the Gaussian Discriminant Analysis (GDA) model, whichmodels $p (x | y)$ using a multivariate normal distribution. The model is:

\begin{matrix} y & \sim & Bernoulli (Φ) \\ x | y = 0 & \sim & N (μ_{0}, Σ) \\ x | y = 1 & \sim & N (μ_{1}, Σ) \end{matrix}

Writing out the distributions, this is:

\begin{matrix} p (y) & = & Φ^{y} {(1 - Φ)}^{1 - y} \\ p (x | y = 0) & = & \frac{1}{{(2 π)}^{n / 2} {| Σ |}^{1 / 2}} exp (- \frac{1}{2} {(x - μ_{0})}^{T} Σ^{- 1} (x - μ_{0})) \\ p (x | y = 1) & = & \frac{1}{{(2 π)}^{n / 2} {| Σ |}^{1 / 2}} exp (- \frac{1}{2} {(x - μ_{1})}^{T} Σ^{- 1} (x - μ_{1})) \end{matrix}

Here, the parameters of our model are $Φ$ , $Σ$ , $μ_{0}$ and $μ_{1}$ . (Note that while there're two different mean vectors $μ_{0}$ and $μ_{1}$ , this model is usually applied using only one covariance matrix $Σ$ .) The log-likelihood of the data is given by

\begin{matrix} ℓ (Φ, μ_{0}, μ_{1}, Σ) & = & log \prod_{i = 1}^{m} p (x^{(i)}, y^{(i)}; Φ, μ_{0}, μ_{1}, Σ) \\ = & log \prod_{i = 1}^{m} p (x^{(i)} | y^{(i)}; μ_{0}, μ_{1}, Σ) p (y^{(i)}; Φ) . \end{matrix}

By maximizing $ℓ$ with respect to the parameters, we find the maximum likelihood estimate of the parameters (see problem set 1) to be:

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask

	Fundamentals of electrical engineering i By OpenStax Read Online Course
	6 Enterprise JavaBeans By JavaChamp Team Start Quiz
©flickr: Luis	Final Exam Review By Madison Christian Start Exam
	1 Week 1 Social Psych By Yacoub Jayoghli Start Quiz
	3 Biology 03 Biological Macromolecules MCQ By OpenStax Start Quiz
	6 Microbiology Midterm Practice By Madison Christian Start Test
	Negotiations Conflict Management BUS403 By Charles Jumper Start Quiz
	Business Statistics By David Bourgeois Start Quiz
	6 AP 06 Skeletal System MCQ By OpenStax Start Quiz
	Anthropology Biology Culture By Richley Crapo Start Assignment