2.9 Machine learning lecture 9 course notes (Page 2/3)

Machine learning Page 2 / 3

If we were fitting a full, unconstrained, covariance matrix $Σ$ to data, it was necessary that $m \geq n + 1$ in order for the maximum likelihood estimate of $Σ$ not to be singular. Under either of the two restrictions above, we may obtain non-singular $Σ$ when $m \geq 2$ .

However, restricting $Σ$ to be diagonal also means modeling the different coordinates $x_{i}$ , $x_{j}$ of the data as being uncorrelated and independent. Often, it would be nice to be able to capture some interesting correlation structure in the data. If we were to useeither of the restrictions on $Σ$ described above, we would therefore fail to do so. In this set of notes, we will describe the factor analysis model,which uses more parameters than the diagonal $Σ$ and captures some correlations in the data, but also without having to fit a full covariance matrix.

Marginals and conditionals of gaussians

Before describing factor analysis, we digress to talk about how to find conditional and marginal distributions of random variables with a joint multivariate Gaussian distribution.

Suppose we have a vector-valued random variable

x = [\begin{matrix} x_{1} \\ x_{2} \end{matrix}],

where $x_{1} \in R^{r}$ , $x_{2} \in R^{s}$ , and $x \in R^{r + s}$ . Suppose $x \sim N (μ, Σ)$ , where

μ = [\begin{matrix} μ_{1} \\ μ_{2} \end{matrix}], Σ = [\begin{matrix} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{matrix}] .

Here, $μ_{1} \in R^{r}$ , $μ_{2} \in R^{s}$ , $Σ_{11} \in R^{r \times r}$ , $Σ_{12} \in R^{r \times s}$ , and so on. Note that since covariance matrices are symmetric, $Σ_{12} = Σ_{21}^{T}$ .

Under our assumptions, $x_{1}$ and $x_{2}$ are jointly multivariate Gaussian. What is the marginal distribution of $x_{1}$ ? It is not hard to see that $E [x_{1}] = μ_{1}$ , and that $Cov (x_{1}) = E [(x_{1} - μ_{1}) (x_{1} - μ_{1})_{T}] = Σ_{11}$ . To see that the latteris true, note that by definition of the joint covariance of $x_{1}$ and $x_{2}$ , we have that

\begin{matrix} Cov (x) & = & Σ \\ = & [\begin{matrix} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{matrix}] \\ = & E [(x - μ) {(x - μ)}^{T}] \\ = & E [(\begin{matrix} x_{1} - μ_{1} \\ x_{2} - μ_{2} \end{matrix}), {(\begin{matrix} x_{1} - μ_{1} \\ x_{2} - μ_{2} \end{matrix})}^{T}] \\ = & E [\begin{matrix} (x_{1} - μ_{1}) {(x_{1} - μ_{1})}^{T} & (x_{1} - μ_{1}) {(x_{2} - μ_{2})}^{T} \\ (x_{2} - μ_{2}) {(x_{1} - μ_{1})}^{T} & (x_{2} - μ_{2}) {(x_{2} - μ_{2})}^{T} \end{matrix}] . \end{matrix}

Matching the upper-left subblocks in the matrices in the second and the last lines above gives the result.

Since marginal distributions of Gaussians are themselves Gaussian, we therefore have that the marginal distribution of $x_{1}$ is given by $x_{1} \sim N (μ_{1}, Σ_{11})$ .

Also, we can ask, what is the conditional distribution of $x_{1}$ given $x_{2}$ ? By referring to the definition of the multivariate Gaussian distribution, itcan be shown that $x_{1} | x_{2} \sim N (μ_{1 | 2}, Σ_{1 | 2})$ , where

\begin{matrix} μ_{1 | 2} & = & μ_{1} + Σ_{12} Σ_{22}^{- 1} (x_{2} - μ_{2}), \end{matrix}

\begin{matrix} Σ_{1 | 2} & = & Σ_{11} - Σ_{12} Σ_{22}^{- 1} Σ_{21} . \end{matrix}

When working with the factor analysis model in the next section, these formulas for finding conditional and marginal distributions of Gaussians will be very useful.

The factor analysis model

In the factor analysis model, we posit a joint distribution on $(x, z)$ as follows, where $z \in R^{k}$ is a latent random variable:

\begin{matrix} z & \sim & N (0, I) \\ x | z & \sim & N (μ + Λ z, Ψ) . \end{matrix}

Here, the parameters of our model are the vector $μ \in R^{n}$ , the matrix $Λ \in R^{n \times k}$ , and the diagonal matrix $Ψ \in R^{n \times n}$ . The value of $k$ is usually chosen to be smaller than $n$ .

Thus, we imagine that each datapoint $x^{(i)}$ is generated by sampling a $k$ dimension multivariate Gaussian $z^{(i)}$ . Then, it is mapped to a $k$ -dimensional affine space of $R^{n}$ by computing $μ + Λ z^{(i)}$ . Lastly, $x^{(i)}$ is generated by adding covariance $Ψ$ noise to $μ + Λ z^{(i)}$ .

Equivalently (convince yourself that this is the case), we can therefore also define the factor analysis model according to

\begin{matrix} z & \sim & N (0, I) \\ ϵ & \sim & N (0, Ψ) \\ x & = & μ + Λ z + ϵ . \end{matrix}

where $ϵ$ and $z$ are independent.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask

	Gastrointestinal Pathophysiology 2006 By Tamsin Knox Start Exam
	22 Muscle and Pancreas Bio Path quiz By Brooke Delaney Start Exam
©flickr: Gisela	Electrocardiogram Quiz By Anonymous User Start Quiz
	Measurement Experimentation Lab MCQ By Steve Gibbs Start Quiz
	American Government MCQ By Saylor Foundation Start Quiz
	28 Biology 28 Invertebrates MCQ By OpenStax Start Quiz
	NCE Ch 03 Human Growth and Develoment By Anh Dao Start Test
	Social Dances 2 By Marion Cabalfin Start Quiz
©flickr: Justin	Music Appreciation Final Practice By Madison Christian Start Exam
	Anthropology Religion Culture By Richley Crapo Start Assignment