<< Chapter < Page Chapter >> Page >

If we were fitting a full, unconstrained, covariance matrix Σ to data, it was necessary that m n + 1 in order for the maximum likelihood estimate of Σ not to be singular. Under either of the two restrictions above, we may obtain non-singular Σ when m 2 .

However, restricting Σ to be diagonal also means modeling the different coordinates x i , x j of the data as being uncorrelated and independent. Often, it would be nice to be able to capture some interesting correlation structure in the data. If we were to useeither of the restrictions on Σ described above, we would therefore fail to do so. In this set of notes, we will describe the factor analysis model,which uses more parameters than the diagonal Σ and captures some correlations in the data, but also without having to fit a full covariance matrix.

Marginals and conditionals of gaussians

Before describing factor analysis, we digress to talk about how to find conditional and marginal distributions of random variables with a joint multivariate Gaussian distribution.

Suppose we have a vector-valued random variable

x = x 1 x 2 ,

where x 1 R r , x 2 R s , and x R r + s . Suppose x N ( μ , Σ ) , where

μ = μ 1 μ 2 , Σ = Σ 11 Σ 12 Σ 21 Σ 22 .

Here, μ 1 R r , μ 2 R s , Σ 11 R r × r , Σ 12 R r × s , and so on. Note that since covariance matrices are symmetric, Σ 12 = Σ 21 T .

Under our assumptions, x 1 and x 2 are jointly multivariate Gaussian. What is the marginal distribution of x 1 ? It is not hard to see that E [ x 1 ] = μ 1 , and that Cov ( x 1 ) = E [ ( x 1 - μ 1 ) ( x 1 - μ 1 ) T ] = Σ 11 . To see that the latteris true, note that by definition of the joint covariance of x 1 and x 2 , we have that

Cov ( x ) = Σ = Σ 11 Σ 12 Σ 21 Σ 22 = E [ ( x - μ ) ( x - μ ) T ] = E x 1 - μ 1 x 2 - μ 2 x 1 - μ 1 x 2 - μ 2 T = E ( x 1 - μ 1 ) ( x 1 - μ 1 ) T ( x 1 - μ 1 ) ( x 2 - μ 2 ) T ( x 2 - μ 2 ) ( x 1 - μ 1 ) T ( x 2 - μ 2 ) ( x 2 - μ 2 ) T .

Matching the upper-left subblocks in the matrices in the second and the last lines above gives the result.

Since marginal distributions of Gaussians are themselves Gaussian, we therefore have that the marginal distribution of x 1 is given by x 1 N ( μ 1 , Σ 11 ) .

Also, we can ask, what is the conditional distribution of x 1 given x 2 ? By referring to the definition of the multivariate Gaussian distribution, itcan be shown that x 1 | x 2 N ( μ 1 | 2 , Σ 1 | 2 ) , where

μ 1 | 2 = μ 1 + Σ 12 Σ 22 - 1 ( x 2 - μ 2 ) ,
Σ 1 | 2 = Σ 11 - Σ 12 Σ 22 - 1 Σ 21 .

When working with the factor analysis model in the next section, these formulas for finding conditional and marginal distributions of Gaussians will be very useful.

The factor analysis model

In the factor analysis model, we posit a joint distribution on ( x , z ) as follows, where z R k is a latent random variable:

z N ( 0 , I ) x | z N ( μ + Λ z , Ψ ) .

Here, the parameters of our model are the vector μ R n , the matrix Λ R n × k , and the diagonal matrix Ψ R n × n . The value of k is usually chosen to be smaller than n .

Thus, we imagine that each datapoint x ( i ) is generated by sampling a k dimension multivariate Gaussian z ( i ) . Then, it is mapped to a k -dimensional affine space of R n by computing μ + Λ z ( i ) . Lastly, x ( i ) is generated by adding covariance Ψ noise to μ + Λ z ( i ) .

Equivalently (convince yourself that this is the case), we can therefore also define the factor analysis model according to

z N ( 0 , I ) ϵ N ( 0 , Ψ ) x = μ + Λ z + ϵ .

where ϵ and z are independent.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask