<< Chapter < Page | Chapter >> Page > |
If we were fitting a full, unconstrained, covariance matrix to data, it was necessary that in order for the maximum likelihood estimate of not to be singular. Under either of the two restrictions above, we may obtain non-singular when .
However, restricting to be diagonal also means modeling the different coordinates , of the data as being uncorrelated and independent. Often, it would be nice to be able to capture some interesting correlation structure in the data. If we were to useeither of the restrictions on described above, we would therefore fail to do so. In this set of notes, we will describe the factor analysis model,which uses more parameters than the diagonal and captures some correlations in the data, but also without having to fit a full covariance matrix.
Before describing factor analysis, we digress to talk about how to find conditional and marginal distributions of random variables with a joint multivariate Gaussian distribution.
Suppose we have a vector-valued random variable
where , , and . Suppose , where
Here, , , , , and so on. Note that since covariance matrices are symmetric, .
Under our assumptions, and are jointly multivariate Gaussian. What is the marginal distribution of ? It is not hard to see that , and that . To see that the latteris true, note that by definition of the joint covariance of and , we have that
Matching the upper-left subblocks in the matrices in the second and the last lines above gives the result.
Since marginal distributions of Gaussians are themselves Gaussian, we therefore have that the marginal distribution of is given by .
Also, we can ask, what is the conditional distribution of given ? By referring to the definition of the multivariate Gaussian distribution, itcan be shown that , where
When working with the factor analysis model in the next section, these formulas for finding conditional and marginal distributions of Gaussians will be very useful.
In the factor analysis model, we posit a joint distribution on as follows, where is a latent random variable:
Here, the parameters of our model are the vector , the matrix , and the diagonal matrix . The value of is usually chosen to be smaller than .
Thus, we imagine that each datapoint is generated by sampling a dimension multivariate Gaussian . Then, it is mapped to a -dimensional affine space of by computing . Lastly, is generated by adding covariance noise to .
Equivalently (convince yourself that this is the case), we can therefore also define the factor analysis model according to
where and are independent.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?