<< Chapter < Page Chapter >> Page >

Just to recap where we were with PCA, principal component analysis, I said that in PCA, we imagine that we have some very high dimensional data that perhaps lies approximately on some low dimensional subspace. So if you had the data set like this, you might find that that's the first principal component of the data, and that's the second component of this 2-D data.

To summarize the algorithm, we have three steps. The first step of PCA was to normalize the data to zero mean and [inaudible]. So tracked out the means of your training examples. So it now has zero means, and then normalize each of your features so that the variance of each feature is now one.

The next step was [inaudible] computical variance matrix of your zero mean data. So you compute it as follows. The sum of all the products, and then you find the top K eigen vectors of sigma. So last time we saw the applications of this. For example, one of the applications was to eigen faces where each of your training examples, XI, is an image. So if you have 100 by 100 images, if your pictures of faces are 100 pixels by 100 pixels, then each of your training examples, XI, will be a 10,000 dimensional vector, corresponding to the 10,000 grayscale intensity pixel values.

There are 10,000 pixel values in each of your 100 by 100 images. So the eigen faces application was where the training example comprised pictures of faces of people. Then we ran PCA, and then to measure the distance between say a face here and a face there, we would project both of the face images onto the subspace and then measure the distance along the subspace. So in eigen faces, you use something like 50 principle components.

So the difficulty of working with problems like these is that in step two of the algorithm, we construct the covariance matrix sigma. The covariance matrix now becomes a 10,000 by 10,000 dimensional matrix, which is huge. That has 100 million entries, which is huge.

So let's apply PCA to very, very high dimensional data, used as a point of reducing the dimension. But step two of this algorithm had this step where you were constructing [inaudible]. So this extremely large matrix, which you can't do. Come back to this in a second.

It turns out one of the other frequently-used applications of PCA is actually to text data. So here's what I mean. Remember our vectorial representation of emails? So this is way back when we were talking about supervised learning algorithms for a stand classification. You remember I said that given a piece of email or given a piece of text document, you can represent it using a very high-dimensional vector by taking – writing down a list of all the words in your dictionary. Somewhere you had the word learn, somewhere you have the word study and so on.

Depending on whether each word appears or does not appear in your text document, you put either a one or a zero there. This is a representation we use on an electrode five or electrode six for representing text documents for when we're building [inaudible] based classifiers for [inaudible].

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask