<< Chapter < Page Chapter >> Page >

You can compute the square root of something and just not worry about it. You know the computer will give you the right answer. For most reasonably-sized matrixes, even up to thousands by thousands matrixes, the SVD routine, I think of it as a square root function. If you call it, it'll give you back the right answer. You don't have to worry too much about it.

If you have extremely large matrixes, like a million by a million matrixes, I might start to worry a bit, but a few thousand by a few thousand matrixes, this is implemented very well today. Interviewee:

[Inaudible].

Instructor (Andrew Ng) :What's the complexity of SVD? That's a good question. I actually don't know. I want to guess it's roughly on the order of N-cubed. I'm not sure. [Inaudible] algorithms, so I think – I don't know what's known about the conversion of these algorithms.

The example I drew out was for a facts matrix, and a matrix is [inaudible]. In the same way, you can also call SVD on the tall matrix, so it's taller than it's wide. It would decompose it into – okay? A product of three matrixes like that.

The nice thing about this is that we can use it to compute eigen vectors and PCA very efficiently. In particular, a covariance matrix sigma was this. It was the sum of all the products, so if you go back and recall the definition of the design matrix – I think I described this in lecture two when we derived the close form solution to these squares [inaudible] these squares. The design matrix was this matrix where I took my examples and stacked them in rows. They call this the design matrix [inaudible].

So if you construct the design matrix, then the covariance matrix sigma can be written just X transposing. That's X transposed, and [inaudible]. Okay? I hope you see why the X transpose X gives you the sum of products of vectors. If you aren't seeing this right now, just go home and convince yourself [inaudible]if it's true.

To get the top K eigen vectors of sigma, you would take sigma and decompose it using the – excuse me. You would take the matrix X, and you would compute as SVD. So you get USV transpose. Then the top three columns of U are the top K eigen vectors of X transpose X, which is therefore, the top K eigen vectors of your covariance matrix sigma.

So in our examples, the design matrix may be, say R. If you have 50,000 words in your dictionary, than the design matrix would be RM by 50,000. [Inaudible] say 100 by 50,000, if you have 100 examples. So X would be quite tractable to represent and compute the SVD, whereas the matrix sigma would be much harder to represent. This is 50,000 by 50,000. So this gives you an efficient way to implement PCA.

The reason I want to talk about this is in previous years, I didn't talk [inaudible]. The class projects, I found a number of students trying to implement SVD on huge problems and [inaudible], so this is a much better to implement PCA if you have extremely high dimensional data. If you have low dimensional data, if you have 50 or 100 dimensional data, then computing sigma's no problem. You can do it the old way, but otherwise, use the SVD to implement this.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask