0.1 Principal component analysis (pca)

Comparison of three different Page 1 / 1

A brief discussion of PCA.

Principal component analysis

PCA is essentially just SVD. The only difference is that we usually center the data first using some grand mean before doing SVD. There are three perspectives of views for PCA. Each of them gives different insight on what PCA does.

Low-rank approximation

\begin{matrix} \min_{Z} \frac{1}{2} | | X - Z | |_{F}^{2} \\ s u b j e c t t o r a n k (Z) \leq K \end{matrix}

where Frobenius norm is a matrix version of sums of squared. This gives the interpretation of dimension reduction. Solution to the problem is: $Z = \sum_{i = 1}^{K} U_{k} d_{k} V_{K}^{T}$

We do lose some information when doing dimension reduction, but the majority of variance is explained in the lower-rank matrix (The eigenvalues give us information about how significant the eigenvector is. So we put the eigenvalues in the order of the magnitude of the eigenvectors, and discard the smallest several since the contribution of components along that particular eigenvector is less significant comparing that with a large eigenvalue). PCA guarantees the best rank-K approximation to X. The tuning parameter K can be either chosen by cross-validation or AIC/BIC. This property is useful for data visualization when the data is high dimensional.

Matrix factorization

\begin{array}{l} \underset{\begin{array}{l} U, D, \\ V \end{array}}{minimize} {\frac{1}{2} {‖ X - U D V^{T} ‖}_{F}^{2}} \\ s u b j e c t t o U^{T} U = I, V^{T} V = I, D \in d i a g^{+} \end{array}

This gives the interpretation of pattern recognition. The first column of U gives the first major pattern in sample (row) space while the first column of V gives the first major pattern in feature space. This property is also useful in recommender systems (a lot of the popular algorithms in collaborative filtering like SVD++, bias-SVD etc. are based upon this “projection-to-find-major-pattern” idea).

Covariance

\begin{array}{l} \max V_{K}^{T} X^{T} X V_{K} \\ s u b j e c t t o V_{K}^{T} V_{K} = 1, V_{K}^{T} V_{j} = 0 \end{array}

$X^{T} X$ here behaves like covariates for multivariate Gaussian. This is essentially an eigenvalue problem of covariance: $X^{T} X = V D^{2} V^{T}$ and $X X^{T} = U D^{2} U^{T}$ . Interpretation here is that we are maximizing the covariates in column and row space.

_PCA (Figure Credit: https://onlinecourses.science.psu.edu/stat857/node/35)

The intuition behind pca

The intuition behind PCA is as follows: The First PC (Principal Component) finds the linear combinations of variables that correspond to the direction with maximal sample variance (the major pattern of the dataset, the most spread out direction). Succeeding PCs then goes on to find direction that gives highest variance under the constraint of it being orthogonal (uncorrelated) to preceding ones. Geometrically, what we are doing is basically a coordinate transformation – the newly formed axes correspond to the newly constructed linear combination of variables. The number of the newly formed coordinate axes (variables) is usually much lower than the number of axes (variables) in the original dataset, but it’s still explaining most of the variance present in the data.

Another interesting insight

Another interesting insight on PCA is provided by considering its relationship to Ridge Regression (L2 penalty). The result given by Ridge Regression can be written like this:

\hat{Y} = X {\hat{β}}^{r} = \sum_{j = 1}^{p} u_{j} \frac{d {}_{j}^{2}}{d_{j}^{2} + λ} u_{j}^{T} y

The term in the middle here, $\frac{d {}_{j}^{2}}{d_{j}^{2} + λ}$ , shrinks the singular values. For those major patterns with large singular values, lambda has little effect for shrinking; but for those with small singular values, lambda has huge effect to shrink them towards zero (not exactly zero, unlike lasso - L1 penalty, which does feature selection). This non-uniform shrinkage thus has a grouping effect. This is why Ridge Regression is often used when features are strongly correlated (it only captures orthogonal major pattern). PCA is really easy to implement - feed the data matrix(n*p) to the SVD command in Matlab, extract the PC loading(V) and PC score(U) vector and we will get the major pattern we want.

Questions & Answers

what's Thermochemistry

rhoda Reply

the study of the heat energy which is associated with chemical reactions

Kaddija

How was CH4 and o2 was able to produce (Co2)and (H2o

Edafe Reply

explain please

Victory

First twenty elements with their valences

Martine Reply

what is chemistry

asue Reply

what is atom

asue

what is the best way to define periodic table for jamb

Damilola Reply

what is the change of matter from one state to another

Elijah Reply

what is isolation of organic compounds

IKyernum Reply

what is atomic radius

ThankGod Reply

Read Chapter 6, section 5

Kareem

Atomic radius is the radius of the atom and is also called the orbital radius

Kareem

atomic radius is the distance between the nucleus of an atom and its valence shell

Amos

Read Chapter 6, section 5

paulino

Bohr's model of the theory atom

Ayom Reply

is there a question?

when a gas is compressed why it becomes hot?

ATOMIC

It has no oxygen then

Goldyei

read the chapter on thermochemistry...the sections on "PV" work and the First Law of Thermodynamics should help..

Which element react with water

Mukthar Reply

Mgo

Ibeh

an increase in the pressure of a gas results in the decrease of its

Valentina Reply

definition of the periodic table

Cosmos Reply

What is the lkenes

Da Reply

what were atoms composed of?

Moses Reply

what is chemistry

Imoh Reply

what is chemistry

Damilola

Got questions? Join the online conversation and get instant answers!

Jobilize.com Reply

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Comparison of three different matrix factorization techniques for unsupervised machine learning. OpenStax CNX. Dec 18, 2013 Download for free at http://cnx.org/content/col11602/1.1

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Comparison of three different matrix factorization techniques for unsupervised machine learning' conversation and receive update notifications?

Ask

	22 AP Key Terms 22 The Respiratory System By OpenStax Start Key Terms
	17 AP Key Terms 17 Endocrine System By OpenStax Start Key Terms
	Algebra and trigonometry By OpenStax Read Online Course
	11 AP Key Terms 11 The Muscular System By OpenStax Start Key Terms
	28 AP 28 Development Inheritance MCQ By OpenStax Start Quiz
	Microeconomics Practice MCQ By Frank Levy Start Test
	Biology Final By Anonymous User Start Quiz
	1 Biology 01 The Study of Life MCQ By OpenStax Start Quiz
	Financial Markets By Keyaira Braxton Start Exam
	Cardiac Electrophysiology Basic By Mistry Bhavesh Start Quiz