<< Chapter < Page | Chapter >> Page > |
As a simple example of dimensionality reduction, consider the case of a bending string of beads, as depicted in figure 2. The input data has 3x7=21 dimensions (if given as the $\mathrm{(x,y,z)}$ coordinates of each bead) but the beads always move collectively from the "bent" arrangement (left) to the "straight" arrangement (right). Under this simplified view, the process can be considered as one-dimensional, and a meaningful axis for it would represent the "degree of straightness" of the system. Using this axis, each string of beads can be substituted by one single number, its "coordinate" along the proposed axis. Thus, the location of a shape along this axis can quickly indicate in what stage of the bending process it is.
When dimensionality reduction methods are applied to molecular motion data, the goal is to find the main "directions" or "axes" collectively followed by the atoms, and the placement of each input conformation along these axes. The meaning of such axes can be intuitive or abstract, depending on the technique used and how complex the system is. We can reword the definition of dimensionality reduction when working with molecular motion samples as:
The remainder of this module describes two dimensionality reduction techniques and their application to molecular motion data. These are Principal Components Analysis (PCA), a linear method, and ISOmetric feature MAPping (Isomap), a non-linear method.
We will start our discussion of Principal Components Analysis, or PCA, considering a very general data set of points as depicted in figure 3. Each point in this simple data set is given as a 3-dimensional vector $\mathrm{(x,y,z)}$ (the discussion will later be turned to the molecular motion domain, and the interpretation of such data). Even though this data set is given as 3-dimensional points, it is obvious from the figure that the data points are distributed mostly on a two-dimensional surface. Our objective is then to find the inherent, 2-dimensional parameterization of this data set. (For a full discussion on PCA, see Principal Components Analysis by I. T. Jolliffe).
Notification Switch
Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?