<< Chapter < Page Chapter >> Page >

As a simple example of dimensionality reduction, consider the case of a bending string of beads, as depicted in figure 2. The input data has 3x7=21 dimensions (if given as the (x,y,z) coordinates of each bead) but the beads always move collectively from the "bent" arrangement (left) to the "straight" arrangement (right). Under this simplified view, the process can be considered as one-dimensional, and a meaningful axis for it would represent the "degree of straightness" of the system. Using this axis, each string of beads can be substituted by one single number, its "coordinate" along the proposed axis. Thus, the location of a shape along this axis can quickly indicate in what stage of the bending process it is.

Sampled data from a chain of 7 beads can be given by the (x,y,z) positions of each bead, for a total of 21 "apparent" degrees of freedom. However, the process is inherently one-dimensional.

When dimensionality reduction methods are applied to molecular motion data, the goal is to find the main "directions" or "axes" collectively followed by the atoms, and the placement of each input conformation along these axes. The meaning of such axes can be intuitive or abstract, depending on the technique used and how complex the system is. We can reword the definition of dimensionality reduction when working with molecular motion samples as:

    Dimensionality reduction of molecular motion data

  • INPUT: A set of molecular conformations sampled from some physical process, given as the (x,y,z) coordinates for each atom. These are 3N-dimensional points for a molecule with N atoms.
  • OUTPUT: A set of d coordinates for each input conformation, such that d<<3N . These d coordinates should help classify the conformations throughout the main stages of the studied process.
Dimensionality reduction methods can be either linear or non-linear . Linear methods typically compute the low-dimensional representation of each input point by a series of mathematical operations involving linear combinations and/or linear matrix operations. Non-linear methods use either non-linear mathematics or modify linear methods with algorithmic techniques that encode the data's "curvature" (such as Isomap, explained later). Both categories of methods have advantages and disadvantages, which will become clear through the rest of this module. More information on different linear and non-linear techniques for dimensionality reduction are given in [7] . Also, a more extensive classification of dimensionality reduction techniques into several categories, and examples of different algorithms can be found navigating this hypertext article .

The remainder of this module describes two dimensionality reduction techniques and their application to molecular motion data. These are Principal Components Analysis (PCA), a linear method, and ISOmetric feature MAPping (Isomap), a non-linear method.

Principal components analysis

We will start our discussion of Principal Components Analysis, or PCA, considering a very general data set of points as depicted in figure 3. Each point in this simple data set is given as a 3-dimensional vector (x,y,z) (the discussion will later be turned to the molecular motion domain, and the interpretation of such data). Even though this data set is given as 3-dimensional points, it is obvious from the figure that the data points are distributed mostly on a two-dimensional surface. Our objective is then to find the inherent, 2-dimensional parameterization of this data set. (For a full discussion on PCA, see Principal Components Analysis by I. T. Jolliffe).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask