<< Chapter < Page Chapter >> Page >
Isomap at work on the "swiss roll" data set. a) The input data is given as three-dimensional, but is really two-dimensional in nature. b) Each point in the data set is connected to its neighbors to form a neighborhood graph, overlaid in light grey. Geodesic distances are approximated by computing shortest paths along the neighborhood graph (red). c) MDS applied to the geodesic distances has the effect of "unrolling" the swiss roll into its natural, two-dimensional parameterization. The neighborhood graph has been overlaid for comparison. Now, the Euclidean distances (in blue) approximate the original geodesic distances (in red).

The Isomap method can be put in algorithmic form in the following way:

    The isomap algorithm

  • Build the neighborhood graph. Take all input points and connect each point to the closest ones according to the distance measure used. Different criteria can be used to select the closest neighbors, such as the k closest or all points within some threshold distance.
  • Compute all pairwise geodesic distances. These are computed as the shortest paths on the neighborhood graph. Efficient algorithms for computing all-pairs-shortest-paths exist, such as Dijkstra's algorithm. All geodesic distances can be put in a matrix D .
  • Perform MDS on geodesic distances. Take the matrix D and apply MDS to it. That is, apply the double-centering formula explained above to produce B , and compute the low-dimensional coordinates by computing B 's eigenvectors and eigenvalues.

The Isomap algorithm captures the non-linearity of the data set automatically, from the data itself. It returns a low-dimensional projection for each point; these projections can be used to understand the underlying data distribution better. However, Isomap does not return "modes of motion" like PCA does, along which other points can be interpolated. Also, Isomap is much more expensive than PCA, since building a neighborhood graph and computing all-pairs-shortest-paths can have quadratic complexity on the number of input points. Plus, ther is the hidden cost of the distance measure itself, which can also be quite expensive. Regardless of its computational cost, Isomap has proven extremely useful in describing non-linear processes with few parameters. In particular, the work in Das et al. applied Isomap successfully to the study of a protein folding reaction.

In order to apply Isomap to a set of molecular conformations (gathered, for example, through molecular dynamics simulations) all that is needed is a distance measure between two conformations. The most popular distance measure for conformations is lRMSD, discussed in the module Molecular Distance Measures . There is no need to pre-align all the conformations in this case, since the lRMSD distance already includes pairwise alignment. Thus, the Isomap algorithm as described above can be directly applied to molecular conformations. Choosing an appropriate value for a neighborhood parameter (such as k ) may require a bit of experience, though, and it may depend on the amount and variability of the data. It should be noted that now, the shape of the low-dimensional surface where we hypothesize the points lie is unknown to us. In the swiss roll example above, it was obvious that the data was sampled from a two-dimensional surface. For molecular data, however, we do not know, a priori, what the surface looks like. But we know that the process should be low-dimensional and highly non-linear in nature. Of course, the distance measure used as an underlying operation has an impact on the final coordinates as well. The validation of the Isomap method using the lRMSD distance on protein folding data was done in Das et al. , where a statistical analysis method was used to prove its usefulness.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask