<< Chapter < Page Chapter >> Page >

Figure 12 shows the results of applying Isomap to the same molecular trajectory used for the CV-N example in the PCA section. CV-N is a protein that is known to have an "intermediate" state in the folding mechanism, and it is also known to fold through a very ordered process following a well-defined "folding route".

The first two Isomap coordinates for the CV-N protein folding trajectory. Left: the projection of each simulated point onto the first two low-dimensional coordinates. Right: A free-energy (probability) surface computed using the Isomap coordinates.

The free-energy plot in figure 12 (right) clearly shows the superior quality of the Isomap coordinates when compared with PCA. Isomap clearly identifies the CV-N intermediate (seen as a free-energy minimum in blue) in between the unfolded and folded states. Also, the highest probability route connecting the intermediate to the folded state is clearly seen in this rendering. The PCA coordinates did not identify an itermediate or a folding route, since the PCA coordinates are simply a linear projection of the original coordinates. Isomap, on the contrary, finds the underlying connectivity of the data and always returns coordinates that clearly show the progression of the reaction along its different stages.

The Isomap procedure can be applied to different molecular models. As another, smaller example, consider the Alanine Dipeptide molecule shown in figure 13 (left). This is an all-atom molecule that has been studied thoroughly and is known to have two predominant shapes: an "extended" shape (also called C5 and a variant called PII) and a "helical" shape (also called alpha, with three variants: right-handed, P-like, and left-handed). The application of the Isomap algorithm to this molecule, without any bias by a priori known information, yields the low-dimensional coordinates on the left of figure 13 (shown again as a free-energy plot). The first two coordinates (top right) are already enough to identify all five major states of the peptide, and the possible transitions between them. Applying PCA to such a simple molecule yields comparable results (not shown), but PCA cannot differentiate the left-handed helix shape from the right-handed one. Only the geodesic formulation of Isomap can.

Alanine Dipeptide. Left: a) A helical conformation. b) An extended conformation. Right: Top: free-energy as a function of the first two Isomap coordinates computed from a Molecular Dynamics trajectory. Bottom: free-energy as a function of the first and third coordinates, for comparison (these explain the data variance better but do not add new states or transitions).

Although the Isomap coordinates describe non-linear molecular processes extremely well, their computation is very expensive, as mentioned earlier. The most expensive step is the computation of the neighborhood graph. Some approximations are possible to speed up its computation, at the expense of some accuracy in the identification of the "true" nearest neighbors for each point. The work in Plaku et al. proposed the use of an approximation technique that allows the computation of nearest-neighbors for high-dimensional data sets orders of magnitude faster, by quickly reducing the dimensionality of all points, and then querying for neighbors. The tradeoff between precision and speed is controllable through several algorithm parameters.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask