<< Chapter < Page Chapter >> Page >

The computations required to simulate protein motion in silico are very expensive and involve non-trivial force field evaluations, as explained above. These simulations provide us with the (x,y,z) positions of all atoms in the molecule; for a molecule with N atoms, this amounts to 3N numbers per conformation. For interestingly sized molecules (such as proteins) the number of atoms is large and thus the dimensionality of the obtained data is extremely high. That is, a conformation sample for a protein with N atoms can be thought of as a 3N-dimensional point.

Protein folding is an important biological process. Individual atom motions are very complex, but the overall process seems almost one-dimensional, that is, following a direction from "unfolded" to "folded".

However, many biological processes are known to be very structured at the molecular level, since the constituent atoms self-organize to achieve their bio-chemical goal. An example of such a process is protein folding , the process by which a protein achieves its thermodynamically-stable three-dimensional shape to perform its biological function (a depiction of a protein folding process is shown in figure 1). To study such processes based on data gathered through simulations, there is a need to "summarize" the high-dimensional conformational data. Simply visualizing the time-series of a moving protein as produced by simulation packages does not provide a lot of insight into the process itself. One way of summarizing these motions is to turn conformations into a low-dimensional representation, such as a vector with very few components, that somehow give the "highlights" of the process. This data analysis process -turning high-dimensional data into low-dimensional data for betterinterpretation- is called dimensionality reduction and is discussed next.

Dimensionality reduction

When molecular shapes are sampled throughout some physical-chemical process that involves the motion of the molecule, there is a need to simplify the high-dimensional (albeit redundant) representation of a molecule given as a 3N-dimensional point, since it is believed that the actual degrees of freedom (DoFs) of the process are much less, as explained before. The resulting, simplified representation has to be useful to classify the different conformations along one or more "directions" or "axes" that provide enough discrimination between them.

Dimensionality Reduction techniques aim at analyzing a set of points, given as input, and producing the corresponding low-dimensional representation for each. The goal is to discover the true dimensionality of a data set that is only apparently high-dimensional . There exist mathematical tools to perform automatic dimensionality reduction, based on arbitrary input data in the form of high-dimensional points (not just molecules). Although different techniques achieve their goals in different ways, and have both advantages and disadvantages, the most general definition for dimensionality reduction could be stated as:

    Dimensionality reduction

  • INPUT: A set of M-dimensional points.
  • OUTPUT: A set of d -dimensional points, one for each of the input points, where d<<M .
Some dimensionality reduction methods can also produce other useful information, such as a "direction vector" that can be used to interpolate atomic positions continuously along the main motions (like in PCA, see below). For both a general discussion and specific methodology on dimensionality reduction, refer to [7] , and for more information on non-linear methods, see [8] .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask