<< Chapter < Page Chapter >> Page >

As for the case of rotation matrices, the translational part of the alignment consists of making the centroids of the two data sets coincide. To find the optimal rotation using quaternions, recall that the dot product of two vectors is maximized when the vectors are in the same direction. The same is true when the vectors are represented as quaternions. Using this property, we can define a quantity that we want to maximize (proof here ):

The objective function for rotational alignment (quaternion form)

We want to find the rotation on set A that maximizes the sum of the dot products of the rotated vectors of A with the vectors of B, all expressed as offsets from the set centroids.
Equivalently, using the last property from the section "Introduction to quaternions", we get:
The objective restated.
Now, recall that quaternion multiplication can be represented by matrices, and that the quaterions a and b have a 0 real component:
These substitutions will be used to restate the function to be maximized.
Using these matrices, we can derive a new form for the objective function:
The third step follows because each term in the sum is multiplied on the left and right by q, so the q factors can be moved outside the sum. The fourth step simply renames the sum of matrix products to a single matrix, N, based on which we can find q.
where:
Now the problem is stated in terms of a matrix product optimization.
The quaternion that maximizes this product is the eigenvector of N that corresponds to its most positive eigenvalue (proof here ). The eigenvalues can be found by solving the following equation, which is quartic in lambda:
I is the 4x4 identity matrix.
This quartic equation can be solved by a number of standard approaches. Finally, given the maximum eigenvalue lambda-max, the quaternion corresponding to the optimal rotation is the eigenvector v:
This equation can be solved to find the optimal rotation.
A closed-form solution to this equation for v can be found by applying techniques from linear algebra. One possible algorithm, based on constructing a matrix of cofactors, is presented in appendix A5 of the source paper [3] .

In summary, the alignment algorithm works as follows:

  • Recalculate atom coordinates as displacements from the centroid of each molecule. The optimal translation superimposes the centroids.
  • Construct the matrix N based on matrices A and B for each atom.
  • Find the maximum eigenvalue by solving the quartic eigenvalue equation.
  • Find the eigenvector corresponding to this eigenvalue. This vector is the quaternion corresponding to the optimal rotation.

This method appears computationally intensive, but has the major advantage over other approaches of being a closed-form, unique solution.

RMSD and lRMSD are not ideally suited for all applications. For example, consider the case of a given conformation A, and a set S of other conformations generated by some means. The goal is to estimate which conformations in S are closest in potential energy to A, making the assumption that they will be the conformations most structurally similar to A. The lRMSD measure will find the conformations in which the overall average atomic displacement is least. The problem is that if the quantity of interest is the potential energy of conformations, not all atoms can be treated equally. Those on the outside of the protein can often move a fair amount without dramatically affecting the energy. In contrast, the core of the molecule tends to be more compact, and therefore a slight change in the relative positions of a pair of atoms could lead to overlap of the atoms, and therefore a completely infeasible structure and high potential energy. A class of distance measures and pseudo-measures based on intramolecular distances have been developed to address this shortcoming of RMSD-based measures.

Assume we wish to compare two conformations P and Q of a molecule with N atoms. Let p ij be the distance between atom i and atom j in conformation P, and let q ij be the same distance for conformation Q. Then the intramolecular distance is defined as

Intra-molecular distance (dRMSD)
One of the main computational advantages of this class of approaches is that we do not have to compute the alignment between P and Q. On the other hand, for this metric we need to sum over aquadratic number of terms, whereas for RMSD the number of terms is linear in the number of atoms. Approximations can be made to speed up this computation, as shown in [7] . Also, the intramolecular distance measure given above, which is sometimes referred to as the dRMSD, is subject to the problem that pairs of atoms most distant from each other are the ones that contribute the greatest amount to their measured difference.

An interesting open problem is to come up with physically meaningful molecular distance metric that allows for fast nearestneighbor computations. This can be useful for, for example, clustering conformations. One proposed method is the contact distance . Contact distance requires constructing a contact map matrix for each conformation indicating which pairs of atoms are less than some threshold separation. The distance measure is then a measure of the difference of the contact maps.

Contact distance

Contact maps (C) are calculated for each structure, and the differences in these contact maps used to define a distance D.
Other distance measures attempt to weight each pair in the dRMSD based on how close the atoms are, with closer pairs given more weight, in keeping with the intuition that small changes in the relative positions of nearby atoms are more likely to result in collisions. One such measure is the normalized Holm and Sander Score .

Holm and sander distance

This distance function is weighted to accentuate the importance of differences in structures that are relatively close to each other. These are the contacts most likely to affect the potential energy of the structure.
This score is technically a pseudo-measure rather than a measure because it does not necessarily obey the triangle inequality .

The definition of distance measures remains an open problem. For reference on ongoing work, see articles that compare several methods, such as [5] .

The first two papers are the original descriptions of the Kabsch Algorithm, and use rotations represented as orthonormal matrices to find the correct rotational transformation. Many software packages use this alignment method. The third and fourth papers use quaternions. The alignment method presented in the previous section comes from the third paper:

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask