<< Chapter < Page Chapter >> Page >

Augmentation

Augmentation is an application of depth first search that begins with the list of seed matches. Assuming that there are more than four motif points, we must find correspondences for the unmatched motif points within the target. Interpret the list of seed matches as a stack of partially complete matches. Pop off the first match, and considering the lRMSD alignment of this match, plot the position p of the next unmatched motif point s i relative to the aligned orientation of the motif. In the spherical region V around p, identify all target points t i , compatible with s i , inside V. Now compute the lRMSD alignment of all correlated points, include the new correlation ( s i , t i ). If the new alignment satisfies our first two criteria and there are no more unmatched motif points, put this match into a heap which maintains the match with smallest lRMSD. If there are more unmatched motif points, put this partial match back onto the stack. Continue to test correlations in this manner, until V contains no more target points that satisfy our criteria. Then, return to the stack, and begin again by popping off the first match on the stack, repeating this process until the stack is empty.

Filtering matches

Structural similarity is important to functional annotation only if a strong correlation exists between identifiably significant structural similarity and functional similarity. However, the existence of a match alone does not guarantee functional similarity. lRMSD can be a differentiating factor. If matches of homologous proteins represent statistically significant structural similarity over what is expected by random chance, we could differentiate on lRMSD, as long as we can evaluate the statistical significance of the lRMSD of a match.

BLAST first calculated the statistical significance of sequence matches with a combinatorial model of the space of similar sequences. Determining the statistical significance of structural matches has also been attempted. Modeling was applied for the PINTS database to estimate the probability of a structural match given a particular LRMSD. An artificial distribution was parameterized by motif size and amino acid composition in order to fit a given data set, and the p-value is calculated relative to that distribution. Another approach was taken in the algorithm JESS , using comparative analysis to generate a significance score relative to a specific population of known motifs. Both methods have some disadvantages. The artificial models of PINTS are not parameterized by the geometry of motifs, and, all else equal, produce identical distributions for motifs of different geometry. JESS, on the other hand, is dependent on a set of known motifs; should this set change, all significance scores would have to be revised.

Local structural alignment methods operate on the assumption that local structural and chemical similarity implies functional similarity. A statistical model has been developed that can be used to identify the degree of similarity sufficient to follow this implication. Given a match m with lRMSD r between motif S and target T, exactly one of two hypotheses must hold:

  • H 0 : S and T are structurally dissimilar
  • H A : S and T are structurally similar
The proposed statistical model implements this measurement by computing a motif profile . Motif profiles are frequency distributions (see Figure below) of match lRMSDs between S and the entire Protein Data Bank (PDB) [ site ], which is essentially a large set of functionally unrelated proteins. A motif profile is essentially a histogram, where the vertical axis measures the number of matches at each specific lRMSD, which is measured on the horizontal axis. Motif profiles provide very complete information about matches typical of H 0 . If we suspect that a match m has lRMSD r indicative of functional similarity, we can use the motif profile to determine the probability p of observing another match m' with smaller lRMSD by computing the volume under the curve to the left of r, relative to the entire volume. The probability p, often referred to as the p-value, is the measure of statistical significance. With a standard of statistical significance alpha, if p is less than alpha, then we say that the probability of observing a match m' with lRMSD r' less than r is so low that we reject the null hypothesis in favor of the alternative hypothesis. This process means that if a match m with lRMSD r has a p-value exactly equal to alpha, then this lRMSD is the highest lRMSD for which our statistical model predicts that m identifies structural and chemical similarity sufficient to imply functional similarity. Matches with this property are considered to be statistically significant.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask