<< Chapter < Page Chapter >> Page >

Protein classification

Once alignment algorithms have been implemented, it is possible to explore different classifications of proteins. Naturally, it would be intuitive to classify proteins solely according to their structure, but much richer data is available as well. Current classifications of proteins integrate sequence and structure information in order to maximize their utility. These include the following:

  • SCOP (Structural Classification of Protein) is a database of all proteins whose structures have been determined, organized by family (evolutionary relationship), superfamily (structural and functional similarity), and fold (similar secondary structure, with similar arrangement and topological connections). SCOP was constructed largely by manual inspection and annotation.
  • CATH Protein Structure Classification is another database of protein structures, organized by Class (secondary structure content), Architecture (orientation of secondary structures), Topology (overall shape and connectivity), and Homologous Superfamily (evolutionary relationship inferred based on sequence and structure similarity). CATH uses SSAP (the Sequential Structural Alignment Program, a secondary structure element-based method) for structural comparison.

The biological purpose of designing Global Stuructural Alignment algorithms was the identification, classification, and ultimately prediction, of protein function, under the hypothesis that protein structure dictates protein function. Simultaneously, it was realized that small changes to proteins could lead to massive changes in function, or nothing at all. This suggested that global alignment and global structure comparison (as well as global sequence alignment and global sequence comparison) should not be the only tools used for function prediction.

In particular, it was realized that active sites, clusterings of amino acids on the surface of proteins and a tiny minority of the entire protein, were often strongly related to protein function. In a continuation of the effort to predict protein function through structural comparison, algorithms were developed to compare functionally relevant substructures of proteins. We refer to these algorithms collectively as local structure alignment algorithms.

Local matching: geometric hashing, pose clustering and match augmentation

Algorithms for local structure alignment address the similar computational problem of selecting a correspondence between a motif , a tiny substructure of a protein, often between 3 and 20 amino acids, and a target , a full protein structure. Once a correspondence has been established, the "distance" of the motif to the identified part of the target is measured using lRMSD.

Some algorithms for local structure alignment are based on pattern matching algorithms. Pattern matching algorithms seek target substructures called matches which have maximal geometric similarity to the motif. An excellent example of this type of algorithm is Geometric Hashing , a very flexible framework for geometric pattern matching under noisy constraints. Geometric Hashing , has been adapted to alignment by atom position , by backbone C-alpha , multiple structural alignment , and alignment of hinge-bending and flexible protein models . Other algorithms for substructural alignment include JESS , PINTS , webFEATURE , and pvSOAR . The description below concentrates on the work developed in .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask