12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 Fold Recognition 39based on the returned alignment then one would have one of the simplest proteinstructure prediction techniques (Fig. 2.4a).The obvious shortcoming of this approach is the limited ability of the simple20 × 20 scoring functions <strong>to</strong> detect anything but close (>30% sequence identity)homology. Given that we know sequences can diverge well below this threshold ofsequence identity whilst maintaining highly similar structures, it was clear thatthere would be many homologous relationships we were missing <strong>with</strong> thisapproach, and which, if detectable, would permit a substantial increase in our ability<strong>to</strong> predict structure.2.3.1 Using Predicted Structural FeaturesOne of the earliest attempts at pushing homology recognition beyond simplesequence matching was developed by Bowie et al. (1991). The idea is based on thefact that certain structural features of a protein sequence can be predicted in theabsence of an explicit template. Most notably, the secondary structure, i.e. the locationsof alpha-helices and beta-strands, can now be predicted <strong>with</strong> an accuracyapproaching 80% using programs such as PSIPRED (Jones 1999a). Given thatstructure is more conserved than sequence, a pair of remotely homologous proteinswill contain similar patterns of secondary structure elements even in the absence ofany obvious sequence similarity. In addition, the solvent exposure of a residue canbe predicted <strong>with</strong> relatively high accuracy (e.g. Kim and Park 2003), as can thepresence of tight beta-hairpin turns (e.g. Kumar et al. 2005).These predicted structural features provide us <strong>with</strong> further information that canbe used <strong>to</strong>gether <strong>with</strong> the sequence matching. When aligning two amino acidsfrom the query and template one can calculate a compatibility score based on amutation matrix such as BLOSUM plus terms involving secondary structurematching and solvent exposure:S = Seq + SS + Solvij ij ij ijWhere S ijis the overall score for matching residue i in the query sequence <strong>with</strong> residuej in the template sequence, Seq ijis the score from the BLOSUM matrix formatching i and j, SS ijis the score for matching the predicted secondary structuretype at residue i <strong>with</strong> the known secondary structure at residue j, and Solv ijis thescore for matching the burial state predicted for residue i <strong>with</strong> the known burialstate at residue j. Simple versions of such scoring functions are depicted in Table2.1b and c, where identical states (helix matched <strong>to</strong> helix for example) receive ascore of +1 and all other combinations receive −1. Often the functions will be moreelaborate and be based on empirical observation of the frequency <strong>with</strong> whichthe different states tend <strong>to</strong> be aligned in known homologues. This is analogous <strong>to</strong>the progression from a simple identity-based sequence matching matrix <strong>to</strong>wards themore sensitive BLOSUM-style matrix.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!