12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

120 P. Tompabeen developed in<strong>to</strong> several variants, enabling prediction of disorder <strong>with</strong>in theterminal regions of proteins (Li et al. 1999), prediction of regions likely <strong>to</strong> serveas recognition motifs (VL-XT (Iakoucheva et al. 2002) ) or a combined predictionof short and long regions of disorder (VSL2 (Peng et al. 2006) ). Becauseshort disordered regions are context dependent, i.e. their lack of structuredepends on their structural environment, whereas disorder of long regions standson its own, this combined approach results in one of the most powerful algorithmsof disorder prediction.A computationally different ML approach is the application of support vec<strong>to</strong>rmachines (SVMs), as exemplified by DISOPRED2 (Ward et al. 2004). This algorithmsearches for a hyperplane in a feature space that separates ordered and disorderedproteins. The hyperplane may either be linear or non-linear, and unbalancedclass frequencies of data from ordered (e.g. proteins in PDB) and disordered (e.g.proteins in DisProt (Sickmeier et al. 2007) ) proteins are taken in<strong>to</strong> consideration.Sequence profiles generated by PSI-BLAST are also incorporated as an input.5.3.6 Prediction Based on Contact PotentialsSome predic<strong>to</strong>rs operate on a completely different principle, based on the idea thatIDPs cannot fold because their amino acids cannot make sufficient inter-residueinteractions <strong>to</strong> overcome the unfavourable decrease in entropy accompanying folding.There are several predic<strong>to</strong>rs based on this principle, which apply simple statisticalprinciples (FoldUnfold (Galzitskaya et al. 2006) ), compare pairwise interactionpotentials (Ucon (Schlessinger et al. 2007) ), or estimate the <strong>to</strong>tal inter-residueinteraction energy of a chain (IUPred (Dosztanyi et al. 2005a, b) ). This latter isdescribed in some detail.To estimate the <strong>to</strong>tal pair wise interaction energy realized by a polypeptidechain, IUPred uses low-resolution force fields (statistical potentials) derivedfrom globular proteins. The underlying idea is that the contribution of a residuedepends not only on its type, but also on other amino acids, i.e. its potentialpartners, in the sequence. Because a probabilistic treatment of the potentialinteractions of all residues <strong>with</strong> all others is not tractable, the problem is simplifiedby a quadratic expression in the amino acid composition. The contributionof an amino acid is approximated by an energy predic<strong>to</strong>r matrix, which relatesthe energy contribution of amino acid i <strong>to</strong> that of amino acid j. The parametersof the matrix are determined by least squares fitting <strong>to</strong> actual globular proteins.By this approach, the average energy level of disordered proteins (−0.07 arbitraryunits) is significantly more unfavourable than that of globular proteins(−0.81 arbitrary units), which suggests that the approach is informative on thegross structural status of proteins (Fig. 5.3). When only a pre-defined localsequential neighbourhood is considered in the calculations, the approach providessequence-specific information on disorder, forming the basis of IUPred(Dosztanyi et al. 2005a, b).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!