12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

306 I.A. Cymerman et al.proteins <strong>with</strong> proteins from the genomes of ten pathogenic organisms responsible forneglected diseases. The pathogen and host genomes were first scanned for proteinshomologous <strong>to</strong> those known <strong>to</strong> interact. The pipeline proceeded when structuralinformation for the interaction was not available by employing simple sequencesimilarity scores. This approach produced few predictions, however, since strict criteriawere necessary in order for confident interaction prediction by this approach.More interesting and powerful was the explicit comparative modelling of the potentialinteraction partners based on protein complex templates. These modelled complexeswere assessed using a statistical potential <strong>with</strong> favourable interactions passedon <strong>to</strong> a further ingenious filter. This employed known information about (sub-) cellularlocalization and function in order <strong>to</strong> eliminate from consideration interactionswhich could not occur in vivo. Thus, only host proteins known <strong>to</strong> be expressed inskin, lymph node or lung were considered as possible interaction partners forMycobacterium leprae proteins. Pathogen proteins were also required <strong>to</strong> pass specificbiological criteria. For M. leprae, for example, a protein had <strong>to</strong> have a relevantGO annotation (e.g. pathogenesis) or be annotated as being extracellular or surfacelocated. The number of filtered predictions varied from 0 <strong>to</strong> 1,501 between the pathogens.Rather few known interactions were available <strong>with</strong> which <strong>to</strong> benchmark thetechnique, but the method predicted four of the 33 interactions demonstrated at thetime. In the remaining cases there was no available template <strong>to</strong> model the interactionsuggesting that this lack was consistently responsible for the low coverage of knowninteractions (Davis et al. 2007). Interestingly, one prediction was experimentally validated:the method predicted the interaction of falcipain-2 and cystatin (PDB code1yvb) based on the earlier structure of cathepsin-H bound <strong>to</strong> stefin A (PDB code 1nb3)(Fig. 12.2). The two enzymes share around 24% sequence identity while the inhibi<strong>to</strong>rsare around 11% identical. The success of the prediction in the face of these sequencedifferences and considerable structural variation (Fig. 12.2) illustrates the power of themethodology.12.4.4 <strong>Function</strong> Predictions from Template-Free ModelsUntil recently, <strong>with</strong> the development of more powerful but highly computer-intensivealgorithms (Bradley et al. 2005), a reasonable objective for template-free(ab initio or de novo) modelling has been simply achieving the correct fold ratherthan higher accurate predictions (see Chapter 1). This has limited the range of functioninference techniques that could be applied, and means that most predictions in theliterature are based mainly on the protein fold predicted, and its functional correlations(discussed in Chapter 6).In an early large-scale application of ROSETTA, Bonneau et al. (2002) producedmodels for 510 Pfam families <strong>with</strong> average length of less than 150 residues.These were of unknown structure at the time, but for some a function was knownor suspected. Tentative predictions could be bolstered by the modelling results inseveral cases. For example, PF01938, the TRAM domain was suspected at the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!