From Protein Structure to Function with Bioinformatics.pdf
From Protein Structure to Function with Bioinformatics.pdf
From Protein Structure to Function with Bioinformatics.pdf
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
11 <strong>Function</strong> Predictions of Structural Genomics Results 279determine how successful the structure-based methods in ProFunc were at predictingfunction. These 93 proteins of known function were submitted <strong>to</strong> the ProFuncserver and the <strong>to</strong>p scoring matches from each method retrieved and s<strong>to</strong>red. Theresults were then backdated <strong>to</strong> the deposition date for each structure <strong>to</strong> ensure thatwhat was being measured was how successful the server would have been had itbeen fully operational from the start. The resultant <strong>to</strong>p hit for each method was thenmanually compared <strong>with</strong> the known function for each protein and a judgementmade as <strong>to</strong> whether the prediction was correct.The results from the study indicated that, of the methods available as part of theProFunc server, the fold recognition and “reverse template” approaches were themost successful <strong>with</strong> approximately 60% of the known functions identified correctly.Detailed investigation revealed that both of these methods often identify thesame function by matching <strong>to</strong> the same protein but cases could be found where onemethod succeeded where the other failed. This is due <strong>to</strong> the fact that the fold matchingis looking at the global similarity in compared proteins whereas the “reversetemplate” approach is a very local comparison. One major drawback identified inthe study was its inability <strong>to</strong> address the question of how structure-based approachescompare against sequence-based approaches. However, this is a generic problemwhich has not been adequately addressed in the literature due <strong>to</strong> the immense difficultiesinvolved in accurately rolling back the sequence databases, as well as patternsand profiles derived from them, <strong>to</strong> a particular date.In addition <strong>to</strong> the general comparison of methods Watson et al. (2007) presentedsome specific examples of function predictions, sometimes verified. An example ofsuccessful structure-based function prediction was the BioH protein fromEscherichia coli (Sanishvili et al. 2003). This protein was known <strong>to</strong> be involved inbiotin synthesis but no biochemical function had been assigned <strong>to</strong> it. Analysis ofthe structure using ProFunc returned a highly significant match (<strong>with</strong> an RMSD of0.28 Å) <strong>to</strong> an enzyme active-site template for the Ser-His-Asp catalytic triad of thelipases. Fold comparison using DALI indicated structural similarity <strong>to</strong> a largenumber of proteins <strong>with</strong> a variety of enzymatic functions although the sequenceidentity of these hits was low, ranging from 15–25%. The closest matches includeda bromoperoxidase (EC 1.11.1.10), an aminopeptidase (EC 3.4.11.5), two epoxidehydrolases (EC 3.3.2.3), two haloalkane dehalogenases (EC 3.8.1.5), and a lyase(EC 4.2.1.39). Only through extensive manual analysis of these enzymes and a literaturereview would it have become obvious that each of these contain a Ser-His-Aspcatalytic triad in their active sites. The enzyme active-site template search identifiedthe presence of the catalytic triad instantly. Experimental characterisation of thisprotein revealed it <strong>to</strong> be a novel carboxylesterase acting on short acyl chainsubstrates (Sanishvili et al. 2003).A further example illustrating how functional clues can be derived from thestructure involves a hypothetical protein (IsdG) from Staphylococcus aureus.Sequence-based analysis by methods in ProFunc revealed a variety of functions,including antibiotic biosynthesis monooxygenase, cysteine peptidase, oxidoreductase,methyltransferase, epimerase, transportation, possible RNA binding, andothers. When the structure was examined using the MSDfold/SSM service, the <strong>to</strong>p