12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

124 Datedescribed by Verjovsky Marcotte and Marcotte (16), which provides a confidencevalue for each predicted functional link. The test is introduced here inbrief, and users are encouraged to refer to the original publication for detailsand tips on computational implementation. The author takes into accounttwo types of possible ambiguities when determining probability of findingfunctionally linked proteins by random chance. First, the probability of findingk number of fusions by random chance is calculated based on the hypergeometricdistribution, given the number of BLAST hits for proteins X andY in a database of size N.p(number of fusions ≥ k | x, y, N) =Here, x and y represent hits to proteins X and Y in the database, respectively,and i represents a counter for summation.Next, the author introduces a correction term, which addresses potentialproblems arising because of the presence of paralogs of the proteins X and Y.p(X, Y are functionally linked in the presence of paralogs)= 1/ max ( Xparalogs,Yparalogs)This term directly addresses problems encountered when deciding theaccuracy of identifying proteins represented in the fusions; if X and Y arerepresented by single copies, then the probability of finding linked proteinswill be one. The probability decreases as more paralogs of X and Y occur inthe genome. The final probability of finding proteins X and Y linked by randomchance given these conditions, is then simply the product of the twoprobabilities. Based on the information provided by the author, this scoreperforms adequately when benchmarked against information derived fromthe Kyoto Encyclopedia of Genes and Genomes (KEGG) database (17).3.2.2.3. EXPANDING THE ROSETTA STONE METHOD1−∑ pi (| xyN ,, )The Rosetta stone method can be extended to include the entire genome,whereby functional links are identified on a genome-wide scale, using theentire protein complement as input (Fig. 1). The fullest potential of thismethod can be achieved when the query genome is a part of the database, andall sequences in the query genome are compared with each other. Besidesidentifying functional links, this also identifies fusion proteins in the querygenome that serve to link independent proteins in other genomes. Thus, theentire landscape of functional linkages is revealed, along with informationabout fusions that might indicate coupling between pathways and systems.k−1i=0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!