View - ResearchGate
View - ResearchGate View - ResearchGate
Estimating Protein Function Using Protein–Protein Relationships 119and a threshold needs to be adopted for discarding false-positives and functionallinkages that conflict with known biological facts. Unfortunately, it is difficult todevise a cutoff that is useful for confidently describing both profile similarity as wellas biological validity of the linkages. One way of obtaining a primary cutoff valuerelies on the use of shuffled profiles; mutual information scores between normalprofiles that match or fall below the highest score observed when comparing shuffledprofiles can be discarded. Certainly, other techniques can also be imagined,such as using a set of known linkages to derive a true-positive to false-positive ratio,which can then be used as a threshold.Once a reasonable set of matching profiles is obtained, annotations of theincluded proteins can be searched for overrepresentation of a particular function.Overrepresented annotations reveal functional links to particular pathways,suggesting a putative role for the query protein, especially if the queryprotein in uncharacterized. As a test case, the profile of the P. falciparumprotein PFB0445c was generated and compared with profiles of all knownP. falciparum proteins. The results capture functional links between PFB0445cand other helicases in the parasite genome:Query PFB0445c (helicase, putative)0.70 PF10_0309 (hypothetical protein)0.69 MAL6P1.119 (DEAD/DEAH box ATP-dependent RNA helicase, putative)0.61 MAL7P1.113 (DEAD box helicase, putative)0.58 PF14_0436 (helicase, truncated, putative)0.57 PFE0215w (ATP-dependent helicase, putative)In this example, mutual information scores in the left column indicate confidencein the functional links; the greater the mutual information values, thegreater the confidence in the predicted linkages. Comparison against the Proteinfamilies (Pfam) database (http://www.sanger.ac.uk/Software/Pfam/) revealsthat the hypothetical protein PF10_0309 included in the results also containshelicase domains, demonstrating that the method captures biologically validfunctional links. Profile data used for this example is available for downloadfrom the plasmoMAP website (http://cbil.upenn.edu/plasmoMAP/) (8). A scoreof 0.559, based on scores derived from a comparison of permuted profiles wasused as the cutoff in this example.As described previously, the input query set can be expanded to include theentire protein complement of any given genome. After profiles are constructedfor all proteins, an all vs all comparison of profile similarity reveals functionallinkages on a local and genome-wide scale. This is highly useful in understandingrelationships between genes, and in some cases, has the ability to reveal newsystems and pathways, especially if a majority of the components involved areof unknown function (5).
- Page 212: 94 Crabtree et al.genomes, which is
- Page 216: 96 Crabtree et al.Fig. 2. Sybil pro
- Page 220: 98 Crabtree et al.Fig. 3. Computing
- Page 224: 100 Crabtree et al.3.1.5.1. FILTER
- Page 228: 102 Crabtree et al.3. For the sake
- Page 232: 104 Crabtree et al.Fig. 5. Best bid
- Page 236: 106 Crabtree et al.17. Some cluster
- Page 240: 108 Crabtree et al.19. Chado—The
- Page 244: 110 Dateproducts prevents the under
- Page 248: 112 DateDetails of these tasks are
- Page 252: 114 DateThis step creates additiona
- Page 256: 116 Date>hsapiens|gi|20093443 >hsap
- Page 260: 118 DateBLAST score from the match
- Page 266: Estimating Protein Function Using P
- Page 270: Estimating Protein Function Using P
- Page 274: Estimating Protein Function Using P
- Page 278: Estimating Protein Function Using P
- Page 282: 130 Davuluriinteracting proteins an
- Page 286: Table 1Web URLs of Promoter, TF Dat
- Page 290: 134 DavuluriPWM-based models do not
- Page 294: 136 DavuluriTF-map alignments of or
- Page 298: 138 Davuluridiscussed which program
- Page 302: 140 DavuluriTable 2ER-a-Responsive
- Page 306: Table 3Sample Data Matrix Represent
- Page 310: Table 3 (Continued)Class MYCMAX MYC
Estimating Protein Function Using Protein–Protein Relationships 119and a threshold needs to be adopted for discarding false-positives and functionallinkages that conflict with known biological facts. Unfortunately, it is difficult todevise a cutoff that is useful for confidently describing both profile similarity as wellas biological validity of the linkages. One way of obtaining a primary cutoff valuerelies on the use of shuffled profiles; mutual information scores between normalprofiles that match or fall below the highest score observed when comparing shuffledprofiles can be discarded. Certainly, other techniques can also be imagined,such as using a set of known linkages to derive a true-positive to false-positive ratio,which can then be used as a threshold.Once a reasonable set of matching profiles is obtained, annotations of theincluded proteins can be searched for overrepresentation of a particular function.Overrepresented annotations reveal functional links to particular pathways,suggesting a putative role for the query protein, especially if the queryprotein in uncharacterized. As a test case, the profile of the P. falciparumprotein PFB0445c was generated and compared with profiles of all knownP. falciparum proteins. The results capture functional links between PFB0445cand other helicases in the parasite genome:Query PFB0445c (helicase, putative)0.70 PF10_0309 (hypothetical protein)0.69 MAL6P1.119 (DEAD/DEAH box ATP-dependent RNA helicase, putative)0.61 MAL7P1.113 (DEAD box helicase, putative)0.58 PF14_0436 (helicase, truncated, putative)0.57 PFE0215w (ATP-dependent helicase, putative)In this example, mutual information scores in the left column indicate confidencein the functional links; the greater the mutual information values, thegreater the confidence in the predicted linkages. Comparison against the Proteinfamilies (Pfam) database (http://www.sanger.ac.uk/Software/Pfam/) revealsthat the hypothetical protein PF10_0309 included in the results also containshelicase domains, demonstrating that the method captures biologically validfunctional links. Profile data used for this example is available for downloadfrom the plasmoMAP website (http://cbil.upenn.edu/plasmoMAP/) (8). A scoreof 0.559, based on scores derived from a comparison of permuted profiles wasused as the cutoff in this example.As described previously, the input query set can be expanded to include theentire protein complement of any given genome. After profiles are constructedfor all proteins, an all vs all comparison of profile similarity reveals functionallinkages on a local and genome-wide scale. This is highly useful in understandingrelationships between genes, and in some cases, has the ability to reveal newsystems and pathways, especially if a majority of the components involved areof unknown function (5).