View - ResearchGate
View - ResearchGate View - ResearchGate
Estimating Protein Function Using Protein–Protein Relationships 115system “myBlastParser.pl –imyBLASTOuputForProtein_ j >>myParsedBLASTOuputForProtein_ j”;compress myBLASTOutputForProtein_ j;move myBLASTOutputForProtein_j.compressed to dirstoreRawBLASTData/;}close myInputFile;3.1.4. Parsing BLAST ResultsParsing of BLAST results is required so that only the information necessaryfor generating phylogenetic profiles and identifying Rosetta stone sequences isretained from sequence matches against the database. This greatly reduces thesize of the input required for subsequent steps. For every match of the querysequence against the database, at least five important details need to be capturedand retained from the raw output:1. The unique identifier of the subject sequence.2. The genome to which the subject sequence belongs.3. The BLAST expectation value of the high-scoring pair (HSP).4. The start and stop position of the HSP on the query sequence.5. The start and stop position of the HSP on the subject sequence.Besides these attributes, other bits of information such as raw scores, orthe percentage of sequence identity, can also be captured (see also Note 2).As the user becomes more familiar with the methods, other pieces of informationcan be utilized as filters, or even as substitutes for the primary attributes,when deciding the quality of a match or a hit against the referencedatabase.One possible form of output from a parser program is described next:>query >subject raw_score: value | E-value: value |query_start: value | query_end: value |subject_start: value | subject_end: value |match_length: value | identity_percentage: value |similarity_percentage: value | query_length: value |subject_length: value>hsapiens|gi|20093443 >hsapiens|gi|20093443raw_score: 300 | E-value:1e-155 | query_start: 1 | query_end: 140 | subject_start:1 | subject_end: 140 | match_length: 140| identity_percentage: 100 | similarity_percentage:100 | query_length: 140 | subject_length: 140
- Page 204: Prediction of ID and Its Use in Fun
- Page 208: IICOMPUTATIONAL METHODS II
- Page 212: 94 Crabtree et al.genomes, which is
- Page 216: 96 Crabtree et al.Fig. 2. Sybil pro
- Page 220: 98 Crabtree et al.Fig. 3. Computing
- Page 224: 100 Crabtree et al.3.1.5.1. FILTER
- Page 228: 102 Crabtree et al.3. For the sake
- Page 232: 104 Crabtree et al.Fig. 5. Best bid
- Page 236: 106 Crabtree et al.17. Some cluster
- Page 240: 108 Crabtree et al.19. Chado—The
- Page 244: 110 Dateproducts prevents the under
- Page 248: 112 DateDetails of these tasks are
- Page 252: 114 DateThis step creates additiona
- Page 258: Estimating Protein Function Using P
- Page 262: Estimating Protein Function Using P
- Page 266: Estimating Protein Function Using P
- Page 270: Estimating Protein Function Using P
- Page 274: Estimating Protein Function Using P
- Page 278: Estimating Protein Function Using P
- Page 282: 130 Davuluriinteracting proteins an
- Page 286: Table 1Web URLs of Promoter, TF Dat
- Page 290: 134 DavuluriPWM-based models do not
- Page 294: 136 DavuluriTF-map alignments of or
- Page 298: 138 Davuluridiscussed which program
- Page 302: 140 DavuluriTable 2ER-a-Responsive
Estimating Protein Function Using Protein–Protein Relationships 115system “myBlastParser.pl –imyBLASTOuputForProtein_ j >>myParsedBLASTOuputForProtein_ j”;compress myBLASTOutputForProtein_ j;move myBLASTOutputForProtein_j.compressed to dirstoreRawBLASTData/;}close myInputFile;3.1.4. Parsing BLAST ResultsParsing of BLAST results is required so that only the information necessaryfor generating phylogenetic profiles and identifying Rosetta stone sequences isretained from sequence matches against the database. This greatly reduces thesize of the input required for subsequent steps. For every match of the querysequence against the database, at least five important details need to be capturedand retained from the raw output:1. The unique identifier of the subject sequence.2. The genome to which the subject sequence belongs.3. The BLAST expectation value of the high-scoring pair (HSP).4. The start and stop position of the HSP on the query sequence.5. The start and stop position of the HSP on the subject sequence.Besides these attributes, other bits of information such as raw scores, orthe percentage of sequence identity, can also be captured (see also Note 2).As the user becomes more familiar with the methods, other pieces of informationcan be utilized as filters, or even as substitutes for the primary attributes,when deciding the quality of a match or a hit against the referencedatabase.One possible form of output from a parser program is described next:>query >subject raw_score: value | E-value: value |query_start: value | query_end: value |subject_start: value | subject_end: value |match_length: value | identity_percentage: value |similarity_percentage: value | query_length: value |subject_length: value>hsapiens|gi|20093443 >hsapiens|gi|20093443raw_score: 300 | E-value:1e-155 | query_start: 1 | query_end: 140 | subject_start:1 | subject_end: 140 | match_length: 140| identity_percentage: 100 | similarity_percentage:100 | query_length: 140 | subject_length: 140