View - ResearchGate
View - ResearchGate View - ResearchGate
54 Gonye et al.dynamic nature of the databases containing TF information and user-specifiedparameter options require online retrieval rather than an offline processing for allthe promoters in the PAINT promoter database.4. FeasnetBuilder: a Perl module that processes the output of the TF inspection/discoveryprograms and produces a candidate interaction matrix, termed Feasnet, for thegenes of interest.5. FeasnetAnalyzer and FeasnetViewer: a Perl and R module that contains functionsfor analysis and visualization of PAINT results (TRE-Pvaluator, StatFilter, R, GDgraphics library, Graphviz available at http://www.graphviz.org). A matrix imagewith optional clustering of data and a network layout diagram are available.A detailed description of each of the modules and the input–output relationshipsis presented next.2.2. PAINT Modules2.2.1. PAINT Promoter Database and Preprocessor ModuleFor an organism of interest, the principal requirement for constructing thepromoter database is annotated genome sequence assembly. Several genomeassemblies are available for mammalian systems, for example, Ensembl (20) andSanta Cruz (http://genome.ucsc.edu), Celera (http://www.celera.com). For eachof the human, mouse, and rat genomes, an UpstreamDB database was constructedfor all the annotated genes (known and putative) in the corresponding Ensemblgenome database. For each gene, 5000 bp upstream (5′ to the gene) were retrievedfrom the Ensembl database. The retrieved sequence was placed in the databaseonly if at least 300-bp sequence immediately 5′ to the gene was available. Thegenome database contains sequences in 5′ to 3′ orientation on a single strand(conventionally denoted as +1) of DNA. For the genes that are located on thestrand –1, the sequence from the genome database was reversed and complementarybase pairs were computed to produce the upstream sequences.One key aspect of any promoter analysis is using the correct sequence torepresent the cis-regulatory control regions. Note that this requires informationabout the 5′-untranslated region of each gene in order to correctly identify thetranscription start site, and hence, the corresponding adjacent cis-regulatorycontrol region for each gene. In order to overcome the limitations of theincomplete annotation in Ensembl database, early versions of PAINT utilized5′-untranslated region from RIKEN clone sequences to estimate the transcriptionstart site in mouse genome (11,21). Subsequent versions of Ensemblannotation incorporated the experimentally determined 5′-untranslatedsequence to the extent available, thus improving the Transcription start site(TSS) estimate significantly. Hence, starting from version 3.0, the preprocessormodule in PAINT considers for each gene, the starting position of the
- Page 78: 26Fig. 1. Functional associations f
- Page 82: 28 Kirov et al.Fig. 2. Pathway anal
- Page 86: 30 Kirov et al.3. Gene symbols usag
- Page 90: 32 Kirov et al.9. OBO_Team, Open Bi
- Page 94: 3Estimating Gene Function With Leas
- Page 98: Estimating Gene Function With LS-NM
- Page 102: Estimating Gene Function With LS-NM
- Page 106: Estimating Gene Function With LS-NM
- Page 110: Estimating Gene Function With LS-NM
- Page 114: Estimating Gene Function With LS-NM
- Page 118: Estimating Gene Function With LS-NM
- Page 122: 50 Gonye et al.activity and problem
- Page 126: 52 Gonye et al.Currently, PAINT can
- Page 132: Prediction Using PAINT 55first exon
- Page 138: 58 Gonye et al.Fig. 3. A network vi
- Page 142: 60 Gonye et al.exGeneList.txt) is a
- Page 146: 62 Gonye et al.(http://www.tm4.org)
- Page 150: 64 Gonye et al.does not span the en
- Page 154: 66 Gonye et al.4.7. Interpreting th
- Page 158: 68 Gonye et al.18. Dozmorov, M. G.,
- Page 162: 70 Uversky et al.in protein functio
- Page 166: 72 Uversky et al.sequence space and
- Page 170: Table 1 (Continued)Server name URL
- Page 174: 76 Uversky et al.1.5. When to Use t
- Page 178: 78 Uversky et al.elucidating compos
54 Gonye et al.dynamic nature of the databases containing TF information and user-specifiedparameter options require online retrieval rather than an offline processing for allthe promoters in the PAINT promoter database.4. FeasnetBuilder: a Perl module that processes the output of the TF inspection/discoveryprograms and produces a candidate interaction matrix, termed Feasnet, for thegenes of interest.5. FeasnetAnalyzer and Feasnet<strong>View</strong>er: a Perl and R module that contains functionsfor analysis and visualization of PAINT results (TRE-Pvaluator, StatFilter, R, GDgraphics library, Graphviz available at http://www.graphviz.org). A matrix imagewith optional clustering of data and a network layout diagram are available.A detailed description of each of the modules and the input–output relationshipsis presented next.2.2. PAINT Modules2.2.1. PAINT Promoter Database and Preprocessor ModuleFor an organism of interest, the principal requirement for constructing thepromoter database is annotated genome sequence assembly. Several genomeassemblies are available for mammalian systems, for example, Ensembl (20) andSanta Cruz (http://genome.ucsc.edu), Celera (http://www.celera.com). For eachof the human, mouse, and rat genomes, an UpstreamDB database was constructedfor all the annotated genes (known and putative) in the corresponding Ensemblgenome database. For each gene, 5000 bp upstream (5′ to the gene) were retrievedfrom the Ensembl database. The retrieved sequence was placed in the databaseonly if at least 300-bp sequence immediately 5′ to the gene was available. Thegenome database contains sequences in 5′ to 3′ orientation on a single strand(conventionally denoted as +1) of DNA. For the genes that are located on thestrand –1, the sequence from the genome database was reversed and complementarybase pairs were computed to produce the upstream sequences.One key aspect of any promoter analysis is using the correct sequence torepresent the cis-regulatory control regions. Note that this requires informationabout the 5′-untranslated region of each gene in order to correctly identify thetranscription start site, and hence, the corresponding adjacent cis-regulatorycontrol region for each gene. In order to overcome the limitations of theincomplete annotation in Ensembl database, early versions of PAINT utilized5′-untranslated region from RIKEN clone sequences to estimate the transcriptionstart site in mouse genome (11,21). Subsequent versions of Ensemblannotation incorporated the experimentally determined 5′-untranslatedsequence to the extent available, thus improving the Transcription start site(TSS) estimate significantly. Hence, starting from version 3.0, the preprocessormodule in PAINT considers for each gene, the starting position of the