View - ResearchGate

View - ResearchGate View - ResearchGate

12.07.2015 Views

54 Gonye et al.dynamic nature of the databases containing TF information and user-specifiedparameter options require online retrieval rather than an offline processing for allthe promoters in the PAINT promoter database.4. FeasnetBuilder: a Perl module that processes the output of the TF inspection/discoveryprograms and produces a candidate interaction matrix, termed Feasnet, for thegenes of interest.5. FeasnetAnalyzer and FeasnetViewer: a Perl and R module that contains functionsfor analysis and visualization of PAINT results (TRE-Pvaluator, StatFilter, R, GDgraphics library, Graphviz available at http://www.graphviz.org). A matrix imagewith optional clustering of data and a network layout diagram are available.A detailed description of each of the modules and the input–output relationshipsis presented next.2.2. PAINT Modules2.2.1. PAINT Promoter Database and Preprocessor ModuleFor an organism of interest, the principal requirement for constructing thepromoter database is annotated genome sequence assembly. Several genomeassemblies are available for mammalian systems, for example, Ensembl (20) andSanta Cruz (http://genome.ucsc.edu), Celera (http://www.celera.com). For eachof the human, mouse, and rat genomes, an UpstreamDB database was constructedfor all the annotated genes (known and putative) in the corresponding Ensemblgenome database. For each gene, 5000 bp upstream (5′ to the gene) were retrievedfrom the Ensembl database. The retrieved sequence was placed in the databaseonly if at least 300-bp sequence immediately 5′ to the gene was available. Thegenome database contains sequences in 5′ to 3′ orientation on a single strand(conventionally denoted as +1) of DNA. For the genes that are located on thestrand –1, the sequence from the genome database was reversed and complementarybase pairs were computed to produce the upstream sequences.One key aspect of any promoter analysis is using the correct sequence torepresent the cis-regulatory control regions. Note that this requires informationabout the 5′-untranslated region of each gene in order to correctly identify thetranscription start site, and hence, the corresponding adjacent cis-regulatorycontrol region for each gene. In order to overcome the limitations of theincomplete annotation in Ensembl database, early versions of PAINT utilized5′-untranslated region from RIKEN clone sequences to estimate the transcriptionstart site in mouse genome (11,21). Subsequent versions of Ensemblannotation incorporated the experimentally determined 5′-untranslatedsequence to the extent available, thus improving the Transcription start site(TSS) estimate significantly. Hence, starting from version 3.0, the preprocessormodule in PAINT considers for each gene, the starting position of the

54 Gonye et al.dynamic nature of the databases containing TF information and user-specifiedparameter options require online retrieval rather than an offline processing for allthe promoters in the PAINT promoter database.4. FeasnetBuilder: a Perl module that processes the output of the TF inspection/discoveryprograms and produces a candidate interaction matrix, termed Feasnet, for thegenes of interest.5. FeasnetAnalyzer and Feasnet<strong>View</strong>er: a Perl and R module that contains functionsfor analysis and visualization of PAINT results (TRE-Pvaluator, StatFilter, R, GDgraphics library, Graphviz available at http://www.graphviz.org). A matrix imagewith optional clustering of data and a network layout diagram are available.A detailed description of each of the modules and the input–output relationshipsis presented next.2.2. PAINT Modules2.2.1. PAINT Promoter Database and Preprocessor ModuleFor an organism of interest, the principal requirement for constructing thepromoter database is annotated genome sequence assembly. Several genomeassemblies are available for mammalian systems, for example, Ensembl (20) andSanta Cruz (http://genome.ucsc.edu), Celera (http://www.celera.com). For eachof the human, mouse, and rat genomes, an UpstreamDB database was constructedfor all the annotated genes (known and putative) in the corresponding Ensemblgenome database. For each gene, 5000 bp upstream (5′ to the gene) were retrievedfrom the Ensembl database. The retrieved sequence was placed in the databaseonly if at least 300-bp sequence immediately 5′ to the gene was available. Thegenome database contains sequences in 5′ to 3′ orientation on a single strand(conventionally denoted as +1) of DNA. For the genes that are located on thestrand –1, the sequence from the genome database was reversed and complementarybase pairs were computed to produce the upstream sequences.One key aspect of any promoter analysis is using the correct sequence torepresent the cis-regulatory control regions. Note that this requires informationabout the 5′-untranslated region of each gene in order to correctly identify thetranscription start site, and hence, the corresponding adjacent cis-regulatorycontrol region for each gene. In order to overcome the limitations of theincomplete annotation in Ensembl database, early versions of PAINT utilized5′-untranslated region from RIKEN clone sequences to estimate the transcriptionstart site in mouse genome (11,21). Subsequent versions of Ensemblannotation incorporated the experimentally determined 5′-untranslatedsequence to the extent available, thus improving the Transcription start site(TSS) estimate significantly. Hence, starting from version 3.0, the preprocessormodule in PAINT considers for each gene, the starting position of the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!