View - ResearchGate
View - ResearchGate View - ResearchGate
Modeling Transcription Factor Target Promoters 131The drawback of these integrative approaches is that these programs tend tomiss many functional TFBSs that show very little sequence conservationeven across modestly distant species, because of single-nucleotide substitutionsand small indels within the regulatory regions.Novel high-throughput technologies, such as ChIP-chip, have enabledgenome-wide identification of the epigenetic mechanisms and protein–DNAinteractions that effect gene expression (29). In ChIP-chip experiments, chromatinimmunoprecipitation of specific protein/DNA complexes followed bymicroarray analysis is performed to probe a promoter microarray panel (e.g.,CpG-island microarray panel [30]). In recent years, the author (31–34) andothers (35) have successfully used ChIP-chip assays to find the target genesof TFs in mammalian systems. The major focus of this chapter is to introducedifferent bioinformatics tools that identify TFBS in a set of genomicsequences, and discuss the application of these methods in the high-levelanalysis of ChIP-chip experimental data.2. MaterialsThe user must have access to a computer with Internet access; for example,a PC running Microsoft Windows or Linux, an Apple Macintosh, or a UNIXworkstation. The user should be familiar with the use of Netscape Navigator orMicrosoft Internet Explorer, and the R statistical package http://www.r-project.org/.If the R programming package is not readily available the user can download theR base package from R-project website (through http://CRAN.R-project.org).The classification packages “rpart” and “randomForest” should be downloadedand installed in R. The user-friendly commercial CART software from Salfordsystems(http://www.salford-systems.com) and the professional version ofTRANSFAC from Genomatix (http://www.genomatix.de) would be helpful, butnot necessary. The list of commonly used TFBS prediction programs based onPWM and phylogenetic footprinting approaches are provided in Table 1.3. MethodsFirst an overview of the methodology is provided in Subheading 3.1., thena worked example is presented in Subheading 3.2.3.1. An Overview of In Silico Identification of TF Target PromotersQuite a few methods are available to scan for TFBSs in a candidate promotersequence. The simplest method of searching for a TFBS is by its consensussequence of preferred nucleotides at specific positions of the binding site (36).Perhaps the most widely used method is the PWM approach, wherein a candidateTFBS is represented by a matrix of nucleotide scores reflecting the likelihoodof each nucleotide at specific position (37). Although consensus sequence and
- Page 232: 104 Crabtree et al.Fig. 5. Best bid
- Page 236: 106 Crabtree et al.17. Some cluster
- Page 240: 108 Crabtree et al.19. Chado—The
- Page 244: 110 Dateproducts prevents the under
- Page 248: 112 DateDetails of these tasks are
- Page 252: 114 DateThis step creates additiona
- Page 256: 116 Date>hsapiens|gi|20093443 >hsap
- Page 260: 118 DateBLAST score from the match
- Page 264: Table 1A Sample of Results From Pro
- Page 268: 122 DateFig. 1. A network of functi
- Page 272: 124 Datedescribed by Verjovsky Marc
- Page 276: 126 Dateor contracts put forth by t
- Page 280: 8Bioinformatics Tools for Modeling
- Page 286: Table 1Web URLs of Promoter, TF Dat
- Page 290: 134 DavuluriPWM-based models do not
- Page 294: 136 DavuluriTF-map alignments of or
- Page 298: 138 Davuluridiscussed which program
- Page 302: 140 DavuluriTable 2ER-a-Responsive
- Page 306: Table 3Sample Data Matrix Represent
- Page 310: Table 3 (Continued)Class MYCMAX MYC
- Page 314: 146 DavuluriFig. 3. (A) CART Tree:
- Page 318: 148 Davuluri11. Vlieghe, D., Sandel
- Page 322: 150 Davuluri44. Berezikov, E., Gury
- Page 326: 9Mining Biomedical Data Using MetaM
- Page 330: Mining Biomedical Data Using MMTx a
Modeling Transcription Factor Target Promoters 131The drawback of these integrative approaches is that these programs tend tomiss many functional TFBSs that show very little sequence conservationeven across modestly distant species, because of single-nucleotide substitutionsand small indels within the regulatory regions.Novel high-throughput technologies, such as ChIP-chip, have enabledgenome-wide identification of the epigenetic mechanisms and protein–DNAinteractions that effect gene expression (29). In ChIP-chip experiments, chromatinimmunoprecipitation of specific protein/DNA complexes followed bymicroarray analysis is performed to probe a promoter microarray panel (e.g.,CpG-island microarray panel [30]). In recent years, the author (31–34) andothers (35) have successfully used ChIP-chip assays to find the target genesof TFs in mammalian systems. The major focus of this chapter is to introducedifferent bioinformatics tools that identify TFBS in a set of genomicsequences, and discuss the application of these methods in the high-levelanalysis of ChIP-chip experimental data.2. MaterialsThe user must have access to a computer with Internet access; for example,a PC running Microsoft Windows or Linux, an Apple Macintosh, or a UNIXworkstation. The user should be familiar with the use of Netscape Navigator orMicrosoft Internet Explorer, and the R statistical package http://www.r-project.org/.If the R programming package is not readily available the user can download theR base package from R-project website (through http://CRAN.R-project.org).The classification packages “rpart” and “randomForest” should be downloadedand installed in R. The user-friendly commercial CART software from Salfordsystems(http://www.salford-systems.com) and the professional version ofTRANSFAC from Genomatix (http://www.genomatix.de) would be helpful, butnot necessary. The list of commonly used TFBS prediction programs based onPWM and phylogenetic footprinting approaches are provided in Table 1.3. MethodsFirst an overview of the methodology is provided in Subheading 3.1., thena worked example is presented in Subheading 3.2.3.1. An Overview of In Silico Identification of TF Target PromotersQuite a few methods are available to scan for TFBSs in a candidate promotersequence. The simplest method of searching for a TFBS is by its consensussequence of preferred nucleotides at specific positions of the binding site (36).Perhaps the most widely used method is the PWM approach, wherein a candidateTFBS is represented by a matrix of nucleotide scores reflecting the likelihoodof each nucleotide at specific position (37). Although consensus sequence and