12.07.2015 Views

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

articlesnucleotides to collections <strong>of</strong> chromosomes. Unless noted, all analyseswere conducted on <strong>the</strong> assembled draft <strong>genome</strong> sequencedescribed above.Figure 9 provides a high-level view <strong>of</strong> <strong>the</strong> contents <strong>of</strong> <strong>the</strong> draft<strong>genome</strong> sequence, at a scale <strong>of</strong> about 3.8 Mb per centimetre. Ofcourse, navigating information spanning nearly ten orders <strong>of</strong>magnitude requires computational tools to extract <strong>the</strong> full value.We have created <strong>and</strong> made freely available various `Genome Browsers'.Browsers were developed <strong>and</strong> are maintained by <strong>the</strong> University<strong>of</strong> California at Santa Cruz (Fig. 10) <strong>and</strong> <strong>the</strong> EnsEMBL project <strong>of</strong> <strong>the</strong>European Bioinformatics Institute <strong>and</strong> <strong>the</strong> Sanger Centre (Fig. 11).Additional browsers have been created; URLs are listed atwww.nhgri.nih.gov/<strong>genome</strong>_hub. These web-based computertools allow users to view an annotated display <strong>of</strong> <strong>the</strong> draft <strong>genome</strong>sequence, with <strong>the</strong> ability to scroll along <strong>the</strong> chromosomes <strong>and</strong>zoom in or out to different scales. They include: <strong>the</strong> nucleotidesequence, sequence contigs, clone contigs, sequence coverage <strong>and</strong>®nishing status, local GC content, CpG isl<strong>and</strong>s, known STS markersfrom previous genetic <strong>and</strong> physical maps, families <strong>of</strong> repeatsequences, known genes, ESTs <strong>and</strong> mRNAs, predicted genes, SNPs<strong>and</strong> sequence similarities with o<strong>the</strong>r organisms (currently <strong>the</strong>puffer®sh Tetraodon nigroviridis). These browsers will be updatedas <strong>the</strong> draft <strong>genome</strong> sequence is re®ned <strong>and</strong> corrected as additionalannotations are developed.In addition to using <strong>the</strong> Genome Browsers, one can downloadBox 2Sources <strong>of</strong> publicly available sequence data <strong>and</strong> o<strong>the</strong>r relevantgenomic informationhttp://<strong>genome</strong>.ucsc.edu/University <strong>of</strong> California at Santa CruzContains <strong>the</strong> assembly <strong>of</strong> <strong>the</strong> draft <strong>genome</strong> sequence used in this paper <strong>and</strong>updateshttp://<strong>genome</strong>.wustl.edu/gsc/ <strong>human</strong>/Mapping/Washington UniversityContains links to clone <strong>and</strong> accession maps <strong>of</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong>http://www.ensembl.orgEBI/Sanger CentreAllows access to DNA <strong>and</strong> protein sequences with automatic baseline annotationhttp://www.ncbi.nlm.nih.gov/ <strong>genome</strong>/guide/NCBIViews <strong>of</strong> chromosomes <strong>and</strong> maps <strong>and</strong> loci with links to o<strong>the</strong>r NCBI resourceshttp://www.ncbi.nlm.nih.gov/ genemap99/Gene map 99: contains data <strong>and</strong> viewers for radiation hybrid maps <strong>of</strong> EST-basedSTSsfrom <strong>the</strong>se sites <strong>the</strong> entire draft <strong>genome</strong> sequence toge<strong>the</strong>r with <strong>the</strong>annotations in a computer-readable format. The sequences <strong>of</strong> <strong>the</strong>underlying sequenced clones are all available through <strong>the</strong> publicsequence databases. URLs for <strong>the</strong>se <strong>and</strong> o<strong>the</strong>r <strong>genome</strong> websites arelisted in Box 2. A larger list <strong>of</strong> useful URLs can be found atwww.nhgri.nih.gov/<strong>genome</strong>_hub. An introduction to using <strong>the</strong>draft <strong>genome</strong> sequence, as well as associated databases <strong>and</strong> analyticaltools, is provided in an accompanying paper 111 .In addition, <strong>the</strong> <strong>human</strong> cytogenetic map has been integrated with<strong>the</strong> draft <strong>genome</strong> sequence as part <strong>of</strong> a related project. The BACResource Consortium 103 established dense connections between <strong>the</strong>maps using more than 7,500 sequenced large-insert clones that hadbeen cytogenetically mapped by FISH; <strong>the</strong> average density <strong>of</strong> <strong>the</strong>map is 2.3 clones per Mb. Although <strong>the</strong> precision <strong>of</strong> <strong>the</strong> integrationis limited by <strong>the</strong> resolution <strong>of</strong> FISH, <strong>the</strong> links provide a powerfultool for <strong>the</strong> <strong>analysis</strong> <strong>of</strong> cytogenetic aberrations in inherited diseases<strong>and</strong> cancer. These cytogenetic links can also be accessed through <strong>the</strong>Genome Browsers.Long-range variation in GC contentThe existence <strong>of</strong> GC-rich <strong>and</strong> GC-poor regions in <strong>the</strong> <strong>human</strong><strong>genome</strong> was ®rst revealed by experimental studies involving densitygradient separation, which indicated substantial variation in averageGC content among large fragments. Subsequent studies haveindicated that <strong>the</strong>se GC-rich <strong>and</strong> GC-poor regions may havedifferent biological properties, such as gene density, composition<strong>of</strong> repeat sequences, correspondence with cytogenetic b<strong>and</strong>s <strong>and</strong>recombination rate 112±117 . Many <strong>of</strong> <strong>the</strong>se studies were indirect, owingto <strong>the</strong> lack <strong>of</strong> suf®cient sequence data.The draft <strong>genome</strong> sequence makes it possible to explore <strong>the</strong>variation in GC content in a direct <strong>and</strong> global manner. Visualinspection (Fig. 9) con®rms that local GC content undergoessubstantial long-range excursions from its <strong>genome</strong>-wide average<strong>of</strong> 41%. If <strong>the</strong> <strong>genome</strong> were drawn from a uniform distribution <strong>of</strong>GC content, <strong>the</strong> local GC content in a window <strong>of</strong> size n bp shouldbe 41 6 Î((41)(59)/n)%. Fluctuations would be modest, with <strong>the</strong>st<strong>and</strong>ard deviation being halved as <strong>the</strong> window size is quadrupledÐfor example, 0.70%, 0.35%, 0.17% <strong>and</strong> 0.09% for windows <strong>of</strong> size 5,20, 80 <strong>and</strong> 320 kb.The draft <strong>genome</strong> sequence, however, contains many regions withmuch more extreme variation. There are huge regions (. 10 Mb)with GC content far from <strong>the</strong> average. For example, <strong>the</strong> most distal48 Mb <strong>of</strong> chromosome 1p (from <strong>the</strong> telomere to about STS markerD1S3279) has an average GC content <strong>of</strong> 47.1%, <strong>and</strong> chromosome 13has a 40-Mb region (roughly between STS marker A005X38 <strong>and</strong>12,000http://compbio.ornl.gov/channel/index.htmlOak Ridge National LaboratoryJava viewers for <strong>human</strong> <strong>genome</strong> datahttp://hgrep.ims.u-tokyo.ac.jp/RIKEN <strong>and</strong> <strong>the</strong> University <strong>of</strong> TokyoGives an overview <strong>of</strong> <strong>the</strong> entire <strong>human</strong> <strong>genome</strong> structurehttp://snp.cshl.org/The SNP ConsortiumIncludes a variety <strong>of</strong> ways to query for SNPs in <strong>the</strong> <strong>human</strong> <strong>genome</strong>http://www.ncbi.nlm.nih.gov/Omim/Online Mendelian Inheritance in ManContain information about <strong>human</strong> genes <strong>and</strong> diseaseNumber <strong>of</strong> 20-kb windows10,0008,0006,0004,0002,000http://www.nhgri.nih.gov/ELSI/ <strong>and</strong> http://www.ornl.gov/hgmis/elsi/elsi.htmlNHGRI <strong>and</strong> DOEContains information, links <strong>and</strong> articles on a wide range <strong>of</strong> social, ethical <strong>and</strong> legalissues020 25 30 35 40 45 50 55 60 65 70GC contentFigure 12 Histogram <strong>of</strong> GC content <strong>of</strong> 20-kb windows in <strong>the</strong> draft <strong>genome</strong> sequence.876 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!