12.07.2015 Views

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

articleswell as targeted regions <strong>of</strong> mammalian <strong>genome</strong>s 34±37 . These projectsshowed that large-scale <strong>sequencing</strong> was feasible <strong>and</strong> developed <strong>the</strong>two-phase paradigm for <strong>genome</strong> <strong>sequencing</strong>. In <strong>the</strong> ®rst, `shotgun',phase, <strong>the</strong> <strong>genome</strong> is divided into appropriately sized segments <strong>and</strong>each segment is covered to a high degree <strong>of</strong> redundancy (typically,eight- to tenfold) through <strong>the</strong> <strong>sequencing</strong> <strong>of</strong> r<strong>and</strong>omly selectedsubfragments. The second is a `®nishing' phase, in which sequencegaps are closed <strong>and</strong> remaining ambiguities are resolved throughdirected <strong>analysis</strong>. The results also showed that complete genomicsequence provided information about genes, regulatory regions <strong>and</strong>chromosome structure that was not readily obtainable from cDNAstudies alone.In 1995, <strong>genome</strong> scientists considered a proposal 38 that wouldhave involved producing a draft <strong>genome</strong> sequence <strong>of</strong> <strong>the</strong> <strong>human</strong><strong>genome</strong> in a ®rst phase <strong>and</strong> <strong>the</strong>n returning to ®nish <strong>the</strong> sequence ina second phase. After vigorous debate, it was decided that such aplan was premature for several reasons. These included <strong>the</strong> need ®rstto prove that high-quality, long-range ®nished sequence could beproduced from most parts <strong>of</strong> <strong>the</strong> complex, repeat-rich <strong>human</strong><strong>genome</strong>; <strong>the</strong> sense that many aspects <strong>of</strong> <strong>the</strong> <strong>sequencing</strong> processwere still rapidly evolving; <strong>and</strong> <strong>the</strong> desirability <strong>of</strong> fur<strong>the</strong>r decreasingcosts.Instead, pilot projects were launched to demonstrate <strong>the</strong> feasibility<strong>of</strong> cost-effective, large-scale <strong>sequencing</strong>, with a target completiondate <strong>of</strong> March 1999. The projects successfully produced®nished sequence with 99.99% accuracy <strong>and</strong> no gaps 39 . They alsointroduced bacterial arti®cial chromosomes (BACs) 40 , a new largeinsertcloning system that proved to be more stable than <strong>the</strong> cosmids<strong>and</strong> yeast arti®cial chromosomes (YACs) 41 that had been usedpreviously. The pilot projects drove <strong>the</strong> maturation <strong>and</strong> convergence<strong>of</strong> <strong>sequencing</strong> strategies, while producing 15% <strong>of</strong> <strong>the</strong> <strong>human</strong><strong>genome</strong> sequence. With successful completion <strong>of</strong> this phase, <strong>the</strong><strong>human</strong> <strong>genome</strong> <strong>sequencing</strong> effort moved into full-scale productionin March 1999.The idea <strong>of</strong> ®rst producing a draft <strong>genome</strong> sequence was revivedat this time, both because <strong>the</strong> ability to ®nish such a sequence was nolonger in doubt <strong>and</strong> because <strong>the</strong>re was great hunger in <strong>the</strong> scienti®ccommunity for <strong>human</strong> sequence data. In addition, some scientistsfavoured prioritizing <strong>the</strong> production <strong>of</strong> a draft <strong>genome</strong> sequenceover regional ®nished sequence because <strong>of</strong> concerns about commercialplans to generate proprietary databases <strong>of</strong> <strong>human</strong> sequencethat might be subject to undesirable restrictions on use 42±44 .The consortium focused on an initial goal <strong>of</strong> producing, in a ®rstproduction phase lasting until June 2000, a draft <strong>genome</strong> sequencecovering most <strong>of</strong> <strong>the</strong> <strong>genome</strong>. Such a draft <strong>genome</strong> sequence,although not completely ®nished, would rapidly allow investigatorsto begin to extract most <strong>of</strong> <strong>the</strong> information in <strong>the</strong> <strong>human</strong> sequence.Experiments showed that <strong>sequencing</strong> clones covering about 90% <strong>of</strong><strong>the</strong> <strong>human</strong> <strong>genome</strong> to a redundancy <strong>of</strong> about four- to ®vefold (`halfshotgun'coverage; see Box 1) would accomplish this 45,46 . The draft<strong>genome</strong> sequence goal has been achieved, as described below.The second sequence production phase is now under way. Itsaims are to achieve full-shotgun coverage <strong>of</strong> <strong>the</strong> existing clonesduring 2001, to obtain clones to ®ll <strong>the</strong> remaining gaps in <strong>the</strong>physical map, <strong>and</strong> to produce a ®nished sequence (apart fromregions that cannot be cloned or sequenced with currently availabletechniques) no later than 2003.Strategic issuesHierarchical shotgun <strong>sequencing</strong>Soon after <strong>the</strong> invention <strong>of</strong> DNA <strong>sequencing</strong> methods 47,48 , <strong>the</strong>shotgun <strong>sequencing</strong> strategy was introduced 49±51 ; it has remained<strong>the</strong> fundamental method for large-scale <strong>genome</strong> <strong>sequencing</strong> 52±54 for<strong>the</strong> past 20 years. The approach has been re®ned <strong>and</strong> extended tomake it more ef®cient. For example, improved protocols forfragmenting <strong>and</strong> cloning DNA allowed construction <strong>of</strong> shotgunlibraries with more uniform representation. The practice <strong>of</strong> <strong>sequencing</strong>from both ends <strong>of</strong> double-str<strong>and</strong>ed clones (`double-barrelled'shotgun <strong>sequencing</strong>) was introduced by Ansorge <strong>and</strong> o<strong>the</strong>rs 37 in1990, allowing <strong>the</strong> use <strong>of</strong> `linking information' between sequencefragments.The application <strong>of</strong> shotgun <strong>sequencing</strong> was also extended byapplying it to larger <strong>and</strong> larger DNA moleculesÐfrom plasmids(, 4 kilobases (kb)) to cosmid clones 37 (40 kb), to arti®cial chromosomescloned in bacteria <strong>and</strong> yeast 55 (100±500 kb) <strong>and</strong> bacterial<strong>genome</strong>s 56 (1±2 megabases (Mb)). In principle, a <strong>genome</strong> <strong>of</strong> arbitrarysize may be directly sequenced by <strong>the</strong> shotgun method,provided that it contains no repeated sequence <strong>and</strong> can be uniformlysampled at r<strong>and</strong>om. The <strong>genome</strong> can <strong>the</strong>n be assembledusing <strong>the</strong> simple computer science technique <strong>of</strong> `hashing' (in whichone detects overlaps by consulting an alphabetized look-up table <strong>of</strong>all k-letter words in <strong>the</strong> data). Ma<strong>the</strong>matical <strong>analysis</strong> <strong>of</strong> <strong>the</strong>expected number <strong>of</strong> gaps as a function <strong>of</strong> coverage is similarlystraightforward 57 .Practical dif®culties arise because <strong>of</strong> repeated sequences <strong>and</strong>cloning bias. Small amounts <strong>of</strong> repeated sequence pose littleproblem for shotgun <strong>sequencing</strong>. For example, one can readilyassemble typical bacterial <strong>genome</strong>s (about 1.5% repeat) or <strong>the</strong>euchromatic portion <strong>of</strong> <strong>the</strong> ¯y <strong>genome</strong> (about 3% repeat). Bycontrast, <strong>the</strong> <strong>human</strong> <strong>genome</strong> is ®lled (. 50%) with repeatedsequences, including interspersed repeats derived from transposableelements, <strong>and</strong> long genomic regions that have been duplicated int<strong>and</strong>em, palindromic or dispersed fashion (see below). Theseinclude large duplicated segments (50±500 kb) with high sequenceidentity (98±99.9%), at which mispairing during recombinationcreates deletions responsible for genetic syndromes. Such featurescomplicate <strong>the</strong> assembly <strong>of</strong> a correct <strong>and</strong> ®nished <strong>genome</strong> sequence.There are two approaches for <strong>sequencing</strong> large repeat-rich<strong>genome</strong>s. The ®rst is a whole-<strong>genome</strong> shotgun <strong>sequencing</strong>approach, as has been used for <strong>the</strong> repeat-poor <strong>genome</strong>s <strong>of</strong> viruses,bacteria <strong>and</strong> ¯ies, using linking information <strong>and</strong> computationalGenomic DNABAC libraryOrganizedmapped largeclone contigsBAC to besequencedShotgunclonesShotgunsequenceAssemblyHierarchical shotgun <strong>sequencing</strong>...ACCGTAAATGGGCTGATCATGCTTAAATGATCATGCTTAAACCCTGTGCATCCTACTG......ACCGTAAATGGGCTGATCATGCTTAAACCCTGTGCATCCTACTG...Figure 2 Idealized representation <strong>of</strong> <strong>the</strong> hierarchical shotgun <strong>sequencing</strong> strategy. Alibrary is constructed by fragmenting <strong>the</strong> target <strong>genome</strong> <strong>and</strong> cloning it into a largefragmentcloning vector; here, BAC vectors are shown. The genomic DNA fragmentsrepresented in <strong>the</strong> library are <strong>the</strong>n organized into a physical map <strong>and</strong> individual BACclones are selected <strong>and</strong> sequenced by <strong>the</strong> r<strong>and</strong>om shotgun strategy. Finally, <strong>the</strong> clonesequences are assembled to reconstruct <strong>the</strong> sequence <strong>of</strong> <strong>the</strong> <strong>genome</strong>.NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com © 2001 Macmillan Magazines Ltd863

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!