Essential Cell Biology 5th edition
348HOW WE KNOWSEQUENCING THE HUMAN GENOMEWhen DNA sequencing techniques became fully automated,determining the order of the nucleotides in apiece of DNA went from being an elaborate Ph.D. thesisproject to a routine laboratory chore. Feed DNA into thesequencing machine, add the necessary reagents, andout comes the sought-after result: the order of As, Ts,Gs, and Cs. Nothing could be simpler.So why was sequencing the human genome such aformidable task? Largely because of its size. The DNAsequencing methods employed at the time were limitedby the physical size of the gel used to separate thelabeled fragments (see, for example, Figure Q10−9). Atmost, only a few hundred nucleotides could be readfrom a single gel. How, then, do you handle a genomethat contains billions of nucleotide pairs?The solution is to break the genome into fragments andsequence these smaller pieces. The main challenge thencomes in piecing the short fragments together in thecorrect order to yield a comprehensive sequence of awhole chromosome, and ultimately a whole genome.There are two main strategies for accomplishing thisgenomic breakage and reassembly: the shotgun methodand the clone-by-clone approach.Shotgun sequencingThe most straightforward approach to sequencing agenome is to break it into random fragments, separateand sequence each of the single-stranded fragments,and then use a powerful computer to order these piecesusing sequence overlaps to guide the assembly (Figure10–18). This approach is called the shotgun sequencingstrategy. As an analogy, imagine shredding several copiesof Essential Cell Biology (ECB), mixing up the pieces,RANDOMFRAGMENTATIONmultiple copiesof genomeand then trying to put one whole copy of the book backtogether again by matching up the words or phrases orsentences that appear on each piece. (Several copieswould be needed to generate enough overlap for reassembly.)It could be done, but it would be much easier ifthe book were, say, only two pages long.For this reason, a straight-out shotgun approach is thestrategy of choice only for sequencing small genomes.The method proved its worth in 1995, when it was usedto sequence the genome of the infectious bacteriumHaemophilus influenzae, the first organism to have itscomplete genome sequence determined. The troublewith shotgun sequencing is that the reassembly processcan be derailed by repetitive nucleotide sequences.Although rare in bacteria, these sequences make up alarge fraction of vertebrate genomes (see Figure 9–33).Highly repetitive DNA segments make it difficult topiece DNA sequences back together accurately (Figure10–19). Returning to the ECB analogy, this chapter alonecontains more than a few instances of the phrase “thehuman genome.” Imagine that one slip of paper from theshredded ECBs contains the information: “So why wassequencing the human genome” (which appears at thestart of this section); another contains the information:“the human genome sequence consortium combinedshotgun sequencing with a clone-by-clone approach”(which appears below). You might be tempted to jointhese two segments together based on the overlappingphrase “the human genome.” But you would wind upwith the nonsensical statement: “So why was sequencingthe human genome sequence consortium combinedshotgun sequencing with a clone-by-clone approach.”You would also lose the several paragraphs of importanttext that originally appeared between these twoinstances of “the human genome.”And that’s just in this section. The phrase “the humangenome” appears in many chapters of this book. Suchrepetition compounds the problem of placing eachfragment in its correct context. To circumvent theseassembly problems, researchers in the human genomesequence consortium combined shotgun sequencingwith a clone-by-clone approach.---GCCATTAGTTCASEQUENCE ONE STRANDOF FRAGMENTSGTTCAGCATTG---ASSEMBLESEQUENCE---GCCATTAGTTCAGCATTG---ORIGINAL SEQUENCERECONSTRUCTED BASEDON SEQUENCE OVERLAPsequencesof twofragmentsFigure 10–18 Shotgunsequencing is themethod of choice forsmall genomes. Thegenome is first brokeninto much smaller,overlapping fragments.Each fragment is thensequenced, and thegenome is assembledbased on overlappingsequences.Clone-by-cloneIn this approach, researchers started by preparing agenomic DNA library. They broke the human genomeinto overlapping fragments, 100–200 kilobase pairs insize. They then plugged these segments into bacterialartificial chromosomes (BACs) and inserted them intoE. coli. (BACs are similar to the bacterial plasmids discussedearlier, except they can carry much larger piecesof DNA.) As the bacteria divided, they copied the BACs,thus producing a collection of overlapping cloned fragments(see Figure 10–8).
Sequencing DNA349repetitive DNAinterveninginformationRANDOMFRAGMENTATIONSEQUENCEFRAGMENTSGATTACAGATTACAGATTACA------GATTACAGATTACAGATTACASEQUENCE ASSEMBLEDINCORRECTLYmultiple copiesof genome---GATTACAGATTACAGATTACAGATTACA---intervening information is lostsequencesof twofragmentsFigure 10–19 Repetitive DNA sequences in a genome make itdifficult to accurately assemble its fragments. In this example,the DNA contains two segments of repetitive DNA, each madeof many copies of the sequence GATTACA. When the resultingsequences are examined, two fragments from different partsof the DNA appear to overlap. Assembling these sequencesincorrectly would result in a loss of the information (in brackets)that lies between the original repeats.ECB5 e10.25/10.19The researchers then determined where each of theseDNA fragments fit into the existing map of the humangenome. To do this, different restriction enzymes wereused to cut each clone to generate a unique restrictionsite“signature.” The locations of the restriction sites ineach fragment allowed researchers to map each BACclone onto a restriction map of a whole human genomethat had been generated previously using the same set ofrestriction enzymes (Figure 10–20).Knowing the relative positions of the cloned fragments,the researchers then selected some 30,000 BACs, shearedeach into smaller fragments, and determined the nucleotidesequence of each BAC separately using the shotgunmethod. They could then assemble the whole genomesequence by stitching together the sequences of thousandsof individual BACs that span the length of thegenome.The beauty of this approach was that it was relativelyeasy to accurately determine where the BAC fragmentsbelong in the genome. This mapping step reduced thelikelihood that regions containing repetitive sequenceswere assembled incorrectly, and it virtually eliminatedthe possibility that sequences from different chromosomeswere mistakenly joined together. Returning tothe textbook analogy, the BAC-based approach is akin tofirst separating your copies of ECB into individual pagesand then shredding each page into its own separate pile.It should be much easier to put the book back togetherwhen one pile of fragments contains words from page 1,a second pile from page 2, and so on. And there’s virtuallyno chance of mistakenly sticking a sentence frompage 40 into the middle of a paragraph on page 412.All together nowThe clone-by-clone approach produced the first draftof the human genome sequence in 2000 and the completedsequence in 2004. As the set of instructions thatspecify all of the RNA and protein molecules needed tobuild a human being, this string of genetic bits holds thesecrets to human development and physiology. But thesequence was also of great value to researchers interestedin comparative genomics or in the physiology ofother organisms: it eased the assembly of nucleotidesequences from other mammalian genomes—mice, rats,dogs, and other primates. It also made it much easier todetermine the nucleotide sequences of the genomes ofindividual humans by providing a framework on whichthe new sequences could be simply superimposed.The first human sequence was the only mammaliangenome completed in this methodical way. But theHuman Genome Project was an unqualified success inthat it provided the techniques, confidence, and momentumthat drove the development of the next generationof DNA sequencing methods, which are now rapidlytransforming all areas of biology.restriction map of one segmentof human genomerestriction patternfor individual BACclonescleavage sites for restriction nucleases A, B, C, D, and EAA D B BA B C ECFigure 10–20 Individual BAC clones arepositioned on the physical map of thehuman genome sequence on the basis oftheir restriction-site “signatures.” Clones aredigested with five different restriction enzymes,and the sites at which the different enzymes cuteach clone are recorded. The distinctive patternof restriction sites allows investigators to orderthe fragments and place them on a restrictionmap of a human genome that had beenpreviously generated using the same nucleases.
- Page 332 and 333: 298 CHAPTER 9 How Genes and Genomes
- Page 334 and 335: 300 CHAPTER 9 How Genes and Genomes
- Page 336 and 337: 302 CHAPTER 9 How Genes and Genomes
- Page 338 and 339: 304 CHAPTER 9 How Genes and Genomes
- Page 340 and 341: 306 CHAPTER 9 How Genes and Genomes
- Page 342 and 343: 308 CHAPTER 9 How Genes and Genomes
- Page 344 and 345: 310 CHAPTER 9 How Genes and Genomes
- Page 346 and 347: 312 CHAPTER 9 How Genes and Genomes
- Page 348 and 349: 314 CHAPTER 9 How Genes and Genomes
- Page 350 and 351: 316 CHAPTER 9 How Genes and Genomes
- Page 352 and 353: 318 CHAPTER 9 How Genes and Genomes
- Page 354 and 355: 320 CHAPTER 9 How Genes and Genomes
- Page 356 and 357: 322 CHAPTER 9 How Genes and Genomes
- Page 358 and 359: 324HOW WE KNOWCOUNTING GENESHow man
- Page 360 and 361: 326 CHAPTER 9 How Genes and Genomes
- Page 362 and 363: 328 CHAPTER 9 How Genes and Genomes
- Page 364 and 365: 5′ 3′5′3′330 CHAPTER 9 How
- Page 367 and 368: CHAPTER TEN10Analyzing the Structur
- Page 369 and 370: Isolating and Cloning DNA Molecules
- Page 371 and 372: Isolating and Cloning DNA Molecules
- Page 373 and 374: Isolating and Cloning DNA Molecules
- Page 375 and 376: IIIIIIIIIIIIIDNA Cloning by PCR341D
- Page 377 and 378: DNA Cloning by PCR343HEAT TOSEPARAT
- Page 379 and 380: DNA Cloning by PCR345(A)ANALYSIS OF
- Page 381: Sequencing DNA347single-stranded DN
- Page 385 and 386: Exploring Gene Function351each loca
- Page 387 and 388: Exploring Gene Function353(A)CONSTR
- Page 389 and 390: Exploring Gene Function355E. coli,
- Page 391 and 392: Exploring Gene Function357(A)(B)Fig
- Page 393 and 394: Exploring Gene Function359fused to
- Page 395 and 396: Exploring Gene Function361Figure 10
- Page 397 and 398: Questions363• DNA sequencing tech
- Page 399 and 400: CHAPTER ELEVEN11Membrane StructureA
- Page 401 and 402: ECB5 e11.05/11.05The Lipid Bilayer3
- Page 403 and 404: The Lipid Bilayer369hydrogen bondsC
- Page 405 and 406: The Lipid Bilayer371sets a lower li
- Page 407 and 408: The Lipid Bilayer373Figure 11-16 Ne
- Page 409 and 410: Membrane Proteins375TRANSPORTERS AN
- Page 411 and 412: Membrane Proteins377A Polypeptide C
- Page 413 and 414: Membrane Proteins379Figure 11-26 SD
- Page 415 and 416: Membrane Proteins381spectrin dimera
- Page 417 and 418: Membrane Proteins383apical plasmame
- Page 419 and 420: ECB5 e11.36/11.36Membrane Proteins3
- Page 421 and 422: Questions387• Most cell membranes
- Page 423 and 424: CHAPTER TWELVE12Transport Across Ce
- Page 425 and 426: Principles of Transmembrane Transpo
- Page 427 and 428: Principles of Transmembrane Transpo
- Page 429 and 430: Transporters and Their Functions395
- Page 431 and 432: Transporters and Their Functions397
Sequencing DNA
349
repetitive DNA
intervening
information
RANDOM
FRAGMENTATION
SEQUENCE
FRAGMENTS
GATTACAGATTACAGATTACA---
---GATTACAGATTACAGATTACA
SEQUENCE ASSEMBLED
INCORRECTLY
multiple copies
of genome
---GATTACAGATTACAGATTACAGATTACA---
intervening information is lost
sequences
of two
fragments
Figure 10–19 Repetitive DNA sequences in a genome make it
difficult to accurately assemble its fragments. In this example,
the DNA contains two segments of repetitive DNA, each made
of many copies of the sequence GATTACA. When the resulting
sequences are examined, two fragments from different parts
of the DNA appear to overlap. Assembling these sequences
incorrectly would result in a loss of the information (in brackets)
that lies between the original repeats.
ECB5 e10.25/10.19
The researchers then determined where each of these
DNA fragments fit into the existing map of the human
genome. To do this, different restriction enzymes were
used to cut each clone to generate a unique restrictionsite
“signature.” The locations of the restriction sites in
each fragment allowed researchers to map each BAC
clone onto a restriction map of a whole human genome
that had been generated previously using the same set of
restriction enzymes (Figure 10–20).
Knowing the relative positions of the cloned fragments,
the researchers then selected some 30,000 BACs, sheared
each into smaller fragments, and determined the nucleotide
sequence of each BAC separately using the shotgun
method. They could then assemble the whole genome
sequence by stitching together the sequences of thousands
of individual BACs that span the length of the
genome.
The beauty of this approach was that it was relatively
easy to accurately determine where the BAC fragments
belong in the genome. This mapping step reduced the
likelihood that regions containing repetitive sequences
were assembled incorrectly, and it virtually eliminated
the possibility that sequences from different chromosomes
were mistakenly joined together. Returning to
the textbook analogy, the BAC-based approach is akin to
first separating your copies of ECB into individual pages
and then shredding each page into its own separate pile.
It should be much easier to put the book back together
when one pile of fragments contains words from page 1,
a second pile from page 2, and so on. And there’s virtually
no chance of mistakenly sticking a sentence from
page 40 into the middle of a paragraph on page 412.
All together now
The clone-by-clone approach produced the first draft
of the human genome sequence in 2000 and the completed
sequence in 2004. As the set of instructions that
specify all of the RNA and protein molecules needed to
build a human being, this string of genetic bits holds the
secrets to human development and physiology. But the
sequence was also of great value to researchers interested
in comparative genomics or in the physiology of
other organisms: it eased the assembly of nucleotide
sequences from other mammalian genomes—mice, rats,
dogs, and other primates. It also made it much easier to
determine the nucleotide sequences of the genomes of
individual humans by providing a framework on which
the new sequences could be simply superimposed.
The first human sequence was the only mammalian
genome completed in this methodical way. But the
Human Genome Project was an unqualified success in
that it provided the techniques, confidence, and momentum
that drove the development of the next generation
of DNA sequencing methods, which are now rapidly
transforming all areas of biology.
restriction map of one segment
of human genome
restriction pattern
for individual BAC
clones
cleavage sites for restriction nucleases A, B, C, D, and E
AA D B BA B C EC
Figure 10–20 Individual BAC clones are
positioned on the physical map of the
human genome sequence on the basis of
their restriction-site “signatures.” Clones are
digested with five different restriction enzymes,
and the sites at which the different enzymes cut
each clone are recorded. The distinctive pattern
of restriction sites allows investigators to order
the fragments and place them on a restriction
map of a human genome that had been
previously generated using the same nucleases.