Essential Cell Biology 5th edition

14.07.2022 Views

348HOW WE KNOWSEQUENCING THE HUMAN GENOMEWhen DNA sequencing techniques became fully automated,determining the order of the nucleotides in apiece of DNA went from being an elaborate Ph.D. thesisproject to a routine laboratory chore. Feed DNA into thesequencing machine, add the necessary reagents, andout comes the sought-after result: the order of As, Ts,Gs, and Cs. Nothing could be simpler.So why was sequencing the human genome such aformidable task? Largely because of its size. The DNAsequencing methods employed at the time were limitedby the physical size of the gel used to separate thelabeled fragments (see, for example, Figure Q10−9). Atmost, only a few hundred nucleotides could be readfrom a single gel. How, then, do you handle a genomethat contains billions of nucleotide pairs?The solution is to break the genome into fragments andsequence these smaller pieces. The main challenge thencomes in piecing the short fragments together in thecorrect order to yield a comprehensive sequence of awhole chromosome, and ultimately a whole genome.There are two main strategies for accomplishing thisgenomic breakage and reassembly: the shotgun methodand the clone-by-clone approach.Shotgun sequencingThe most straightforward approach to sequencing agenome is to break it into random fragments, separateand sequence each of the single-stranded fragments,and then use a powerful computer to order these piecesusing sequence overlaps to guide the assembly (Figure10–18). This approach is called the shotgun sequencingstrategy. As an analogy, imagine shredding several copiesof Essential Cell Biology (ECB), mixing up the pieces,RANDOMFRAGMENTATIONmultiple copiesof genomeand then trying to put one whole copy of the book backtogether again by matching up the words or phrases orsentences that appear on each piece. (Several copieswould be needed to generate enough overlap for reassembly.)It could be done, but it would be much easier ifthe book were, say, only two pages long.For this reason, a straight-out shotgun approach is thestrategy of choice only for sequencing small genomes.The method proved its worth in 1995, when it was usedto sequence the genome of the infectious bacteriumHaemophilus influenzae, the first organism to have itscomplete genome sequence determined. The troublewith shotgun sequencing is that the reassembly processcan be derailed by repetitive nucleotide sequences.Although rare in bacteria, these sequences make up alarge fraction of vertebrate genomes (see Figure 9–33).Highly repetitive DNA segments make it difficult topiece DNA sequences back together accurately (Figure10–19). Returning to the ECB analogy, this chapter alonecontains more than a few instances of the phrase “thehuman genome.” Imagine that one slip of paper from theshredded ECBs contains the information: “So why wassequencing the human genome” (which appears at thestart of this section); another contains the information:“the human genome sequence consortium combinedshotgun sequencing with a clone-by-clone approach”(which appears below). You might be tempted to jointhese two segments together based on the overlappingphrase “the human genome.” But you would wind upwith the nonsensical statement: “So why was sequencingthe human genome sequence consortium combinedshotgun sequencing with a clone-by-clone approach.”You would also lose the several paragraphs of importanttext that originally appeared between these twoinstances of “the human genome.”And that’s just in this section. The phrase “the humangenome” appears in many chapters of this book. Suchrepetition compounds the problem of placing eachfragment in its correct context. To circumvent theseassembly problems, researchers in the human genomesequence consortium combined shotgun sequencingwith a clone-by-clone approach.---GCCATTAGTTCASEQUENCE ONE STRANDOF FRAGMENTSGTTCAGCATTG---ASSEMBLESEQUENCE---GCCATTAGTTCAGCATTG---ORIGINAL SEQUENCERECONSTRUCTED BASEDON SEQUENCE OVERLAPsequencesof twofragmentsFigure 10–18 Shotgunsequencing is themethod of choice forsmall genomes. Thegenome is first brokeninto much smaller,overlapping fragments.Each fragment is thensequenced, and thegenome is assembledbased on overlappingsequences.Clone-by-cloneIn this approach, researchers started by preparing agenomic DNA library. They broke the human genomeinto overlapping fragments, 100–200 kilobase pairs insize. They then plugged these segments into bacterialartificial chromosomes (BACs) and inserted them intoE. coli. (BACs are similar to the bacterial plasmids discussedearlier, except they can carry much larger piecesof DNA.) As the bacteria divided, they copied the BACs,thus producing a collection of overlapping cloned fragments(see Figure 10–8).

Sequencing DNA349repetitive DNAinterveninginformationRANDOMFRAGMENTATIONSEQUENCEFRAGMENTSGATTACAGATTACAGATTACA------GATTACAGATTACAGATTACASEQUENCE ASSEMBLEDINCORRECTLYmultiple copiesof genome---GATTACAGATTACAGATTACAGATTACA---intervening information is lostsequencesof twofragmentsFigure 10–19 Repetitive DNA sequences in a genome make itdifficult to accurately assemble its fragments. In this example,the DNA contains two segments of repetitive DNA, each madeof many copies of the sequence GATTACA. When the resultingsequences are examined, two fragments from different partsof the DNA appear to overlap. Assembling these sequencesincorrectly would result in a loss of the information (in brackets)that lies between the original repeats.ECB5 e10.25/10.19The researchers then determined where each of theseDNA fragments fit into the existing map of the humangenome. To do this, different restriction enzymes wereused to cut each clone to generate a unique restrictionsite“signature.” The locations of the restriction sites ineach fragment allowed researchers to map each BACclone onto a restriction map of a whole human genomethat had been generated previously using the same set ofrestriction enzymes (Figure 10–20).Knowing the relative positions of the cloned fragments,the researchers then selected some 30,000 BACs, shearedeach into smaller fragments, and determined the nucleotidesequence of each BAC separately using the shotgunmethod. They could then assemble the whole genomesequence by stitching together the sequences of thousandsof individual BACs that span the length of thegenome.The beauty of this approach was that it was relativelyeasy to accurately determine where the BAC fragmentsbelong in the genome. This mapping step reduced thelikelihood that regions containing repetitive sequenceswere assembled incorrectly, and it virtually eliminatedthe possibility that sequences from different chromosomeswere mistakenly joined together. Returning tothe textbook analogy, the BAC-based approach is akin tofirst separating your copies of ECB into individual pagesand then shredding each page into its own separate pile.It should be much easier to put the book back togetherwhen one pile of fragments contains words from page 1,a second pile from page 2, and so on. And there’s virtuallyno chance of mistakenly sticking a sentence frompage 40 into the middle of a paragraph on page 412.All together nowThe clone-by-clone approach produced the first draftof the human genome sequence in 2000 and the completedsequence in 2004. As the set of instructions thatspecify all of the RNA and protein molecules needed tobuild a human being, this string of genetic bits holds thesecrets to human development and physiology. But thesequence was also of great value to researchers interestedin comparative genomics or in the physiology ofother organisms: it eased the assembly of nucleotidesequences from other mammalian genomes—mice, rats,dogs, and other primates. It also made it much easier todetermine the nucleotide sequences of the genomes ofindividual humans by providing a framework on whichthe new sequences could be simply superimposed.The first human sequence was the only mammaliangenome completed in this methodical way. But theHuman Genome Project was an unqualified success inthat it provided the techniques, confidence, and momentumthat drove the development of the next generationof DNA sequencing methods, which are now rapidlytransforming all areas of biology.restriction map of one segmentof human genomerestriction patternfor individual BACclonescleavage sites for restriction nucleases A, B, C, D, and EAA D B BA B C ECFigure 10–20 Individual BAC clones arepositioned on the physical map of thehuman genome sequence on the basis oftheir restriction-site “signatures.” Clones aredigested with five different restriction enzymes,and the sites at which the different enzymes cuteach clone are recorded. The distinctive patternof restriction sites allows investigators to orderthe fragments and place them on a restrictionmap of a human genome that had beenpreviously generated using the same nucleases.

Sequencing DNA

349

repetitive DNA

intervening

information

RANDOM

FRAGMENTATION

SEQUENCE

FRAGMENTS

GATTACAGATTACAGATTACA---

---GATTACAGATTACAGATTACA

SEQUENCE ASSEMBLED

INCORRECTLY

multiple copies

of genome

---GATTACAGATTACAGATTACAGATTACA---

intervening information is lost

sequences

of two

fragments

Figure 10–19 Repetitive DNA sequences in a genome make it

difficult to accurately assemble its fragments. In this example,

the DNA contains two segments of repetitive DNA, each made

of many copies of the sequence GATTACA. When the resulting

sequences are examined, two fragments from different parts

of the DNA appear to overlap. Assembling these sequences

incorrectly would result in a loss of the information (in brackets)

that lies between the original repeats.

ECB5 e10.25/10.19

The researchers then determined where each of these

DNA fragments fit into the existing map of the human

genome. To do this, different restriction enzymes were

used to cut each clone to generate a unique restrictionsite

“signature.” The locations of the restriction sites in

each fragment allowed researchers to map each BAC

clone onto a restriction map of a whole human genome

that had been generated previously using the same set of

restriction enzymes (Figure 10–20).

Knowing the relative positions of the cloned fragments,

the researchers then selected some 30,000 BACs, sheared

each into smaller fragments, and determined the nucleotide

sequence of each BAC separately using the shotgun

method. They could then assemble the whole genome

sequence by stitching together the sequences of thousands

of individual BACs that span the length of the

genome.

The beauty of this approach was that it was relatively

easy to accurately determine where the BAC fragments

belong in the genome. This mapping step reduced the

likelihood that regions containing repetitive sequences

were assembled incorrectly, and it virtually eliminated

the possibility that sequences from different chromosomes

were mistakenly joined together. Returning to

the textbook analogy, the BAC-based approach is akin to

first separating your copies of ECB into individual pages

and then shredding each page into its own separate pile.

It should be much easier to put the book back together

when one pile of fragments contains words from page 1,

a second pile from page 2, and so on. And there’s virtually

no chance of mistakenly sticking a sentence from

page 40 into the middle of a paragraph on page 412.

All together now

The clone-by-clone approach produced the first draft

of the human genome sequence in 2000 and the completed

sequence in 2004. As the set of instructions that

specify all of the RNA and protein molecules needed to

build a human being, this string of genetic bits holds the

secrets to human development and physiology. But the

sequence was also of great value to researchers interested

in comparative genomics or in the physiology of

other organisms: it eased the assembly of nucleotide

sequences from other mammalian genomes—mice, rats,

dogs, and other primates. It also made it much easier to

determine the nucleotide sequences of the genomes of

individual humans by providing a framework on which

the new sequences could be simply superimposed.

The first human sequence was the only mammalian

genome completed in this methodical way. But the

Human Genome Project was an unqualified success in

that it provided the techniques, confidence, and momentum

that drove the development of the next generation

of DNA sequencing methods, which are now rapidly

transforming all areas of biology.

restriction map of one segment

of human genome

restriction pattern

for individual BAC

clones

cleavage sites for restriction nucleases A, B, C, D, and E

AA D B BA B C EC

Figure 10–20 Individual BAC clones are

positioned on the physical map of the

human genome sequence on the basis of

their restriction-site “signatures.” Clones are

digested with five different restriction enzymes,

and the sites at which the different enzymes cut

each clone are recorded. The distinctive pattern

of restriction sites allows investigators to order

the fragments and place them on a restriction

map of a human genome that had been

previously generated using the same nucleases.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!