12.07.2015 Views

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

articlesposable elements in categories re¯ecting equal amounts <strong>of</strong> sequencedivergence. In Fig. 18b <strong>the</strong> data are grouped into four binscorresponding to successive 25-Myr periods, on <strong>the</strong> basis <strong>of</strong> anapproximate clock. Figure 19 shows <strong>the</strong> mean ages <strong>of</strong> varioussubfamilies <strong>of</strong> DNA transposons. Several facts are apparent from<strong>the</strong>se graphs. First, most interspersed repeats in <strong>the</strong> <strong>human</strong> <strong>genome</strong>predate <strong>the</strong> eu<strong>the</strong>rian radiation. This is a testament to <strong>the</strong> extremelyslow rate with which nonfunctional sequences are cleared fromvertebrate <strong>genome</strong>s (see below concerning comparison with <strong>the</strong> ¯y).Second, LINE <strong>and</strong> SINE elements have extremely long lives. Themonophyletic LINE1 <strong>and</strong> Alu lineages are at least 150 <strong>and</strong> 80 Myrold, respectively. In earlier times, <strong>the</strong> reigning transposons wereLINE2 <strong>and</strong> MIR 148,158 . The SINE MIR was perfectly adapted forreverse transcription by LINE2, as it carried <strong>the</strong> same 50-basesequence at its 39 end. When LINE2 became extinct 80±100 Myrago, it spelled <strong>the</strong> doom <strong>of</strong> MIR.Third, <strong>the</strong>re were two major peaks <strong>of</strong> DNA transposon activity(Fig. 19). The ®rst involved Charlie elements <strong>and</strong> occurred longbefore <strong>the</strong> eu<strong>the</strong>rian radiation; <strong>the</strong> second involved Tigger elements<strong>and</strong> occurred after this radiation. Because DNA transposons canproduce large-scale chromosome rearrangements 159±162 , it is possiblethat widespread activity could be involved in speciation events.Fourth, <strong>the</strong>re is no evidence for DNA transposon activity in <strong>the</strong>past 50 Myr in <strong>the</strong> <strong>human</strong> <strong>genome</strong>. The youngest two DNAtransposon families that we can identify in <strong>the</strong> draft <strong>genome</strong>sequence (MER75 <strong>and</strong> MER85) show 6±7% divergence from <strong>the</strong>irrespective consensus sequences representing <strong>the</strong> ancestral element(Fig. 19), indicating that <strong>the</strong>y were active before <strong>the</strong> divergence <strong>of</strong><strong>human</strong>s <strong>and</strong> new world monkeys. Moreover, <strong>the</strong>se elements wererelatively unsuccessful, toge<strong>the</strong>r contributing just 125 kb to <strong>the</strong> draft<strong>genome</strong> sequence.Number <strong>of</strong> kb covered by family8,0007,0006,0005,0004,0003,0002,0001,000CharliePiggyBac-likeMarinerTc2TiggerUnclassifiedZaphod00 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4Average nucleotide substitution level for familyFigure 19 Median ages <strong>and</strong> per cent <strong>of</strong> <strong>the</strong> <strong>genome</strong> covered by subfamilies <strong>of</strong> DNAtransposons. The Charlie <strong>and</strong> Zaphod elements were hobo-Activator-Tam3 (hAT) DNAtransposons; Mariner, Tc2 <strong>and</strong> Tigger were Tc1-like elements. Unlike retroposons, DNAtransposons are thought to have a short life span in a <strong>genome</strong>. Thus, <strong>the</strong> average ormedian divergence <strong>of</strong> copies from <strong>the</strong> consensus is a particularly accurate measure <strong>of</strong> <strong>the</strong>age <strong>of</strong> <strong>the</strong> DNA transposon copies.Finally, LTR retroposons appear to be teetering on <strong>the</strong> brink <strong>of</strong>extinction, if <strong>the</strong>y have not already succumbed. For example, <strong>the</strong>most proli®c elements (ERVL <strong>and</strong> MaLRs) ¯ourished for more than100 Myr but appear to have died out about 40 Myr ago 163,164 . Only asingle LTR retroposon family (HERVK10) is known to have transposedsince our divergence from <strong>the</strong> chimpanzee 7 Myr ago, withonly one known copy (in <strong>the</strong> HLA region) that is not sharedbetween all <strong>human</strong>s 165 . In <strong>the</strong> draft <strong>genome</strong> sequence, we canidentify only three full-length copies with all ORFs intact (<strong>the</strong>®nal total may be slightly higher owing to <strong>the</strong> imperfect state <strong>of</strong><strong>the</strong> draft <strong>genome</strong> sequence).More generally, <strong>the</strong> overall activity <strong>of</strong> all transposons has declinedmarkedly over <strong>the</strong> past 35±50 Myr, with <strong>the</strong> possible exception <strong>of</strong>LINE1 (Fig. 18). Indeed, apart from an exceptional burst <strong>of</strong> activity<strong>of</strong> Alus peaking around 40 Myr ago, <strong>the</strong>re would appear to havebeen a fairly steady decline in activity in <strong>the</strong> hominid lineage since<strong>the</strong> mammalian radiation. The extent <strong>of</strong> <strong>the</strong> decline must be evengreater than it appears because old repeats are gradually removed byr<strong>and</strong>om deletion <strong>and</strong> because old repeat families are harder torecognize <strong>and</strong> likely to be under-represented in <strong>the</strong> repeat databases.(We con®rmed that <strong>the</strong> decline in transposition is not an artefactarising from errors in <strong>the</strong> draft <strong>genome</strong> sequence, which, inprinciple, could increase <strong>the</strong> divergence level in recent elements.First, <strong>the</strong> sequence error rate (Table 9) is far too low to have asigni®cant effect on <strong>the</strong> apparent age <strong>of</strong> recent transposons; <strong>and</strong>second, <strong>the</strong> same result is seen if one considers only ®nishedsequence.)What explains <strong>the</strong> decline in transposon activity in <strong>the</strong> lineageleading to <strong>human</strong>s? We return to this question below, in <strong>the</strong> context<strong>of</strong> <strong>the</strong> observation that <strong>the</strong>re is no similar decline in <strong>the</strong> mouse<strong>genome</strong>.Comparison with o<strong>the</strong>r organisms. We compared <strong>the</strong> complement<strong>of</strong> transposable elements in <strong>the</strong> <strong>human</strong> <strong>genome</strong> with those <strong>of</strong> <strong>the</strong>o<strong>the</strong>r sequenced eukaryotic <strong>genome</strong>s. We analysed <strong>the</strong> ¯y, worm<strong>and</strong> mustard weed <strong>genome</strong>s for <strong>the</strong> number <strong>and</strong> nature <strong>of</strong> repeats(Table 12) <strong>and</strong> <strong>the</strong> age distribution (Fig. 20). (For <strong>the</strong> ¯y, weanalysed <strong>the</strong> 114 Mb <strong>of</strong> un®nished `large' contigs produced by <strong>the</strong>whole-<strong>genome</strong> shotgun assembly 166 , which are reported to representeuchromatic sequence. Similar results were obtained by analysing30 Mb <strong>of</strong> ®nished euchromatic sequence.) The <strong>human</strong> <strong>genome</strong>st<strong>and</strong>s in stark contrast to <strong>the</strong> <strong>genome</strong>s <strong>of</strong> <strong>the</strong> o<strong>the</strong>r organisms.(1) The euchromatic portion <strong>of</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong> has a muchhigher density <strong>of</strong> transposable element copies than <strong>the</strong> euchromaticDNA <strong>of</strong> <strong>the</strong> o<strong>the</strong>r three organisms. The repeats in <strong>the</strong> o<strong>the</strong>rorganisms may have been slightly underestimated because <strong>the</strong>repeat databases for <strong>the</strong> o<strong>the</strong>r organisms are less complete thanfor <strong>the</strong> <strong>human</strong>, especially with regard to older elements; on <strong>the</strong> o<strong>the</strong>rh<strong>and</strong>, recent additions to <strong>the</strong>se databases appear to increase <strong>the</strong>repeat content only marginally.(2) The <strong>human</strong> <strong>genome</strong> is ®lled with copies <strong>of</strong> ancient transposons,whereas <strong>the</strong> transposons in <strong>the</strong> o<strong>the</strong>r <strong>genome</strong>s tend to be <strong>of</strong> morerecent origin. The difference is most marked with <strong>the</strong> ¯y, but is clearfor <strong>the</strong> o<strong>the</strong>r <strong>genome</strong>s as well. The accumulation <strong>of</strong> old repeats islikely to be determined by <strong>the</strong> rate at which organisms engage in`housecleaning' through genomic deletion. Studies <strong>of</strong> pseudogenesTable 12 Number <strong>and</strong> nature <strong>of</strong> interspersed repeats in eukaryotic <strong>genome</strong>sPercentage<strong>of</strong> basesHuman Fly Worm Mustard weedApproximatenumber <strong>of</strong>familiesPercentage<strong>of</strong> basesApproximatenumber <strong>of</strong>familiesPercentage<strong>of</strong> basesApproximatenumber <strong>of</strong>familiesPercentage<strong>of</strong> basesApproximatenumber <strong>of</strong>familiesLINE/SINE 33.40% 6 0.70% 20 0.40% 10 0.50% 10LTR 8.10% 100 1.50% 50 0.00% 4 4.80% 70DNA 2.80% 60 0.70% 20 5.30% 80 5.10% 80Total 44.40% 170 3.10% 90 6.50% 90 10.50% 160...................................................................................................................................................................................................................................................................................................................................................................The complete <strong>genome</strong>s <strong>of</strong> ¯y, worm, <strong>and</strong> chromosomes 2 <strong>and</strong> 4 <strong>of</strong> mustard weed (as deposited at ncbi.nlm.nih.gov/genbank/<strong>genome</strong>s) were screened against <strong>the</strong> repeats in RepBase Update 5.02(September 2000) with RepeatMasker at sensitive settings.882 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!