Initial sequencing and analysis of the human genome - Vitagenes
Initial sequencing and analysis of the human genome - Vitagenes
Initial sequencing and analysis of the human genome - Vitagenes
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
articlesposable elements in categories re¯ecting equal amounts <strong>of</strong> sequencedivergence. In Fig. 18b <strong>the</strong> data are grouped into four binscorresponding to successive 25-Myr periods, on <strong>the</strong> basis <strong>of</strong> anapproximate clock. Figure 19 shows <strong>the</strong> mean ages <strong>of</strong> varioussubfamilies <strong>of</strong> DNA transposons. Several facts are apparent from<strong>the</strong>se graphs. First, most interspersed repeats in <strong>the</strong> <strong>human</strong> <strong>genome</strong>predate <strong>the</strong> eu<strong>the</strong>rian radiation. This is a testament to <strong>the</strong> extremelyslow rate with which nonfunctional sequences are cleared fromvertebrate <strong>genome</strong>s (see below concerning comparison with <strong>the</strong> ¯y).Second, LINE <strong>and</strong> SINE elements have extremely long lives. Themonophyletic LINE1 <strong>and</strong> Alu lineages are at least 150 <strong>and</strong> 80 Myrold, respectively. In earlier times, <strong>the</strong> reigning transposons wereLINE2 <strong>and</strong> MIR 148,158 . The SINE MIR was perfectly adapted forreverse transcription by LINE2, as it carried <strong>the</strong> same 50-basesequence at its 39 end. When LINE2 became extinct 80±100 Myrago, it spelled <strong>the</strong> doom <strong>of</strong> MIR.Third, <strong>the</strong>re were two major peaks <strong>of</strong> DNA transposon activity(Fig. 19). The ®rst involved Charlie elements <strong>and</strong> occurred longbefore <strong>the</strong> eu<strong>the</strong>rian radiation; <strong>the</strong> second involved Tigger elements<strong>and</strong> occurred after this radiation. Because DNA transposons canproduce large-scale chromosome rearrangements 159±162 , it is possiblethat widespread activity could be involved in speciation events.Fourth, <strong>the</strong>re is no evidence for DNA transposon activity in <strong>the</strong>past 50 Myr in <strong>the</strong> <strong>human</strong> <strong>genome</strong>. The youngest two DNAtransposon families that we can identify in <strong>the</strong> draft <strong>genome</strong>sequence (MER75 <strong>and</strong> MER85) show 6±7% divergence from <strong>the</strong>irrespective consensus sequences representing <strong>the</strong> ancestral element(Fig. 19), indicating that <strong>the</strong>y were active before <strong>the</strong> divergence <strong>of</strong><strong>human</strong>s <strong>and</strong> new world monkeys. Moreover, <strong>the</strong>se elements wererelatively unsuccessful, toge<strong>the</strong>r contributing just 125 kb to <strong>the</strong> draft<strong>genome</strong> sequence.Number <strong>of</strong> kb covered by family8,0007,0006,0005,0004,0003,0002,0001,000CharliePiggyBac-likeMarinerTc2TiggerUnclassifiedZaphod00 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4Average nucleotide substitution level for familyFigure 19 Median ages <strong>and</strong> per cent <strong>of</strong> <strong>the</strong> <strong>genome</strong> covered by subfamilies <strong>of</strong> DNAtransposons. The Charlie <strong>and</strong> Zaphod elements were hobo-Activator-Tam3 (hAT) DNAtransposons; Mariner, Tc2 <strong>and</strong> Tigger were Tc1-like elements. Unlike retroposons, DNAtransposons are thought to have a short life span in a <strong>genome</strong>. Thus, <strong>the</strong> average ormedian divergence <strong>of</strong> copies from <strong>the</strong> consensus is a particularly accurate measure <strong>of</strong> <strong>the</strong>age <strong>of</strong> <strong>the</strong> DNA transposon copies.Finally, LTR retroposons appear to be teetering on <strong>the</strong> brink <strong>of</strong>extinction, if <strong>the</strong>y have not already succumbed. For example, <strong>the</strong>most proli®c elements (ERVL <strong>and</strong> MaLRs) ¯ourished for more than100 Myr but appear to have died out about 40 Myr ago 163,164 . Only asingle LTR retroposon family (HERVK10) is known to have transposedsince our divergence from <strong>the</strong> chimpanzee 7 Myr ago, withonly one known copy (in <strong>the</strong> HLA region) that is not sharedbetween all <strong>human</strong>s 165 . In <strong>the</strong> draft <strong>genome</strong> sequence, we canidentify only three full-length copies with all ORFs intact (<strong>the</strong>®nal total may be slightly higher owing to <strong>the</strong> imperfect state <strong>of</strong><strong>the</strong> draft <strong>genome</strong> sequence).More generally, <strong>the</strong> overall activity <strong>of</strong> all transposons has declinedmarkedly over <strong>the</strong> past 35±50 Myr, with <strong>the</strong> possible exception <strong>of</strong>LINE1 (Fig. 18). Indeed, apart from an exceptional burst <strong>of</strong> activity<strong>of</strong> Alus peaking around 40 Myr ago, <strong>the</strong>re would appear to havebeen a fairly steady decline in activity in <strong>the</strong> hominid lineage since<strong>the</strong> mammalian radiation. The extent <strong>of</strong> <strong>the</strong> decline must be evengreater than it appears because old repeats are gradually removed byr<strong>and</strong>om deletion <strong>and</strong> because old repeat families are harder torecognize <strong>and</strong> likely to be under-represented in <strong>the</strong> repeat databases.(We con®rmed that <strong>the</strong> decline in transposition is not an artefactarising from errors in <strong>the</strong> draft <strong>genome</strong> sequence, which, inprinciple, could increase <strong>the</strong> divergence level in recent elements.First, <strong>the</strong> sequence error rate (Table 9) is far too low to have asigni®cant effect on <strong>the</strong> apparent age <strong>of</strong> recent transposons; <strong>and</strong>second, <strong>the</strong> same result is seen if one considers only ®nishedsequence.)What explains <strong>the</strong> decline in transposon activity in <strong>the</strong> lineageleading to <strong>human</strong>s? We return to this question below, in <strong>the</strong> context<strong>of</strong> <strong>the</strong> observation that <strong>the</strong>re is no similar decline in <strong>the</strong> mouse<strong>genome</strong>.Comparison with o<strong>the</strong>r organisms. We compared <strong>the</strong> complement<strong>of</strong> transposable elements in <strong>the</strong> <strong>human</strong> <strong>genome</strong> with those <strong>of</strong> <strong>the</strong>o<strong>the</strong>r sequenced eukaryotic <strong>genome</strong>s. We analysed <strong>the</strong> ¯y, worm<strong>and</strong> mustard weed <strong>genome</strong>s for <strong>the</strong> number <strong>and</strong> nature <strong>of</strong> repeats(Table 12) <strong>and</strong> <strong>the</strong> age distribution (Fig. 20). (For <strong>the</strong> ¯y, weanalysed <strong>the</strong> 114 Mb <strong>of</strong> un®nished `large' contigs produced by <strong>the</strong>whole-<strong>genome</strong> shotgun assembly 166 , which are reported to representeuchromatic sequence. Similar results were obtained by analysing30 Mb <strong>of</strong> ®nished euchromatic sequence.) The <strong>human</strong> <strong>genome</strong>st<strong>and</strong>s in stark contrast to <strong>the</strong> <strong>genome</strong>s <strong>of</strong> <strong>the</strong> o<strong>the</strong>r organisms.(1) The euchromatic portion <strong>of</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong> has a muchhigher density <strong>of</strong> transposable element copies than <strong>the</strong> euchromaticDNA <strong>of</strong> <strong>the</strong> o<strong>the</strong>r three organisms. The repeats in <strong>the</strong> o<strong>the</strong>rorganisms may have been slightly underestimated because <strong>the</strong>repeat databases for <strong>the</strong> o<strong>the</strong>r organisms are less complete thanfor <strong>the</strong> <strong>human</strong>, especially with regard to older elements; on <strong>the</strong> o<strong>the</strong>rh<strong>and</strong>, recent additions to <strong>the</strong>se databases appear to increase <strong>the</strong>repeat content only marginally.(2) The <strong>human</strong> <strong>genome</strong> is ®lled with copies <strong>of</strong> ancient transposons,whereas <strong>the</strong> transposons in <strong>the</strong> o<strong>the</strong>r <strong>genome</strong>s tend to be <strong>of</strong> morerecent origin. The difference is most marked with <strong>the</strong> ¯y, but is clearfor <strong>the</strong> o<strong>the</strong>r <strong>genome</strong>s as well. The accumulation <strong>of</strong> old repeats islikely to be determined by <strong>the</strong> rate at which organisms engage in`housecleaning' through genomic deletion. Studies <strong>of</strong> pseudogenesTable 12 Number <strong>and</strong> nature <strong>of</strong> interspersed repeats in eukaryotic <strong>genome</strong>sPercentage<strong>of</strong> basesHuman Fly Worm Mustard weedApproximatenumber <strong>of</strong>familiesPercentage<strong>of</strong> basesApproximatenumber <strong>of</strong>familiesPercentage<strong>of</strong> basesApproximatenumber <strong>of</strong>familiesPercentage<strong>of</strong> basesApproximatenumber <strong>of</strong>familiesLINE/SINE 33.40% 6 0.70% 20 0.40% 10 0.50% 10LTR 8.10% 100 1.50% 50 0.00% 4 4.80% 70DNA 2.80% 60 0.70% 20 5.30% 80 5.10% 80Total 44.40% 170 3.10% 90 6.50% 90 10.50% 160...................................................................................................................................................................................................................................................................................................................................................................The complete <strong>genome</strong>s <strong>of</strong> ¯y, worm, <strong>and</strong> chromosomes 2 <strong>and</strong> 4 <strong>of</strong> mustard weed (as deposited at ncbi.nlm.nih.gov/genbank/<strong>genome</strong>s) were screened against <strong>the</strong> repeats in RepBase Update 5.02(September 2000) with RepeatMasker at sensitive settings.882 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com