12.07.2015 Views

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

articlesaSum <strong>of</strong> aligned bases (kb)c1,200,000Sum <strong>of</strong> aligned bases (kb)900,000800,000700,000600,000500,000400,000300,000200,000100,00001,000,000800,000600,000400,000200,0000Intrachromosomal90 90.5 91 91.5 92 92.5 93 93.5 94 94.5 95 95.5 96 96.5 97 97.5 98 98.5 99Similarity (%)bSum <strong>of</strong> aligned bases (kb)dSum <strong>of</strong> aligned bases (kb)1,600,0001,400,0001,200,0001,000,000800,000600,000400,000200,00004,000,0003,500,0003,000,0002,500,0002,000,0001,500,0001,000,000500,000Interchromosomal90 90.5 91 91.5 92 92.5 93 93.5 94 94.5 95 95.5 96 96.5 97 97.5 98 98.5 99Similarity (%)01 2 3 4 5 6 7 8 9 10–19 20–29 30–39 40–49 50+ 1 2 3 4 5 6 7 8 9 10–19 20–29 30–39 40–49 50+Length <strong>of</strong> alignment (kb)Length <strong>of</strong> alignment (kb)Figure 33 a±d, Sequence properties <strong>of</strong> segmental duplications. Distributions <strong>of</strong> length<strong>and</strong> per cent nucleotide identity for segmental duplications are shown as a function <strong>of</strong> <strong>the</strong>number <strong>of</strong> aligned bp, for <strong>the</strong> subset <strong>of</strong> ®nished <strong>genome</strong> sequence. Intrachromosomal,red; interchromosomal, blue.duplications may well be underestimated by <strong>the</strong> current <strong>analysis</strong>. Anunderst<strong>and</strong>ing <strong>of</strong> <strong>the</strong> biology, pathology <strong>and</strong> evolution <strong>of</strong> <strong>the</strong>seduplications will require specialized efforts within <strong>the</strong>se exceptionalregions <strong>of</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong>. The presence <strong>and</strong> distribution <strong>of</strong>such segments may provide evolutionary fodder for processes <strong>of</strong>exon shuf¯ing <strong>and</strong> a general increase in protein diversity associatedwith domain accretion. It will be important to consider both<strong>genome</strong>-wide duplication events <strong>and</strong> more restricted punctuatedevents <strong>of</strong> <strong>genome</strong> duplication as forces in <strong>the</strong> evolution <strong>of</strong> vertebrate<strong>genome</strong>s.Table 17 Fraction <strong>of</strong> <strong>the</strong> draft <strong>genome</strong> sequence in inter- <strong>and</strong> intrachromosomalduplicationsChromosome Intrachromosomal (%) Interchromosomal (%) All (%)1 2.1 1.7 3.42 1.6 1.6 2.63 1.8 1.4 2.74 1.5 2.2 3.05 1.0 0.9 1.86 1.5 1.4 2.77 3.6 1.8 4.58 1.2 1.5 2.19 2.1 2.3 3.810 3.3 2.0 4.711 2.7 1.4 3.712 2.1 1.2 2.813 1.7 1.6 3.014 0.6 0.6 1.215 4.1 4.4 6.716 3.4 3.4 5.517 4.4 1.7 5.718 0.9 1.0 1.919 5.4 1.6 6.320 0.8 1.4 2.021 1.9 4.0 4.822 6.8 7.7 11.9X 1.2 1.1 2.2Y 10.9 13.1 20.8NA 2.3 7.8 8.3UL 11.6 20.8 22.2Total 2.3 2.0 3.6.............................................................................................................................................................................Excludes duplications with identities .98% to avoid artefactual duplication due to incompletemerger in <strong>the</strong> assembly process. Calculation was performed on an earlier version <strong>of</strong> <strong>the</strong> draft<strong>genome</strong> sequence based on data available in July 2000 <strong>and</strong> re¯ects <strong>the</strong> duplications found within<strong>the</strong> total amount <strong>of</strong> ®nished sequence <strong>the</strong>n. Note that <strong>the</strong>re is some overlap between <strong>the</strong>interchromosomal <strong>and</strong> intrachromosomal sets.Gene content <strong>of</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong>Genes (or at least <strong>the</strong>ir coding regions) comprise only a tiny fraction<strong>of</strong> <strong>human</strong> DNA, but <strong>the</strong>y represent <strong>the</strong> major biological function <strong>of</strong><strong>the</strong> <strong>genome</strong> <strong>and</strong> <strong>the</strong> main focus <strong>of</strong> interest by biologists. They arealso <strong>the</strong> most challenging feature to identify in <strong>the</strong> <strong>human</strong> <strong>genome</strong>sequence.The ultimate goal is to compile a complete list <strong>of</strong> all <strong>human</strong> genes<strong>and</strong> <strong>the</strong>ir encoded proteins, to serve as a `periodic table' forbiomedical research 243 . But this is a dif®cult task. In organismswith small <strong>genome</strong>s, it is straightforward to identify most genes by<strong>the</strong> presence <strong>of</strong> long ORFs. In contrast, <strong>human</strong> genes tend to havesmall exons (encoding an average <strong>of</strong> only 50 codons) separated bylong introns (some exceeding 10 kb). This creates a signal-to-noiseproblem, with <strong>the</strong> result that computer programs for direct geneprediction have only limited accuracy. Instead, computationalprediction <strong>of</strong> <strong>human</strong> genes must rely largely on <strong>the</strong> availability <strong>of</strong>cDNA sequences or on sequence conservation with genes <strong>and</strong>proteins from o<strong>the</strong>r organisms. This approach is adequate forstrongly conserved genes (such as histones or ubiquitin), but maybe less sensitive to rapidly evolving genes (including many crucial tospeciation, sex determination <strong>and</strong> fertilization).Here we describe our efforts to recognize both <strong>the</strong> RNA genes <strong>and</strong>protein-coding genes in <strong>the</strong> <strong>human</strong> <strong>genome</strong>. We also study <strong>the</strong>properties <strong>of</strong> <strong>the</strong> predicted <strong>human</strong> protein set, attempting to discernhow <strong>the</strong> <strong>human</strong> proteome differs from those <strong>of</strong> invertebrates such asworm <strong>and</strong> ¯y.Noncoding RNAsAlthough biologists <strong>of</strong>ten speak <strong>of</strong> a tight coupling between `genesTable 18 Cross-species comparison for large, highly homologous segmentalduplicationsPercentage <strong>of</strong> <strong>genome</strong> (%)Fly Worm Human (®nished)*. 1 kb 1.2 4.25 3.25. 5 kb 0.37 1.50 2.86. 10 kb 0.08 0.66 2.52.............................................................................................................................................................................* This is an underestimate <strong>of</strong> <strong>the</strong> total amount <strong>of</strong> segmental duplication in <strong>the</strong> <strong>human</strong> <strong>genome</strong>because it only re¯ects duplication detectable with available ®nished sequence. The proportion <strong>of</strong>segmental duplications <strong>of</strong> . 1 kb is probably about 5% (see text).892 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!