12.07.2015 Views

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

articlesmultifunctional proteins <strong>and</strong> 3 for cytoskeletal/structural. No suchgroups were found for defence <strong>and</strong> immunity or cell±cell communication.The 1-1-1-1 groups probably represent key functions that havenot undergone duplication <strong>and</strong> elaboration in <strong>the</strong> various lineages.They include many anabolic enzymes responsible for such functionsas respiratory chain <strong>and</strong> nucleotide biosyn<strong>the</strong>sis. In contrast, <strong>the</strong>reare few catabolic enzymes. As anabolic pathways branch lessfrequently than catabolic pathways, this indicates that alternativeroutes <strong>and</strong> displacements are more frequent in catabolic reactions.If proteins from <strong>the</strong> single-celled yeast are excluded from <strong>the</strong><strong>analysis</strong>, <strong>the</strong>re are 1,195 1-1-1 groups. The additional groupsinclude many examples <strong>of</strong> more complex signalling proteins, suchas receptor-type <strong>and</strong> src-like tyrosine kinases, likely to have arisenearly in <strong>the</strong> metazoan lineage. The fact that this set comprises only asmall proportion <strong>of</strong> <strong>the</strong> proteome <strong>of</strong> each <strong>of</strong> <strong>the</strong> animals indicatesthat, apart from a modest conserved core, <strong>the</strong>re has been extensiveelaboration <strong>and</strong> innovation within <strong>the</strong> protein complement.Most proteins do not show simple 1-1-1 orthologous relationshipsacross <strong>the</strong> three animals. To illustrate this, we investigated <strong>the</strong>nuclear hormone receptor family. In <strong>the</strong> <strong>human</strong> proteome, thisfamily consists <strong>of</strong> 60 different `classical' members, each with a zinc®nger <strong>and</strong> a lig<strong>and</strong>-binding domain. In comparison, <strong>the</strong> ¯y proteomehas 19 <strong>and</strong> <strong>the</strong> worm proteome has 220. As shown in Fig. 39,few simple orthologous relationships can be derived among <strong>the</strong>sehomologues. And, where potential subgroups <strong>of</strong> orthologues <strong>and</strong>(>200)(7)(4) Hepatocyte nuclear factors(3)Steroid(7)hormone(4)(2)(2)Ecdysone(3) Retinoic acids(2)(2)(6) Vitamin D3Ecdysone(3)(3)(3)(3)(3)HumanWormFly(3) Thyroid hormone(3) Retinoic acidsPeroxisomeproliferator activated(2) Apolipoproteinregulatory proteinFigure 39 Simpli®ed cladogram (relationship tree) <strong>of</strong> <strong>the</strong> `many-to-many' relationships <strong>of</strong>classical nuclear receptors. Triangles indicate expansion within one lineage; barsrepresent single members. Numbers in paren<strong>the</strong>ses indicate <strong>the</strong> number <strong>of</strong> paralogues ineach group.paralogues could be identi®ed, it was apparent that <strong>the</strong> functions <strong>of</strong><strong>the</strong> subgroup members could differ signi®cantly. For example, <strong>the</strong>¯y receptor for <strong>the</strong> ¯y-speci®c hormone ecdysone <strong>and</strong> <strong>the</strong> <strong>human</strong>retinoic acid receptors cluster toge<strong>the</strong>r on <strong>the</strong> basis <strong>of</strong> sequencesimilarity. Such examples underscore that <strong>the</strong> assignment <strong>of</strong> functionalsimilarity on <strong>the</strong> basis <strong>of</strong> sequence similarities among <strong>the</strong>sethree organisms is not trivial in most cases.New vertebrate domains <strong>and</strong> proteins. We <strong>the</strong>n explored how <strong>the</strong>proteome <strong>of</strong> vertebrates (as represented by <strong>the</strong> <strong>human</strong>) differs fromthose <strong>of</strong> <strong>the</strong> o<strong>the</strong>r species considered. The 1,262 InterPro familieswere scanned to identify those that contain only vertebrate proteins.Only 94 (7%) <strong>of</strong> <strong>the</strong> families were `vertebrate-speci®c'. Theserepresent 70 protein families <strong>and</strong> 24 domain families. Only one <strong>of</strong><strong>the</strong> 94 families represents enzymes, which is consistent with <strong>the</strong>ancient origins <strong>of</strong> most enzymes 336 . The single vertebrate-speci®cenzyme family identi®ed was <strong>the</strong> pancreatic or eosinophil-associatedribonucleases. These enzymes evolved rapidly, possibly tocombat vertebrate pathogens 337 .The relatively small proportion <strong>of</strong> vertebrate-speci®c multicopyfamilies suggests that few new protein domains have been inventedin <strong>the</strong> vertebrate lineage, <strong>and</strong> that most protein domains trace atleast as far back as a common animal ancestor. This conclusion mustbe tempered by <strong>the</strong> fact that <strong>the</strong> InterPro classi®cation system isincomplete; additional vertebrate-speci®c families undoubtedlyexist that have not yet been recognized in <strong>the</strong> InterPro system.The 94 vertebrate-speci®c families appear to re¯ect importantphysiological differences between vertebrates <strong>and</strong> o<strong>the</strong>r eukaryotes.Defence <strong>and</strong> immunity proteins (23 families) <strong>and</strong> proteins thatfunction in <strong>the</strong> nervous system (17 families) are particularlyenriched in this set. These data indicate <strong>the</strong> recent emergence orrapid divergence <strong>of</strong> <strong>the</strong>se proteins.Representative <strong>human</strong> proteins were previously known for nearlyall <strong>of</strong> <strong>the</strong> vertebrate-speci®c families. This was not surprising, given<strong>the</strong> anthropocentrism <strong>of</strong> biological research. However, <strong>the</strong> <strong>analysis</strong>did identify <strong>the</strong> ®rst mammalian proteins belonging to two <strong>of</strong> <strong>the</strong>sefamilies. Both <strong>of</strong> <strong>the</strong>se families were originally de®ned in ®sh. The®rst is <strong>the</strong> family <strong>of</strong> polar ®sh antifreeze III proteins. We found a<strong>human</strong> sialic acid synthase containing a domain homologous topolar ®sh antifreeze III protein (BAA91818.1). This ®nding suggeststhat ®sh created <strong>the</strong> antifreeze function by adaptation <strong>of</strong> thisdomain. We also found a <strong>human</strong> protein (CAB60269.1) homologousto <strong>the</strong> ependymin found in teleost ®sh. Ependymins aremajor glycoproteins <strong>of</strong> ®sh brains that have been claimed to beinvolved in long-term memory formation 338 . The function <strong>of</strong> <strong>the</strong>mammalian ependymin homologue will need to be elucidated.New architectures from old domains. Whereas <strong>the</strong>re appears to beonly modest invention at <strong>the</strong> level <strong>of</strong> new vertebrate proteindomains, <strong>the</strong>re appears to be substantial innovation in <strong>the</strong> creation<strong>of</strong> new vertebrate proteins. This innovation is evident at <strong>the</strong> level <strong>of</strong>domain architecture, de®ned as <strong>the</strong> linear arrangement <strong>of</strong> domainswithin a polypeptide. New architectures can be created by shuf¯ing,adding or deleting domains, resulting in new proteins from oldparts.We quanti®ed <strong>the</strong> number <strong>of</strong> distinct protein architectures foundin yeast, worm, ¯y <strong>and</strong> <strong>human</strong> by using <strong>the</strong> SMART annotationresource 339 (Fig. 40). The <strong>human</strong> proteome set contained 1.8 timesas many protein architectures as worm or ¯y <strong>and</strong> 5.8 times as manyas yeast. This difference is most prominent in <strong>the</strong> recent evolution <strong>of</strong>novel extracellular <strong>and</strong> transmembrane architectures in <strong>the</strong> <strong>human</strong>lineage. Human extracellular proteins show <strong>the</strong> greatest innovation:<strong>the</strong> <strong>human</strong> has 2.3 times as many extracellular architectures as ¯y<strong>and</strong> 2.0 times as many as worm. The larger number <strong>of</strong> <strong>human</strong>architectures does not simply re¯ect differences in <strong>the</strong> number <strong>of</strong>domains known in <strong>the</strong>se organisms; <strong>the</strong> result remains qualitatively<strong>the</strong> same even if <strong>the</strong> number <strong>of</strong> architectures in each organism isnormalized by dividing by <strong>the</strong> total number <strong>of</strong> domains (notshown). (We also checked that <strong>the</strong> larger number <strong>of</strong> <strong>human</strong>904 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!