12.07.2015 Views

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

articlesinclude <strong>the</strong> DiGeorge/velocardi<strong>of</strong>acial syndrome region on chromosome22 (ref. 238) <strong>and</strong> <strong>the</strong> Williams±Beuren syndrome recurrentdeletion on chromosome 7 (ref. 239).The availability <strong>of</strong> <strong>the</strong> <strong>genome</strong> sequence also allows rapid identi-®cation <strong>of</strong> paralogues <strong>of</strong> disease genes, which is valuable for tworeasons. First, mutations in a paralogous gene may give rise to arelated genetic disease. A good example, discovered through use <strong>of</strong><strong>the</strong> <strong>genome</strong> sequence, is achromatopsia (complete colour blindness).The CNGA3 gene, encoding <strong>the</strong> a-subunit <strong>of</strong> <strong>the</strong> conephotoreceptor cyclic GMP-gated channel, had been shown toharbour mutations in some families with achromatopsia. Computationalsearching <strong>of</strong> <strong>the</strong> <strong>genome</strong> sequences revealed <strong>the</strong> paralogousgene encoding <strong>the</strong> corresponding b-subunit, CNGB3 (which hadnot been apparent from EST databases). The CNGB3 gene wasrapidly shown to be <strong>the</strong> cause <strong>of</strong> achromatopsia in o<strong>the</strong>rfamilies 407,408 . Ano<strong>the</strong>r example is provided by <strong>the</strong> presenilin-1<strong>and</strong> presenilin-2 genes, in which mutations can cause early-onsetAlzheimer's disease 423,424 . Second, <strong>the</strong> paralogue may provide anopportunity for <strong>the</strong>rapeutic intervention, as exempli®ed byattempts to reactivate <strong>the</strong> fetally expressed haemoglobin genes inindividuals with sickle cell disease or b-thalassaemia, caused bymutations in <strong>the</strong> b-globin gene 425 .We undertook a systematic search for paralogues <strong>of</strong> 971 known<strong>human</strong> disease genes with entries in both <strong>the</strong> Online MendelianInheritance in Man (OMIM) database (http://www.ncbi.nlm.nih.gov/Omim/) <strong>and</strong> ei<strong>the</strong>r <strong>the</strong> SwissProt or TrEMBL protein databases.We identi®ed 286 potential paralogues (with <strong>the</strong> requirement <strong>of</strong> amatch <strong>of</strong> at least 50 amino acids with identity greater than 70% butless than 90% if on <strong>the</strong> same chromosome, <strong>and</strong> less than 95% if on adifferent chromosome). Although this <strong>analysis</strong> may have identi®edsome pseudogenes, 89% <strong>of</strong> <strong>the</strong> matches showed homology overmore than one exon in <strong>the</strong> new target sequence, suggesting thatmany are functional. This <strong>analysis</strong> shows <strong>the</strong> potential for rapididenti®cation <strong>of</strong> disease gene paralogues in silico.Drug targetsOver <strong>the</strong> past century, <strong>the</strong> pharmaceutical industry has largelydepended upon a limited set <strong>of</strong> drug targets to develop newTable 26 Disease genes positionally cloned using <strong>the</strong> draft <strong>genome</strong>sequenceLocus Disorder Reference(s)BRCA2 Breast cancer susceptibility 55AIREAutoimmune polygl<strong>and</strong>ular syndrome type 1 (APS1 389or APECED)PEX1 Peroxisome biogenesis disorder 390, 391PDS Pendred syndrome 392XLP X-linked lymphoproliferative disease 393DFNA5 Nonsyndromic deafness 394ATP2A2 Darier's disease 395SEDL X-linked spondyloepiphyseal dysplasia tarda 396WISP3 Progressive pseudorheumatoid dysplasia 397CCM1 Cerebral cavernous malformations 398, 399COL11A2/DFNA13 Nonsyndromic deafness 400LGMD 2G Limb-girdle muscular dystrophy 401EVCEllis-Van Creveld syndrome, Weyer's acrodental 402dysostosisACTN4 Familial focal segmental glomerulosclerosis 403SCN1A Generalized epilepsy with febrile seizures plus type 2 404AASS Familial hyperlysinaemia 405NDRG1 Hereditary motor <strong>and</strong> sensory neuropathy-Lom 406CNGB3 Total colour-blindness 407, 408MUL Mulibrey nanism 409USH1C Usher type 1C 410, 411MYH9 May-Hegglin anomaly 412, 413PRKAR1A Carney's complex 414MYH9 Nonsyndromic hereditary deafness DFNA17 415SCA10 Spinocerebellar ataxia type 10 416OPA1 Optic atrophy 417XLCSNB X-linked congenital stationary night blindness 418FGF23 Hypophosphataemic rickets 419GAN Giant axonal neuropathy 420AAAS Triple-A syndrome 421HSPG2 Schwartz-Jampel syndrome 422.............................................................................................................................................................................<strong>the</strong>rapies. A recent compendium 426,427 lists 483 drug targets asaccounting for virtually all drugs on <strong>the</strong> market. Knowing <strong>the</strong>complete set <strong>of</strong> <strong>human</strong> genes <strong>and</strong> proteins will greatly exp<strong>and</strong> <strong>the</strong>search for suitable drug targets. Although only a minority <strong>of</strong> <strong>human</strong>genes may be drug targets, it has been predicted that <strong>the</strong> number willexceed several thous<strong>and</strong>, <strong>and</strong> this prospect has led to a massiveexpansion <strong>of</strong> genomic research in pharmaceutical research <strong>and</strong>development. A few examples will illustrate <strong>the</strong> point.(1) The neurotransmitter serotonin (5-HT) mediates rapid excitatoryresponses through lig<strong>and</strong>-gated channels. The previouslyidenti®ed 5-HT 3A receptor gene produces functional receptors,but with a much smaller conductance than observed in vivo.Cross-hybridization experiments <strong>and</strong> <strong>analysis</strong> <strong>of</strong> ESTs failed toreveal any o<strong>the</strong>r homologues <strong>of</strong> <strong>the</strong> known receptor. Recently,however, by searching <strong>the</strong> <strong>human</strong> draft <strong>genome</strong> sequence at lowstringency, a putative homologue was identi®ed within a PAC clonefrom <strong>the</strong> long arm <strong>of</strong> chromosome 11 (ref. 428). The homologuewas shown to be expressed in <strong>the</strong> amygdala, caudate <strong>and</strong> hippocampus,<strong>and</strong> a full-length cDNA was subsequently obtained. Thegene, which codes for a serotonin receptor, was named 5-HT 3B .When assembled in a heterodimer with 5-HT 3A , it was shown toaccount for <strong>the</strong> large-conductance neuronal serotonin channel.Given <strong>the</strong> central role <strong>of</strong> <strong>the</strong> serotonin pathway in mood disorders<strong>and</strong> schizophrenia, <strong>the</strong> discovery <strong>of</strong> a major new <strong>the</strong>rapeutic targetis <strong>of</strong> considerable interest.(2) The contractile <strong>and</strong> in¯ammatory actions <strong>of</strong> <strong>the</strong> cysteinylleukotrienes, formerly known as <strong>the</strong> slow reacting substance <strong>of</strong>anaphylaxis (SRS-A), are mediated through speci®c receptors. Thesecond such receptor, CysLT 2 , was identi®ed using <strong>the</strong> combination<strong>of</strong> a rat EST <strong>and</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong> sequence. This led to <strong>the</strong>cloning <strong>of</strong> a gene with 38% amino-acid identity to <strong>the</strong> only o<strong>the</strong>rreceptor that had previously been identi®ed 429 . This new receptor,which shows high-af®nity binding to several leukotrienes, maps to aregion <strong>of</strong> chromosome 13 that is linked to atopic asthma. The geneis expressed in airway smooth muscles <strong>and</strong> in <strong>the</strong> heart. As <strong>the</strong>leukotriene pathway has been a signi®cant target for <strong>the</strong> development<strong>of</strong> drugs against asthma, <strong>the</strong> discovery <strong>of</strong> a new receptor hasobvious <strong>and</strong> important consequences.(3) Abundant deposition <strong>of</strong> b-amyloid in senile plaques is <strong>the</strong>hallmark <strong>of</strong> Alzheimer's disease. b-Amyloid is generated by proteolyticprocessing <strong>of</strong> <strong>the</strong> amyloid precursor protein (APP). One <strong>of</strong><strong>the</strong> enzymes involved is <strong>the</strong> b-site APP-cleaving enzyme (BACE),which is a transmembrane aspartyl protease. Computationalsearching <strong>of</strong> <strong>the</strong> public <strong>human</strong> draft <strong>genome</strong> sequence recentlyidenti®ed a new sequence homologous to BACE, encoding a proteinnow named BACE2 430,431 . BACE2, which has 52% amino-acidsequence identity to BACE, contains two active protease sites <strong>and</strong>maps to <strong>the</strong> obligatory Down's syndrome region <strong>of</strong> chromosome 21,as does APP. This raises <strong>the</strong> question <strong>of</strong> whe<strong>the</strong>r <strong>the</strong> extra copies <strong>of</strong>both BACE2 <strong>and</strong> APP may contribute to accelerated deposition <strong>of</strong>b-amyloid in <strong>the</strong> brains <strong>of</strong> Down's syndrome patients. The development<strong>of</strong> antagonists to BACE <strong>and</strong> BACE2 represents a promisingapproach to preventing Alzheimer's disease.Given <strong>the</strong>se examples, we undertook a systematic effort toidentify paralogues <strong>of</strong> <strong>the</strong> classic drug target proteins in <strong>the</strong> draft<strong>genome</strong> sequence. The target list 427 was used to identify 603 entriesin <strong>the</strong> SwissProt database with unique accession numbers. Thesewere <strong>the</strong>n searched against <strong>the</strong> current <strong>genome</strong> sequence database,using <strong>the</strong> requirement that a match should have 70±100% identityto at least 50 amino acids. Matches to named proteins were ignored,as we assumed that <strong>the</strong>se represented known homologues.We found 18 putative novel paralogues (Table 27), includingapparent dopamine receptors, purinergic receptors <strong>and</strong> insulin-likegrowth factor receptors. In six cases, <strong>the</strong> novel paralogue matches atleast one EST, adding con®dence that this search process canidentify novel functional genes. For <strong>the</strong> remaining 12 putativeparalogues without an EST match, all have long ORFs <strong>and</strong> all but912 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!