12.07.2015 Views

Evolutionary Rate Heterogeneity of Core and Attachment Proteins in ...

Evolutionary Rate Heterogeneity of Core and Attachment Proteins in ...

Evolutionary Rate Heterogeneity of Core and Attachment Proteins in ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GBE<strong>Evolutionary</strong> <strong>Rate</strong> <strong>Heterogeneity</strong> <strong>of</strong> <strong>Core</strong> <strong>and</strong> <strong>Attachment</strong><strong>Prote<strong>in</strong>s</strong> <strong>in</strong> Yeast Prote<strong>in</strong> ComplexesS<strong>and</strong>ip Chakraborty <strong>and</strong> Tapash Ch<strong>and</strong>ra Ghosh*Bio<strong>in</strong>formatics Centre, Bose Institute, Kolkata, West Bengal, India*Correspond<strong>in</strong>g author: E-mail: tapash@jcbose.ac.<strong>in</strong>.Accepted: June 23, 2013AbstractIn general, prote<strong>in</strong>s do not work alone; they form macromolecular complexes to play fundamental roles <strong>in</strong> diverse cellular functions.On the basis <strong>of</strong> their iterative cluster<strong>in</strong>g procedure <strong>and</strong> frequency <strong>of</strong> occurrence <strong>in</strong> the macromolecular complexes, the prote<strong>in</strong>subunits have been categorized as core <strong>and</strong> attachment. <strong>Core</strong> prote<strong>in</strong> subunits are the ma<strong>in</strong> functional elements, whereas attachmentprote<strong>in</strong>s act as modifiers or activators <strong>in</strong> prote<strong>in</strong> complexes. In this article, us<strong>in</strong>g the current data set <strong>of</strong> yeast prote<strong>in</strong> complexes,we found that core prote<strong>in</strong>s are evolv<strong>in</strong>g at a faster rate than attachment prote<strong>in</strong>s <strong>in</strong> spite <strong>of</strong> their functional importance. Interest<strong>in</strong>gly,our <strong>in</strong>vestigation revealed that attachment prote<strong>in</strong>s are present <strong>in</strong> a higher number <strong>of</strong> macromolecular complexes than core prote<strong>in</strong>s.We also observed that the prote<strong>in</strong> complex number (def<strong>in</strong>ed as the number <strong>of</strong> prote<strong>in</strong> complexes <strong>in</strong> which a prote<strong>in</strong> subunit belongs)has a stronger <strong>in</strong>fluence on gene/prote<strong>in</strong> essentiality than multifunctionality. F<strong>in</strong>ally, our results suggest that the observed differences<strong>in</strong> the rates <strong>of</strong> prote<strong>in</strong> evolution between core <strong>and</strong> attachment prote<strong>in</strong>s are due to differences <strong>in</strong> prote<strong>in</strong> complex number <strong>and</strong>expression level. Moreover, we conclude that prote<strong>in</strong>s which are present <strong>in</strong> higher numbers <strong>of</strong> macromolecular complexes enhancetheir overall expression level by <strong>in</strong>creas<strong>in</strong>g their transcription rate as well as translation rate, <strong>and</strong> thus the prote<strong>in</strong> complex numberimposes a strong selection pressure on the evolution <strong>of</strong> yeast proteome.Key words: core prote<strong>in</strong>, attachment prote<strong>in</strong>, evolutionary rate, yeast, prote<strong>in</strong> complex, prote<strong>in</strong> complex number.IntroductionDifferent prote<strong>in</strong>s evolve at different rates. The central problem<strong>in</strong> molecular evolution is to comprehend the factors forthe differential evolutionary rates <strong>of</strong> prote<strong>in</strong>s. The orig<strong>in</strong>s <strong>of</strong>variation <strong>in</strong> prote<strong>in</strong> evolutionary rates have rema<strong>in</strong>ed the coreissue between the selectionists <strong>and</strong> the neutralists (Fisher1930; Kimura <strong>and</strong> Ohta 1974; Kimura 1983; Razeto-Barryet al. 2011). The neutral theory <strong>of</strong> molecular evolution accentuatesthat most <strong>of</strong> the nucleotide substitutions <strong>in</strong> a gene aredue to the r<strong>and</strong>om fixation <strong>of</strong> neutral mutations (Kimura1983), which is <strong>in</strong> accordance with the proposal that functionallyimportant genes should evolve slower than less importantgenes (Kimura <strong>and</strong> Ohta 1974). Accord<strong>in</strong>g to this proposal, ithas been demonstrated that essential genes evolve slowerthan the genes which are dispensable (Jordan et al. 2002).Moreover, prote<strong>in</strong>s those <strong>in</strong>teract with a large number <strong>of</strong> partnersevolve at a slower rate compared with the prote<strong>in</strong>s withfewer number <strong>of</strong> <strong>in</strong>teract<strong>in</strong>g partners (Hirsh <strong>and</strong> Fraser 2001;Jordan et al. 2002; Fraser et al. 2003; Hahn <strong>and</strong> Kern 2005;Lemos et al. 2005; Chakraborty et al. 2011; Alvarez-Ponce<strong>and</strong> Fares 2012). However, these results have been questionedby some researchers (Batada et al. 2006). On the other h<strong>and</strong>,it has been demonstrated that the gene expression level is animportant constra<strong>in</strong>t <strong>in</strong> prote<strong>in</strong> evolution (Subramanian <strong>and</strong>Kumar 2004; Drummond et al. 2005, 2006; Drummond <strong>and</strong>Wilke 2008); highly expressed prote<strong>in</strong>s are more conservedthan lowly expressed prote<strong>in</strong>s. This phenomenon has beenattributed to natural selection (Popescu et al. 2006).Moreover,itwasshownthatabouthalf<strong>of</strong>thevariation<strong>in</strong>prote<strong>in</strong> evolutionary rates can be expla<strong>in</strong>ed by expression level(Drummond et al. 2005).<strong>Prote<strong>in</strong>s</strong> do not carry out their functions alone, they <strong>of</strong>tenact by participat<strong>in</strong>g <strong>in</strong> macromolecular complexes <strong>and</strong> playdifferent functional roles (Qiu <strong>and</strong> Noble 2008; Chakrabortyet al. 2010). Additionally, it has been found that the multiprote<strong>in</strong>complexes are enriched for essential genes (Semple et al.2008), <strong>and</strong> prote<strong>in</strong>s those are present <strong>in</strong> several prote<strong>in</strong> complexassemblies have a tendency to be more essential thanprote<strong>in</strong>s that are present <strong>in</strong> fewer prote<strong>in</strong> complex assemblies(Pereira-Leal et al. 2006). In contrast, Pache et al. (2009)reported that there is no such significant correlation betweenDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014ß The Author(s) 2013. Published by Oxford University Press on behalf <strong>of</strong> the Society for Molecular Biology <strong>and</strong> Evolution.This is an Open Access article distributed under the terms <strong>of</strong> the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permitsnon-commercial re-use, distribution, <strong>and</strong> reproduction <strong>in</strong> any medium, provided the orig<strong>in</strong>al work is properly cited. For commercial re-use, please contact journals.permissions@oup.com1366 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013


Chakraborty <strong>and</strong> GhoshGBEEssential Genes <strong>and</strong> Calculation <strong>of</strong> EssentialityTo obta<strong>in</strong> a data set <strong>of</strong> essential genes <strong>of</strong> yeast, we downloadedthe essential genes list from the OGEE database (http://ogeedb.embl.de/#summary, last accessed July 15, 2013)(Chen et al. 2012). Moreover, we used the fitness quantitativedata <strong>of</strong> Ste<strong>in</strong>metz et al. (2002) to calculate the essentiality foreach gene. The fitness data were downloaded from Pacheet al. (2009) (Additional file 3: Quantitative fitness data:http://www.biomedcentral.com/1752-0509/3/74/additional,last accessed July 15, 2013) <strong>and</strong> considered only the growth <strong>in</strong>the YPD medium (1% Bacto-peptone, 2% yeast extract, <strong>and</strong>2% glucose). The normalized essentiality (E i ) was calculated by f iE i ¼ 1f maxwhere f i represents the fitness values for the deletion <strong>of</strong> the ithgene <strong>and</strong> f max represents the maximum fitness value.Therefore, the gene essentiality ranges from 0 to 1, with anessentiality <strong>of</strong> 0 denot<strong>in</strong>g that a gene has no measurable effecton fitness, whereas 1 denotes the gene is essential for survival.Calculation <strong>of</strong> <strong>Evolutionary</strong> <strong>Rate</strong>We calculated the average dN/dS ratio as a measure <strong>of</strong> evolutionaryrate, which is the ratio <strong>of</strong> the number <strong>of</strong> nonsynonymoussubstitutions per nonsynonymous site (dN)tothenumber<strong>of</strong> synonymous substitutions per synonymous site (dS). To calculatethe dN/dS ratio, we compared S. cerevisiae sequenceswith its orthologous sequences <strong>in</strong> S. paradoxus, S. bayanus, <strong>and</strong>S. mikatae, as they are sibl<strong>in</strong>g species (Naumov et al. 1992). Weobta<strong>in</strong>ed whole cod<strong>in</strong>g sequences <strong>of</strong> four yeast stra<strong>in</strong>s from theSaccharomyces Genome Database (Cherry et al. 2012) (forS. cerevisiae: http://downloads.yeastgenome.org/sequence/S288C_reference/orf_dna/orf_cod<strong>in</strong>g.fasta.gz [last accessedJuly 15, 2013]; S. paradoxus: http://downloads.yeastgenome.org/sequence/fungi/S_paradoxus/archive/MIT/orf_dna/orf_genomic.fasta.gz [last accessed July 15, 2013]; S. bayanus: http://downloads.yeastgenome.org/sequence/fungi/S_bayanus/archive/MIT/orf_dna/orf_genomic.fasta.gz [last accessed July 15,2013]; <strong>and</strong> S. mikatae: http://downloads.yeastgenome.org/sequence/fungi/S_mikatae/archive/MIT/orf_dna/orf_genomic.fasta.gz [last accessed July 15, 2013]) to calculate the evolutionaryrates (dN/dS). Then, we translate these sequences <strong>in</strong>to prote<strong>in</strong>sequence us<strong>in</strong>g the EMBOSS Transeq program (http://www.ebi.ac.uk/Tools/st/emboss_transeq/, last accessed July15, 2013). By us<strong>in</strong>g the National Center for BiotechnologyInformation BlastP program (version 2.2.17) (Altschul et al.1997), we identified the orthologous prote<strong>in</strong>s <strong>of</strong> S. cerevisiae<strong>in</strong> S. paradoxus, S. baynus, <strong>and</strong> S. mikatate. Weusedtheexpectationvalue 1.0 10 5 as a cut<strong>of</strong>f <strong>and</strong> ma<strong>in</strong>ta<strong>in</strong>ed at least75% sequence similarity between the two sequences with am<strong>in</strong>imum alignment overlap 80%. The gaps allowed <strong>in</strong> thealignment were less than 3%. We also verified our resultswith the results <strong>of</strong> Kellis et al. (2003). Then the pair-wisealignment was performed us<strong>in</strong>g the ClustalW (version 2.0)(Lark<strong>in</strong> et al. 2007) for each set <strong>of</strong> orthologs gene pair. dN/dSvalues were calculated by Yang <strong>and</strong> Nielsen method (Yang <strong>and</strong>Nielsen 2000) us<strong>in</strong>g the PAML package (version 4) (Yang 2007).We took the average values <strong>of</strong> dN/dS for each S. cerevisiae ORFwhere at least one orthologous pair was present (supplementarytable S2, Supplementary Material onl<strong>in</strong>e).Prote<strong>in</strong>–Prote<strong>in</strong> Interactions DataWe collected the prote<strong>in</strong>–prote<strong>in</strong> <strong>in</strong>teractions (PPIs) <strong>in</strong>formationfrom the DIP (Database <strong>of</strong> Interact<strong>in</strong>g <strong>Prote<strong>in</strong>s</strong>: http://dip.doe-mbi.ucla.edu/, last accessed July 15, 2013) (Salw<strong>in</strong>skiet al. 2004), MINT (Molecular INTeraction Database: http://m<strong>in</strong>t.bio.uniroma2.it/m<strong>in</strong>t/, last accessed July 15, 2013)(Licata et al. 2012), <strong>and</strong> IntAct (http://www.ebi.ac.uk/<strong>in</strong>tact,last accessed July 15, 2013) (Kerrien et al. 2012). In thosedatabases, the PPIs were documented experimentally bygenome wide two-hybrid screen, immune precipitation, aff<strong>in</strong>ityb<strong>in</strong>d<strong>in</strong>g, antibody blockage, <strong>and</strong> so on. To collect highthroughputPPIs data from the DIP database, we used theCORE data set <strong>of</strong> S. cerevisiae (baker’s yeast)(Scere20120228CR). In the CORE data set, the PPIs were identifiedby high-throughput methods <strong>and</strong> small-scale experiments,thus the data <strong>in</strong> the CORE are highly reliable (Deaneet al. 2002). However, there is no predef<strong>in</strong>ed high confidencedata <strong>in</strong> MINT <strong>and</strong> IntAct, thus we used the confidence score tocollect the high-throughput data from MINT <strong>and</strong> IntAct. Wecalculated the average confidence score for each data set <strong>and</strong>then used this average value as a cut<strong>of</strong>f to identify the highthroughputPPIs (for MINT, the cut<strong>of</strong>f score is 0.30, <strong>and</strong> forIntAct, the cut <strong>of</strong>f score is 0.40). After merg<strong>in</strong>g the threedatabases (DIP, MINT, <strong>and</strong> IntAct) <strong>and</strong> remov<strong>in</strong>g the redundant<strong>and</strong> self-<strong>in</strong>teractions, we obta<strong>in</strong>ed 3,580 <strong>in</strong>dividual prote<strong>in</strong>s<strong>and</strong> 12,624 b<strong>in</strong>ary <strong>in</strong>teractions (fig. 1) (supplementarytable S3, Supplementary Material onl<strong>in</strong>e).Prote<strong>in</strong> Expression, Transcription <strong>Rate</strong>,<strong>and</strong> Translation <strong>Rate</strong>The gene expression (number <strong>of</strong> mRNAs molecules per cell) <strong>and</strong>transcription rate (number <strong>of</strong> mRNAs molecules per hour) werecollected from genome wide expression analysis <strong>of</strong> Holstegeet al. (1998) (http://web.wi.mit.edu/young/pub/data/orf_transcriptome.txt, last accessed July 15, 2013). We also used amore up-to-date (Miller et al. 2011) expression data set forvalidation purposes. Dynamic transcriptome analysis wasused to realistically monitor the mRNA metabolism <strong>in</strong> yeast(Miller et al. 2011). We downloaded the translation rate(number <strong>of</strong> prote<strong>in</strong>s molecules per seconds) <strong>of</strong> yeast prote<strong>in</strong>sfrom the data set <strong>of</strong> Arava et al. (2003).S<strong>of</strong>twareWe used SPSS (version 13.0) (Nie et al. 1970) <strong>and</strong> Tanagra(version 1.4.36) (Rokotomalala 2005) for all statisticalDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 20141368 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013


<strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> Yeast Prote<strong>in</strong> ComplexesGBEFIG. 1.—Venn diagram <strong>of</strong> PPI <strong>in</strong>formation collected from the DIP,MINT, <strong>and</strong> IntAct. Here prote<strong>in</strong> is abbreviated by “P” <strong>and</strong> PPI is abbreviatedby “I.” The “light green” color <strong>in</strong>dicates the DIP data set doma<strong>in</strong>, “lightblue” color <strong>in</strong>dicates the MINT data set doma<strong>in</strong>, <strong>and</strong> “light saffron” color<strong>in</strong>dicates the IntAct data set doma<strong>in</strong>.calculations. In all statistical analysis, we used 95% level <strong>of</strong>confidence as a measure <strong>of</strong> significance. The network statistic(Degree) was calculated us<strong>in</strong>g the Pajek s<strong>of</strong>tware package(Batagelj <strong>and</strong> Mrvar 2004).Results<strong>Evolutionary</strong> <strong>Rate</strong>s, Multifunctionality, <strong>and</strong> Essentiality <strong>of</strong><strong>Core</strong> <strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong>Yeast prote<strong>in</strong> complexes were taken from the supplementarydata set <strong>of</strong> Gav<strong>in</strong> et al. (2006), which comprised 1,487 prote<strong>in</strong>sbelong<strong>in</strong>g to 491 prote<strong>in</strong> complexes. In their work, Gav<strong>in</strong>et al. (2006) identified 357 <strong>and</strong> 339 subunits as core <strong>and</strong>attachment prote<strong>in</strong>s, respectively. The rema<strong>in</strong><strong>in</strong>g 791 subunitsare present <strong>in</strong> both core <strong>and</strong> attachment prote<strong>in</strong>s. We havenot considered those 791 prote<strong>in</strong>s for evolutionary rate analysesdue to their presence <strong>in</strong> both core <strong>and</strong> attachment prote<strong>in</strong>s.Estimation <strong>of</strong> evolutionary rates <strong>in</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s shows that attachment prote<strong>in</strong>s are more conservedthan core prote<strong>in</strong>s (average evolutionary rates <strong>of</strong> core prote<strong>in</strong>s:0.0901 (þ0.0031) (N ¼ 348), attachment prote<strong>in</strong>s:0.0772 (þ0.0040) (N ¼ 325); Mann–Whitney U test,P ¼ 2.5 10 5 )(fig. 2).The function <strong>of</strong> a prote<strong>in</strong> is one <strong>of</strong> the most importantparameters that <strong>in</strong>fluence prote<strong>in</strong> evolution, <strong>and</strong> it has beenreported that multifunctional genes evolve slowly (Wilsonet al. 1977; Podder et al. 2009). Moreover, core prote<strong>in</strong>s arealways present <strong>in</strong> all different subsets <strong>of</strong> is<strong>of</strong>orms <strong>of</strong> a complexassembly, <strong>and</strong> thus they have been reported as the functionalunits <strong>of</strong> that complex assembly (Dezso et al. 2003; Gav<strong>in</strong> et al.2006). Therefore, the role <strong>of</strong> each core prote<strong>in</strong> is crucial forthe functional <strong>in</strong>tegrity <strong>of</strong> a prote<strong>in</strong> complex (Dezso et al.2003). Hence, we measured the multifunctionality <strong>of</strong> core<strong>and</strong> attachment prote<strong>in</strong>s by count<strong>in</strong>g unique GO-BP terms,which are widely used to calculate the multifunctionality <strong>of</strong> aprote<strong>in</strong> (Camon et al. 2004; Pál et al. 2006; Salathé et al.2006; Podder et al. 2009; Su et al. 2010; Flicek et al. 2011;Yang <strong>and</strong> Gaut 2011). We observed that core prote<strong>in</strong>s are<strong>in</strong>volved <strong>in</strong> a higher number <strong>of</strong> biological processes thanattachment prote<strong>in</strong>s (table 1). Nonetheless, the GO terms,widely used for functional characterization but several drawbackshave been reported to be associated with the GO terms(Dolan et al. 2005; Park et al. 2011). In particular, GO annotationis systematically redundant <strong>and</strong> the biological doma<strong>in</strong> is<strong>in</strong>consistent (Park et al. 2011). Therefore, we also used theMIPS-CYGD (Guldener et al. 2005; Mak<strong>in</strong>o <strong>and</strong> Gojobori2006; Chakraborty et al. 2010; Mewes et al. 2011) <strong>and</strong>theKEGG (Kanehisa et al. 2012) databases, <strong>and</strong> we countedthe number <strong>of</strong> pathways <strong>in</strong> which a prote<strong>in</strong> takes part. Wefound similar trends, that is, core prote<strong>in</strong>s are <strong>in</strong>volved <strong>in</strong> ahigher number <strong>of</strong> pathways/functions than attachmentprote<strong>in</strong>s (table 1). However, we found that core prote<strong>in</strong>sevolve faster than the attachment prote<strong>in</strong>s <strong>in</strong> spite <strong>of</strong> theirhigher multifunctionality. Thus the prevail<strong>in</strong>g idea that,“multifunctional prote<strong>in</strong>s evolve at a slower rate” (Wilsonet al. 1977; Podder et al. 2009) does not seem to be compatible<strong>in</strong> expla<strong>in</strong><strong>in</strong>g the evolutionary rate differences betweencore <strong>and</strong> attachment prote<strong>in</strong>s.It has been reported that both gene essentiality <strong>and</strong> prote<strong>in</strong>multifunctionality are negatively associated with prote<strong>in</strong>evolutionary rates (Jordan et al. 2002; Podder et al. 2009).Therefore, it could be expected that essential genes aremultifunctional. Indeed, we found a positive associationbetween gene essentiality <strong>and</strong> multifunctionality <strong>in</strong> ourdata set (Spearman’s r multifunctionality vs. essentiality ¼ 0.1689,P ¼ 1.0 10 6 ), that is, the proportion <strong>of</strong> essential genes ishigher <strong>in</strong> highly multifunctional prote<strong>in</strong>s. To validate the aboveresults, we further divided our data set <strong>in</strong>to two groups byus<strong>in</strong>g the mean value <strong>of</strong> GO-multifunctionality as a cut<strong>of</strong>f.Genes with a multifunctionality higher than that <strong>of</strong> the correspond<strong>in</strong>gmean value were considered as highly functional(HF) (N ¼ 436) <strong>and</strong> those with a lower one, as low-functional(LF) (N ¼ 260) groups. We found a significantly (two-sidedFisher’s exact test, P ¼ 1.1 10 3 ) higher proportion <strong>of</strong> essentialgenes <strong>in</strong> the HF group (41.28%) compared with the LFgroup (28.85%); <strong>in</strong>dicat<strong>in</strong>g that HF prote<strong>in</strong>s are more essentialthan the low functional prote<strong>in</strong>s. Hence, one would expect ahigher proportion <strong>of</strong> essential genes <strong>in</strong> core prote<strong>in</strong>s thanattachment prote<strong>in</strong>s. Surpris<strong>in</strong>gly, we observed a similar proportion<strong>of</strong> essential genes <strong>in</strong> core <strong>and</strong> attachment prote<strong>in</strong>s(two-sided Fisher’s exact test, <strong>Core</strong>: 38.66%, <strong>Attachment</strong>s:34.51%, P ¼ 2.7 10 1 ). Moreover, we found that core<strong>and</strong> attachment prote<strong>in</strong>s exhibit similar essentiality (averageDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013 1369


Chakraborty <strong>and</strong> GhoshGBEFIG. 2.—Average evolutionary rates (dN/dS) <strong>of</strong> core <strong>and</strong> attachment prote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparison performed by two-tailedMann–Whitney U test.Table 1Average Multifunctionality <strong>and</strong> Pathways <strong>of</strong> <strong>Core</strong> <strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong>Database Used <strong>Core</strong> <strong>Attachment</strong> Significance Level (P)KEGG 1.8992 (þ0.1209) (N ¼ 129) 1.6994 (þ0.0939) (N ¼ 173) 3.3 10 2MIPS 2.6627 (þ0.0873) (N ¼ 335) 2.4174 (þ0.0801) (N ¼ 321) 4.4 10 2GO-BP 4.2068 (þ0.1934) (N ¼ 266) 3.3746 (þ0.1814) (N ¼ 283) 2.1 10 5essentiality <strong>of</strong> core prote<strong>in</strong>s: 0.1496 ( þ 0.0091) (N ¼ 214),attachment prote<strong>in</strong>s: 0.1685 (þ0.0086) (N ¼ 213); Mann–Whitney U test, P ¼ 5.7 10 2 ). The neutral theory <strong>of</strong> molecularevolution postulates that functionally important genesevolve more slowly than the less important genes (Hirsh <strong>and</strong>Fraser 2001; Jordan et al. 2002; Liao et al. 2006). In ourdata set when we separated the genes <strong>in</strong>to essential <strong>and</strong>nonessential genes, we also found that essential genesevolve slower than nonessential genes (average evolutionaryrate <strong>of</strong> essential genes: 0.0735 (þ0.0039) (N ¼ 251), nonessentialgenes: 0.0904 (þ0.0032) (N ¼ 416); Mann–Whitney Utest, P ¼ 8.5 10 5 ). Moreover, when we separated essential<strong>and</strong> nonessential genes <strong>in</strong>to core <strong>and</strong> attachment prote<strong>in</strong>s, wefound that the evolutionary rates <strong>of</strong> attachment prote<strong>in</strong>s aresignificantly lower than core prote<strong>in</strong>s <strong>in</strong> both the gene pools(fig. 2). Therefore, the observed evolutionary rate differencesbetween core <strong>and</strong> attachment prote<strong>in</strong>s are <strong>in</strong>dependent <strong>of</strong>gene essentiality. Previously, Hurst <strong>and</strong> Smith (1999) demonstratedthat gene essentiality is a very weak correlate <strong>of</strong> therates <strong>of</strong> prote<strong>in</strong> evolution. They observed that the “nonimmunenon-essential genes” (“non-immune” genes refersto those genes that do not belong to the immune system)evolve at a similar rate to the “essential genes.” Even <strong>in</strong> thisstudy we also observed no significant differences <strong>in</strong> evolutionaryrates between “essential core prote<strong>in</strong>s” <strong>and</strong> “nonessentialattachment prote<strong>in</strong>s” (average evolutionary rates<strong>of</strong> essential core prote<strong>in</strong>s: 0.0760 (þ0.0042) (N ¼ 137), nonessentialattachment prote<strong>in</strong>s: 0.0818 (þ0.0050) (N ¼ 207);Mann–Whitney U test, P ¼ 1.0 10 1 ). These results may <strong>in</strong>dicatethat there are some other constra<strong>in</strong>ts which can expla<strong>in</strong>the variation <strong>in</strong> rates <strong>of</strong> prote<strong>in</strong> evolution better than the geneessentiality or multifunctionality.Prote<strong>in</strong> Connectivity <strong>and</strong> <strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> <strong>Core</strong><strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong>In general, a biological system <strong>in</strong> a cell can be considered as acomplex network, <strong>and</strong> the PPIs are one such network that actsas the source <strong>of</strong> many biological functions (Mete et al. 2008).<strong>Prote<strong>in</strong>s</strong> with many <strong>in</strong>teraction partners are expected to bemultifunctional. We analyzed the correlation between thenumber <strong>of</strong> <strong>in</strong>teract<strong>in</strong>g partners <strong>of</strong> core/attachment prote<strong>in</strong>s(whose GO-BP annotations <strong>and</strong> connectivity data were available)<strong>and</strong> its multifunctionality. As expected, we found asignificant positive correlation (Spearman’s r GO-BP vs. connectivity¼ 0.3005, P ¼ 1.0 10 6 ) between GO-BP <strong>and</strong> connectivity<strong>in</strong> the PPI network. Previously, it has been reported that theprote<strong>in</strong> connectivity <strong>in</strong> a PPI network is negatively correlatedwith the evolutionary rates (Hirsh <strong>and</strong> Fraser 2001; Fraser et al.2002, 2003; Jordan et al. 2002; Hahn <strong>and</strong> Kern 2005; Lemoset al. 2005; Chakraborty et al. 2010, 2011; Alvarez-Ponce <strong>and</strong>Fares 2012). Similar trend has been also observed <strong>in</strong> our core/attachment data set (Spearman’s r connectivity vs. dN/dS ¼ 0.1907, P ¼ 1.0 10 6 ). The above observations motivatedus to <strong>in</strong>vestigate if there exists any variation <strong>of</strong> connectivity<strong>in</strong> core <strong>and</strong> attachment prote<strong>in</strong>s or not. Surpris<strong>in</strong>gly, wedid not f<strong>in</strong>d any significant difference (average connectivity <strong>of</strong>core prote<strong>in</strong>s: 8.0364 (þ0.3565) (N ¼ 357), attachment prote<strong>in</strong>s:13.8260 (þ2.3445) (N ¼ 339); Mann–Whitney U test,P ¼ 5.4 10 1 ) <strong>in</strong> the connectivity between core <strong>and</strong> attachmentprote<strong>in</strong>s. This result <strong>in</strong>dicates that the observedDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 20141370 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013


<strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> Yeast Prote<strong>in</strong> ComplexesGBEFIG. 3.—Average evolutionary rates (dN/dS) <strong>of</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s <strong>in</strong> two b<strong>in</strong>es. The statistical comparison performed by two-tailedMann–Whitney U test.difference <strong>in</strong> prote<strong>in</strong> evolution between core <strong>and</strong> attachmentprote<strong>in</strong>s is <strong>in</strong>dependent <strong>of</strong> prote<strong>in</strong> connectivity. To confirm theabove observation, we b<strong>in</strong>ned core <strong>and</strong> attachment prote<strong>in</strong>s<strong>in</strong>to two groups accord<strong>in</strong>g to their connectivity <strong>in</strong> the PPI network<strong>and</strong> analyzed their evolutionary rates. Interest<strong>in</strong>gly, wefound that core prote<strong>in</strong>s are still evolv<strong>in</strong>g at a higher rate thanattachment prote<strong>in</strong>s <strong>in</strong> same PPI b<strong>in</strong> (fig. 3). Thus we can saythat the observed difference <strong>in</strong> evolutionary rates betweencore <strong>and</strong> attachment prote<strong>in</strong>s may be steered by factorsother than the prote<strong>in</strong> connectivity.Prote<strong>in</strong> Complex Number <strong>and</strong> <strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong><strong>Core</strong> <strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong>Earlier, it has been demonstrated that prote<strong>in</strong> complexnumber significantly modulates the rates <strong>of</strong> prote<strong>in</strong> evolutionwhen compared with prote<strong>in</strong> connectivity <strong>in</strong> PPI networks(Chakraborty et al. 2010; Das et al. 2013). Thus, we calculatedprote<strong>in</strong> complex number <strong>in</strong> both core <strong>and</strong> attachment prote<strong>in</strong>s.We observed that prote<strong>in</strong> complex number is significantlyhigher <strong>in</strong> attachment prote<strong>in</strong>s than that <strong>in</strong> coreprote<strong>in</strong>s (fig. 4a). Furthermore, we reconfirmed this result byus<strong>in</strong>g the data set <strong>of</strong> prote<strong>in</strong> complexes developed byBenschop et al. (2010) <strong>and</strong> found similar trends (fig. 4b).We also found a significant negative correlation between prote<strong>in</strong>complex number <strong>and</strong> dN/dS (table 2) which is <strong>in</strong> agreementwith our previously published results (Chakraborty et al.2010). Therefore, the prote<strong>in</strong> complex number could be thecause <strong>of</strong> evolutionary rate differences between core <strong>and</strong>attachment prote<strong>in</strong>s. Interest<strong>in</strong>gly, we have found considerabledifferences <strong>in</strong> the distribution <strong>of</strong> prote<strong>in</strong> complex numbersbetween core <strong>and</strong> attachment prote<strong>in</strong>s, rang<strong>in</strong>g from1 to 3 for core prote<strong>in</strong>s <strong>and</strong> from 1 to 22 for attachmentprote<strong>in</strong>s by consider<strong>in</strong>g the data set <strong>of</strong> Gav<strong>in</strong> et al. (2006).To verify the effect <strong>of</strong> prote<strong>in</strong> complex number <strong>in</strong> controll<strong>in</strong>gthe evolutionary rate differences between core <strong>and</strong> attachmentprote<strong>in</strong>s, we divided the whole data <strong>of</strong> attachment prote<strong>in</strong>s<strong>in</strong>to two groups; attachment prote<strong>in</strong>s with a prote<strong>in</strong>FIG. 4.—(a) Average prote<strong>in</strong> complex number (prote<strong>in</strong> complexnumber calculated by us<strong>in</strong>g the Gav<strong>in</strong> et al. [2006] data set) <strong>of</strong> core <strong>and</strong>attachment prote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparison performed bytwo-tailed Mann–Whitney U test. (b) Average prote<strong>in</strong> complex number(prote<strong>in</strong> complex number calculated by us<strong>in</strong>g the Benschop et al. [2010]data set) <strong>of</strong> core <strong>and</strong> attachment prote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparisonperformed by two-tailed Mann–Whitney U test.complex number from 1 to 3, named as Attach-I, <strong>and</strong> therest ones as Attach-II (prote<strong>in</strong> complex number ranges from4 to 22). Compar<strong>in</strong>g the prote<strong>in</strong> evolutionary rates betweencore <strong>and</strong> Attach-I prote<strong>in</strong>s, we found marg<strong>in</strong>ally significant(Mann–Whitney U test, P ¼ 5.3 10 2 ) differences betweenthem (fig. 5). At this po<strong>in</strong>t, it should be mentioned that there isonly one core prote<strong>in</strong> which has complex number 3, <strong>and</strong>when this core prote<strong>in</strong> was excluded from the data set <strong>and</strong>the rests were compared with attachment prote<strong>in</strong>s hav<strong>in</strong>gcomplex numbers 1<strong>and</strong> 2, we found no significant difference<strong>in</strong> their evolutionary rates (average evolutionary rates <strong>of</strong> coreprote<strong>in</strong>s: 0.0899 (þ0.0031) (N ¼ 347), attachment prote<strong>in</strong>s:0.0811 (þ0.0057) (N ¼ 114); Mann–Whitney U test,P ¼ 8.6 10 2 ). However, the evolutionary rate differenceswere significant between core <strong>and</strong> the Attach-II prote<strong>in</strong>s, asalso between Attach-I <strong>and</strong> Attach-II prote<strong>in</strong>s (fig. 5). Theseresults <strong>in</strong>dicate that the differences <strong>in</strong> evolutionary ratesbetween core <strong>and</strong> attachment prote<strong>in</strong>s disappear when thecomplex number is factored out.Prote<strong>in</strong> Complex Number <strong>and</strong> Expression LevelA large number <strong>of</strong> studies us<strong>in</strong>g organisms rang<strong>in</strong>g frombacteria to mammals (Akashi 2001; Subramanian <strong>and</strong>Downloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013 1371


Chakraborty <strong>and</strong> GhoshGBETable 2Spearman’s Correlation Analysis <strong>of</strong> Prote<strong>in</strong> Complex Number (Tak<strong>in</strong>g <strong>Core</strong> <strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong> from Gav<strong>in</strong> et al. [2006]) versus<strong>Evolutionary</strong><strong>Rate</strong>s (dN/dS), Essentiality, Connectivity, Expression Level, Transcription <strong>Rate</strong>, <strong>and</strong> Translation <strong>Rate</strong>Prote<strong>in</strong>complexnumber(Gav<strong>in</strong> et al.2006)dN/dS Essentiality Connectivity ExpressionLevel(Holstegeet al. 1998)ExpressionLevel (Milleret al. 2011)Transcription<strong>Rate</strong> (Holstegeet al. 1998)Transcription<strong>Rate</strong> (Milleret al. 2011)Translation<strong>Rate</strong> (Aravaet al. 2003)r ¼ 0.1572 r ¼ 0.1659 r ¼ 0.0284 r ¼ 0.3987 r ¼ 0.3990 r ¼ 0.4032 r ¼ 0.3977 r ¼ 0.3239P¼ 4.2 10 5 P¼ 5.7 10 4 P¼ 4.5 10 1 P¼ 1.0 10 6 P¼ 1.0 10 6 P¼ 1.0 10 6 P¼ 1.0 10 6 P¼ 1.0 10 6N¼ 673 N¼ 428 N¼ 696 N¼ 687 N¼ 657 N¼ 649 N¼ 649 N¼ 659FIG. 5.—Average evolutionary rates (dN/dS) <strong>of</strong><strong>Core</strong>,Attach-I,<strong>and</strong>Attach-II prote<strong>in</strong>s <strong>in</strong> yeast. The pair wise comparison performed bytwo-tailed Mann–Whitney U test.Kumar 2004; Drummond et al. 2006) demonstrated that geneexpression levels are strongly correlated with prote<strong>in</strong> evolutionaryrates, <strong>and</strong> it has been hypothesized that highly expressedgenes are under strong selection pressure to reducemistranslation-<strong>in</strong>duced prote<strong>in</strong> misfold<strong>in</strong>g (Drummond et al.2005; Drummond <strong>and</strong> Wilke 2008). In our previous study, wefound that prote<strong>in</strong> complex number is one <strong>of</strong> the importantconstra<strong>in</strong>ts <strong>in</strong> guid<strong>in</strong>g prote<strong>in</strong> evolutionary rates <strong>and</strong> also observedthat prote<strong>in</strong>s those are associated with a large number<strong>of</strong> complexes (higher prote<strong>in</strong> complex number) have higherexpression levels (Chakraborty et al. 2010). Once we determ<strong>in</strong>edthe average expression level <strong>of</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s, we were able to show that attachment prote<strong>in</strong>s havea higher average expression level compared with that <strong>of</strong> coreprote<strong>in</strong>s (fig. 6a <strong>and</strong> b), <strong>and</strong> a strong significant positivecorrelation exists between prote<strong>in</strong> complex number <strong>and</strong>expression level (table 2). Moreover, we found that complexnumber <strong>and</strong> the expression level <strong>in</strong>dependently affectevolutionary rates (multivariate regression analysis:b expression level ¼ 0.0601, P ¼ 8.1 10 5 ; b complex number ¼0.1012, P ¼ 3.2 10 11 ), which is consistent with our previousstudy (Chakraborty et al. 2010). Accord<strong>in</strong>g toDrummond et al. (2005), the expression level <strong>of</strong> a prote<strong>in</strong><strong>in</strong>fluences the rates <strong>of</strong> prote<strong>in</strong> evolution through the translationevents. Furthermore, we know that prote<strong>in</strong> expressionlevel can be <strong>in</strong>creased by three possible steps by 1) <strong>in</strong>creas<strong>in</strong>gtranscription rate, 2) <strong>in</strong>creas<strong>in</strong>g translational rate, or 3) <strong>in</strong>creas<strong>in</strong>gboth transcription <strong>and</strong> translation rate. Here, we analyzedthe transcription rate <strong>and</strong> translation rate <strong>in</strong> the whole prote<strong>in</strong>complex data set <strong>and</strong> observed that the transcription rate <strong>and</strong>translation rate <strong>in</strong>creases with the <strong>in</strong>crement <strong>of</strong> prote<strong>in</strong> complexnumber (table 2). We also found that attachment prote<strong>in</strong>sshowed higher values <strong>of</strong> average transcription <strong>and</strong>translation rates than core prote<strong>in</strong>s (for Holstege data set: averagetranscription rate <strong>of</strong> core prote<strong>in</strong>s: 3.8349 (þ0.4394)(N ¼ 335), attachment prote<strong>in</strong>s: 20.9643 (þ2.0421)(N ¼ 322); Mann–Whitney U test, P ¼ 9.6 10 16 ; for Millerdata set: average transcription rate <strong>of</strong> core prote<strong>in</strong>s: 22.1872(þ0.9876) (N ¼ 329), attachment prote<strong>in</strong>s: 51.9403 (þ3.0335)(N ¼ 320); Mann–Whitney U test, P ¼ 6.9 10 16 ;<strong>and</strong>forArava data set: average translation rate <strong>of</strong> core prote<strong>in</strong>s:0.1893 (þ0.0296) (N ¼ 331), attachment prote<strong>in</strong>s: 0.9446(þ0.1325) (N ¼ 328); Mann–Whitney U test, P ¼ 5.8 10 9 ).Therefore, it is clear that prote<strong>in</strong>s with higher complex number<strong>in</strong>crease their transcription <strong>and</strong> translation rates, <strong>and</strong> this<strong>in</strong>creased transcription <strong>and</strong> translation rates enhance theoverall prote<strong>in</strong> expression levels. F<strong>in</strong>ally, the elevated expressionlevels decrease the evolutionary rates (Spearman’s r evolutionaryrate vs. expression level_Holstage ¼ 0.4894, P ¼ 1.0 10 6 ;Spearman’s r evolutionary rate vs. expression level_Miller ¼ 0.4857,P ¼ 1.0 10 6 ) due to avoidance <strong>of</strong> mistranslational-<strong>in</strong>ducedprote<strong>in</strong> misfold<strong>in</strong>g (Drummond et al. 2005, 2006; Drummond<strong>and</strong> Wilke 2008). Thus, when we controlled for expressionlevels, we did not obta<strong>in</strong> any significant difference <strong>in</strong> therates <strong>of</strong> prote<strong>in</strong> evolution between core <strong>and</strong> attachment prote<strong>in</strong>s(fig. 7).DiscussionIt has been reported that complex-form<strong>in</strong>g prote<strong>in</strong>s evolveslower than the noncomplex form<strong>in</strong>g prote<strong>in</strong>s (Teichmann2002), <strong>and</strong> we have also observed the same <strong>in</strong> yeast(Chakraborty et al. 2010). Moreover, with<strong>in</strong> the complexform<strong>in</strong>gprote<strong>in</strong>s, the subunits those are associated with alarge number <strong>of</strong> prote<strong>in</strong> complexes evolve slower than theprote<strong>in</strong>s associated with a lower number <strong>of</strong> prote<strong>in</strong> complexesDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 20141372 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013


<strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> Yeast Prote<strong>in</strong> ComplexesGBEFIG. 7.—Average evolutionary rates (dN/dS) <strong>of</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s <strong>in</strong> different b<strong>in</strong>s. The statistical comparison performed bytwo-tailed Mann–Whitney U test.FIG.6.—(a) Average expression level (us<strong>in</strong>g the Holstege et al. [1998]data set) <strong>of</strong> core <strong>and</strong> attachment prote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparisonperformed by two-tailed Mann–Whitney U test. (b) Average expressionlevel (us<strong>in</strong>g the Miller et al. [2011] data set) <strong>of</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparison performed by two-tailedMann–Whitney U test.(Chakraborty et al. 2010; Das et al. 2013). The prote<strong>in</strong> complexnumber showed a strong positive correlation with geneexpression levels <strong>and</strong> negative correlation with evolutionaryrates, suggest<strong>in</strong>g that prote<strong>in</strong> complex number is a significantfactor modulat<strong>in</strong>g prote<strong>in</strong> evolutionary rates. In this work, weutilized the yeast prote<strong>in</strong> complex data set (Gav<strong>in</strong> et al. 2006)for the classification perta<strong>in</strong><strong>in</strong>g to core <strong>and</strong> attachmentprote<strong>in</strong>s <strong>and</strong> analyzed their evolutionary rates to address the<strong>in</strong>fluence <strong>of</strong> gene dispensability, prote<strong>in</strong> multifunctionality,prote<strong>in</strong> connectivity, prote<strong>in</strong> complex number, <strong>and</strong> geneexpression level on prote<strong>in</strong> evolutionary rates.Analysis <strong>of</strong> complex form<strong>in</strong>g versus noncomplex prote<strong>in</strong>sdemonstrates that prote<strong>in</strong>s <strong>in</strong> complexes have higheressentiality than the noncomplex prote<strong>in</strong>s (average essentiality<strong>of</strong> complex prote<strong>in</strong>s: 0.1665 (þ0.0045) (N ¼ 867), noncomplexprote<strong>in</strong>s: 0.1002 (þ0.0014) (N ¼ 3,348); Mann–WhitneyU test, P ¼ 1.1 10 65 ). With<strong>in</strong> the complex-form<strong>in</strong>g prote<strong>in</strong>s,core prote<strong>in</strong>s were reported to be crucial for the function<strong>of</strong> the prote<strong>in</strong> complex assembly (Dezso et al. 2003; Gav<strong>in</strong>et al. 2006), <strong>in</strong>dicat<strong>in</strong>g that core prote<strong>in</strong>s contribute moretoward fitness <strong>of</strong> an organism. Moreover, it has been reportedthat essential genes are associated with multifunctionalfeatures <strong>of</strong> genes (Liao et al. 2006). Therefore, we measuredmultifunctionality <strong>of</strong> core <strong>and</strong> attachment prote<strong>in</strong>s <strong>and</strong> foundthat core prote<strong>in</strong>s are associated with a higher number <strong>of</strong>biological processes compared with attachment prote<strong>in</strong>s.However, <strong>in</strong> our data set, we did not observe any significantdifference <strong>in</strong> the essentiality between core <strong>and</strong> attachmentprote<strong>in</strong>s. These results <strong>in</strong>dicate that the essentiality <strong>of</strong> agiven gene or prote<strong>in</strong> does not depend only on its multifunctionality,perhaps there are some other parameters which <strong>in</strong>fluencethe gene/prote<strong>in</strong> essentiality. In this study, weobserved that attachments prote<strong>in</strong>s are present <strong>in</strong> more complexassemblies than core prote<strong>in</strong>s, which may <strong>in</strong>crease theiressentiality s<strong>in</strong>ce they modify or enhance the activity <strong>of</strong> a largenumber <strong>of</strong> complex assemblies. Therefore, one can reasonablyassume that prote<strong>in</strong> essentiality <strong>and</strong> prote<strong>in</strong> complex numberare <strong>in</strong>terrelated to each other. To verify our hypothesis, wedeterm<strong>in</strong>ed the correlation between gene essentiality <strong>and</strong> prote<strong>in</strong>complex number <strong>in</strong> our data set, <strong>and</strong> <strong>in</strong>deed, we founda strong positive association between them (Spearman’sr prote<strong>in</strong> complex number vs. essentiality ¼ 0.2681, P ¼ 1.0 10 6 ),which contradicts the results <strong>of</strong> Pache et al. (2009) that theessentiality does not depend on the prote<strong>in</strong> complex number.Thus both multifunctionality <strong>and</strong> prote<strong>in</strong> complex number <strong>in</strong>fluencethe gene/prote<strong>in</strong> essentiality. We have obta<strong>in</strong>ed a similarproportion <strong>of</strong> essential prote<strong>in</strong>s <strong>in</strong> both core <strong>and</strong>attachment prote<strong>in</strong>s suggest<strong>in</strong>g that core prote<strong>in</strong>s executehigher multifunctionality, whereas attachment prote<strong>in</strong>s havea higher prote<strong>in</strong> complex number. These results are consistentwith the recent study on fitness effect <strong>and</strong> prote<strong>in</strong> complexthat core <strong>and</strong> attachment prote<strong>in</strong>s are equally important forthefunction<strong>of</strong>anorganism(Pache et al. 2009). <strong>Evolutionary</strong>rate differences between core <strong>and</strong> attachment prote<strong>in</strong>s can be<strong>in</strong>terpreted <strong>in</strong> the framework <strong>of</strong> neutralist/selectionist controversy.<strong>Core</strong> prote<strong>in</strong>s evolve faster than attachment prote<strong>in</strong>s,although core prote<strong>in</strong>s are more multifunctional than attachmentprote<strong>in</strong>s, <strong>and</strong> these results are not compatible with theneutral theory <strong>of</strong> molecular evolution. However, attachmentprote<strong>in</strong>s hav<strong>in</strong>g higher expression levels should have higherimpact on essentiality, <strong>and</strong> thus the slowly evolv<strong>in</strong>g nature <strong>of</strong>Downloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013 1373


Chakraborty <strong>and</strong> GhoshGBEattachment prote<strong>in</strong>s than core prote<strong>in</strong>s can apparently beaccommodated <strong>in</strong> neutral theory. Moreover, we demonstratedthat core prote<strong>in</strong>s have higher multifunctionality <strong>and</strong>also have higher impact on gene/prote<strong>in</strong> essentiality. Indeed,we found no significant difference <strong>in</strong> gene essentialitybetween core <strong>and</strong> attachment prote<strong>in</strong>s. However, we observedsignificant evolutionary rate differences between core<strong>and</strong> attachment prote<strong>in</strong>s, which are <strong>in</strong>consistent with the neutraltheory <strong>of</strong> molecular evolution. Recently, Razeto-Barry et al.(2011) predicted that <strong>in</strong> a selective scenario, more multifunctionalprote<strong>in</strong>s should evolve faster than the less multifunctionalprote<strong>in</strong>s when there is no difference <strong>in</strong> the size <strong>of</strong> fitnesseffects. They also conclude that a higher number <strong>of</strong> mutationshave been fixed <strong>in</strong> more multifunctional prote<strong>in</strong>s by positiveselection. Our results confirm the prediction <strong>of</strong> Razeto-Barryet al. (2011).In this study, we also observed that prote<strong>in</strong> complexnumber has emerged as an important parameter expla<strong>in</strong><strong>in</strong>gthe differences <strong>in</strong> evolutionary rates between core <strong>and</strong> attachmentprote<strong>in</strong>s. However, attachment prote<strong>in</strong>s are lessmultifunctional, but they participate <strong>in</strong> a high number <strong>of</strong> prote<strong>in</strong>complexes <strong>and</strong> thus their transcription rates <strong>and</strong> translationrates <strong>in</strong>crease. The <strong>in</strong>creased transcription rate <strong>and</strong>translation rate <strong>in</strong> turn enhances the overall prote<strong>in</strong> expressionlevels <strong>and</strong> the expression level imposes a strong selection pressureon attachment prote<strong>in</strong>s. Previously, Drummond et al.(2005) also demonstrated that the majority <strong>of</strong> the variation<strong>in</strong> prote<strong>in</strong> evolution is ma<strong>in</strong>ly guided by the expression level.This is further confirmed by our study, when we control theexpression level or complex number, the difference <strong>in</strong> evolutionaryrates between core <strong>and</strong> attachment prote<strong>in</strong>s disappears.Hence, from our analysis, we showed that the rates<strong>of</strong> prote<strong>in</strong> evolution between core <strong>and</strong> attachment prote<strong>in</strong>sare ma<strong>in</strong>ly guided by prote<strong>in</strong> complex number <strong>and</strong> expressionlevel. To f<strong>in</strong>d out the relative <strong>in</strong>fluence <strong>in</strong> evolutionary rates,we performed a multivariate regression analysis <strong>and</strong> foundthat both expression level <strong>and</strong> prote<strong>in</strong> complex number <strong>in</strong>dependentlycontrol the evolutionary rate but essentiality did notshow any significant contribution.Supplementary MaterialSupplementary tables S1–S3 areavailableatGenome Biology<strong>and</strong> Evolution onl<strong>in</strong>e (http://www.gbe.oxfordjournals.org/).AcknowledgmentsThis work was supported by the Department <strong>of</strong> Biotechnology,Government <strong>of</strong> India <strong>and</strong> Department <strong>of</strong> Science <strong>and</strong>Technology, Government <strong>of</strong> India (Sanction No. BT/PR692/BID/7/369/2011). The authors are thankful to Pr<strong>of</strong>essorGiorgio Bernardi, Department <strong>of</strong> Biology, University Rome 3,Viale Marconi 446, Rome 00146, Italy, <strong>and</strong> Dr Bratati Kahali,Department <strong>of</strong> Internal Medic<strong>in</strong>e, Division <strong>of</strong> Gastroenterology<strong>and</strong> Department <strong>of</strong> Computational Medic<strong>in</strong>e <strong>and</strong>Bio<strong>in</strong>formatics, University <strong>of</strong> Michigan Medical School, AnnArbor, MI 48109 for critical read<strong>in</strong>g <strong>of</strong> the manuscript. Theyare also thankful to the two anonymous reviewers for theirvaluable comments to improve the manuscript.Literature CitedAkashi H. 2001. Gene expression <strong>and</strong> molecular evolution. Curr Op<strong>in</strong>Genet Dev. 11:660–666.Altschul SF, et al. 1997. Gapped BLAST <strong>and</strong> PSI-BLAST: a new generation<strong>of</strong> prote<strong>in</strong> database search programs. Nucleic Acids Res. 25:3389–3402.Alvarez-Ponce D, Fares MA. 2012. <strong>Evolutionary</strong> rate <strong>and</strong> duplicability <strong>in</strong> thearabidopsis thaliana prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teraction network. Genome BiolEvol. 4:1263–1274.Arava Y, et al. 2003. Genome-wide analysis <strong>of</strong> mRNA translation pr<strong>of</strong>iles <strong>in</strong>Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 100:3889–3894.Batada NN, Hurst LD, Tyers M. 2006. <strong>Evolutionary</strong> <strong>and</strong> physiologicalimportance <strong>of</strong> hub prote<strong>in</strong>s. PLoS Comput Biol. 2:748–756.Batagelj V, Mrvar A. 2004. Pajek—analysis <strong>and</strong> visualization <strong>of</strong> large networks.In: Jünger M, Mutzel P, editors. Graph draw<strong>in</strong>g s<strong>of</strong>tware. Berl<strong>in</strong>Heidelberg (Germany): Spr<strong>in</strong>ger. p. 77–103.Benschop JJ, et al. 2010. A consensus <strong>of</strong> core prote<strong>in</strong> complex compositionsfor Saccharomyces cerevisiae. Mol Cell. 38:916–928.Camon E, et al. 2004. The Gene Ontology Annotation (GOA) Database:shar<strong>in</strong>g knowledge <strong>in</strong> Uniprot with Gene Ontology. Nucleic Acids Res.32:D262–D266.Chakraborty S, Kahali B, Ghosh TC. 2010. Prote<strong>in</strong> complex form<strong>in</strong>g abilityis favored over the features <strong>of</strong> <strong>in</strong>teract<strong>in</strong>g partners <strong>in</strong> determ<strong>in</strong><strong>in</strong>g theevolutionary rates <strong>of</strong> prote<strong>in</strong>s <strong>in</strong> the yeast prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teractionnetworks. BMC Syst Biol. 4:155.Chakraborty S, et al. 2011. Insights <strong>in</strong>to eukaryotic <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong>evolution. In: Pontarotti P, editor. <strong>Evolutionary</strong> biology: concepts, biodiversity,macroevolution <strong>and</strong> genome evolution. Berl<strong>in</strong> Heidelberg(Germany): Spr<strong>in</strong>ger. p. 51–70.Chen W-H, M<strong>in</strong>guez P, Lercher MJ, Bork P. 2012. OGEE: an onl<strong>in</strong>e geneessentiality database. Nucleic Acids Res. 40:901–906.Cherry JM, et al. 2012. Saccharomyces Genome Database: the genomicsresource <strong>of</strong> budd<strong>in</strong>g yeast. Nucleic Acids Res. 40:D700–D705.Das J, Chakraborty S, Podder S, Ghosh TC. 2013. Complex-form<strong>in</strong>g prote<strong>in</strong>sescape the robust regulations <strong>of</strong> miRNA <strong>in</strong> human. FEBS Lett.587:2284–2287.Deane CM, Salw<strong>in</strong>ski L, Xenarios I, Eisenberg D. 2002. Prote<strong>in</strong><strong>in</strong>teractions—two methods for assessment <strong>of</strong> the reliability <strong>of</strong> highthroughput observations. Mol Cell Proteomics. 1:349–356.Dezso Z, Oltvai ZN, Barabási AL. 2003. Bio<strong>in</strong>formatics analysis <strong>of</strong> experimentallydeterm<strong>in</strong>ed prote<strong>in</strong> complexes <strong>in</strong> the yeast Saccharomycescerevisiae. Genome Res. 13:2450–2454.Dolan ME, Ni L, Camon E, Blake JA. 2005. A procedure for assess<strong>in</strong>g GOannotation consistency. Bio<strong>in</strong>formatics 21:I136–I143.Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. 2005. Whyhighly expressed prote<strong>in</strong>s evolve slowly. Proc Natl Acad Sci U S A. 102:14338–14343.Drummond DA, Raval A, Wilke CO. 2006. A s<strong>in</strong>gle determ<strong>in</strong>ant dom<strong>in</strong>atesthe rate <strong>of</strong> yeast prote<strong>in</strong> evolution. Mol Biol Evol. 23:327–337.Drummond DA, Wilke CO. 2008. Mistranslation-<strong>in</strong>duced prote<strong>in</strong> misfold<strong>in</strong>gas a dom<strong>in</strong>ant constra<strong>in</strong>t on cod<strong>in</strong>g-sequence evolution. Cell 134:341–352.Fisher RA. 1930. The genetical theory <strong>of</strong> natural selection. Oxford:Clarendon Press.Flicek P, et al. 2011. Ensembl 2011. Nucleic Acids Res. 39:D800–D806.Downloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 20141374 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013


<strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> Yeast Prote<strong>in</strong> ComplexesGBEFraser HB, Hirsh AE, Ste<strong>in</strong>metz LM, Scharfe C, Feldman MW. 2002.<strong>Evolutionary</strong> rate <strong>in</strong> the prote<strong>in</strong> <strong>in</strong>teraction network. Science 296:750–752.Fraser HB, Wall DP, Hirsh AE. 2003. A simple dependence between prote<strong>in</strong>evolution rate <strong>and</strong> the number <strong>of</strong> prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teractions. BMCEvol Biol. 3:11.Gav<strong>in</strong> AC, et al. 2006. Proteome survey reveals modularity <strong>of</strong> the yeast cellmach<strong>in</strong>ery. Nature 440:631–636.Guldener U, et al. 2005. CYGD: the comprehensive Yeast GenomeDatabase. Nucleic Acids Research 33:D364–D368.Hahn MW, Kern AD. 2005. Comparative genomics <strong>of</strong> centrality <strong>and</strong>essentiality <strong>in</strong> three eukaryotic prote<strong>in</strong>-<strong>in</strong>teraction networks. Mol BiolEvol. 22:803–806.Hart GT, Lee I, Marcotte ER. 2007. A high-accuracy consensus map <strong>of</strong>yeast prote<strong>in</strong> complexes reveals modular nature <strong>of</strong> gene essentiality.BMC Bio<strong>in</strong>formatics 8:236.Hirsh AE, Fraser HB. 2001. Prote<strong>in</strong> dispensability <strong>and</strong> rate <strong>of</strong> evolution.Nature 411:1046–1049.Holstege FCP, et al. 1998. Dissect<strong>in</strong>g the regulatory circuitry <strong>of</strong> a eukaryoticgenome. Cell 95:717–728.Hurst LD, Smith NGC. 1999. Do essential genes evolve slowly? Curr Biol. 9:747–750.Jordan IK, Rogoz<strong>in</strong> IB, Wolf YI, Koon<strong>in</strong> EV. 2002. Essential genes are moreevolutionarily conserved than are nonessential genes <strong>in</strong> bacteria.Genome Res. 12:962–968.Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. 2012. KEGG for<strong>in</strong>tegration <strong>and</strong> <strong>in</strong>terpretation <strong>of</strong> large-scale molecular data sets.Nucleic Acids Res. 40:D109–D114.Kellis M, Patterson N, Endrizzi M, Birren B, L<strong>and</strong>er ES. 2003. Sequenc<strong>in</strong>g<strong>and</strong> comparison <strong>of</strong> yeast species to identify genes <strong>and</strong> regulatoryelements. Nature 423:241–254.Kerrien S, et al. 2012. The IntAct molecular <strong>in</strong>teraction database <strong>in</strong> 2012.Nucleic Acids Res. 40:D841–D846.Kimura M. 1983. The neutral theory <strong>of</strong> molecular evolution. Cambridge:Cambridge University Press.Kimura M, Ohta T. 1974. Some pr<strong>in</strong>ciples govern<strong>in</strong>g molecular evolution.Proc Natl Acad Sci U S A. 71:2848–2852.Lark<strong>in</strong> MA, et al. 2007. Clustal W <strong>and</strong> clustal X version 2.0. Bio<strong>in</strong>formatics23:2947–2948.Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL. 2005. Evolution <strong>of</strong>prote<strong>in</strong>s <strong>and</strong> gene expression levels are coupled <strong>in</strong> Drosophila <strong>and</strong> are<strong>in</strong>dependently associated with mRNA abundance, prote<strong>in</strong> length, <strong>and</strong>number <strong>of</strong> prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teractions. Mol Biol Evol. 22:1345–1354.Liao B-Y, Scott NM, Zhang J. 2006. Impacts <strong>of</strong> gene essentiality, expressionpattern, <strong>and</strong> gene compactness on the evolutionary rate <strong>of</strong> mammalianprote<strong>in</strong>s. Mol Biol Evol. 23:2072–2080.Licata L, et al. 2012. MINT, the molecular <strong>in</strong>teraction database: 2012update. Nucleic Acids Res. 40:D857–D861.Mak<strong>in</strong>o T, Gojobori T. 2006. The evolutionary rate <strong>of</strong> a prote<strong>in</strong> is <strong>in</strong>fluencedby features <strong>of</strong> the <strong>in</strong>teract<strong>in</strong>g partners. Mol Biol Evol. 23:784–789.Mete M, Tang F, Xu X, Yuruk N. 2008. A structural approach for f<strong>in</strong>d<strong>in</strong>gfunctional modules from large biological networks. BMCBio<strong>in</strong>formatics 9:S19.Mewes HW, et al. 2011. MIPS: curated databases <strong>and</strong> comprehensivesecondary data resources <strong>in</strong> 2010. Nucleic Acids Res. 39:D220–D224.Miller C, et al. 2011. Dynamic transcriptome analysis measures rates <strong>of</strong>mRNA synthesis <strong>and</strong> decay <strong>in</strong> yeast. Mol Syst Biol. 7:458.Naumov G, Naumova ES, Lantto RA, Louis EJ, Korhola M. 1992. Genetichomology between Saccharomyces cerevisiae <strong>and</strong> its sibl<strong>in</strong>gspecies Saccharomyces paradoxus <strong>and</strong> Saccharomyces bayanus—electrophoretic karyotypes. Yeast 8:599–612.Nie NH, Bent DH, Hull CH. 1970. SPSS: statistical package for the socialsciences. NewYork: McGraw-Hill.Pache RA, Babu MM, Aloy P. 2009. Exploit<strong>in</strong>g gene deletion fitness effects<strong>in</strong> yeast to underst<strong>and</strong> the modular architecture <strong>of</strong> prote<strong>in</strong> complexesunder different growth conditions. BMC Syst Biol. 3:74.Pál C, Papp B, Lercher MJ. 2006. An <strong>in</strong>tegrated view <strong>of</strong> prote<strong>in</strong> evolution.Nat Rev Genet. 7:337–348.Pang CNI, Krycer JR, Lek A, Wilk<strong>in</strong>s MR. 2008. Are prote<strong>in</strong>complexes made <strong>of</strong> cores, modules <strong>and</strong> attachments? Proteomics 8:425–434.ParkYR,KimJ,LeeHW,YoonYJ,KimJH.2011.GOChase-II:correct<strong>in</strong>gsemantic <strong>in</strong>consistencies from gene ontology-based annotations forgene products. BMC Bio<strong>in</strong>formatics 12:S40.Pereira-Leal JB, Levy ED, Teichmann SA. 2006. The orig<strong>in</strong>s <strong>and</strong> evolution <strong>of</strong>functional modules: lessons from prote<strong>in</strong> complexes. Philos Trans RSoc Lond B Biol Sci. 361:507–517.Podder S, Mukhopadhyay P, Ghosh TC. 2009. Multifunctionality dom<strong>in</strong>antlydeterm<strong>in</strong>es the rate <strong>of</strong> human housekeep<strong>in</strong>g <strong>and</strong> tissue specific<strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> evolution. Gene 439:11–24.Popescu CE, Borza T, Bielawski JP, Lee RW. 2006. <strong>Evolutionary</strong> rates <strong>and</strong>expression level <strong>in</strong> Chlamydomonas. Genetics 172:1567–1576.Pu S, Vlasblom J, Emili A, Greenblatt J, Wodak SJ. 2007. Identify<strong>in</strong>gfunctional modules <strong>in</strong> the physical <strong>in</strong>teractome <strong>of</strong> Saccharomycescerevisiae. Proteomics 7:944–960.Qiu J, Noble WS. 2008. Predict<strong>in</strong>g co-complexed prote<strong>in</strong> pairs fromheterogeneous data. PLoS Comput Biol. 4:e1000054.Razeto-Barry P, Diaz J, Cotoras D, Vasquez RA. 2011. Molecular evolution,mutation size <strong>and</strong> gene pleiotropy: a geometric reexam<strong>in</strong>ation.Genetics 187:877–885.Rokotomalala R. 2005. TANAGRA: a free s<strong>of</strong>tware for research <strong>and</strong> academicpurposes. Advances <strong>in</strong> grid comput<strong>in</strong>g. EGC 2005. In:Proceed<strong>in</strong>gs <strong>of</strong> European Grid Conference; 2005 Feb 14–16;Amsterdam, The Netherl<strong>and</strong>s. Berl<strong>in</strong> (Germany): Spr<strong>in</strong>ger. Vol. 2,p. 697–702.Salathé M, Ackermann M, Bonhoeffer S. 2006. The effect <strong>of</strong> multifunctionalityon the rate <strong>of</strong> evolution <strong>in</strong> yeast. Mol Biol Evol. 23:721–722.Salw<strong>in</strong>ski L, et al. 2004. The Database <strong>of</strong> Interact<strong>in</strong>g <strong>Prote<strong>in</strong>s</strong>: 2004update. Nucleic Acids Res. 32:D449–D451.Semple JI, Vavouri T, Lehner B. 2008. A simple pr<strong>in</strong>ciple concern<strong>in</strong>g therobustness <strong>of</strong> prote<strong>in</strong> complex activity to changes <strong>in</strong> gene expression.BMC Syst Biol. 2:1.Ste<strong>in</strong>metz LM, et al. 2002. Systematic screen for human disease genes <strong>in</strong>yeast. Nat Genet. 31:400–404.Su Z, Zeng Y, Gu X. 2010. A prelim<strong>in</strong>ary analysis <strong>of</strong> gene pleiotropyestimated from prote<strong>in</strong> sequences. J Exp Zool B Mol Dev Evol. 314:115–122.Subramanian S, Kumar S. 2004. Gene expression <strong>in</strong>tensity shapes evolutionaryrates <strong>of</strong> the prote<strong>in</strong>s encoded by the vertebrate genome.Genetics 168:373–381.Teichmann SA. 2002. The constra<strong>in</strong>ts prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teractions place onsequence divergence. J Mol Biol. 324:399–407.Wilson AC, Carlson SS, White TJ. 1977. Biochemical evolution. Annu RevBiochem. 46:573–639.Yang L, Gaut BS. 2011. Factors that contribute to variation <strong>in</strong> evolutionaryrate among Arabidopsis genes. Mol Biol Evol. 28:2359–2369.Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. MolBiol Evol. 24:1586–1591.Yang ZH, Nielsen R. 2000. Estimat<strong>in</strong>g synonymous <strong>and</strong> nonsynonymoussubstitution rates under realistic evolutionary models. Mol Biol Evol.17:32–43.Associate editor: Takashi GojoboriDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013 1375

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!