Evolutionary Rate Heterogeneity of Core and Attachment Proteins in ...
Evolutionary Rate Heterogeneity of Core and Attachment Proteins in ...
Evolutionary Rate Heterogeneity of Core and Attachment Proteins in ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
GBE<strong>Evolutionary</strong> <strong>Rate</strong> <strong>Heterogeneity</strong> <strong>of</strong> <strong>Core</strong> <strong>and</strong> <strong>Attachment</strong><strong>Prote<strong>in</strong>s</strong> <strong>in</strong> Yeast Prote<strong>in</strong> ComplexesS<strong>and</strong>ip Chakraborty <strong>and</strong> Tapash Ch<strong>and</strong>ra Ghosh*Bio<strong>in</strong>formatics Centre, Bose Institute, Kolkata, West Bengal, India*Correspond<strong>in</strong>g author: E-mail: tapash@jcbose.ac.<strong>in</strong>.Accepted: June 23, 2013AbstractIn general, prote<strong>in</strong>s do not work alone; they form macromolecular complexes to play fundamental roles <strong>in</strong> diverse cellular functions.On the basis <strong>of</strong> their iterative cluster<strong>in</strong>g procedure <strong>and</strong> frequency <strong>of</strong> occurrence <strong>in</strong> the macromolecular complexes, the prote<strong>in</strong>subunits have been categorized as core <strong>and</strong> attachment. <strong>Core</strong> prote<strong>in</strong> subunits are the ma<strong>in</strong> functional elements, whereas attachmentprote<strong>in</strong>s act as modifiers or activators <strong>in</strong> prote<strong>in</strong> complexes. In this article, us<strong>in</strong>g the current data set <strong>of</strong> yeast prote<strong>in</strong> complexes,we found that core prote<strong>in</strong>s are evolv<strong>in</strong>g at a faster rate than attachment prote<strong>in</strong>s <strong>in</strong> spite <strong>of</strong> their functional importance. Interest<strong>in</strong>gly,our <strong>in</strong>vestigation revealed that attachment prote<strong>in</strong>s are present <strong>in</strong> a higher number <strong>of</strong> macromolecular complexes than core prote<strong>in</strong>s.We also observed that the prote<strong>in</strong> complex number (def<strong>in</strong>ed as the number <strong>of</strong> prote<strong>in</strong> complexes <strong>in</strong> which a prote<strong>in</strong> subunit belongs)has a stronger <strong>in</strong>fluence on gene/prote<strong>in</strong> essentiality than multifunctionality. F<strong>in</strong>ally, our results suggest that the observed differences<strong>in</strong> the rates <strong>of</strong> prote<strong>in</strong> evolution between core <strong>and</strong> attachment prote<strong>in</strong>s are due to differences <strong>in</strong> prote<strong>in</strong> complex number <strong>and</strong>expression level. Moreover, we conclude that prote<strong>in</strong>s which are present <strong>in</strong> higher numbers <strong>of</strong> macromolecular complexes enhancetheir overall expression level by <strong>in</strong>creas<strong>in</strong>g their transcription rate as well as translation rate, <strong>and</strong> thus the prote<strong>in</strong> complex numberimposes a strong selection pressure on the evolution <strong>of</strong> yeast proteome.Key words: core prote<strong>in</strong>, attachment prote<strong>in</strong>, evolutionary rate, yeast, prote<strong>in</strong> complex, prote<strong>in</strong> complex number.IntroductionDifferent prote<strong>in</strong>s evolve at different rates. The central problem<strong>in</strong> molecular evolution is to comprehend the factors forthe differential evolutionary rates <strong>of</strong> prote<strong>in</strong>s. The orig<strong>in</strong>s <strong>of</strong>variation <strong>in</strong> prote<strong>in</strong> evolutionary rates have rema<strong>in</strong>ed the coreissue between the selectionists <strong>and</strong> the neutralists (Fisher1930; Kimura <strong>and</strong> Ohta 1974; Kimura 1983; Razeto-Barryet al. 2011). The neutral theory <strong>of</strong> molecular evolution accentuatesthat most <strong>of</strong> the nucleotide substitutions <strong>in</strong> a gene aredue to the r<strong>and</strong>om fixation <strong>of</strong> neutral mutations (Kimura1983), which is <strong>in</strong> accordance with the proposal that functionallyimportant genes should evolve slower than less importantgenes (Kimura <strong>and</strong> Ohta 1974). Accord<strong>in</strong>g to this proposal, ithas been demonstrated that essential genes evolve slowerthan the genes which are dispensable (Jordan et al. 2002).Moreover, prote<strong>in</strong>s those <strong>in</strong>teract with a large number <strong>of</strong> partnersevolve at a slower rate compared with the prote<strong>in</strong>s withfewer number <strong>of</strong> <strong>in</strong>teract<strong>in</strong>g partners (Hirsh <strong>and</strong> Fraser 2001;Jordan et al. 2002; Fraser et al. 2003; Hahn <strong>and</strong> Kern 2005;Lemos et al. 2005; Chakraborty et al. 2011; Alvarez-Ponce<strong>and</strong> Fares 2012). However, these results have been questionedby some researchers (Batada et al. 2006). On the other h<strong>and</strong>,it has been demonstrated that the gene expression level is animportant constra<strong>in</strong>t <strong>in</strong> prote<strong>in</strong> evolution (Subramanian <strong>and</strong>Kumar 2004; Drummond et al. 2005, 2006; Drummond <strong>and</strong>Wilke 2008); highly expressed prote<strong>in</strong>s are more conservedthan lowly expressed prote<strong>in</strong>s. This phenomenon has beenattributed to natural selection (Popescu et al. 2006).Moreover,itwasshownthatabouthalf<strong>of</strong>thevariation<strong>in</strong>prote<strong>in</strong> evolutionary rates can be expla<strong>in</strong>ed by expression level(Drummond et al. 2005).<strong>Prote<strong>in</strong>s</strong> do not carry out their functions alone, they <strong>of</strong>tenact by participat<strong>in</strong>g <strong>in</strong> macromolecular complexes <strong>and</strong> playdifferent functional roles (Qiu <strong>and</strong> Noble 2008; Chakrabortyet al. 2010). Additionally, it has been found that the multiprote<strong>in</strong>complexes are enriched for essential genes (Semple et al.2008), <strong>and</strong> prote<strong>in</strong>s those are present <strong>in</strong> several prote<strong>in</strong> complexassemblies have a tendency to be more essential thanprote<strong>in</strong>s that are present <strong>in</strong> fewer prote<strong>in</strong> complex assemblies(Pereira-Leal et al. 2006). In contrast, Pache et al. (2009)reported that there is no such significant correlation betweenDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014ß The Author(s) 2013. Published by Oxford University Press on behalf <strong>of</strong> the Society for Molecular Biology <strong>and</strong> Evolution.This is an Open Access article distributed under the terms <strong>of</strong> the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permitsnon-commercial re-use, distribution, <strong>and</strong> reproduction <strong>in</strong> any medium, provided the orig<strong>in</strong>al work is properly cited. For commercial re-use, please contact journals.permissions@oup.com1366 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013
Chakraborty <strong>and</strong> GhoshGBEEssential Genes <strong>and</strong> Calculation <strong>of</strong> EssentialityTo obta<strong>in</strong> a data set <strong>of</strong> essential genes <strong>of</strong> yeast, we downloadedthe essential genes list from the OGEE database (http://ogeedb.embl.de/#summary, last accessed July 15, 2013)(Chen et al. 2012). Moreover, we used the fitness quantitativedata <strong>of</strong> Ste<strong>in</strong>metz et al. (2002) to calculate the essentiality foreach gene. The fitness data were downloaded from Pacheet al. (2009) (Additional file 3: Quantitative fitness data:http://www.biomedcentral.com/1752-0509/3/74/additional,last accessed July 15, 2013) <strong>and</strong> considered only the growth <strong>in</strong>the YPD medium (1% Bacto-peptone, 2% yeast extract, <strong>and</strong>2% glucose). The normalized essentiality (E i ) was calculated by f iE i ¼ 1f maxwhere f i represents the fitness values for the deletion <strong>of</strong> the ithgene <strong>and</strong> f max represents the maximum fitness value.Therefore, the gene essentiality ranges from 0 to 1, with anessentiality <strong>of</strong> 0 denot<strong>in</strong>g that a gene has no measurable effecton fitness, whereas 1 denotes the gene is essential for survival.Calculation <strong>of</strong> <strong>Evolutionary</strong> <strong>Rate</strong>We calculated the average dN/dS ratio as a measure <strong>of</strong> evolutionaryrate, which is the ratio <strong>of</strong> the number <strong>of</strong> nonsynonymoussubstitutions per nonsynonymous site (dN)tothenumber<strong>of</strong> synonymous substitutions per synonymous site (dS). To calculatethe dN/dS ratio, we compared S. cerevisiae sequenceswith its orthologous sequences <strong>in</strong> S. paradoxus, S. bayanus, <strong>and</strong>S. mikatae, as they are sibl<strong>in</strong>g species (Naumov et al. 1992). Weobta<strong>in</strong>ed whole cod<strong>in</strong>g sequences <strong>of</strong> four yeast stra<strong>in</strong>s from theSaccharomyces Genome Database (Cherry et al. 2012) (forS. cerevisiae: http://downloads.yeastgenome.org/sequence/S288C_reference/orf_dna/orf_cod<strong>in</strong>g.fasta.gz [last accessedJuly 15, 2013]; S. paradoxus: http://downloads.yeastgenome.org/sequence/fungi/S_paradoxus/archive/MIT/orf_dna/orf_genomic.fasta.gz [last accessed July 15, 2013]; S. bayanus: http://downloads.yeastgenome.org/sequence/fungi/S_bayanus/archive/MIT/orf_dna/orf_genomic.fasta.gz [last accessed July 15,2013]; <strong>and</strong> S. mikatae: http://downloads.yeastgenome.org/sequence/fungi/S_mikatae/archive/MIT/orf_dna/orf_genomic.fasta.gz [last accessed July 15, 2013]) to calculate the evolutionaryrates (dN/dS). Then, we translate these sequences <strong>in</strong>to prote<strong>in</strong>sequence us<strong>in</strong>g the EMBOSS Transeq program (http://www.ebi.ac.uk/Tools/st/emboss_transeq/, last accessed July15, 2013). By us<strong>in</strong>g the National Center for BiotechnologyInformation BlastP program (version 2.2.17) (Altschul et al.1997), we identified the orthologous prote<strong>in</strong>s <strong>of</strong> S. cerevisiae<strong>in</strong> S. paradoxus, S. baynus, <strong>and</strong> S. mikatate. Weusedtheexpectationvalue 1.0 10 5 as a cut<strong>of</strong>f <strong>and</strong> ma<strong>in</strong>ta<strong>in</strong>ed at least75% sequence similarity between the two sequences with am<strong>in</strong>imum alignment overlap 80%. The gaps allowed <strong>in</strong> thealignment were less than 3%. We also verified our resultswith the results <strong>of</strong> Kellis et al. (2003). Then the pair-wisealignment was performed us<strong>in</strong>g the ClustalW (version 2.0)(Lark<strong>in</strong> et al. 2007) for each set <strong>of</strong> orthologs gene pair. dN/dSvalues were calculated by Yang <strong>and</strong> Nielsen method (Yang <strong>and</strong>Nielsen 2000) us<strong>in</strong>g the PAML package (version 4) (Yang 2007).We took the average values <strong>of</strong> dN/dS for each S. cerevisiae ORFwhere at least one orthologous pair was present (supplementarytable S2, Supplementary Material onl<strong>in</strong>e).Prote<strong>in</strong>–Prote<strong>in</strong> Interactions DataWe collected the prote<strong>in</strong>–prote<strong>in</strong> <strong>in</strong>teractions (PPIs) <strong>in</strong>formationfrom the DIP (Database <strong>of</strong> Interact<strong>in</strong>g <strong>Prote<strong>in</strong>s</strong>: http://dip.doe-mbi.ucla.edu/, last accessed July 15, 2013) (Salw<strong>in</strong>skiet al. 2004), MINT (Molecular INTeraction Database: http://m<strong>in</strong>t.bio.uniroma2.it/m<strong>in</strong>t/, last accessed July 15, 2013)(Licata et al. 2012), <strong>and</strong> IntAct (http://www.ebi.ac.uk/<strong>in</strong>tact,last accessed July 15, 2013) (Kerrien et al. 2012). In thosedatabases, the PPIs were documented experimentally bygenome wide two-hybrid screen, immune precipitation, aff<strong>in</strong>ityb<strong>in</strong>d<strong>in</strong>g, antibody blockage, <strong>and</strong> so on. To collect highthroughputPPIs data from the DIP database, we used theCORE data set <strong>of</strong> S. cerevisiae (baker’s yeast)(Scere20120228CR). In the CORE data set, the PPIs were identifiedby high-throughput methods <strong>and</strong> small-scale experiments,thus the data <strong>in</strong> the CORE are highly reliable (Deaneet al. 2002). However, there is no predef<strong>in</strong>ed high confidencedata <strong>in</strong> MINT <strong>and</strong> IntAct, thus we used the confidence score tocollect the high-throughput data from MINT <strong>and</strong> IntAct. Wecalculated the average confidence score for each data set <strong>and</strong>then used this average value as a cut<strong>of</strong>f to identify the highthroughputPPIs (for MINT, the cut<strong>of</strong>f score is 0.30, <strong>and</strong> forIntAct, the cut <strong>of</strong>f score is 0.40). After merg<strong>in</strong>g the threedatabases (DIP, MINT, <strong>and</strong> IntAct) <strong>and</strong> remov<strong>in</strong>g the redundant<strong>and</strong> self-<strong>in</strong>teractions, we obta<strong>in</strong>ed 3,580 <strong>in</strong>dividual prote<strong>in</strong>s<strong>and</strong> 12,624 b<strong>in</strong>ary <strong>in</strong>teractions (fig. 1) (supplementarytable S3, Supplementary Material onl<strong>in</strong>e).Prote<strong>in</strong> Expression, Transcription <strong>Rate</strong>,<strong>and</strong> Translation <strong>Rate</strong>The gene expression (number <strong>of</strong> mRNAs molecules per cell) <strong>and</strong>transcription rate (number <strong>of</strong> mRNAs molecules per hour) werecollected from genome wide expression analysis <strong>of</strong> Holstegeet al. (1998) (http://web.wi.mit.edu/young/pub/data/orf_transcriptome.txt, last accessed July 15, 2013). We also used amore up-to-date (Miller et al. 2011) expression data set forvalidation purposes. Dynamic transcriptome analysis wasused to realistically monitor the mRNA metabolism <strong>in</strong> yeast(Miller et al. 2011). We downloaded the translation rate(number <strong>of</strong> prote<strong>in</strong>s molecules per seconds) <strong>of</strong> yeast prote<strong>in</strong>sfrom the data set <strong>of</strong> Arava et al. (2003).S<strong>of</strong>twareWe used SPSS (version 13.0) (Nie et al. 1970) <strong>and</strong> Tanagra(version 1.4.36) (Rokotomalala 2005) for all statisticalDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 20141368 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013
<strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> Yeast Prote<strong>in</strong> ComplexesGBEFIG. 1.—Venn diagram <strong>of</strong> PPI <strong>in</strong>formation collected from the DIP,MINT, <strong>and</strong> IntAct. Here prote<strong>in</strong> is abbreviated by “P” <strong>and</strong> PPI is abbreviatedby “I.” The “light green” color <strong>in</strong>dicates the DIP data set doma<strong>in</strong>, “lightblue” color <strong>in</strong>dicates the MINT data set doma<strong>in</strong>, <strong>and</strong> “light saffron” color<strong>in</strong>dicates the IntAct data set doma<strong>in</strong>.calculations. In all statistical analysis, we used 95% level <strong>of</strong>confidence as a measure <strong>of</strong> significance. The network statistic(Degree) was calculated us<strong>in</strong>g the Pajek s<strong>of</strong>tware package(Batagelj <strong>and</strong> Mrvar 2004).Results<strong>Evolutionary</strong> <strong>Rate</strong>s, Multifunctionality, <strong>and</strong> Essentiality <strong>of</strong><strong>Core</strong> <strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong>Yeast prote<strong>in</strong> complexes were taken from the supplementarydata set <strong>of</strong> Gav<strong>in</strong> et al. (2006), which comprised 1,487 prote<strong>in</strong>sbelong<strong>in</strong>g to 491 prote<strong>in</strong> complexes. In their work, Gav<strong>in</strong>et al. (2006) identified 357 <strong>and</strong> 339 subunits as core <strong>and</strong>attachment prote<strong>in</strong>s, respectively. The rema<strong>in</strong><strong>in</strong>g 791 subunitsare present <strong>in</strong> both core <strong>and</strong> attachment prote<strong>in</strong>s. We havenot considered those 791 prote<strong>in</strong>s for evolutionary rate analysesdue to their presence <strong>in</strong> both core <strong>and</strong> attachment prote<strong>in</strong>s.Estimation <strong>of</strong> evolutionary rates <strong>in</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s shows that attachment prote<strong>in</strong>s are more conservedthan core prote<strong>in</strong>s (average evolutionary rates <strong>of</strong> core prote<strong>in</strong>s:0.0901 (þ0.0031) (N ¼ 348), attachment prote<strong>in</strong>s:0.0772 (þ0.0040) (N ¼ 325); Mann–Whitney U test,P ¼ 2.5 10 5 )(fig. 2).The function <strong>of</strong> a prote<strong>in</strong> is one <strong>of</strong> the most importantparameters that <strong>in</strong>fluence prote<strong>in</strong> evolution, <strong>and</strong> it has beenreported that multifunctional genes evolve slowly (Wilsonet al. 1977; Podder et al. 2009). Moreover, core prote<strong>in</strong>s arealways present <strong>in</strong> all different subsets <strong>of</strong> is<strong>of</strong>orms <strong>of</strong> a complexassembly, <strong>and</strong> thus they have been reported as the functionalunits <strong>of</strong> that complex assembly (Dezso et al. 2003; Gav<strong>in</strong> et al.2006). Therefore, the role <strong>of</strong> each core prote<strong>in</strong> is crucial forthe functional <strong>in</strong>tegrity <strong>of</strong> a prote<strong>in</strong> complex (Dezso et al.2003). Hence, we measured the multifunctionality <strong>of</strong> core<strong>and</strong> attachment prote<strong>in</strong>s by count<strong>in</strong>g unique GO-BP terms,which are widely used to calculate the multifunctionality <strong>of</strong> aprote<strong>in</strong> (Camon et al. 2004; Pál et al. 2006; Salathé et al.2006; Podder et al. 2009; Su et al. 2010; Flicek et al. 2011;Yang <strong>and</strong> Gaut 2011). We observed that core prote<strong>in</strong>s are<strong>in</strong>volved <strong>in</strong> a higher number <strong>of</strong> biological processes thanattachment prote<strong>in</strong>s (table 1). Nonetheless, the GO terms,widely used for functional characterization but several drawbackshave been reported to be associated with the GO terms(Dolan et al. 2005; Park et al. 2011). In particular, GO annotationis systematically redundant <strong>and</strong> the biological doma<strong>in</strong> is<strong>in</strong>consistent (Park et al. 2011). Therefore, we also used theMIPS-CYGD (Guldener et al. 2005; Mak<strong>in</strong>o <strong>and</strong> Gojobori2006; Chakraborty et al. 2010; Mewes et al. 2011) <strong>and</strong>theKEGG (Kanehisa et al. 2012) databases, <strong>and</strong> we countedthe number <strong>of</strong> pathways <strong>in</strong> which a prote<strong>in</strong> takes part. Wefound similar trends, that is, core prote<strong>in</strong>s are <strong>in</strong>volved <strong>in</strong> ahigher number <strong>of</strong> pathways/functions than attachmentprote<strong>in</strong>s (table 1). However, we found that core prote<strong>in</strong>sevolve faster than the attachment prote<strong>in</strong>s <strong>in</strong> spite <strong>of</strong> theirhigher multifunctionality. Thus the prevail<strong>in</strong>g idea that,“multifunctional prote<strong>in</strong>s evolve at a slower rate” (Wilsonet al. 1977; Podder et al. 2009) does not seem to be compatible<strong>in</strong> expla<strong>in</strong><strong>in</strong>g the evolutionary rate differences betweencore <strong>and</strong> attachment prote<strong>in</strong>s.It has been reported that both gene essentiality <strong>and</strong> prote<strong>in</strong>multifunctionality are negatively associated with prote<strong>in</strong>evolutionary rates (Jordan et al. 2002; Podder et al. 2009).Therefore, it could be expected that essential genes aremultifunctional. Indeed, we found a positive associationbetween gene essentiality <strong>and</strong> multifunctionality <strong>in</strong> ourdata set (Spearman’s r multifunctionality vs. essentiality ¼ 0.1689,P ¼ 1.0 10 6 ), that is, the proportion <strong>of</strong> essential genes ishigher <strong>in</strong> highly multifunctional prote<strong>in</strong>s. To validate the aboveresults, we further divided our data set <strong>in</strong>to two groups byus<strong>in</strong>g the mean value <strong>of</strong> GO-multifunctionality as a cut<strong>of</strong>f.Genes with a multifunctionality higher than that <strong>of</strong> the correspond<strong>in</strong>gmean value were considered as highly functional(HF) (N ¼ 436) <strong>and</strong> those with a lower one, as low-functional(LF) (N ¼ 260) groups. We found a significantly (two-sidedFisher’s exact test, P ¼ 1.1 10 3 ) higher proportion <strong>of</strong> essentialgenes <strong>in</strong> the HF group (41.28%) compared with the LFgroup (28.85%); <strong>in</strong>dicat<strong>in</strong>g that HF prote<strong>in</strong>s are more essentialthan the low functional prote<strong>in</strong>s. Hence, one would expect ahigher proportion <strong>of</strong> essential genes <strong>in</strong> core prote<strong>in</strong>s thanattachment prote<strong>in</strong>s. Surpris<strong>in</strong>gly, we observed a similar proportion<strong>of</strong> essential genes <strong>in</strong> core <strong>and</strong> attachment prote<strong>in</strong>s(two-sided Fisher’s exact test, <strong>Core</strong>: 38.66%, <strong>Attachment</strong>s:34.51%, P ¼ 2.7 10 1 ). Moreover, we found that core<strong>and</strong> attachment prote<strong>in</strong>s exhibit similar essentiality (averageDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013 1369
Chakraborty <strong>and</strong> GhoshGBEFIG. 2.—Average evolutionary rates (dN/dS) <strong>of</strong> core <strong>and</strong> attachment prote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparison performed by two-tailedMann–Whitney U test.Table 1Average Multifunctionality <strong>and</strong> Pathways <strong>of</strong> <strong>Core</strong> <strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong>Database Used <strong>Core</strong> <strong>Attachment</strong> Significance Level (P)KEGG 1.8992 (þ0.1209) (N ¼ 129) 1.6994 (þ0.0939) (N ¼ 173) 3.3 10 2MIPS 2.6627 (þ0.0873) (N ¼ 335) 2.4174 (þ0.0801) (N ¼ 321) 4.4 10 2GO-BP 4.2068 (þ0.1934) (N ¼ 266) 3.3746 (þ0.1814) (N ¼ 283) 2.1 10 5essentiality <strong>of</strong> core prote<strong>in</strong>s: 0.1496 ( þ 0.0091) (N ¼ 214),attachment prote<strong>in</strong>s: 0.1685 (þ0.0086) (N ¼ 213); Mann–Whitney U test, P ¼ 5.7 10 2 ). The neutral theory <strong>of</strong> molecularevolution postulates that functionally important genesevolve more slowly than the less important genes (Hirsh <strong>and</strong>Fraser 2001; Jordan et al. 2002; Liao et al. 2006). In ourdata set when we separated the genes <strong>in</strong>to essential <strong>and</strong>nonessential genes, we also found that essential genesevolve slower than nonessential genes (average evolutionaryrate <strong>of</strong> essential genes: 0.0735 (þ0.0039) (N ¼ 251), nonessentialgenes: 0.0904 (þ0.0032) (N ¼ 416); Mann–Whitney Utest, P ¼ 8.5 10 5 ). Moreover, when we separated essential<strong>and</strong> nonessential genes <strong>in</strong>to core <strong>and</strong> attachment prote<strong>in</strong>s, wefound that the evolutionary rates <strong>of</strong> attachment prote<strong>in</strong>s aresignificantly lower than core prote<strong>in</strong>s <strong>in</strong> both the gene pools(fig. 2). Therefore, the observed evolutionary rate differencesbetween core <strong>and</strong> attachment prote<strong>in</strong>s are <strong>in</strong>dependent <strong>of</strong>gene essentiality. Previously, Hurst <strong>and</strong> Smith (1999) demonstratedthat gene essentiality is a very weak correlate <strong>of</strong> therates <strong>of</strong> prote<strong>in</strong> evolution. They observed that the “nonimmunenon-essential genes” (“non-immune” genes refersto those genes that do not belong to the immune system)evolve at a similar rate to the “essential genes.” Even <strong>in</strong> thisstudy we also observed no significant differences <strong>in</strong> evolutionaryrates between “essential core prote<strong>in</strong>s” <strong>and</strong> “nonessentialattachment prote<strong>in</strong>s” (average evolutionary rates<strong>of</strong> essential core prote<strong>in</strong>s: 0.0760 (þ0.0042) (N ¼ 137), nonessentialattachment prote<strong>in</strong>s: 0.0818 (þ0.0050) (N ¼ 207);Mann–Whitney U test, P ¼ 1.0 10 1 ). These results may <strong>in</strong>dicatethat there are some other constra<strong>in</strong>ts which can expla<strong>in</strong>the variation <strong>in</strong> rates <strong>of</strong> prote<strong>in</strong> evolution better than the geneessentiality or multifunctionality.Prote<strong>in</strong> Connectivity <strong>and</strong> <strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> <strong>Core</strong><strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong>In general, a biological system <strong>in</strong> a cell can be considered as acomplex network, <strong>and</strong> the PPIs are one such network that actsas the source <strong>of</strong> many biological functions (Mete et al. 2008).<strong>Prote<strong>in</strong>s</strong> with many <strong>in</strong>teraction partners are expected to bemultifunctional. We analyzed the correlation between thenumber <strong>of</strong> <strong>in</strong>teract<strong>in</strong>g partners <strong>of</strong> core/attachment prote<strong>in</strong>s(whose GO-BP annotations <strong>and</strong> connectivity data were available)<strong>and</strong> its multifunctionality. As expected, we found asignificant positive correlation (Spearman’s r GO-BP vs. connectivity¼ 0.3005, P ¼ 1.0 10 6 ) between GO-BP <strong>and</strong> connectivity<strong>in</strong> the PPI network. Previously, it has been reported that theprote<strong>in</strong> connectivity <strong>in</strong> a PPI network is negatively correlatedwith the evolutionary rates (Hirsh <strong>and</strong> Fraser 2001; Fraser et al.2002, 2003; Jordan et al. 2002; Hahn <strong>and</strong> Kern 2005; Lemoset al. 2005; Chakraborty et al. 2010, 2011; Alvarez-Ponce <strong>and</strong>Fares 2012). Similar trend has been also observed <strong>in</strong> our core/attachment data set (Spearman’s r connectivity vs. dN/dS ¼ 0.1907, P ¼ 1.0 10 6 ). The above observations motivatedus to <strong>in</strong>vestigate if there exists any variation <strong>of</strong> connectivity<strong>in</strong> core <strong>and</strong> attachment prote<strong>in</strong>s or not. Surpris<strong>in</strong>gly, wedid not f<strong>in</strong>d any significant difference (average connectivity <strong>of</strong>core prote<strong>in</strong>s: 8.0364 (þ0.3565) (N ¼ 357), attachment prote<strong>in</strong>s:13.8260 (þ2.3445) (N ¼ 339); Mann–Whitney U test,P ¼ 5.4 10 1 ) <strong>in</strong> the connectivity between core <strong>and</strong> attachmentprote<strong>in</strong>s. This result <strong>in</strong>dicates that the observedDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 20141370 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013
<strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> Yeast Prote<strong>in</strong> ComplexesGBEFIG. 3.—Average evolutionary rates (dN/dS) <strong>of</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s <strong>in</strong> two b<strong>in</strong>es. The statistical comparison performed by two-tailedMann–Whitney U test.difference <strong>in</strong> prote<strong>in</strong> evolution between core <strong>and</strong> attachmentprote<strong>in</strong>s is <strong>in</strong>dependent <strong>of</strong> prote<strong>in</strong> connectivity. To confirm theabove observation, we b<strong>in</strong>ned core <strong>and</strong> attachment prote<strong>in</strong>s<strong>in</strong>to two groups accord<strong>in</strong>g to their connectivity <strong>in</strong> the PPI network<strong>and</strong> analyzed their evolutionary rates. Interest<strong>in</strong>gly, wefound that core prote<strong>in</strong>s are still evolv<strong>in</strong>g at a higher rate thanattachment prote<strong>in</strong>s <strong>in</strong> same PPI b<strong>in</strong> (fig. 3). Thus we can saythat the observed difference <strong>in</strong> evolutionary rates betweencore <strong>and</strong> attachment prote<strong>in</strong>s may be steered by factorsother than the prote<strong>in</strong> connectivity.Prote<strong>in</strong> Complex Number <strong>and</strong> <strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong><strong>Core</strong> <strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong>Earlier, it has been demonstrated that prote<strong>in</strong> complexnumber significantly modulates the rates <strong>of</strong> prote<strong>in</strong> evolutionwhen compared with prote<strong>in</strong> connectivity <strong>in</strong> PPI networks(Chakraborty et al. 2010; Das et al. 2013). Thus, we calculatedprote<strong>in</strong> complex number <strong>in</strong> both core <strong>and</strong> attachment prote<strong>in</strong>s.We observed that prote<strong>in</strong> complex number is significantlyhigher <strong>in</strong> attachment prote<strong>in</strong>s than that <strong>in</strong> coreprote<strong>in</strong>s (fig. 4a). Furthermore, we reconfirmed this result byus<strong>in</strong>g the data set <strong>of</strong> prote<strong>in</strong> complexes developed byBenschop et al. (2010) <strong>and</strong> found similar trends (fig. 4b).We also found a significant negative correlation between prote<strong>in</strong>complex number <strong>and</strong> dN/dS (table 2) which is <strong>in</strong> agreementwith our previously published results (Chakraborty et al.2010). Therefore, the prote<strong>in</strong> complex number could be thecause <strong>of</strong> evolutionary rate differences between core <strong>and</strong>attachment prote<strong>in</strong>s. Interest<strong>in</strong>gly, we have found considerabledifferences <strong>in</strong> the distribution <strong>of</strong> prote<strong>in</strong> complex numbersbetween core <strong>and</strong> attachment prote<strong>in</strong>s, rang<strong>in</strong>g from1 to 3 for core prote<strong>in</strong>s <strong>and</strong> from 1 to 22 for attachmentprote<strong>in</strong>s by consider<strong>in</strong>g the data set <strong>of</strong> Gav<strong>in</strong> et al. (2006).To verify the effect <strong>of</strong> prote<strong>in</strong> complex number <strong>in</strong> controll<strong>in</strong>gthe evolutionary rate differences between core <strong>and</strong> attachmentprote<strong>in</strong>s, we divided the whole data <strong>of</strong> attachment prote<strong>in</strong>s<strong>in</strong>to two groups; attachment prote<strong>in</strong>s with a prote<strong>in</strong>FIG. 4.—(a) Average prote<strong>in</strong> complex number (prote<strong>in</strong> complexnumber calculated by us<strong>in</strong>g the Gav<strong>in</strong> et al. [2006] data set) <strong>of</strong> core <strong>and</strong>attachment prote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparison performed bytwo-tailed Mann–Whitney U test. (b) Average prote<strong>in</strong> complex number(prote<strong>in</strong> complex number calculated by us<strong>in</strong>g the Benschop et al. [2010]data set) <strong>of</strong> core <strong>and</strong> attachment prote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparisonperformed by two-tailed Mann–Whitney U test.complex number from 1 to 3, named as Attach-I, <strong>and</strong> therest ones as Attach-II (prote<strong>in</strong> complex number ranges from4 to 22). Compar<strong>in</strong>g the prote<strong>in</strong> evolutionary rates betweencore <strong>and</strong> Attach-I prote<strong>in</strong>s, we found marg<strong>in</strong>ally significant(Mann–Whitney U test, P ¼ 5.3 10 2 ) differences betweenthem (fig. 5). At this po<strong>in</strong>t, it should be mentioned that there isonly one core prote<strong>in</strong> which has complex number 3, <strong>and</strong>when this core prote<strong>in</strong> was excluded from the data set <strong>and</strong>the rests were compared with attachment prote<strong>in</strong>s hav<strong>in</strong>gcomplex numbers 1<strong>and</strong> 2, we found no significant difference<strong>in</strong> their evolutionary rates (average evolutionary rates <strong>of</strong> coreprote<strong>in</strong>s: 0.0899 (þ0.0031) (N ¼ 347), attachment prote<strong>in</strong>s:0.0811 (þ0.0057) (N ¼ 114); Mann–Whitney U test,P ¼ 8.6 10 2 ). However, the evolutionary rate differenceswere significant between core <strong>and</strong> the Attach-II prote<strong>in</strong>s, asalso between Attach-I <strong>and</strong> Attach-II prote<strong>in</strong>s (fig. 5). Theseresults <strong>in</strong>dicate that the differences <strong>in</strong> evolutionary ratesbetween core <strong>and</strong> attachment prote<strong>in</strong>s disappear when thecomplex number is factored out.Prote<strong>in</strong> Complex Number <strong>and</strong> Expression LevelA large number <strong>of</strong> studies us<strong>in</strong>g organisms rang<strong>in</strong>g frombacteria to mammals (Akashi 2001; Subramanian <strong>and</strong>Downloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013 1371
Chakraborty <strong>and</strong> GhoshGBETable 2Spearman’s Correlation Analysis <strong>of</strong> Prote<strong>in</strong> Complex Number (Tak<strong>in</strong>g <strong>Core</strong> <strong>and</strong> <strong>Attachment</strong> <strong>Prote<strong>in</strong>s</strong> from Gav<strong>in</strong> et al. [2006]) versus<strong>Evolutionary</strong><strong>Rate</strong>s (dN/dS), Essentiality, Connectivity, Expression Level, Transcription <strong>Rate</strong>, <strong>and</strong> Translation <strong>Rate</strong>Prote<strong>in</strong>complexnumber(Gav<strong>in</strong> et al.2006)dN/dS Essentiality Connectivity ExpressionLevel(Holstegeet al. 1998)ExpressionLevel (Milleret al. 2011)Transcription<strong>Rate</strong> (Holstegeet al. 1998)Transcription<strong>Rate</strong> (Milleret al. 2011)Translation<strong>Rate</strong> (Aravaet al. 2003)r ¼ 0.1572 r ¼ 0.1659 r ¼ 0.0284 r ¼ 0.3987 r ¼ 0.3990 r ¼ 0.4032 r ¼ 0.3977 r ¼ 0.3239P¼ 4.2 10 5 P¼ 5.7 10 4 P¼ 4.5 10 1 P¼ 1.0 10 6 P¼ 1.0 10 6 P¼ 1.0 10 6 P¼ 1.0 10 6 P¼ 1.0 10 6N¼ 673 N¼ 428 N¼ 696 N¼ 687 N¼ 657 N¼ 649 N¼ 649 N¼ 659FIG. 5.—Average evolutionary rates (dN/dS) <strong>of</strong><strong>Core</strong>,Attach-I,<strong>and</strong>Attach-II prote<strong>in</strong>s <strong>in</strong> yeast. The pair wise comparison performed bytwo-tailed Mann–Whitney U test.Kumar 2004; Drummond et al. 2006) demonstrated that geneexpression levels are strongly correlated with prote<strong>in</strong> evolutionaryrates, <strong>and</strong> it has been hypothesized that highly expressedgenes are under strong selection pressure to reducemistranslation-<strong>in</strong>duced prote<strong>in</strong> misfold<strong>in</strong>g (Drummond et al.2005; Drummond <strong>and</strong> Wilke 2008). In our previous study, wefound that prote<strong>in</strong> complex number is one <strong>of</strong> the importantconstra<strong>in</strong>ts <strong>in</strong> guid<strong>in</strong>g prote<strong>in</strong> evolutionary rates <strong>and</strong> also observedthat prote<strong>in</strong>s those are associated with a large number<strong>of</strong> complexes (higher prote<strong>in</strong> complex number) have higherexpression levels (Chakraborty et al. 2010). Once we determ<strong>in</strong>edthe average expression level <strong>of</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s, we were able to show that attachment prote<strong>in</strong>s havea higher average expression level compared with that <strong>of</strong> coreprote<strong>in</strong>s (fig. 6a <strong>and</strong> b), <strong>and</strong> a strong significant positivecorrelation exists between prote<strong>in</strong> complex number <strong>and</strong>expression level (table 2). Moreover, we found that complexnumber <strong>and</strong> the expression level <strong>in</strong>dependently affectevolutionary rates (multivariate regression analysis:b expression level ¼ 0.0601, P ¼ 8.1 10 5 ; b complex number ¼0.1012, P ¼ 3.2 10 11 ), which is consistent with our previousstudy (Chakraborty et al. 2010). Accord<strong>in</strong>g toDrummond et al. (2005), the expression level <strong>of</strong> a prote<strong>in</strong><strong>in</strong>fluences the rates <strong>of</strong> prote<strong>in</strong> evolution through the translationevents. Furthermore, we know that prote<strong>in</strong> expressionlevel can be <strong>in</strong>creased by three possible steps by 1) <strong>in</strong>creas<strong>in</strong>gtranscription rate, 2) <strong>in</strong>creas<strong>in</strong>g translational rate, or 3) <strong>in</strong>creas<strong>in</strong>gboth transcription <strong>and</strong> translation rate. Here, we analyzedthe transcription rate <strong>and</strong> translation rate <strong>in</strong> the whole prote<strong>in</strong>complex data set <strong>and</strong> observed that the transcription rate <strong>and</strong>translation rate <strong>in</strong>creases with the <strong>in</strong>crement <strong>of</strong> prote<strong>in</strong> complexnumber (table 2). We also found that attachment prote<strong>in</strong>sshowed higher values <strong>of</strong> average transcription <strong>and</strong>translation rates than core prote<strong>in</strong>s (for Holstege data set: averagetranscription rate <strong>of</strong> core prote<strong>in</strong>s: 3.8349 (þ0.4394)(N ¼ 335), attachment prote<strong>in</strong>s: 20.9643 (þ2.0421)(N ¼ 322); Mann–Whitney U test, P ¼ 9.6 10 16 ; for Millerdata set: average transcription rate <strong>of</strong> core prote<strong>in</strong>s: 22.1872(þ0.9876) (N ¼ 329), attachment prote<strong>in</strong>s: 51.9403 (þ3.0335)(N ¼ 320); Mann–Whitney U test, P ¼ 6.9 10 16 ;<strong>and</strong>forArava data set: average translation rate <strong>of</strong> core prote<strong>in</strong>s:0.1893 (þ0.0296) (N ¼ 331), attachment prote<strong>in</strong>s: 0.9446(þ0.1325) (N ¼ 328); Mann–Whitney U test, P ¼ 5.8 10 9 ).Therefore, it is clear that prote<strong>in</strong>s with higher complex number<strong>in</strong>crease their transcription <strong>and</strong> translation rates, <strong>and</strong> this<strong>in</strong>creased transcription <strong>and</strong> translation rates enhance theoverall prote<strong>in</strong> expression levels. F<strong>in</strong>ally, the elevated expressionlevels decrease the evolutionary rates (Spearman’s r evolutionaryrate vs. expression level_Holstage ¼ 0.4894, P ¼ 1.0 10 6 ;Spearman’s r evolutionary rate vs. expression level_Miller ¼ 0.4857,P ¼ 1.0 10 6 ) due to avoidance <strong>of</strong> mistranslational-<strong>in</strong>ducedprote<strong>in</strong> misfold<strong>in</strong>g (Drummond et al. 2005, 2006; Drummond<strong>and</strong> Wilke 2008). Thus, when we controlled for expressionlevels, we did not obta<strong>in</strong> any significant difference <strong>in</strong> therates <strong>of</strong> prote<strong>in</strong> evolution between core <strong>and</strong> attachment prote<strong>in</strong>s(fig. 7).DiscussionIt has been reported that complex-form<strong>in</strong>g prote<strong>in</strong>s evolveslower than the noncomplex form<strong>in</strong>g prote<strong>in</strong>s (Teichmann2002), <strong>and</strong> we have also observed the same <strong>in</strong> yeast(Chakraborty et al. 2010). Moreover, with<strong>in</strong> the complexform<strong>in</strong>gprote<strong>in</strong>s, the subunits those are associated with alarge number <strong>of</strong> prote<strong>in</strong> complexes evolve slower than theprote<strong>in</strong>s associated with a lower number <strong>of</strong> prote<strong>in</strong> complexesDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 20141372 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013
<strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> Yeast Prote<strong>in</strong> ComplexesGBEFIG. 7.—Average evolutionary rates (dN/dS) <strong>of</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s <strong>in</strong> different b<strong>in</strong>s. The statistical comparison performed bytwo-tailed Mann–Whitney U test.FIG.6.—(a) Average expression level (us<strong>in</strong>g the Holstege et al. [1998]data set) <strong>of</strong> core <strong>and</strong> attachment prote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparisonperformed by two-tailed Mann–Whitney U test. (b) Average expressionlevel (us<strong>in</strong>g the Miller et al. [2011] data set) <strong>of</strong> core <strong>and</strong> attachmentprote<strong>in</strong>s <strong>in</strong> yeast. The statistical comparison performed by two-tailedMann–Whitney U test.(Chakraborty et al. 2010; Das et al. 2013). The prote<strong>in</strong> complexnumber showed a strong positive correlation with geneexpression levels <strong>and</strong> negative correlation with evolutionaryrates, suggest<strong>in</strong>g that prote<strong>in</strong> complex number is a significantfactor modulat<strong>in</strong>g prote<strong>in</strong> evolutionary rates. In this work, weutilized the yeast prote<strong>in</strong> complex data set (Gav<strong>in</strong> et al. 2006)for the classification perta<strong>in</strong><strong>in</strong>g to core <strong>and</strong> attachmentprote<strong>in</strong>s <strong>and</strong> analyzed their evolutionary rates to address the<strong>in</strong>fluence <strong>of</strong> gene dispensability, prote<strong>in</strong> multifunctionality,prote<strong>in</strong> connectivity, prote<strong>in</strong> complex number, <strong>and</strong> geneexpression level on prote<strong>in</strong> evolutionary rates.Analysis <strong>of</strong> complex form<strong>in</strong>g versus noncomplex prote<strong>in</strong>sdemonstrates that prote<strong>in</strong>s <strong>in</strong> complexes have higheressentiality than the noncomplex prote<strong>in</strong>s (average essentiality<strong>of</strong> complex prote<strong>in</strong>s: 0.1665 (þ0.0045) (N ¼ 867), noncomplexprote<strong>in</strong>s: 0.1002 (þ0.0014) (N ¼ 3,348); Mann–WhitneyU test, P ¼ 1.1 10 65 ). With<strong>in</strong> the complex-form<strong>in</strong>g prote<strong>in</strong>s,core prote<strong>in</strong>s were reported to be crucial for the function<strong>of</strong> the prote<strong>in</strong> complex assembly (Dezso et al. 2003; Gav<strong>in</strong>et al. 2006), <strong>in</strong>dicat<strong>in</strong>g that core prote<strong>in</strong>s contribute moretoward fitness <strong>of</strong> an organism. Moreover, it has been reportedthat essential genes are associated with multifunctionalfeatures <strong>of</strong> genes (Liao et al. 2006). Therefore, we measuredmultifunctionality <strong>of</strong> core <strong>and</strong> attachment prote<strong>in</strong>s <strong>and</strong> foundthat core prote<strong>in</strong>s are associated with a higher number <strong>of</strong>biological processes compared with attachment prote<strong>in</strong>s.However, <strong>in</strong> our data set, we did not observe any significantdifference <strong>in</strong> the essentiality between core <strong>and</strong> attachmentprote<strong>in</strong>s. These results <strong>in</strong>dicate that the essentiality <strong>of</strong> agiven gene or prote<strong>in</strong> does not depend only on its multifunctionality,perhaps there are some other parameters which <strong>in</strong>fluencethe gene/prote<strong>in</strong> essentiality. In this study, weobserved that attachments prote<strong>in</strong>s are present <strong>in</strong> more complexassemblies than core prote<strong>in</strong>s, which may <strong>in</strong>crease theiressentiality s<strong>in</strong>ce they modify or enhance the activity <strong>of</strong> a largenumber <strong>of</strong> complex assemblies. Therefore, one can reasonablyassume that prote<strong>in</strong> essentiality <strong>and</strong> prote<strong>in</strong> complex numberare <strong>in</strong>terrelated to each other. To verify our hypothesis, wedeterm<strong>in</strong>ed the correlation between gene essentiality <strong>and</strong> prote<strong>in</strong>complex number <strong>in</strong> our data set, <strong>and</strong> <strong>in</strong>deed, we founda strong positive association between them (Spearman’sr prote<strong>in</strong> complex number vs. essentiality ¼ 0.2681, P ¼ 1.0 10 6 ),which contradicts the results <strong>of</strong> Pache et al. (2009) that theessentiality does not depend on the prote<strong>in</strong> complex number.Thus both multifunctionality <strong>and</strong> prote<strong>in</strong> complex number <strong>in</strong>fluencethe gene/prote<strong>in</strong> essentiality. We have obta<strong>in</strong>ed a similarproportion <strong>of</strong> essential prote<strong>in</strong>s <strong>in</strong> both core <strong>and</strong>attachment prote<strong>in</strong>s suggest<strong>in</strong>g that core prote<strong>in</strong>s executehigher multifunctionality, whereas attachment prote<strong>in</strong>s havea higher prote<strong>in</strong> complex number. These results are consistentwith the recent study on fitness effect <strong>and</strong> prote<strong>in</strong> complexthat core <strong>and</strong> attachment prote<strong>in</strong>s are equally important forthefunction<strong>of</strong>anorganism(Pache et al. 2009). <strong>Evolutionary</strong>rate differences between core <strong>and</strong> attachment prote<strong>in</strong>s can be<strong>in</strong>terpreted <strong>in</strong> the framework <strong>of</strong> neutralist/selectionist controversy.<strong>Core</strong> prote<strong>in</strong>s evolve faster than attachment prote<strong>in</strong>s,although core prote<strong>in</strong>s are more multifunctional than attachmentprote<strong>in</strong>s, <strong>and</strong> these results are not compatible with theneutral theory <strong>of</strong> molecular evolution. However, attachmentprote<strong>in</strong>s hav<strong>in</strong>g higher expression levels should have higherimpact on essentiality, <strong>and</strong> thus the slowly evolv<strong>in</strong>g nature <strong>of</strong>Downloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013 1373
Chakraborty <strong>and</strong> GhoshGBEattachment prote<strong>in</strong>s than core prote<strong>in</strong>s can apparently beaccommodated <strong>in</strong> neutral theory. Moreover, we demonstratedthat core prote<strong>in</strong>s have higher multifunctionality <strong>and</strong>also have higher impact on gene/prote<strong>in</strong> essentiality. Indeed,we found no significant difference <strong>in</strong> gene essentialitybetween core <strong>and</strong> attachment prote<strong>in</strong>s. However, we observedsignificant evolutionary rate differences between core<strong>and</strong> attachment prote<strong>in</strong>s, which are <strong>in</strong>consistent with the neutraltheory <strong>of</strong> molecular evolution. Recently, Razeto-Barry et al.(2011) predicted that <strong>in</strong> a selective scenario, more multifunctionalprote<strong>in</strong>s should evolve faster than the less multifunctionalprote<strong>in</strong>s when there is no difference <strong>in</strong> the size <strong>of</strong> fitnesseffects. They also conclude that a higher number <strong>of</strong> mutationshave been fixed <strong>in</strong> more multifunctional prote<strong>in</strong>s by positiveselection. Our results confirm the prediction <strong>of</strong> Razeto-Barryet al. (2011).In this study, we also observed that prote<strong>in</strong> complexnumber has emerged as an important parameter expla<strong>in</strong><strong>in</strong>gthe differences <strong>in</strong> evolutionary rates between core <strong>and</strong> attachmentprote<strong>in</strong>s. However, attachment prote<strong>in</strong>s are lessmultifunctional, but they participate <strong>in</strong> a high number <strong>of</strong> prote<strong>in</strong>complexes <strong>and</strong> thus their transcription rates <strong>and</strong> translationrates <strong>in</strong>crease. The <strong>in</strong>creased transcription rate <strong>and</strong>translation rate <strong>in</strong> turn enhances the overall prote<strong>in</strong> expressionlevels <strong>and</strong> the expression level imposes a strong selection pressureon attachment prote<strong>in</strong>s. Previously, Drummond et al.(2005) also demonstrated that the majority <strong>of</strong> the variation<strong>in</strong> prote<strong>in</strong> evolution is ma<strong>in</strong>ly guided by the expression level.This is further confirmed by our study, when we control theexpression level or complex number, the difference <strong>in</strong> evolutionaryrates between core <strong>and</strong> attachment prote<strong>in</strong>s disappears.Hence, from our analysis, we showed that the rates<strong>of</strong> prote<strong>in</strong> evolution between core <strong>and</strong> attachment prote<strong>in</strong>sare ma<strong>in</strong>ly guided by prote<strong>in</strong> complex number <strong>and</strong> expressionlevel. To f<strong>in</strong>d out the relative <strong>in</strong>fluence <strong>in</strong> evolutionary rates,we performed a multivariate regression analysis <strong>and</strong> foundthat both expression level <strong>and</strong> prote<strong>in</strong> complex number <strong>in</strong>dependentlycontrol the evolutionary rate but essentiality did notshow any significant contribution.Supplementary MaterialSupplementary tables S1–S3 areavailableatGenome Biology<strong>and</strong> Evolution onl<strong>in</strong>e (http://www.gbe.oxfordjournals.org/).AcknowledgmentsThis work was supported by the Department <strong>of</strong> Biotechnology,Government <strong>of</strong> India <strong>and</strong> Department <strong>of</strong> Science <strong>and</strong>Technology, Government <strong>of</strong> India (Sanction No. BT/PR692/BID/7/369/2011). The authors are thankful to Pr<strong>of</strong>essorGiorgio Bernardi, Department <strong>of</strong> Biology, University Rome 3,Viale Marconi 446, Rome 00146, Italy, <strong>and</strong> Dr Bratati Kahali,Department <strong>of</strong> Internal Medic<strong>in</strong>e, Division <strong>of</strong> Gastroenterology<strong>and</strong> Department <strong>of</strong> Computational Medic<strong>in</strong>e <strong>and</strong>Bio<strong>in</strong>formatics, University <strong>of</strong> Michigan Medical School, AnnArbor, MI 48109 for critical read<strong>in</strong>g <strong>of</strong> the manuscript. Theyare also thankful to the two anonymous reviewers for theirvaluable comments to improve the manuscript.Literature CitedAkashi H. 2001. Gene expression <strong>and</strong> molecular evolution. Curr Op<strong>in</strong>Genet Dev. 11:660–666.Altschul SF, et al. 1997. Gapped BLAST <strong>and</strong> PSI-BLAST: a new generation<strong>of</strong> prote<strong>in</strong> database search programs. Nucleic Acids Res. 25:3389–3402.Alvarez-Ponce D, Fares MA. 2012. <strong>Evolutionary</strong> rate <strong>and</strong> duplicability <strong>in</strong> thearabidopsis thaliana prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teraction network. Genome BiolEvol. 4:1263–1274.Arava Y, et al. 2003. Genome-wide analysis <strong>of</strong> mRNA translation pr<strong>of</strong>iles <strong>in</strong>Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 100:3889–3894.Batada NN, Hurst LD, Tyers M. 2006. <strong>Evolutionary</strong> <strong>and</strong> physiologicalimportance <strong>of</strong> hub prote<strong>in</strong>s. PLoS Comput Biol. 2:748–756.Batagelj V, Mrvar A. 2004. Pajek—analysis <strong>and</strong> visualization <strong>of</strong> large networks.In: Jünger M, Mutzel P, editors. Graph draw<strong>in</strong>g s<strong>of</strong>tware. Berl<strong>in</strong>Heidelberg (Germany): Spr<strong>in</strong>ger. p. 77–103.Benschop JJ, et al. 2010. A consensus <strong>of</strong> core prote<strong>in</strong> complex compositionsfor Saccharomyces cerevisiae. Mol Cell. 38:916–928.Camon E, et al. 2004. The Gene Ontology Annotation (GOA) Database:shar<strong>in</strong>g knowledge <strong>in</strong> Uniprot with Gene Ontology. Nucleic Acids Res.32:D262–D266.Chakraborty S, Kahali B, Ghosh TC. 2010. Prote<strong>in</strong> complex form<strong>in</strong>g abilityis favored over the features <strong>of</strong> <strong>in</strong>teract<strong>in</strong>g partners <strong>in</strong> determ<strong>in</strong><strong>in</strong>g theevolutionary rates <strong>of</strong> prote<strong>in</strong>s <strong>in</strong> the yeast prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teractionnetworks. BMC Syst Biol. 4:155.Chakraborty S, et al. 2011. Insights <strong>in</strong>to eukaryotic <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong>evolution. In: Pontarotti P, editor. <strong>Evolutionary</strong> biology: concepts, biodiversity,macroevolution <strong>and</strong> genome evolution. Berl<strong>in</strong> Heidelberg(Germany): Spr<strong>in</strong>ger. p. 51–70.Chen W-H, M<strong>in</strong>guez P, Lercher MJ, Bork P. 2012. OGEE: an onl<strong>in</strong>e geneessentiality database. Nucleic Acids Res. 40:901–906.Cherry JM, et al. 2012. Saccharomyces Genome Database: the genomicsresource <strong>of</strong> budd<strong>in</strong>g yeast. Nucleic Acids Res. 40:D700–D705.Das J, Chakraborty S, Podder S, Ghosh TC. 2013. Complex-form<strong>in</strong>g prote<strong>in</strong>sescape the robust regulations <strong>of</strong> miRNA <strong>in</strong> human. FEBS Lett.587:2284–2287.Deane CM, Salw<strong>in</strong>ski L, Xenarios I, Eisenberg D. 2002. Prote<strong>in</strong><strong>in</strong>teractions—two methods for assessment <strong>of</strong> the reliability <strong>of</strong> highthroughput observations. Mol Cell Proteomics. 1:349–356.Dezso Z, Oltvai ZN, Barabási AL. 2003. Bio<strong>in</strong>formatics analysis <strong>of</strong> experimentallydeterm<strong>in</strong>ed prote<strong>in</strong> complexes <strong>in</strong> the yeast Saccharomycescerevisiae. Genome Res. 13:2450–2454.Dolan ME, Ni L, Camon E, Blake JA. 2005. A procedure for assess<strong>in</strong>g GOannotation consistency. Bio<strong>in</strong>formatics 21:I136–I143.Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. 2005. Whyhighly expressed prote<strong>in</strong>s evolve slowly. Proc Natl Acad Sci U S A. 102:14338–14343.Drummond DA, Raval A, Wilke CO. 2006. A s<strong>in</strong>gle determ<strong>in</strong>ant dom<strong>in</strong>atesthe rate <strong>of</strong> yeast prote<strong>in</strong> evolution. Mol Biol Evol. 23:327–337.Drummond DA, Wilke CO. 2008. Mistranslation-<strong>in</strong>duced prote<strong>in</strong> misfold<strong>in</strong>gas a dom<strong>in</strong>ant constra<strong>in</strong>t on cod<strong>in</strong>g-sequence evolution. Cell 134:341–352.Fisher RA. 1930. The genetical theory <strong>of</strong> natural selection. Oxford:Clarendon Press.Flicek P, et al. 2011. Ensembl 2011. Nucleic Acids Res. 39:D800–D806.Downloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 20141374 Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013
<strong>Evolutionary</strong> <strong>Rate</strong>s <strong>of</strong> Yeast Prote<strong>in</strong> ComplexesGBEFraser HB, Hirsh AE, Ste<strong>in</strong>metz LM, Scharfe C, Feldman MW. 2002.<strong>Evolutionary</strong> rate <strong>in</strong> the prote<strong>in</strong> <strong>in</strong>teraction network. Science 296:750–752.Fraser HB, Wall DP, Hirsh AE. 2003. A simple dependence between prote<strong>in</strong>evolution rate <strong>and</strong> the number <strong>of</strong> prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teractions. BMCEvol Biol. 3:11.Gav<strong>in</strong> AC, et al. 2006. Proteome survey reveals modularity <strong>of</strong> the yeast cellmach<strong>in</strong>ery. Nature 440:631–636.Guldener U, et al. 2005. CYGD: the comprehensive Yeast GenomeDatabase. Nucleic Acids Research 33:D364–D368.Hahn MW, Kern AD. 2005. Comparative genomics <strong>of</strong> centrality <strong>and</strong>essentiality <strong>in</strong> three eukaryotic prote<strong>in</strong>-<strong>in</strong>teraction networks. Mol BiolEvol. 22:803–806.Hart GT, Lee I, Marcotte ER. 2007. A high-accuracy consensus map <strong>of</strong>yeast prote<strong>in</strong> complexes reveals modular nature <strong>of</strong> gene essentiality.BMC Bio<strong>in</strong>formatics 8:236.Hirsh AE, Fraser HB. 2001. Prote<strong>in</strong> dispensability <strong>and</strong> rate <strong>of</strong> evolution.Nature 411:1046–1049.Holstege FCP, et al. 1998. Dissect<strong>in</strong>g the regulatory circuitry <strong>of</strong> a eukaryoticgenome. Cell 95:717–728.Hurst LD, Smith NGC. 1999. Do essential genes evolve slowly? Curr Biol. 9:747–750.Jordan IK, Rogoz<strong>in</strong> IB, Wolf YI, Koon<strong>in</strong> EV. 2002. Essential genes are moreevolutionarily conserved than are nonessential genes <strong>in</strong> bacteria.Genome Res. 12:962–968.Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. 2012. KEGG for<strong>in</strong>tegration <strong>and</strong> <strong>in</strong>terpretation <strong>of</strong> large-scale molecular data sets.Nucleic Acids Res. 40:D109–D114.Kellis M, Patterson N, Endrizzi M, Birren B, L<strong>and</strong>er ES. 2003. Sequenc<strong>in</strong>g<strong>and</strong> comparison <strong>of</strong> yeast species to identify genes <strong>and</strong> regulatoryelements. Nature 423:241–254.Kerrien S, et al. 2012. The IntAct molecular <strong>in</strong>teraction database <strong>in</strong> 2012.Nucleic Acids Res. 40:D841–D846.Kimura M. 1983. The neutral theory <strong>of</strong> molecular evolution. Cambridge:Cambridge University Press.Kimura M, Ohta T. 1974. Some pr<strong>in</strong>ciples govern<strong>in</strong>g molecular evolution.Proc Natl Acad Sci U S A. 71:2848–2852.Lark<strong>in</strong> MA, et al. 2007. Clustal W <strong>and</strong> clustal X version 2.0. Bio<strong>in</strong>formatics23:2947–2948.Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL. 2005. Evolution <strong>of</strong>prote<strong>in</strong>s <strong>and</strong> gene expression levels are coupled <strong>in</strong> Drosophila <strong>and</strong> are<strong>in</strong>dependently associated with mRNA abundance, prote<strong>in</strong> length, <strong>and</strong>number <strong>of</strong> prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teractions. Mol Biol Evol. 22:1345–1354.Liao B-Y, Scott NM, Zhang J. 2006. Impacts <strong>of</strong> gene essentiality, expressionpattern, <strong>and</strong> gene compactness on the evolutionary rate <strong>of</strong> mammalianprote<strong>in</strong>s. Mol Biol Evol. 23:2072–2080.Licata L, et al. 2012. MINT, the molecular <strong>in</strong>teraction database: 2012update. Nucleic Acids Res. 40:D857–D861.Mak<strong>in</strong>o T, Gojobori T. 2006. The evolutionary rate <strong>of</strong> a prote<strong>in</strong> is <strong>in</strong>fluencedby features <strong>of</strong> the <strong>in</strong>teract<strong>in</strong>g partners. Mol Biol Evol. 23:784–789.Mete M, Tang F, Xu X, Yuruk N. 2008. A structural approach for f<strong>in</strong>d<strong>in</strong>gfunctional modules from large biological networks. BMCBio<strong>in</strong>formatics 9:S19.Mewes HW, et al. 2011. MIPS: curated databases <strong>and</strong> comprehensivesecondary data resources <strong>in</strong> 2010. Nucleic Acids Res. 39:D220–D224.Miller C, et al. 2011. Dynamic transcriptome analysis measures rates <strong>of</strong>mRNA synthesis <strong>and</strong> decay <strong>in</strong> yeast. Mol Syst Biol. 7:458.Naumov G, Naumova ES, Lantto RA, Louis EJ, Korhola M. 1992. Genetichomology between Saccharomyces cerevisiae <strong>and</strong> its sibl<strong>in</strong>gspecies Saccharomyces paradoxus <strong>and</strong> Saccharomyces bayanus—electrophoretic karyotypes. Yeast 8:599–612.Nie NH, Bent DH, Hull CH. 1970. SPSS: statistical package for the socialsciences. NewYork: McGraw-Hill.Pache RA, Babu MM, Aloy P. 2009. Exploit<strong>in</strong>g gene deletion fitness effects<strong>in</strong> yeast to underst<strong>and</strong> the modular architecture <strong>of</strong> prote<strong>in</strong> complexesunder different growth conditions. BMC Syst Biol. 3:74.Pál C, Papp B, Lercher MJ. 2006. An <strong>in</strong>tegrated view <strong>of</strong> prote<strong>in</strong> evolution.Nat Rev Genet. 7:337–348.Pang CNI, Krycer JR, Lek A, Wilk<strong>in</strong>s MR. 2008. Are prote<strong>in</strong>complexes made <strong>of</strong> cores, modules <strong>and</strong> attachments? Proteomics 8:425–434.ParkYR,KimJ,LeeHW,YoonYJ,KimJH.2011.GOChase-II:correct<strong>in</strong>gsemantic <strong>in</strong>consistencies from gene ontology-based annotations forgene products. BMC Bio<strong>in</strong>formatics 12:S40.Pereira-Leal JB, Levy ED, Teichmann SA. 2006. The orig<strong>in</strong>s <strong>and</strong> evolution <strong>of</strong>functional modules: lessons from prote<strong>in</strong> complexes. Philos Trans RSoc Lond B Biol Sci. 361:507–517.Podder S, Mukhopadhyay P, Ghosh TC. 2009. Multifunctionality dom<strong>in</strong>antlydeterm<strong>in</strong>es the rate <strong>of</strong> human housekeep<strong>in</strong>g <strong>and</strong> tissue specific<strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> evolution. Gene 439:11–24.Popescu CE, Borza T, Bielawski JP, Lee RW. 2006. <strong>Evolutionary</strong> rates <strong>and</strong>expression level <strong>in</strong> Chlamydomonas. Genetics 172:1567–1576.Pu S, Vlasblom J, Emili A, Greenblatt J, Wodak SJ. 2007. Identify<strong>in</strong>gfunctional modules <strong>in</strong> the physical <strong>in</strong>teractome <strong>of</strong> Saccharomycescerevisiae. Proteomics 7:944–960.Qiu J, Noble WS. 2008. Predict<strong>in</strong>g co-complexed prote<strong>in</strong> pairs fromheterogeneous data. PLoS Comput Biol. 4:e1000054.Razeto-Barry P, Diaz J, Cotoras D, Vasquez RA. 2011. Molecular evolution,mutation size <strong>and</strong> gene pleiotropy: a geometric reexam<strong>in</strong>ation.Genetics 187:877–885.Rokotomalala R. 2005. TANAGRA: a free s<strong>of</strong>tware for research <strong>and</strong> academicpurposes. Advances <strong>in</strong> grid comput<strong>in</strong>g. EGC 2005. In:Proceed<strong>in</strong>gs <strong>of</strong> European Grid Conference; 2005 Feb 14–16;Amsterdam, The Netherl<strong>and</strong>s. Berl<strong>in</strong> (Germany): Spr<strong>in</strong>ger. Vol. 2,p. 697–702.Salathé M, Ackermann M, Bonhoeffer S. 2006. The effect <strong>of</strong> multifunctionalityon the rate <strong>of</strong> evolution <strong>in</strong> yeast. Mol Biol Evol. 23:721–722.Salw<strong>in</strong>ski L, et al. 2004. The Database <strong>of</strong> Interact<strong>in</strong>g <strong>Prote<strong>in</strong>s</strong>: 2004update. Nucleic Acids Res. 32:D449–D451.Semple JI, Vavouri T, Lehner B. 2008. A simple pr<strong>in</strong>ciple concern<strong>in</strong>g therobustness <strong>of</strong> prote<strong>in</strong> complex activity to changes <strong>in</strong> gene expression.BMC Syst Biol. 2:1.Ste<strong>in</strong>metz LM, et al. 2002. Systematic screen for human disease genes <strong>in</strong>yeast. Nat Genet. 31:400–404.Su Z, Zeng Y, Gu X. 2010. A prelim<strong>in</strong>ary analysis <strong>of</strong> gene pleiotropyestimated from prote<strong>in</strong> sequences. J Exp Zool B Mol Dev Evol. 314:115–122.Subramanian S, Kumar S. 2004. Gene expression <strong>in</strong>tensity shapes evolutionaryrates <strong>of</strong> the prote<strong>in</strong>s encoded by the vertebrate genome.Genetics 168:373–381.Teichmann SA. 2002. The constra<strong>in</strong>ts prote<strong>in</strong>-prote<strong>in</strong> <strong>in</strong>teractions place onsequence divergence. J Mol Biol. 324:399–407.Wilson AC, Carlson SS, White TJ. 1977. Biochemical evolution. Annu RevBiochem. 46:573–639.Yang L, Gaut BS. 2011. Factors that contribute to variation <strong>in</strong> evolutionaryrate among Arabidopsis genes. Mol Biol Evol. 28:2359–2369.Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. MolBiol Evol. 24:1586–1591.Yang ZH, Nielsen R. 2000. Estimat<strong>in</strong>g synonymous <strong>and</strong> nonsynonymoussubstitution rates under realistic evolutionary models. Mol Biol Evol.17:32–43.Associate editor: Takashi GojoboriDownloaded from http://gbe.oxfordjournals.org/ by guest on August 10, 2014Genome Biol. Evol. 5(7):1366–1375. doi:10.1093/gbe/evt096 Advance Access publication June 27, 2013 1375