11.07.2015 Views

Comparing genomes to computer operating systems in terms of the ...

Comparing genomes to computer operating systems in terms of the ...

Comparing genomes to computer operating systems in terms of the ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Compar<strong>in</strong>g</strong> <strong>genomes</strong> <strong>to</strong> <strong>computer</strong> <strong>operat<strong>in</strong>g</strong> <strong>systems</strong><strong>in</strong> <strong>terms</strong> <strong>of</strong> <strong>the</strong> <strong>to</strong>pology and evolution <strong>of</strong><strong>the</strong>ir regula<strong>to</strong>ry control networksKoon-Kiu Yan a , Gang Fang a , Nit<strong>in</strong> Bhardwaj a , Roger P. Alexander a , and Mark Gerste<strong>in</strong> b,a,c,1bProgram <strong>in</strong> Computational Biology and Bio<strong>in</strong>formatics, a Department <strong>of</strong> Molecular Biophysics and Biochemistry, and c Department <strong>of</strong> Computer Science,Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520Edited* by Gregory A. Petsko, Brandeis University, Waltham, MA, and approved April 2, 2010 (received for review December 20, 2009)The genome has <strong>of</strong>ten been called <strong>the</strong> <strong>operat<strong>in</strong>g</strong> system (OS)for a liv<strong>in</strong>g organism. A <strong>computer</strong> OS is described by a regula<strong>to</strong>rycontrol network termed <strong>the</strong> call graph, which is analogous <strong>to</strong> <strong>the</strong>transcriptional regula<strong>to</strong>ry network <strong>in</strong> a cell. To apply our firsthandknowledge <strong>of</strong> <strong>the</strong> architecture <strong>of</strong> s<strong>of</strong>tware <strong>systems</strong> <strong>to</strong> understandcellular design pr<strong>in</strong>ciples, we present a comparison between <strong>the</strong>transcriptional regula<strong>to</strong>ry network <strong>of</strong> a well-studied bacterium(Escherichia coli) and <strong>the</strong> call graph <strong>of</strong> a canonical OS (L<strong>in</strong>ux) <strong>in</strong><strong>terms</strong> <strong>of</strong> <strong>to</strong>pology and evolution. We show that both networkshave a fundamentally hierarchical layout, but <strong>the</strong>re is a key difference:The transcriptional regula<strong>to</strong>ry network possesses a few globalregula<strong>to</strong>rs at <strong>the</strong> <strong>to</strong>p and many targets at <strong>the</strong> bot<strong>to</strong>m; conversely,<strong>the</strong> call graph has many regula<strong>to</strong>rs controll<strong>in</strong>g a small set <strong>of</strong> genericfunctions. This <strong>to</strong>p-heavy organization leads <strong>to</strong> highly overlapp<strong>in</strong>gfunctional modules <strong>in</strong> <strong>the</strong> call graph, <strong>in</strong> contrast <strong>to</strong> <strong>the</strong> relatively <strong>in</strong>dependentmodules <strong>in</strong> <strong>the</strong> regula<strong>to</strong>ry network. We fur<strong>the</strong>r developa way <strong>to</strong> measure evolutionary rates comparably between <strong>the</strong> twonetworks and expla<strong>in</strong> this difference <strong>in</strong> <strong>terms</strong> <strong>of</strong> network evolution.The process <strong>of</strong> biological evolution via random mutation andsubsequent selection tightly constra<strong>in</strong>s <strong>the</strong> evolution <strong>of</strong> regula<strong>to</strong>rynetwork hubs. The call graph, however, exhibits rapid evolution<strong>of</strong> its highly connected generic components, made possible by designers’cont<strong>in</strong>ual f<strong>in</strong>e-tun<strong>in</strong>g. These f<strong>in</strong>d<strong>in</strong>gs stem from <strong>the</strong> designpr<strong>in</strong>ciples <strong>of</strong> <strong>the</strong> two <strong>systems</strong>: robustness for biological <strong>systems</strong> andcost effectiveness (reuse) for s<strong>of</strong>tware <strong>systems</strong>.<strong>systems</strong> biology ∣ adaptive complex <strong>systems</strong>Complex <strong>systems</strong> are characterized by <strong>in</strong>teractions among hugenumbers <strong>of</strong> heterogeneous constituents. In particular, manycomplex <strong>systems</strong> are adaptive, mean<strong>in</strong>g <strong>the</strong> <strong>in</strong>terconnections areshaped progressively by a chang<strong>in</strong>g environment. The driv<strong>in</strong>gforces <strong>of</strong> adaptation are common design pr<strong>in</strong>ciples such as <strong>the</strong>reduction <strong>of</strong> cost and <strong>the</strong> enhancement <strong>of</strong> system robustness(1). Optimal solutions are determ<strong>in</strong>ed by trade-<strong>of</strong>fs betweenconflict<strong>in</strong>g pr<strong>in</strong>ciples and <strong>the</strong>refore vary from system <strong>to</strong> system.Over <strong>the</strong> past decade, <strong>the</strong> study <strong>of</strong> networks has emerged as an<strong>in</strong>terdiscipl<strong>in</strong>ary research field aim<strong>in</strong>g <strong>to</strong> discover <strong>the</strong> underly<strong>in</strong>gpr<strong>in</strong>ciples <strong>of</strong> complex <strong>systems</strong> and <strong>to</strong> develop <strong>to</strong>ols or algorithmsfor analyz<strong>in</strong>g <strong>the</strong>m. By captur<strong>in</strong>g <strong>the</strong> <strong>in</strong>terconnections between<strong>in</strong>dividual components, networks not only serve as backbones<strong>to</strong> study <strong>the</strong> emergent properties <strong>of</strong> complex <strong>systems</strong>, but <strong>the</strong>yalso provide an abstract framework that facilitates <strong>the</strong> crossdiscipl<strong>in</strong>arycomparison <strong>of</strong> different adaptive complex <strong>systems</strong>,rang<strong>in</strong>g from biological <strong>systems</strong> <strong>to</strong> technological ones (2).Cross-discipl<strong>in</strong>ary comparison between biological <strong>systems</strong> andcommonplace <strong>systems</strong> such as organization hierarchies (3, 4)and eng<strong>in</strong>eer<strong>in</strong>g devices should be <strong>of</strong> particular <strong>in</strong>terest <strong>to</strong> <strong>systems</strong>biologists. Despite tremendous advancement <strong>in</strong> high-throughputexperiments and computational algorithms, <strong>the</strong> study <strong>of</strong> biological<strong>systems</strong> <strong>in</strong> general still suffers from limitations <strong>in</strong> accuracy andcompleteness <strong>of</strong> data. Insights ga<strong>in</strong>ed from <strong>systems</strong> <strong>in</strong> whichwe have direct access and thorough understand<strong>in</strong>g can leverageour knowledge <strong>to</strong> biological ones.Like biological <strong>systems</strong>, s<strong>of</strong>tware <strong>systems</strong> such as a <strong>computer</strong><strong>operat<strong>in</strong>g</strong> system (OS) are adaptive <strong>systems</strong> undergo<strong>in</strong>g evolution.Whereas <strong>the</strong> evolution <strong>of</strong> biological <strong>systems</strong> is subject <strong>to</strong>natural selection, <strong>the</strong> evolution <strong>of</strong> s<strong>of</strong>tware <strong>systems</strong> is under<strong>the</strong> constra<strong>in</strong>ts <strong>of</strong> hardware architecture and cus<strong>to</strong>mer requirements.S<strong>in</strong>ce <strong>the</strong> pioneer<strong>in</strong>g work <strong>of</strong> Lehman (5), <strong>the</strong> evolutionarypressure on s<strong>of</strong>tware has been studied among eng<strong>in</strong>eers.Interest<strong>in</strong>gly enough, biological and s<strong>of</strong>tware <strong>systems</strong> both execute<strong>in</strong>formation process<strong>in</strong>g tasks. Whereas biological <strong>in</strong>formationprocess<strong>in</strong>g is mediated by complex <strong>in</strong>teractions betweengenes, prote<strong>in</strong>s, and various small molecules, s<strong>of</strong>tware <strong>systems</strong>exhibit a comparable level <strong>of</strong> complexity <strong>in</strong> <strong>the</strong> <strong>in</strong>terconnectionsbetween functions. Understand<strong>in</strong>g <strong>the</strong> structure and evolution <strong>of</strong><strong>the</strong>ir underly<strong>in</strong>g networks sheds light on <strong>the</strong> design pr<strong>in</strong>ciples <strong>of</strong>both natural and man-made <strong>in</strong>formation process<strong>in</strong>g <strong>systems</strong>.The master control plan <strong>of</strong> a cell is its transcriptional regula<strong>to</strong>rynetwork. The transcriptional regula<strong>to</strong>ry network coord<strong>in</strong>atesgene expression <strong>in</strong> response <strong>to</strong> environmental and <strong>in</strong>tracellularsignals, result<strong>in</strong>g <strong>in</strong> <strong>the</strong> execution <strong>of</strong> cellular processes such as celldivisions and metabolism. Understand<strong>in</strong>g how cellular controlprocesses are orchestrated by transcription fac<strong>to</strong>rs (TFs) is a fundamentalobjective <strong>of</strong> <strong>systems</strong> biology (6–9), and <strong>the</strong>refore a greatdeal <strong>of</strong> effort has been focused on understand<strong>in</strong>g <strong>the</strong> structureand evolution <strong>of</strong> transcriptional regula<strong>to</strong>ry networks. Analogous<strong>to</strong> <strong>the</strong> transcriptional regula<strong>to</strong>ry network <strong>in</strong> a cell, a <strong>computer</strong> OSconsists <strong>of</strong> thousands <strong>of</strong> functions organized <strong>in</strong><strong>to</strong> a so-called callgraph, which is a directed network whose nodes are functionswith directed edges lead<strong>in</strong>g from a function <strong>to</strong> each o<strong>the</strong>r functionit calls. Whereas <strong>the</strong> genome-wide transcriptional regula<strong>to</strong>ry networkand <strong>the</strong> call graph are static representations <strong>of</strong> all possibleregula<strong>to</strong>ry relationships and calls, both transcription regulationand function activation are dynamic. Different sets <strong>of</strong> transcriptionfac<strong>to</strong>rs and target genes form<strong>in</strong>g so-called functional modules(10) are activated at different times and <strong>in</strong> response <strong>to</strong> differentenvironmental conditions. In <strong>the</strong> same way, complex OSs areorganized <strong>in</strong><strong>to</strong> modules consist<strong>in</strong>g <strong>of</strong> functions that are executedfor various tasks.Here we perform a one-<strong>to</strong>-one comparison between <strong>the</strong> transcriptionalregula<strong>to</strong>ry network <strong>of</strong> Escherichia coli and <strong>the</strong> callgraph <strong>of</strong> <strong>the</strong> L<strong>in</strong>ux kernel, which are both canonical <strong>systems</strong>.E. coli is one <strong>of</strong> <strong>the</strong> most well-annotated model organisms.The study <strong>of</strong> its transcriptional regula<strong>to</strong>ry network has a long his<strong>to</strong>ry(11–15). On <strong>the</strong> s<strong>of</strong>tware side, <strong>the</strong> L<strong>in</strong>ux kernel is <strong>the</strong> centralAuthor contributions: K.-K.Y., G.F., N.B., R.P.A., and M.G. designed research; K.-K.Y.performed research; G.F., N.B., and R.P.A. contributed new reagents/analytic <strong>to</strong>ols;K.-K.Y. analyzed data; and K.-K.Y. and M.G. wrote <strong>the</strong> paper.The authors declare no conflict <strong>of</strong> <strong>in</strong>terest.*This Direct Submission article had a prearranged edi<strong>to</strong>r.Freely available onl<strong>in</strong>e through <strong>the</strong> PNAS open access option.1To whom correspondence should be addressed. E-mail: Mark.Gerste<strong>in</strong>@yale.edu.This article conta<strong>in</strong>s support<strong>in</strong>g <strong>in</strong>formation onl<strong>in</strong>e at www.pnas.org/lookup/suppl/doi:10.1073/pnas.0914771107/-/DCSupplemental.BIOPHYSICS ANDCOMPUTATIONAL BIOLOGYwww.pnas.org/cgi/doi/10.1073/pnas.0914771107 PNAS Early Edition ∣ 1<strong>of</strong>6


Fig. 1. The hierarchical layout <strong>of</strong> <strong>the</strong> E. coli transcriptional regula<strong>to</strong>ry network and <strong>the</strong> L<strong>in</strong>ux call graph. (Left) The transcriptional regula<strong>to</strong>ry network <strong>of</strong> E. coli.(Right) The call graph <strong>of</strong> <strong>the</strong> L<strong>in</strong>ux Kernel. Nodes are classified <strong>in</strong><strong>to</strong> three categories on <strong>the</strong> basis <strong>of</strong> <strong>the</strong>ir location <strong>in</strong> <strong>the</strong> hierarchy: master regula<strong>to</strong>rs (nodes withzero <strong>in</strong>-degree, Yellow), workhorses (nodes with zero out-degree, Green), and middle managers (nodes with nonzero <strong>in</strong>- and out-degree, Purple). Persistentgenes and persistent functions (as def<strong>in</strong>ed <strong>in</strong> <strong>the</strong> ma<strong>in</strong> text) are shown <strong>in</strong> a larger size. The majority <strong>of</strong> persistent genes are located at <strong>the</strong> workhorse level, butpersistent functions are underrepresented <strong>in</strong> <strong>the</strong> workhorse level. For easy visualization <strong>of</strong> <strong>the</strong> L<strong>in</strong>ux call graph, we sampled 10% <strong>of</strong> <strong>the</strong> nodes for display.Under <strong>the</strong> sampl<strong>in</strong>g, <strong>the</strong> relative portion <strong>of</strong> nodes <strong>in</strong> <strong>the</strong> three levels and <strong>the</strong> ratio between persistent and nonpersistent nodes are preserved compared <strong>to</strong> <strong>the</strong>orig<strong>in</strong>al network. The entire E. coli transcriptional regula<strong>to</strong>ry network is displayed.component <strong>of</strong> one <strong>of</strong> <strong>the</strong> most popular and well-documentedOSs. S<strong>in</strong>ce its creation by L<strong>in</strong>us Torvalds <strong>in</strong> 1991, it has been cont<strong>in</strong>uouslyrevised, and its source l<strong>in</strong>es <strong>of</strong> code has <strong>in</strong>creased fromaround 10,000 <strong>in</strong> <strong>the</strong> orig<strong>in</strong>al version 0.01 <strong>to</strong> more than 12 million<strong>in</strong> version 2.6.33. Therefore, <strong>the</strong> two <strong>systems</strong> are ideal candidatesfor an <strong>in</strong>-depth cross-discipl<strong>in</strong>ary comparison.ResultsComparison <strong>of</strong> Basic Topology and Hierarchical Structure. In a directednetwork, <strong>the</strong> <strong>in</strong>-degree and out-degree <strong>of</strong> a node refer <strong>to</strong> <strong>the</strong>number <strong>of</strong> regula<strong>to</strong>rs call<strong>in</strong>g <strong>the</strong> node and <strong>the</strong> number <strong>of</strong> targetgenes or functions called by <strong>the</strong> node, respectively. The networks<strong>of</strong> <strong>in</strong>terest <strong>in</strong> this study are displayed <strong>in</strong> Fig. 1 and <strong>the</strong>ir key attributesare listed <strong>in</strong> Table 1. As discussed <strong>in</strong> earlier studies (3, 13),transcriptional regula<strong>to</strong>ry networks exhibit a characteristic pyramidalhierarchical layout, <strong>in</strong> which <strong>the</strong>re are a few master TFson <strong>the</strong> <strong>to</strong>p and most TFs are at <strong>the</strong> middle, regulat<strong>in</strong>g a set <strong>of</strong>non-TF target genes. We refer <strong>to</strong> <strong>the</strong>se non-TF targets as workhorses(16). The existence <strong>of</strong> a hierarchical organization implies<strong>the</strong> existence <strong>of</strong> a downward <strong>in</strong>formation flow <strong>in</strong> response <strong>to</strong> variousforms <strong>of</strong> stimuli. The L<strong>in</strong>ux call graph has a similar <strong>in</strong>tr<strong>in</strong>sicdirection, where <strong>the</strong> cha<strong>in</strong> <strong>of</strong> command starts from high-level start<strong>in</strong>gfunctions like “ma<strong>in</strong>” and flows <strong>to</strong> many o<strong>the</strong>r downstreamfunctions follow<strong>in</strong>g <strong>the</strong> outgo<strong>in</strong>g edges. To fur<strong>the</strong>r <strong>in</strong>vestigate<strong>the</strong> structure <strong>of</strong> <strong>the</strong> two networks, we divide nodes <strong>in</strong><strong>to</strong> three categories(Fig. 1): master regula<strong>to</strong>rs (nodes with zero <strong>in</strong>-degree),workhorses (nodes with zero out-degree), and middle managers(nodes with nonzero <strong>in</strong>- and out-degree). Fig. 2A shows <strong>the</strong> distribution<strong>of</strong> <strong>the</strong>se categories. In <strong>the</strong> E. coli transcriptional regula<strong>to</strong>rynetwork, <strong>the</strong> fraction <strong>of</strong> workhorses is large and <strong>the</strong> <strong>to</strong>p two layerseach comprise less than 5% <strong>of</strong> <strong>the</strong> <strong>to</strong>tal number <strong>of</strong> genes. In <strong>the</strong> callTable 1. Statistics <strong>of</strong> <strong>the</strong> E. coli regula<strong>to</strong>ry network and <strong>the</strong> L<strong>in</strong>uxcall graphE. coli transcriptionalregula<strong>to</strong>ry network L<strong>in</strong>ux call graphNumber <strong>of</strong> nodes 1,378 12,391Number <strong>of</strong> persistent 72* (5%) 5,120 (41%)nodesNumber <strong>of</strong> edges 2,967 33,553Number <strong>of</strong> modules 64 3,665Number <strong>of</strong> comparativereferences200 bacterial<strong>genomes</strong>24 versions <strong>of</strong>kernelsYears <strong>of</strong> evolution Billions 20*In <strong>the</strong> E. coli genome 72 out <strong>of</strong> 212 persistent genes could be mapped <strong>to</strong> <strong>the</strong>transcriptional regula<strong>to</strong>ry network.graph, on <strong>the</strong> contrary, over 80% <strong>of</strong> functions are located <strong>in</strong> <strong>the</strong>upper levels <strong>of</strong> <strong>the</strong> hierarchy. In o<strong>the</strong>r words, unlike <strong>the</strong> conventionalpyramidal hierarchy exhibited by <strong>the</strong> E. coli transcriptionalregula<strong>to</strong>ry network, <strong>the</strong> L<strong>in</strong>ux call graph exhibits a <strong>to</strong>p-heavystructure.The discrepancy we f<strong>in</strong>d <strong>in</strong> <strong>the</strong> hierarchical organization isrelated <strong>to</strong> <strong>the</strong> discrepancy <strong>in</strong>-degree distribution. Like o<strong>the</strong>r complexnetworks such as social networks and <strong>the</strong> World Wide Web,both transcriptional regula<strong>to</strong>ry networks and call graphs possesshubs, <strong>the</strong> highly connected nodes at <strong>the</strong> tail <strong>of</strong> <strong>the</strong> skewed degreedistribution (17). The L<strong>in</strong>ux call graph possesses <strong>in</strong>-degree hubs(nodes with many <strong>in</strong>com<strong>in</strong>g edges) but no out-degree hubs (nodeswith a high number <strong>of</strong> outgo<strong>in</strong>g edges) (see Fig. 2B). The skewed<strong>in</strong>-degree distribution has been reported <strong>in</strong> s<strong>of</strong>tware networkso<strong>the</strong>r than <strong>the</strong> L<strong>in</strong>ux call graph (18). In particular, <strong>in</strong>-degree hubs<strong>in</strong> <strong>the</strong> L<strong>in</strong>ux call graph are enriched at <strong>the</strong> bot<strong>to</strong>m <strong>of</strong> <strong>the</strong> networkhierarchy. They are workhorses called by a large number <strong>of</strong> regula<strong>to</strong>rsfrom <strong>the</strong> upper levels. In contrast, <strong>in</strong> <strong>the</strong> E. coli regula<strong>to</strong>rynetwork, <strong>the</strong>re are hubs with high out-degree but not high <strong>in</strong>degree;i.e., no gene is regulated by many different transcriptionfac<strong>to</strong>rs (see Fig. 2B). The out-degree hubs <strong>in</strong> <strong>the</strong> E. coli regula<strong>to</strong>rynetwork regulate many workhorses at <strong>the</strong> bot<strong>to</strong>m <strong>of</strong> <strong>the</strong>hierarchy.Comparison <strong>of</strong> Functional Modules and Node Reuse. Modularity is animportant concept <strong>in</strong> both biology and eng<strong>in</strong>eer<strong>in</strong>g (19). In fact,<strong>the</strong> technique <strong>of</strong> modular programm<strong>in</strong>g is widely employed <strong>in</strong>modern s<strong>of</strong>tware design (20). As discussed earlier, dynamicalfunctional modules expressed under different conditions <strong>in</strong>transcriptional regula<strong>to</strong>ry networks resemble <strong>the</strong> modules <strong>of</strong>functions responsible for different computational tasks. Modulescan be labeled naturally by <strong>the</strong> master regula<strong>to</strong>rs controll<strong>in</strong>g<strong>the</strong>m, because every middle manager and workhorse <strong>in</strong> <strong>the</strong>hierarchy is controlled by at least one master regula<strong>to</strong>r. Modulesdef<strong>in</strong>ed <strong>in</strong> this way have been termed regulons (15) or origons(21). Specifically, we def<strong>in</strong>e a functional module <strong>in</strong> both callgraphs and transcriptional regula<strong>to</strong>ry networks as <strong>the</strong> subnetworkthat consists <strong>of</strong> all <strong>the</strong> downstream nodes executed or controlledby a specific master regula<strong>to</strong>r (Fig. 3A).Many nodes can be members <strong>of</strong> several different functionalmodules. To quantify this phenomenon, we def<strong>in</strong>e <strong>the</strong> reuse <strong>of</strong>a node on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> fraction <strong>of</strong> modules <strong>in</strong> <strong>the</strong> network<strong>to</strong> which it belongs. Nodes with high reuse are called generic.Unsurpris<strong>in</strong>gly, we f<strong>in</strong>d that <strong>the</strong> <strong>in</strong>-degree hubs are executedmost <strong>of</strong>ten and thus are more reusable than o<strong>the</strong>r nodes (Pearsoncorrelation r ¼ 0.16, P < 10 −95 for <strong>the</strong> L<strong>in</strong>ux call graph, and2<strong>of</strong>6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0914771107 Yan et al.


¼ 0.53, P < 10 −100 for <strong>the</strong> E. coli regula<strong>to</strong>ry network). The mostgeneric function is <strong>the</strong> well known function “pr<strong>in</strong>tk,” which isresponsible for standard display and thus called by over 90%functional modules. In <strong>the</strong> E. coli regula<strong>to</strong>ry network, one <strong>of</strong><strong>the</strong> most generic nodes is <strong>the</strong> outer membrane por<strong>in</strong> “ompF” thatcontrols <strong>the</strong> diffusion <strong>of</strong> various metabolites. It is reused by 20%<strong>of</strong> <strong>the</strong> modules. Generally speak<strong>in</strong>g, nodes <strong>in</strong> <strong>the</strong> L<strong>in</strong>ux call graphhave on average higher reuse than those <strong>in</strong> <strong>the</strong> E. coli transcriptionalregula<strong>to</strong>ry network (8.4% and 3.5%, respectively,P < 10 −12 <strong>in</strong> t test; see Fig. 3B). The difference is <strong>to</strong>pologicallyattributed <strong>to</strong> <strong>the</strong> pyramidal versus <strong>to</strong>p-heavy organization. Thenarrow base <strong>in</strong> <strong>the</strong> L<strong>in</strong>ux call graph leads <strong>to</strong> a higher averagereuse. Indeed, many generic functions are workhorses such asstr<strong>in</strong>g manipulation function “strlen.”As shown <strong>in</strong> Fig. 3B, one <strong>of</strong> <strong>the</strong> most strik<strong>in</strong>g differencesconcern<strong>in</strong>g <strong>the</strong> organization <strong>of</strong> modules <strong>in</strong> <strong>the</strong> transcriptionalregula<strong>to</strong>ry network and <strong>the</strong> call graph is <strong>the</strong> overlap <strong>of</strong> modules.In <strong>the</strong> L<strong>in</strong>ux call graph, two randomly chosen modules overlap bymore than 80%. On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> average overlap <strong>in</strong> <strong>the</strong>E. coli transcriptional regula<strong>to</strong>ry network is less than 5%. Weshall discuss later how such differences <strong>in</strong> <strong>the</strong> overlap <strong>of</strong> modulesplay a key role <strong>in</strong> robustness and fragility <strong>of</strong> <strong>the</strong> two <strong>systems</strong>.Comparison <strong>of</strong> Network Evolution and Node Persistence. The corecomponents <strong>of</strong> a system are usually those that survive <strong>the</strong> evolutionaryprocess. It is <strong>in</strong>structive <strong>to</strong> study those “survivors” <strong>in</strong> both<strong>the</strong> E. coli transcriptional regula<strong>to</strong>ry network and <strong>the</strong> L<strong>in</strong>uxcall graph. In <strong>the</strong> L<strong>in</strong>ux kernel, we focus on persistent functions,def<strong>in</strong>ed as those that exist <strong>in</strong> every version <strong>of</strong> s<strong>of</strong>tware development.Persistent functions <strong>in</strong> s<strong>of</strong>tware <strong>systems</strong> are analogous <strong>to</strong>persistent genes <strong>in</strong> biological <strong>systems</strong>, which are genes that areconsistently present <strong>in</strong> a large number <strong>of</strong> <strong>genomes</strong> (22). We identifiedpersistent functions <strong>in</strong> <strong>the</strong> L<strong>in</strong>ux kernel on <strong>the</strong> basis <strong>of</strong> <strong>the</strong>irappearance <strong>in</strong> all versions <strong>of</strong> <strong>the</strong> L<strong>in</strong>ux source code used <strong>in</strong> thisstudy and persistent genes <strong>in</strong> <strong>the</strong> E. coli genome by exam<strong>in</strong><strong>in</strong>g<strong>the</strong>ir distribution across a group <strong>of</strong> over 200 phylogenetically diversebacterial <strong>genomes</strong> (see Materials and Methods for details).As shown <strong>in</strong> Fig. 1, most persistent genes <strong>in</strong> <strong>the</strong> E. coli regula<strong>to</strong>rynetwork are workhorses: 71 out <strong>of</strong> 72 compared <strong>to</strong> 1,243 out <strong>of</strong>Fig. 2. Comparison <strong>of</strong> <strong>the</strong> E. coli transcriptionalregula<strong>to</strong>ry network and L<strong>in</strong>ux call graph <strong>in</strong> <strong>terms</strong> <strong>of</strong><strong>to</strong>pology and hierarchical structure. (A) The distribution<strong>of</strong> <strong>the</strong> three categories <strong>in</strong> <strong>the</strong> E. coli transcriptionalregula<strong>to</strong>ry network and <strong>the</strong> L<strong>in</strong>ux call graph.The transcriptional regula<strong>to</strong>ry network (1,378 nodes)follows a conventional hierarchical picture, with a few<strong>to</strong>p regula<strong>to</strong>rs and many workhorse prote<strong>in</strong>s. TheL<strong>in</strong>ux call graph (12,391 nodes), on <strong>the</strong> o<strong>the</strong>r hand,possesses many regula<strong>to</strong>rs; <strong>the</strong> number <strong>of</strong> workhorserout<strong>in</strong>es is much lower <strong>in</strong> proportion. (B) Degreedistributions <strong>of</strong> <strong>the</strong> E. coli transcriptional regula<strong>to</strong>rynetwork and <strong>the</strong> L<strong>in</strong>ux call graph. The regula<strong>to</strong>rynetwork has a broad out-degree distribution but a narrow<strong>in</strong>-degree distribution. The situation is reversed <strong>in</strong><strong>the</strong> call graph, where we can f<strong>in</strong>d <strong>in</strong>-degree hubs, but<strong>the</strong> out-degree distribution is ra<strong>the</strong>r narrow. An outdegreehub <strong>in</strong> <strong>the</strong> E. coli regula<strong>to</strong>ry network and an<strong>in</strong>-degree hub <strong>in</strong> <strong>the</strong> L<strong>in</strong>ux call graph are shown.1,378 for all genes (P < 10 −3 by permutation test). On <strong>the</strong> o<strong>the</strong>rhand, <strong>in</strong> <strong>the</strong> L<strong>in</strong>ux call graph, persistent functions are present atall three levels but are significantly enriched only among <strong>the</strong> masterregula<strong>to</strong>rs and middle managers (4,680 out <strong>of</strong> 5,120 persistentfunctions are master regula<strong>to</strong>rs and middle managers, comparedFig. 3. Modules <strong>in</strong> <strong>the</strong> E. coli transcriptional regula<strong>to</strong>ry network and L<strong>in</strong>uxcall graph. (A) Def<strong>in</strong>ition <strong>of</strong> modules, reuse, and overlap. A module ischaracterized by a master regula<strong>to</strong>r, with zero <strong>in</strong>-degree, and all <strong>of</strong> <strong>the</strong> nodesregulated directly or <strong>in</strong>directly by <strong>the</strong> master regula<strong>to</strong>r. Here <strong>the</strong>re are threemodules (M1, M2, and M3) represented by three triangles. Reuse <strong>of</strong> a node isdef<strong>in</strong>ed as <strong>the</strong> fraction <strong>of</strong> modules <strong>to</strong> which <strong>the</strong> node belongs. This quantity isillustrated with <strong>the</strong> two labeled nodes. One is shared by M1 and M2 but notM3, and thus <strong>the</strong> reuse is 2∕3. The o<strong>the</strong>r belongs <strong>to</strong> only M3; its reuse is <strong>the</strong>refore1∕3. The overlap between a pair <strong>of</strong> modules is def<strong>in</strong>ed by <strong>the</strong> size <strong>of</strong> <strong>the</strong>ir<strong>in</strong>tersection normalized by <strong>the</strong>ir union. The overlap <strong>of</strong> M2 and M3 is thus2∕11.(B) Statistics <strong>of</strong> modules <strong>in</strong> <strong>the</strong> E. coli transcriptional regula<strong>to</strong>ry networkand <strong>the</strong> L<strong>in</strong>ux call graph. The average overlap is given by <strong>the</strong> mean overlapbetween pairs <strong>of</strong> randomly chosen modules. Nodes <strong>in</strong> <strong>the</strong> call graph are <strong>in</strong>general more generic; i.e., <strong>the</strong>y are reused by more modules.BIOPHYSICS ANDCOMPUTATIONAL BIOLOGYYan et al. PNAS Early Edition ∣ 3<strong>of</strong>6


are revised more <strong>of</strong>ten. In fact, <strong>the</strong> adaptive functions dist<strong>in</strong>guish<strong>the</strong>mselves by hav<strong>in</strong>g higher values <strong>of</strong> reuse (12.6% versus 4.4%,Wilcoxon rank-sum test P < 10 −20 ) than <strong>the</strong> conservativefunctions.DiscussionWe have presented a comparative analysis between <strong>the</strong> transcriptionalregula<strong>to</strong>ry network <strong>of</strong> E. coli and <strong>the</strong> call graph <strong>of</strong> <strong>the</strong>L<strong>in</strong>ux <strong>operat<strong>in</strong>g</strong> system and explored <strong>the</strong>ir similarities and differences<strong>in</strong> hierarchical structure, modularity <strong>of</strong> organization, andpersistence <strong>of</strong> nodes. A summary <strong>of</strong> <strong>the</strong> comparison can be found<strong>in</strong> Table 2. The two networks are shaped by different underly<strong>in</strong>gdesign pr<strong>in</strong>ciples, which are deeply connected <strong>to</strong> <strong>the</strong> <strong>in</strong>terplay between<strong>the</strong> <strong>systems</strong> and <strong>the</strong>ir environments. From a <strong>to</strong>pologicalstandpo<strong>in</strong>t, it is <strong>in</strong>trigu<strong>in</strong>g that two dist<strong>in</strong>ct evolutionary processesboth lead <strong>to</strong> <strong>the</strong> emergence <strong>of</strong> hierarchy <strong>in</strong> <strong>the</strong> control andregulation layouts, probably because hierarchy is a most effectiveway <strong>to</strong> transfer <strong>in</strong>formation and coord<strong>in</strong>ate processes. Never<strong>the</strong>less,we have observed several <strong>in</strong>tr<strong>in</strong>sic differences between <strong>the</strong>two hierarchical networks. To a certa<strong>in</strong> extent, <strong>the</strong> presence <strong>of</strong><strong>in</strong>-degree hub functions and <strong>the</strong> <strong>to</strong>p-heavy hierarchy found <strong>in</strong><strong>the</strong> call graph can be readily expla<strong>in</strong>ed by common programm<strong>in</strong>gpractices. In general, for <strong>the</strong> sake <strong>of</strong> clarity and easy debugg<strong>in</strong>g,programmers are encouraged <strong>to</strong> break down a code <strong>in</strong><strong>to</strong> piecesand reuse certa<strong>in</strong> functions; functions that are called by manyo<strong>the</strong>rs, i.e., <strong>in</strong>-degree hubs, are <strong>the</strong>refore favored. The reuse<strong>of</strong> code leads <strong>to</strong> generic functions, which also accounts for <strong>the</strong><strong>in</strong>crease <strong>of</strong> overlap between modules <strong>in</strong> <strong>the</strong> L<strong>in</strong>ux call graph.These programm<strong>in</strong>g practices are rooted <strong>in</strong> considerations <strong>of</strong> costeffectiveness. From an eng<strong>in</strong>eer<strong>in</strong>g po<strong>in</strong>t <strong>of</strong> view, <strong>the</strong> reuse <strong>of</strong>common nodes between modules is a cost-effective way <strong>to</strong>construct a complex system. However, such optimized usage <strong>of</strong>functions comes at <strong>the</strong> expense <strong>of</strong> robustness, because breakdown<strong>of</strong> a generic function causes problems <strong>in</strong> many modules. Moreimportantly, generic functions lead <strong>to</strong> potential fragility <strong>in</strong> <strong>the</strong>sense that modify<strong>in</strong>g any module may require compensat<strong>in</strong>gchanges <strong>in</strong> a generic function. As a result, generic functions have<strong>to</strong> be updated more <strong>of</strong>ten (as reflected by <strong>the</strong> class <strong>of</strong> rapidlyrevis<strong>in</strong>g functions <strong>in</strong> Fig. 4A). The low overlap between modules<strong>in</strong> biological networks, on <strong>the</strong> o<strong>the</strong>r hand, <strong>in</strong>creases robustness.Modules tend <strong>to</strong> work more <strong>in</strong>dependently by recruit<strong>in</strong>g differentsets <strong>of</strong> workhorses from <strong>the</strong> broad base <strong>of</strong> <strong>the</strong> network hierarchy.The study <strong>of</strong> persistent genes <strong>in</strong> biological networks and persistentfunctions <strong>in</strong> call graphs <strong>of</strong>fers <strong>in</strong>sight <strong>in</strong><strong>to</strong> <strong>the</strong> evolution <strong>of</strong>hierarchies. Persistent genes form <strong>the</strong> core mach<strong>in</strong>ery <strong>of</strong> life, <strong>the</strong>so-called paleome (23). They usually are not regula<strong>to</strong>rs but workhorsegenes that perform vital tasks. In fact, most persistent genesare enzymes. The enrichment <strong>of</strong> persistent genes at <strong>the</strong> bot<strong>to</strong>m <strong>of</strong><strong>the</strong> regula<strong>to</strong>ry hierarchy <strong>in</strong> E. coli is <strong>in</strong> accordance with <strong>the</strong> viewthat orthologous prote<strong>in</strong>s are ra<strong>the</strong>r similar <strong>in</strong> function whereasregula<strong>to</strong>ry changes are <strong>the</strong> ma<strong>in</strong> driv<strong>in</strong>g forces <strong>of</strong> evolution (9).To a certa<strong>in</strong> extent, biological evolution is build<strong>in</strong>g from <strong>the</strong>bot<strong>to</strong>m <strong>to</strong> <strong>the</strong> <strong>to</strong>p. In contrast, persistent functions <strong>in</strong> <strong>the</strong> L<strong>in</strong>uxcall graph are usually not bot<strong>to</strong>m-level workhorses but “controllers.”This difference suggests that not only do s<strong>of</strong>tware networkspossess more regula<strong>to</strong>rs than workhorses, <strong>the</strong> regula<strong>to</strong>rs arema<strong>in</strong>ta<strong>in</strong>ed on purpose and thus <strong>the</strong> evolution goes from <strong>to</strong>p<strong>to</strong> bot<strong>to</strong>m.The trade-<strong>of</strong>f between robustness and cost effectiveness biologicaland s<strong>of</strong>tware <strong>systems</strong> is deeply related <strong>to</strong> <strong>the</strong> nature <strong>of</strong><strong>the</strong>ir evolutionary processes. Biological evolution is mediatedby random mutations followed by natural selection; a hub prote<strong>in</strong><strong>in</strong> a biological network is <strong>in</strong> general hard <strong>to</strong> evolve because <strong>of</strong> <strong>the</strong>constra<strong>in</strong>ts imposed by its many <strong>in</strong>teractions. This constra<strong>in</strong>edevolution is ev<strong>in</strong>ced by <strong>the</strong> negative correlation between nodecentrality and evolutionary rate <strong>in</strong> biological networks (24, 25).The random mutation and selection process underly<strong>in</strong>g biologicalevolution prohibits <strong>the</strong> frequent targeted changes required fornodes <strong>to</strong> become generic. The system is <strong>the</strong>n forced <strong>to</strong> pay forma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g a large set <strong>of</strong> specially designed components perform<strong>in</strong>ga variety <strong>of</strong> functions <strong>in</strong> response <strong>to</strong> environmentalchanges. In contrast, eng<strong>in</strong>eer<strong>in</strong>g <strong>systems</strong> are fundamentally different.Both <strong>in</strong>-degree and betweenness centrality (26) are positivelycorrelated with <strong>the</strong> rate <strong>of</strong> revision <strong>in</strong> <strong>the</strong> L<strong>in</strong>ux call graph(see Fig. 4B for <strong>in</strong>-degree, Spearman correlation r ¼ 0.26,P < 10 −82 for betweenness). In o<strong>the</strong>r words, <strong>in</strong> s<strong>of</strong>tware eng<strong>in</strong>eer<strong>in</strong>g,a system that needs <strong>to</strong> cont<strong>in</strong>ually adapt <strong>to</strong> new conditions iscost effective only by pay<strong>in</strong>g <strong>the</strong> price <strong>of</strong> constantly f<strong>in</strong>e-tun<strong>in</strong>g itsmost highly accessed functions.Reuse is extremely common <strong>in</strong> design<strong>in</strong>g man-made <strong>systems</strong>.For biological <strong>systems</strong>, <strong>to</strong> what extent <strong>the</strong>y reuse <strong>the</strong>ir reper<strong>to</strong>iresand by what means susta<strong>in</strong> robustness at <strong>the</strong> same time are questions<strong>of</strong> much <strong>in</strong>terest. It was recently proposed that <strong>the</strong> reper<strong>to</strong>ire<strong>of</strong> enzymes could be viewed as <strong>the</strong> <strong>to</strong>olbox <strong>of</strong> an organism(27). As <strong>the</strong> genome <strong>of</strong> an organism grows larger, it can reuse its<strong>to</strong>ols more <strong>of</strong>ten and thus require fewer and fewer new <strong>to</strong>ols fornovel metabolic tasks. In o<strong>the</strong>r words, <strong>the</strong> number <strong>of</strong> enzymesgrows slower than <strong>the</strong> number <strong>of</strong> transcription fac<strong>to</strong>rs when<strong>the</strong> size <strong>of</strong> <strong>the</strong> genome <strong>in</strong>creases. Previous studies (4) have made<strong>the</strong> related f<strong>in</strong>d<strong>in</strong>g that as one moves <strong>to</strong>wards more complexorganisms, <strong>the</strong> transcriptional regula<strong>to</strong>ry network has an <strong>in</strong>creas-BIOPHYSICS ANDCOMPUTATIONAL BIOLOGYBasic properties <strong>of</strong><strong>systems</strong>HierarchicalorganizationOrganization <strong>of</strong>modulesTable 2. One-<strong>to</strong>-one comparison between <strong>the</strong> E. coli regula<strong>to</strong>ry network and <strong>the</strong> L<strong>in</strong>ux call graphE. coli transcriptional regula<strong>to</strong>ry network L<strong>in</strong>ux call graphNodes Genes (TFs & targets) Functions (subrout<strong>in</strong>es)Edges Transcriptional regulation Function callsExternal constra<strong>in</strong>ts Natural environment Hardware architecture, cus<strong>to</strong>mer requirementsOrig<strong>in</strong> <strong>of</strong> evolutionarychangesRandom mutation & natural selection Designers’ f<strong>in</strong>e-tun<strong>in</strong>gStructure Pyramidal Top-heavyCharacteristic hubs Upper-level TFs with high out-degree Generic workhorse functions with high <strong>in</strong>-degreeDownstream modules as Master TFs responsible for sens<strong>in</strong>g High-level start<strong>in</strong>g functions that <strong>in</strong>itiatelabeled byenvironmental signalsexecution for specific tasksNode reuse Low HighOverlap betweenLowHighmodulesPersistent nodes Characteristics Specialized (nongeneric) workhorses Generic or reusable functionsLocation <strong>in</strong> hierarchy Mostly bot<strong>to</strong>m Mostly <strong>to</strong>pEvolutionary rate Mostly conservative (e.g., dnaA) Conservative (e.g., strlen) & adaptive (e.g.,mempool_alloc)Design pr<strong>in</strong>ciples Build<strong>in</strong>g <strong>of</strong> hierarchy Bot<strong>to</strong>m up Top downOptimal solution favors Robustness Cost effectiveness (reuse <strong>of</strong> components)Yan et al. PNAS Early Edition ∣ 5<strong>of</strong>6


<strong>in</strong>gly <strong>to</strong>p-heavy structure with a relatively narrow base. Thus, itmay be that fur<strong>the</strong>r analysis will demonstrate <strong>the</strong> <strong>in</strong>creas<strong>in</strong>gresemblance <strong>of</strong> more complex eukaryotic regula<strong>to</strong>ry networks<strong>to</strong> <strong>the</strong> structure <strong>of</strong> <strong>the</strong> L<strong>in</strong>ux call graph.Materials and MethodsNetwork Information. Data on <strong>the</strong> E. coli transcriptional regula<strong>to</strong>ry networkwere obta<strong>in</strong>ed from RegulonDB (15). The largest connected component <strong>of</strong><strong>the</strong> network consists <strong>of</strong> 1,378 genes with 2,967 <strong>in</strong>teractions. L<strong>in</strong>ux sourcecode was downloaded from <strong>the</strong> L<strong>in</strong>ux Kernel Archives (http://www.kernel.org). To address <strong>the</strong> evolution <strong>of</strong> <strong>the</strong> kernel, 24 stable versions were used,from 2.6.4 <strong>to</strong> 2.6.27, spann<strong>in</strong>g from March, 2004 <strong>to</strong> Oc<strong>to</strong>ber, 2008. In general,<strong>the</strong> release <strong>of</strong> a new version, say, from 2.5 <strong>to</strong> 2.6, is accompanied by majorchanges. We worked on <strong>the</strong> 24 releases restricted <strong>in</strong> version 2.6 and focusedon <strong>the</strong> gradual evolution exhibited <strong>in</strong> <strong>the</strong>se releases. For some <strong>of</strong> <strong>the</strong>sereleases, an additional patch was required <strong>in</strong> order <strong>to</strong> compile (SI Text).The source codes were compiled on a MacBook with a 2 GHz Intel Core 2Duo processor and 2 GB <strong>of</strong> memory by us<strong>in</strong>g <strong>the</strong> compiler GCC 3.4.6, and callgraphs were extracted from <strong>the</strong> compiled code by us<strong>in</strong>g <strong>the</strong> <strong>to</strong>ol CodeViz(release 1.0.11) by Gorman (http://www.csn.ul.ie/~mel/projects/codeviz/)(see SI Text). The network analysis presented <strong>in</strong> this study was performedon <strong>the</strong> most recent version <strong>of</strong> <strong>the</strong> L<strong>in</strong>ux kernel downloaded (v. 2.6.27), <strong>in</strong>which <strong>the</strong>re are 12,391 functions related by 33,553 calls. The network canbe downloaded from http://networks.gerste<strong>in</strong>lab.org/callgraph.Persistent Genes. The persistence <strong>in</strong>dex and <strong>the</strong> list <strong>of</strong> 212 persistent genes <strong>in</strong>E. coli K12 were obta<strong>in</strong>ed from ref. 22. Among <strong>the</strong>m, 72 can be mapped <strong>to</strong><strong>the</strong> largest component <strong>of</strong> <strong>the</strong> transcription regula<strong>to</strong>ry network. We quantifyconservation by <strong>the</strong> ratio <strong>of</strong> nonsynonymous <strong>to</strong> synonymous substitutionrates (dN∕dS) (28). The two rates were estimated by align<strong>in</strong>g E. coli K12prote<strong>in</strong>s with <strong>the</strong>ir orthologs from Salmonella typhimurium LT2. The lis<strong>to</strong>f orthologs was downloaded from <strong>the</strong> ATGC database (29). Alignmentwas done by us<strong>in</strong>g <strong>the</strong> <strong>to</strong>ol PAL2NAL (30), and dN∕dS values were estimatedby <strong>the</strong> PAML package (31).Persistent Functions. A function is def<strong>in</strong>ed as persistent if it appears <strong>in</strong> all <strong>the</strong>compiled call graphs (v. 2.6.4 <strong>to</strong> v. 2.6.27). The list <strong>of</strong> persistent functions canbe found at http://networks.gerste<strong>in</strong>lab.org/callgraph. In this def<strong>in</strong>ition, wedo not take <strong>in</strong><strong>to</strong> account <strong>the</strong> precise changes <strong>in</strong> <strong>the</strong> code <strong>of</strong> <strong>the</strong> function. Thefrequency <strong>of</strong> revision for a particular function was estimated by pars<strong>in</strong>g <strong>the</strong>patch files (see SI Text). A function is regarded as revised if <strong>the</strong>re is any change<strong>in</strong> its code.ACKNOWLEDGMENTS. We thank <strong>the</strong> anonymous reviewers whose valuablesuggestions helped <strong>to</strong> improve <strong>the</strong> quality <strong>of</strong> <strong>the</strong> manuscript. K.-K.Y.acknowledges Lucas Lochovsky for useful discussion and critical read<strong>in</strong>g <strong>of</strong>an early manuscript. K.-K.Y. acknowledges Kev<strong>in</strong> Yip for useful discussion.This work is supported by <strong>the</strong> National Institutes <strong>of</strong> Health.1. Alon U (2007) An Introduction <strong>to</strong> Systems Biology (Chapman & Hall/CRC, London).2. Barabási A (2002) LINKED: The New Science <strong>of</strong> Networks (Perseus, Cambridge, MA).3. Yu H, Gerste<strong>in</strong> M (2006) Genomic analysis <strong>of</strong> <strong>the</strong> hierarchical structure <strong>of</strong> regula<strong>to</strong>rynetworks. Proc Natl Acad Sci USA 103:14724–14731.4. Bhardwaj N, Yan KK, Gerste<strong>in</strong> M (2010) Analysis <strong>of</strong> diverse regula<strong>to</strong>ry networks <strong>in</strong> ahierarchical context shows consistent tendencies for collaboration <strong>in</strong> <strong>the</strong> middle levels.Proc Natl Acad Sci USA 107:6841–6846.5. Lehman MM (1980) Programs, life cycles, and laws <strong>of</strong> s<strong>of</strong>tware evolution. Proc IEEE68:1060–1076.6. Lee TI, et al. (2002) Transcriptional regula<strong>to</strong>ry networks <strong>in</strong> Saccharomyces cerevisiae.Science 298:799–804.7. Bolouri H, Davidson EH (2002) Model<strong>in</strong>g transcriptional regula<strong>to</strong>ry networks.Bioessays 24:1118–1129.8. Barabási A, Oltvai ZN (2004) Network biology: Understand<strong>in</strong>g <strong>the</strong> cell’s functionalorganization. Nat Rev Genet 5:101–113.9. Babu MM, Luscombe NM, Arav<strong>in</strong>d L, Gerste<strong>in</strong> M, Teichmann SA (2004) Structure andevolution <strong>of</strong> transcriptional regula<strong>to</strong>ry networks. Curr Op<strong>in</strong> Struct Biol 14:283–291.10. Luscombe NM, et al. (2004) Genomic analysis <strong>of</strong> regula<strong>to</strong>ry network dynamics revealslarge <strong>to</strong>pological changes. Nature 431:308–312.11. Thieffry D, Huerta AM, Perez-Rueda E, Collado-Vides J (1998) From specific generegulation <strong>to</strong> genomic networks: A global analysis <strong>of</strong> transcriptional regulation <strong>in</strong>Escherichia coli. Bioessays 20:433–440.12. Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs <strong>in</strong> <strong>the</strong> transcriptionalregulation network <strong>of</strong> Escherichia coli. Nat Genet 31:64–68.13. Ma H, et al. (2004) An extended transcriptional regula<strong>to</strong>ry network <strong>of</strong> Escherichiacoli and analysis <strong>of</strong> its hierarchical structure and network motifs. Nucleic Acids Res32:6643–6649.14. Seshasayee AS, Fraser GM, Babu MM, Luscombe NM (2009) Pr<strong>in</strong>ciples <strong>of</strong> transcriptionalregulation and evolution <strong>of</strong> <strong>the</strong> metabolic system <strong>in</strong> E. coli. Genome Res 19:79–91.15. Gama-Castro S, et al. (2008) RegulonDB (version 6.0): Gene regulation model <strong>of</strong>Escherichia coli K-12 beyond transcription, active (experimental) annotated promotersand Textpresso navigation. Nucleic Acids Res 36:D120–124.16. Maslov S, Sneppen K (2005) Computational architecture <strong>of</strong> <strong>the</strong> yeast regula<strong>to</strong>rynetwork. Phys Biol 2:S94–100.17. Barabasi AL, Albert R (1999) Emergence scal<strong>in</strong>g <strong>in</strong> random networks. Science286:509–512.18. Myers CR (2003) S<strong>of</strong>tware <strong>systems</strong> as complex networks: Structure, function, andevolvability <strong>of</strong> s<strong>of</strong>tware collaboration graphs. Phys Rev E 68:046116.19. Alon U (2003) Biological networks: The t<strong>in</strong>kerer as an eng<strong>in</strong>eer. Science301:1866–1867.20. Parnas DL (1972) On <strong>the</strong> criteria <strong>to</strong> be used <strong>in</strong> decompos<strong>in</strong>g <strong>systems</strong> <strong>in</strong><strong>to</strong> modules.Commun ACM 15:1053–1058.21. Balazsi G, Barabasi A, Oltvai ZN (2005) Topological units <strong>of</strong> environmental signalprocess<strong>in</strong>g <strong>in</strong> <strong>the</strong> transcriptional regula<strong>to</strong>ry network <strong>of</strong> Escherichia coli. Proc Natl AcadSci USA 102:7841–7846.22. Fang G, Rocha EPC, Danch<strong>in</strong> A (2008) Persistence drives gene cluster<strong>in</strong>g <strong>in</strong> bacterial<strong>genomes</strong>. BMC Genomics 9:4.23. Danch<strong>in</strong> A (2009) Bacteria as <strong>computer</strong>s mak<strong>in</strong>g <strong>computer</strong>s. FEMS Microbiol Rev33:3–26.24. Fraser HB, Hirsh AE, Ste<strong>in</strong>metz LM, Scharfe C, Feldman MW (2002) Evolutionary rate <strong>in</strong><strong>the</strong> prote<strong>in</strong> <strong>in</strong>teraction network. Science 296:750–752.25. Kim PM, Korbel JO, Gerste<strong>in</strong> MB (2007) Positive selection at <strong>the</strong> prote<strong>in</strong> networkperiphery: Evaluation <strong>in</strong> <strong>terms</strong> <strong>of</strong> structural constra<strong>in</strong>ts and cellular context. Proc NatlAcad Sci USA 104:20274–20279.26. Yu H, Kim PM, Sprecher E, Trifonov V, Gerste<strong>in</strong> M (2007) The importance <strong>of</strong> bottlenecks<strong>in</strong> prote<strong>in</strong> networks: Correlation with gene essentiality and expression dynamics.PLoS Comput Biol 3:e59.27. Maslov S, Krishna S, Pang TY, Sneppen K (2009) Toolbox model <strong>of</strong> evolution <strong>of</strong>prokaryotic metabolic networks and <strong>the</strong>ir regulation. Proc Natl Acad Sci USA106:9743–9748.28. Jordan IK, Rogoz<strong>in</strong> IB, Wolf YI, Koon<strong>in</strong> EV (2002) Essential genes are more evolutionarilyconserved than are nonessential genes <strong>in</strong> bacteria. Genome Res 12:962–968.29. Novichkov PS, Ratnere I, Wolf YI, Koon<strong>in</strong> EV, Dubchak I (2009) ATGC: A database <strong>of</strong>orthologous genes from closely related prokaryotic <strong>genomes</strong> and a research platformfor microevolution <strong>of</strong> prokaryotes. Nucleic Acids Res 37:D448–454.30. Suyama M, Torrents D, Bork P (2006) PAL2NAL: Robust conversion <strong>of</strong> prote<strong>in</strong> sequencealignments <strong>in</strong><strong>to</strong> <strong>the</strong> correspond<strong>in</strong>g codon alignments. Nucleic Acids Res 34:W609–612.31. Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol24:1586–1591.6<strong>of</strong>6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0914771107 Yan et al.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!