11.07.2015 Views

Comparing genomes to computer operating systems in terms of the ...

Comparing genomes to computer operating systems in terms of the ...

Comparing genomes to computer operating systems in terms of the ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Fig. 1. The hierarchical layout <strong>of</strong> <strong>the</strong> E. coli transcriptional regula<strong>to</strong>ry network and <strong>the</strong> L<strong>in</strong>ux call graph. (Left) The transcriptional regula<strong>to</strong>ry network <strong>of</strong> E. coli.(Right) The call graph <strong>of</strong> <strong>the</strong> L<strong>in</strong>ux Kernel. Nodes are classified <strong>in</strong><strong>to</strong> three categories on <strong>the</strong> basis <strong>of</strong> <strong>the</strong>ir location <strong>in</strong> <strong>the</strong> hierarchy: master regula<strong>to</strong>rs (nodes withzero <strong>in</strong>-degree, Yellow), workhorses (nodes with zero out-degree, Green), and middle managers (nodes with nonzero <strong>in</strong>- and out-degree, Purple). Persistentgenes and persistent functions (as def<strong>in</strong>ed <strong>in</strong> <strong>the</strong> ma<strong>in</strong> text) are shown <strong>in</strong> a larger size. The majority <strong>of</strong> persistent genes are located at <strong>the</strong> workhorse level, butpersistent functions are underrepresented <strong>in</strong> <strong>the</strong> workhorse level. For easy visualization <strong>of</strong> <strong>the</strong> L<strong>in</strong>ux call graph, we sampled 10% <strong>of</strong> <strong>the</strong> nodes for display.Under <strong>the</strong> sampl<strong>in</strong>g, <strong>the</strong> relative portion <strong>of</strong> nodes <strong>in</strong> <strong>the</strong> three levels and <strong>the</strong> ratio between persistent and nonpersistent nodes are preserved compared <strong>to</strong> <strong>the</strong>orig<strong>in</strong>al network. The entire E. coli transcriptional regula<strong>to</strong>ry network is displayed.component <strong>of</strong> one <strong>of</strong> <strong>the</strong> most popular and well-documentedOSs. S<strong>in</strong>ce its creation by L<strong>in</strong>us Torvalds <strong>in</strong> 1991, it has been cont<strong>in</strong>uouslyrevised, and its source l<strong>in</strong>es <strong>of</strong> code has <strong>in</strong>creased fromaround 10,000 <strong>in</strong> <strong>the</strong> orig<strong>in</strong>al version 0.01 <strong>to</strong> more than 12 million<strong>in</strong> version 2.6.33. Therefore, <strong>the</strong> two <strong>systems</strong> are ideal candidatesfor an <strong>in</strong>-depth cross-discipl<strong>in</strong>ary comparison.ResultsComparison <strong>of</strong> Basic Topology and Hierarchical Structure. In a directednetwork, <strong>the</strong> <strong>in</strong>-degree and out-degree <strong>of</strong> a node refer <strong>to</strong> <strong>the</strong>number <strong>of</strong> regula<strong>to</strong>rs call<strong>in</strong>g <strong>the</strong> node and <strong>the</strong> number <strong>of</strong> targetgenes or functions called by <strong>the</strong> node, respectively. The networks<strong>of</strong> <strong>in</strong>terest <strong>in</strong> this study are displayed <strong>in</strong> Fig. 1 and <strong>the</strong>ir key attributesare listed <strong>in</strong> Table 1. As discussed <strong>in</strong> earlier studies (3, 13),transcriptional regula<strong>to</strong>ry networks exhibit a characteristic pyramidalhierarchical layout, <strong>in</strong> which <strong>the</strong>re are a few master TFson <strong>the</strong> <strong>to</strong>p and most TFs are at <strong>the</strong> middle, regulat<strong>in</strong>g a set <strong>of</strong>non-TF target genes. We refer <strong>to</strong> <strong>the</strong>se non-TF targets as workhorses(16). The existence <strong>of</strong> a hierarchical organization implies<strong>the</strong> existence <strong>of</strong> a downward <strong>in</strong>formation flow <strong>in</strong> response <strong>to</strong> variousforms <strong>of</strong> stimuli. The L<strong>in</strong>ux call graph has a similar <strong>in</strong>tr<strong>in</strong>sicdirection, where <strong>the</strong> cha<strong>in</strong> <strong>of</strong> command starts from high-level start<strong>in</strong>gfunctions like “ma<strong>in</strong>” and flows <strong>to</strong> many o<strong>the</strong>r downstreamfunctions follow<strong>in</strong>g <strong>the</strong> outgo<strong>in</strong>g edges. To fur<strong>the</strong>r <strong>in</strong>vestigate<strong>the</strong> structure <strong>of</strong> <strong>the</strong> two networks, we divide nodes <strong>in</strong><strong>to</strong> three categories(Fig. 1): master regula<strong>to</strong>rs (nodes with zero <strong>in</strong>-degree),workhorses (nodes with zero out-degree), and middle managers(nodes with nonzero <strong>in</strong>- and out-degree). Fig. 2A shows <strong>the</strong> distribution<strong>of</strong> <strong>the</strong>se categories. In <strong>the</strong> E. coli transcriptional regula<strong>to</strong>rynetwork, <strong>the</strong> fraction <strong>of</strong> workhorses is large and <strong>the</strong> <strong>to</strong>p two layerseach comprise less than 5% <strong>of</strong> <strong>the</strong> <strong>to</strong>tal number <strong>of</strong> genes. In <strong>the</strong> callTable 1. Statistics <strong>of</strong> <strong>the</strong> E. coli regula<strong>to</strong>ry network and <strong>the</strong> L<strong>in</strong>uxcall graphE. coli transcriptionalregula<strong>to</strong>ry network L<strong>in</strong>ux call graphNumber <strong>of</strong> nodes 1,378 12,391Number <strong>of</strong> persistent 72* (5%) 5,120 (41%)nodesNumber <strong>of</strong> edges 2,967 33,553Number <strong>of</strong> modules 64 3,665Number <strong>of</strong> comparativereferences200 bacterial<strong>genomes</strong>24 versions <strong>of</strong>kernelsYears <strong>of</strong> evolution Billions 20*In <strong>the</strong> E. coli genome 72 out <strong>of</strong> 212 persistent genes could be mapped <strong>to</strong> <strong>the</strong>transcriptional regula<strong>to</strong>ry network.graph, on <strong>the</strong> contrary, over 80% <strong>of</strong> functions are located <strong>in</strong> <strong>the</strong>upper levels <strong>of</strong> <strong>the</strong> hierarchy. In o<strong>the</strong>r words, unlike <strong>the</strong> conventionalpyramidal hierarchy exhibited by <strong>the</strong> E. coli transcriptionalregula<strong>to</strong>ry network, <strong>the</strong> L<strong>in</strong>ux call graph exhibits a <strong>to</strong>p-heavystructure.The discrepancy we f<strong>in</strong>d <strong>in</strong> <strong>the</strong> hierarchical organization isrelated <strong>to</strong> <strong>the</strong> discrepancy <strong>in</strong>-degree distribution. Like o<strong>the</strong>r complexnetworks such as social networks and <strong>the</strong> World Wide Web,both transcriptional regula<strong>to</strong>ry networks and call graphs possesshubs, <strong>the</strong> highly connected nodes at <strong>the</strong> tail <strong>of</strong> <strong>the</strong> skewed degreedistribution (17). The L<strong>in</strong>ux call graph possesses <strong>in</strong>-degree hubs(nodes with many <strong>in</strong>com<strong>in</strong>g edges) but no out-degree hubs (nodeswith a high number <strong>of</strong> outgo<strong>in</strong>g edges) (see Fig. 2B). The skewed<strong>in</strong>-degree distribution has been reported <strong>in</strong> s<strong>of</strong>tware networkso<strong>the</strong>r than <strong>the</strong> L<strong>in</strong>ux call graph (18). In particular, <strong>in</strong>-degree hubs<strong>in</strong> <strong>the</strong> L<strong>in</strong>ux call graph are enriched at <strong>the</strong> bot<strong>to</strong>m <strong>of</strong> <strong>the</strong> networkhierarchy. They are workhorses called by a large number <strong>of</strong> regula<strong>to</strong>rsfrom <strong>the</strong> upper levels. In contrast, <strong>in</strong> <strong>the</strong> E. coli regula<strong>to</strong>rynetwork, <strong>the</strong>re are hubs with high out-degree but not high <strong>in</strong>degree;i.e., no gene is regulated by many different transcriptionfac<strong>to</strong>rs (see Fig. 2B). The out-degree hubs <strong>in</strong> <strong>the</strong> E. coli regula<strong>to</strong>rynetwork regulate many workhorses at <strong>the</strong> bot<strong>to</strong>m <strong>of</strong> <strong>the</strong>hierarchy.Comparison <strong>of</strong> Functional Modules and Node Reuse. Modularity is animportant concept <strong>in</strong> both biology and eng<strong>in</strong>eer<strong>in</strong>g (19). In fact,<strong>the</strong> technique <strong>of</strong> modular programm<strong>in</strong>g is widely employed <strong>in</strong>modern s<strong>of</strong>tware design (20). As discussed earlier, dynamicalfunctional modules expressed under different conditions <strong>in</strong>transcriptional regula<strong>to</strong>ry networks resemble <strong>the</strong> modules <strong>of</strong>functions responsible for different computational tasks. Modulescan be labeled naturally by <strong>the</strong> master regula<strong>to</strong>rs controll<strong>in</strong>g<strong>the</strong>m, because every middle manager and workhorse <strong>in</strong> <strong>the</strong>hierarchy is controlled by at least one master regula<strong>to</strong>r. Modulesdef<strong>in</strong>ed <strong>in</strong> this way have been termed regulons (15) or origons(21). Specifically, we def<strong>in</strong>e a functional module <strong>in</strong> both callgraphs and transcriptional regula<strong>to</strong>ry networks as <strong>the</strong> subnetworkthat consists <strong>of</strong> all <strong>the</strong> downstream nodes executed or controlledby a specific master regula<strong>to</strong>r (Fig. 3A).Many nodes can be members <strong>of</strong> several different functionalmodules. To quantify this phenomenon, we def<strong>in</strong>e <strong>the</strong> reuse <strong>of</strong>a node on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> fraction <strong>of</strong> modules <strong>in</strong> <strong>the</strong> network<strong>to</strong> which it belongs. Nodes with high reuse are called generic.Unsurpris<strong>in</strong>gly, we f<strong>in</strong>d that <strong>the</strong> <strong>in</strong>-degree hubs are executedmost <strong>of</strong>ten and thus are more reusable than o<strong>the</strong>r nodes (Pearsoncorrelation r ¼ 0.16, P < 10 −95 for <strong>the</strong> L<strong>in</strong>ux call graph, and2<strong>of</strong>6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0914771107 Yan et al.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!