13.07.2015 Views

Chargaffs Legacy.pdf - Biology

Chargaffs Legacy.pdf - Biology

Chargaffs Legacy.pdf - Biology

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±137 129pro®le for a genomic region was constructed. Whensequences of much larger genomes became available inthe 1990s, Donald Forsdyke and colleagues showed that awindow of 1 kb can identify more precisely the locations ofgenes (open reading frames) and their transcriptional orientation(Dang et al., 1998; Bell and Forsdyke, 1999a,b; Fig.2). Furthermore, Jean Lobry showed that the relationship tothe origin of replication is a feature of several microbialgenomes (`skew analysis'; Frank and Lobry, 1999; Rochaet al., 1999).Chargaff's main interest in the base cluster phenomenonwas that, prior to the emergence of nucleic acid sequencingtechnology, it provided some measure of the uniqueness ofthe base order of a nucleic acid. Szybalski's main interestwas the possibility that the clustering played a role in thecontrol of transcription. This implied an evolutionaryselection pressure for clustering so that organisms withclusters would better control transcription than organismswhich did not have clusters. However, following a betterunderstanding of Chargaff's second parity rule as a re¯ectionof nucleic acid secondary structure (see Section 3), itwas recognized that in many cases a selection pressure forclustering was likely to have arisen at the post-transcriptionallevel.3. The second parity ruleChargaff's ®rst parity rule for duplex DNA was consistentwith a base on one strand of the Watson-Crick duplexrequiring a complementary base on the other strand of theduplex. By extrapolation, the existence of a parity rule forsingle strands of nucleic acid (Chargaff's second parityrule), suggested intrastrand base pairing. At least by virtueof the composition of the stems in stem-loop secondarystructures there should be an approximate equivalencebetween the Chargaff base pairs. Do genomes have thepotential to form such secondary structures? What adaptiveforces (if any) could have created them? These questionsbegan to be addressed when the genomic sequences ofvarious bacterial viruses (bacteriophages) ®rst becameavailable in the 1970s. It became evident that genomescontain multiple levels of information, and that someforms of information con¯ict with others (Grantham,1972; Grantham et al., 1985, 1986).In some phages the genome is RNA (e.g. R17, MS2), andin others the genome is DNA (e.g. T4). In classical Darwinianterms it was assumed that bacteriophages which encode optimalproteins are best adapted to their environment. Thus, theenvironment acting on virus proteins would have selected forFig. 2. Szybalski's transcription direction rule, evaluated as `Chargaff differences' (deviations from Chargaff's second parity rule). Heavy horizontal arrowsrefer to the `top' and `bottom' strands of duplex DNA. Grey boxes refer to intergenic DNA. Black balls represent RNA polymerases with thin horizontal arrowsindicating the direction of transcription. In the case of leftward transcription the Chargaff difference for the top strand is in favour of pyrimidines (Y). In thecase of rightward transcription the Chargaff difference for the top strand is in favour of purines (R). It follows that RNAs tend to be purine-loaded. Whenapplied to uncharted DNA, Chargaff difference patterns provide clues to the location of genes and their transcriptional orientation.


130D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±137survival the `®ttest' viruses whose genes encoded optimalproteins. However, it was observed that the base sequenceseems to serve the needs of nucleic acid structure just asmuch as the protein-encoding function. Indeed, the needsof nucleic acid structure are sometimes served better thanthose of the protein-encoding function. Since bacteriophagesdo not encode rRNAs and tRNAs, the possibility arose that itwas the structure of mRNAs which was of selective importance.Winston Salser noted (Salser, 1970):ªRNA phage R17 has very extensive regions of highlyordered base pairing. It has seemed likely that thismight be necessary to allow phage packaging. BerniceRicard and I were therefore somewhat surprised to®nd that T4 messengers [mRNAs], which do nothave to be packaged, also have a very large amountof secondary structure. ¼Our results suggest that ahigh degree of secondary structure may be importantin the functioning of most mRNA molecules. Becauseof the very high Tm's [temperature at which theªmeltingº of secondary structure is half maximum]we do not think that the base pairing seen is random.The possible functions of such extensive regions ofbase-pairing are unknown.ºIn 1972 Andrew Ball (Ball, 1972, 1973a,b) went furthernoting that:ªThe selection pressure for speci®c base pairing in amessenger RNA severely limits its coding potentialº,¼ so that ¼ ªthere is a pressure for some amino acidsequences to be selected according to criteria whichare distinct from the structure and function of theprotein they constitute.ºThis conclusion was supported by better algorithms forcalculating RNA secondary structure (Jaeger et al., 1990),which showed that for many mRNA sequences the energeticsof the folding of the natural sequence are betterthan those of the corresponding shuf¯ed sequence (i.e. onewould have to shuf¯e and fold many times to arrive bychance at a structure approaching the stability of the naturalsequence; Le and Maizel, 1989; Forsdyke, 1995a; Seffensand Digby, 1999). It appeared that `the hand of evolution'had arranged the order of bases to support mRNA structure,sometimes at the expense of the coding function.Although abundantly present in the cytoplasm, tRNAsand rRNAs are encoded by a relatively small part of microbialgenomes. In microorganisms the sequence of mRNAs,as an RNA entity, are more representative of the genome. Ifmany mRNAs have highly signi®cant secondary structure,then the corresponding genomic regions should also havethis potential. Indeed, the primary evolutionary pressure forthe elaboration of mRNA secondary structure might havebeen at the genomic level rather that at the mRNA level. Ifso, regions of a genome which are not transcribed intomRNAs might also demonstrate potential for secondarystructure. When folding algorithms were applied to thesequences of individual DNA strands, it was found thatthere is indeed considerable potential for secondary structure.Stem-loop potential in DNA is not restricted to regionsencoding mRNA (or rRNA or tRNA), but is also present inintrons and in intergenic DNA. Stem-loop potential, greaterthan that of the corresponding shuf¯ed sequence, is diffuselydistributed throughout the genomes of all species examined(Forsdyke, 1995a,b,c; Heximer et al., 1996).The con¯ict with the protein-encoding function wasfound to be particularly apparent in the case of genes evolvingrapidly under positive Darwinian selection (Forsdyke,1995b, 1996a). In these cases the pressure to adapt theprotein sequence has been so powerful that base order hasnot been able to support stem-loop potential. The naturalprotein-encoding sequence has less stem-loop potential thanthe corresponding shuf¯ed sequence. Stem-loop potential isthen diverted to introns, which are more conserved than thesurrounding exons. Intron stem-loop potential is greaterthan in the corresponding shuf¯ed sequence. This supportsan explanation for the early origin of introns, which postulatesthat protein-encoding potential was imposed on prototypicgenomes already supporting stem-loop potential(Forsdyke, 1995a,b,c).At least by virtue of the stems in DNA stem-loop structures,it follows that there would have been an evolutionarypressure for single-strands of DNA to have approximateequivalences of the Watson-Crick pairing bases. Thus,Chargaff's second parity rule is consistent with singlestrands of DNA having considerable potential for formingsecondary structure. The possible adaptive value of thissecondary structure at the genomic level will be discussedafter ®rst considering the adaptive value of purine-loading atthe post-transcriptional level.4. Adaptive value of purine loadingAn important implication of what is now called `Szybalski'stranscription direction rule' is that RNAs, in general,tend to be purine-loaded (Figs. 1 and 2). This observation issuggested by Chargaff's early work on the base compositionof total RNA from various species, but his data would thenhave mainly re¯ected the compositions of the most abundantRNA form, the ribosomal RNAs (rRNAs; Chargaff,1951; Elson and Chargaff, 1955). When purine-loadingwas found to apply to RNAs in general, the possibilityarose that the selection pressure for this might have arisen,not at the transcription level, but post-transcriptionally at thelevel of individual RNA molecules. Given that mRNAs tend(i) to be loaded with runs of purines (the cluster rule) and (ii)to have an elaborate secondary structure (consistent withChargaff's second parity rule), where in the structureswould purine clusters be found?Since for base-pairing purine clusters must be matched


D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±137 131with complementary pyrimidine clusters, and since pyrimidineclusters are scarce in mRNAs, purine clusters shouldoccupy the unpaired regions of mRNA secondary structures,primarily the loops. Indeed, this is where they are found incalculated structures (Bell and Forsdyke, 1999b). Whyshould it be advantageous for an mRNA to `load' its loopswith purines?A possible answer to this question derived from thestudies of Jun-ichi Tomizawa and his colleagues on theway `sense' and `antisense' RNA molecules interact, priorto forming a double-strand duplex molecule (dsRNA;Eguchi et al., 1991; Bull et al., 1998). It was found thatsense sequences in RNA search out complementary antisensesequences through `kissing' interactions between thetips of the loops of stem-loop structures. If Watson-Crickbase-pairing between the loops is achieved, the interactionmay then proceed to the formation of dsRNA. Thus, ifRNAs were generally purine-loaded, `kissing' interactionswould decrease, and hence the probability of formingdsRNA would decrease. How could failure to formdsRNA be of selective advantage? This will be consideredin two steps. First, we consider how the tendency to purineloadmight have ®rst arisen to prevent self-RNA±self-RNAinteractions. Second, we consider how cells might havefurther exploited this tendency, in order to recognize nonself-RNAs.The `distraction hypothesis' (Bell and Forsdyke, 1999b)points out that the physico-chemical state of the `crowded'cytosol (Fulton, 1982; Forsdyke, 1995d) is likely to behighly conducive to a most fundamental interaction, thatbetween the codons of mRNAs and anticodons at the tipsof loops in tRNAs. This results in transient formation ofdsRNA segments, which normally would not exceed ®vebase pairs (Bossi and Roth, 1980). However, a cytosolicenvironment which favours mRNA-tRNA kissing interactionswould also favour mRNA-mRNA kissing interactions.This `distraction' could have impeded mRNA translationand hence could have decreased the rate of protein synthesis.If, when not seriously con¯icting with their codingrole, mRNAs had accepted purine mutations in loop regions,then this `purine-loading' would have diminished mRNAmRNAinteractions and increased the ef®ciency of proteinsynthesis. In some cases, so important has been the need toaccept purine mutations, even the coding role appears tohave been compromised (Rocha et al., 1999; Lao andForsdyke, 2000). When this compromise is not possible,then the length of the protein can be increased by insertingsimple sequence repeats of amino acids with purine-richcodons between functional domains (Cristillo et al., 1998;Forsdyke, 2001).The presence of mechanisms to avoid inadvertant formationof extensive segments of dsRNAs as a result of interactionsbetween self-RNAs, leaves open the possibility ofthe evolution of mechanisms utilizing dsRNA as an intracellularalarm against not-self-RNAs. This was ®rst recognizedin the interferon response by which a cell infected by avirus communicates that fact to other cells of an organism. Itwas found that dsRNA was a most powerful inducer ofinterferon and of other antiviral adaptations (Isaacs, 1962;Ehrenfeld and Hunt, 1971; Marcus, 1983). However, PhillipSharp (1999) noted that `it remains a mystery how cellstreated with interferon speci®cally suppress the translationof viral mRNAs in their cytoplasm and not cellularmRNAs'. Thus, as with so many problems in biology, theself/not-self discrimination aspect appears central.Recently, dsRNA has been found to mediate intracellularalarms in a variety of plants and animals infected with intracellularpathogens; these pathogens, in turn, have evolvedmechanisms to inactivate components of the alarm system(Voinnet et al., 1999; Fire, 1999).Although the mechanism of dsRNA formation is stilluncertain (Hamilton and Baulcombe, 1999), one explanationfor how an exogenous agent is recognised as not-selfand triggers the formation of dsRNA challenges our usualway of regarding the mRNA population of a cell. This populationseems obviously heterogeneous because it consists ofmultiple mRNA species each encoding a different product, ±a powerful viewpoint which preempts the search for alternativeexplanations. However, if we consider the possibilitythat heterogeneity can serve more than one purpose, then wecan regard the heterogenous intracellular mRNA populationin the same way as we regard the heterogenous extracellularimmunoglobulin population. The latter constitutesthe ®rst barrier against foreign pathogens (`not-self'), whichselect from among the immunoglobulins (which have beenrandomly generated and then pre-selected for non-reactionwith self antigens) those with speci®city for the coat antigensof the pathogen.In the case of a virus, the coat is relinquished at the timeof entry into a cell, and the cell's ®rst possible line of intracellulardefence would be to recognize the foreign nucleicacid as `not-self', either in its genomic form, or in the formof an early transcript. This implicates dsRNA. Perhaps asegment of a particular host mRNA species, because ithappened to have suf®cient sequence complementarity,would form a double-stranded segment with viral RNA ofsuf®cient length (at least two helical turns) to trigger analarm response (Izant and Weintraub, 1984; Melton, 1985;Robertson and Mathews, 1996; Tian et al., 2000). Thus, theheterogenous mRNA population can be seen as consistingof `RNA antibodies', which have been preselected overevolutionary time not to react with `self' RNAs, whileretaining, and if possible developing, the potential to reactwith `not-self' RNAs. Since, due to failure of transcriptiontermination, a low level of transcription of extra-genic DNAis possible (Heximer et al., 1998), the maximum potentialrepertoire of `RNA antibodies' is limited only by genomesize. A second line of intracellular defence would be recognitionof the translation products of virus genes; a mechanismfor this form of self/not-self discrimination is discussedelsewhere (Forsdyke, 1995d).


132D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±1375. The GC ruleWe propose above that in some circumstances evolutionaryselective pressures have acted to preserve nucleic acidsecondary structure, sometimes at the expense of anencoded protein. That this might also apply to the speciesdependentcomponent of the base composition, (C 1 G)%,arose from Naboru Sueoka's demonstration in 1961, beforethe genetic code was deciphered, that the amino acidcomposition of the proteins of microorganisms is in¯uenced,not just by the demands of the environment on theproteins, but also by the base composition of the genomeencoding those proteins. The observation has since beenabundantly con®rmed in a wide variety of animal andplant species (Lobry, 1997).Sueoka (1961) further pointed out that for individual`strains' of Tetrahymena the (C 1 G)% (referred to as`GC') tends to be uniform throughout the genome:ªIf one compares the distribution of DNA moleculesof Tetrahymena strains of different mean GC contents,it is clear that the difference in mean values is due to arather uniform difference of GC content in individualmolecules. In other words, assuming that strains ofTetrahymena have a common phylogenetic origin,when the GC content of DNA of a particular strainchanges, all the molecules undergo increases ordecreases of GC pairs in similar amounts. This resultis consistent with the idea that the base composition israther uniform not only among DNA molecules of anorganism, but also with respect to different parts of agiven molecule.ºAgain, this observation has been abundantly con®rmedfor a wide variety of species (Muto and Osawa, 1987),although many organisms considered higher on the evolutionaryscale have their genomes sectored into regions oflow or high (C 1 G)% (Bernardi and Bernardi, 1986;Bernardi, 2000; see Section 9).Sueoka (1961) also noted a link between (C 1 G)% andreproductive isolation for strains of Tetrahymena:ªDNA base composition is a re¯ection of phylogeneticrelationship. Furthermore, it is evident thatthose strains which mate with one another (i.e. strainswithin the same `variety') have similar base compositions.Thus strains of variety 1 ¼, which are freelyintercrossed, have similar mean GC content.ºWhen the genetic code was deciphered in the early 1960s,it was observed that there are more codons than amino acids,so that most amino acids can correspond to more than onetriplet codon. This gives some ¯exibility to a nucleic acidsequence. Sometimes an amino acid can be encoded fromamong as many as six possible synonymous codons. WalterFitch (1974) noted that `the degeneracy of the genetic codeprovides an enormous plasticity to achieve secondary structurewithout sacri®cing speci®city of the message'. Yet, asoutlined above, sometimes even this `plasticity' is insuf®cient,so that, with the exception of genes under positiveDarwinian selection (Forsdyke, 1995b, 1996a), genomicsecondary structure (`fold pressure') and (C 1 G)% `callthe tune'. Non-synonymous codon changes modify theamino acid sequence, sometimes at the expense of proteinstructure and function. A protein has to adapt to thedemands of the environment, but it also has to adapt togenomic forces which we will show have derived, notfrom the conventional environment acting upon the convention(`classical') phenotype, but from what we call the`reproductive environment' acting on the `genome phenotype'',or `reprotype'. Thus Bernardi and Bernardi noted in1986 that:ªThe organismal phenotype comprises two components,the classical phenotype, corresponding to the`gene products', and a `genome phenotype' which isde®ned by [base] compositional constraints.º6. Codon choiceThe issue of which codon was employed in a particularcircumstance was considered by Richard Grantham, whonoted in 1972 that codon choice was not random in microorganisms,`suggesting a mechanism against [base] compositiondrift'. Observing that `little latitude appears left for`neutral' or synonymous mutations in coliphage codons', hewas led to his `genome hypothesis', which speci®ed thatunde®ned adaptive genomic pressure(s) caused changes inbase composition and hence in codon choice (Grantham etal., 1986):ªEach ¼species has a `system' or coding strategy forchoosing among synonymous codons. This system ordialect is repeated in each gene of a genome and henceis a characteristic of the genome.ºThere was also a sense that the coding strategy was ofrelevance to the most fundamental aspects of an organism'sbiology:ªWhat is the fundamental explanation for interspeci®cvariation in coding strategy? Are we faced with asituation of continuous variation within and betweenspecies, thus embracing a Darwinian perspective ofgradual separation of populations to form new species¼? This is the heart of the problem of molecularevolution.ºGrantham and his colleagues further pointed to the needto determine `how much independence exists between the


D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±137 133two levels of evolution' (that of the genome phenotype andof the classical phenotypic) and considered `it is too easyjust to say most mutations are neutral'. However, non-adaptive`neutralist' explanations gained much support (Filipski,1990; Sueoka, 1995). Paul Sharp and his colleaguesconcluded (Sharp et al., 1993) that the main factors in¯uencingcodon choice are mutational biases and the need forhighly expressed genes to be ef®ciently translated. Althoughphenotypic adaptive factors such as the need to translate anabundant mRNA ef®ciently can in¯uence codon choice,genomic factors, identi®ed here as stem-loop potential(`fold pressure') and (C 1 G)%, play an important andoften dominant role.7. Thermophilic bacteriaThe secondary structure of nucleic acids with a high(C 1 G)% is more stable than that of nucleic acids with alow (C 1 G)%. GC bonds are associated with a more stablenucleic acid structure than AT or AU bonds. This isre¯ected in the base composition of RNAs whose structureis vital for their function, namely rRNAs and tRNAs. Free ofcoding constraints, yet required to form part of the precisestructure of ribosomes, rRNAs might more readily acceptmutations which increase GC content than do mRNAs.Indeed, the GC content of rRNAs is directly proportionalto the normal growth temperature, so that rRNAs of thermophilicbacteria are highly enriched in G and C (Dalgaard andGarrett, 1993; Forterre and Elie, 1993; Galtier and Lobry,1997). However, although optimum growth temperaturecorrelates positively with the GC content of rRNA, it doesnot correlate similarly with the GC content of genomicDNA, and hence with that of the mRNA populations transcribedfrom that DNA.The ®nding of no consistent trend towards a high genomicGC in thermophilic organisms has been interpreted assupporting the neutralist argument that variations in genomicGC are the consequences of mutational biases and are,in themselves, of no adaptive value (Filipski, 1990; Galtierand Lobry, 1997). However, the ®nding is also consistentwith the argument that genomic GC is too important merelyto follow the dictates of temperature, since its primary roleis related to other more fundamental adaptations (Bernardiand Bernardi, 1986).Galtier and Lobry (1997) have argued that `any secondarystructure that must endure high temperatures requires a highG 1 C content'. This would include both the classicalWatson-Crick secondary structure involving inter-strandbase pairing, and any secondary structure involving intrastrandbase pairing (Murchie et al., 1992). However, thestability of genomic DNA at high temperatures might beachieved in ways other than by an increase in GC content(Bernardi, 2000). These include association with polyamines(Oshima et al., 1990), and relaxation of supercoiling(Friedman et al., 1995). There is no reason to believe that inthermophiles DNA is not able to maintain both its classicalduplex structure with H-bonding between opposite strands,and any secondary structures involving intrastrand H-bonding.As we propose (see Section 9), the latter structureswould be critical only under certain clearly de®ned, butselectively very important, circumstances, namely whenrecombination repair is required. The most enduring DNAsecondary structure, even at high temperatures, would be theclassical duplex form.8. The `holy grail' of Romanes and BatesonWith hindsight it seems that, in identifying (C 1 G)% asthe species variant component of the base composition,Chargaff had uncovered what we might now recognize asthe `holy grail' of speciation ®rst postulated in 1886 byCharles Darwin's research associate, George Romanes(Forsdyke, 1999a,b). Romanes had pointed to what wewould now call non-genic variations in the germ-line,which would tend to isolate an individual reproductivelyfrom other members of its species, but not from membersthat had undergone the same variation. William Batesonfurther postulated a non-genic inherited variation, whichwould remain constant for a species, whereas genic variationscould occur within a species. The non-genic variations,in whatever was responsible for carrying hereditary informationfrom generation to generation (not known at thattime), would have the potential to lead to species differentiation,so that variant members of a species (`not-self')would not successfully reproduce with members of themain species (`self'). The latter would constitute the `reproductiveenvironment' moulding the genome phenotype(reprotype).Once reproductive isolation was achieved, the naturalselection postulated by Darwin would be able to furtherincrease species differentiation by allowing the survival oforganisms with advantageous genic variations, and disallowingthe survival of organisms with disadvantageousgenic variations. These genic variations would affect theclassical phenotype. Romanes referred to his holy grail(speciating factor) as an `intrinsic peculiarity' of the reproductivesystem. Bateson described his holy grail as aspeciating factor uniformly attached to the same `residue'as the genes, but distinct from the genes. These are just theproperties we ®nd in the (C 1 G)% (Forsdyke, 1996b,1998).A metaphor for the role (C 1 G)% might play in keepingindividuals reproductively isolated from each other isprovided by the word `dialect' (Grantham et al., 1986). Acommon language brings people together, and in this way isconducive to sexual reproduction. But languages can vary,®rst into dialects and then into independent sub-languages.Linguistic differences keep people apart, and this differencein the reproductive environment militates against sexualreproduction.


134D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±137At the molecular level, we see similar forces acting at thelevel of meiosis. Here paternal and maternal chromosomalhomologs align. Forsdyke has proposed that if there is suf®cientsequence identity, so that the DNA `dialects'[(C 1 G)%] match, meiosis is likely to progress throughvarious check-points (Page and Orr-Weaver, 1996), andgametes will be formed. If there is insuf®cient identity (i.e.the DNA `dialects' do not match) meiosis will fail, gameteswill not form, and the individual will be sterile, ± a `mule'.Thus, the original parental lines will be reproductivelyisolated from each other, and as such would be de®ned asdistinct species. Changes in the (C 1 G)% dialects have thepotential to initiate speciation (Forsdyke, 1996b).9. The unpairing postulateMuller (1922) suggested that the pairing of genes as partsof chromosomes undergoing meiotic synapses, mightprovide clues to gene structure and replication:ªIt is evident that the very same forces which causethe genes to grow should also cause like genes toattract each other, ¼If the two phenomena are thusdependent on a common principle in the make-up ofthe gene, progress made in the study of one of themshould help in the solution of the other.ºIn 1954 he set his students an essay `How does theWatson-Crick model account for synapsis?' (Carlson,1981). Crick (1971) took up the challenge with his `unpairingpostulate' by which the two strands of the classical DNAduplex would unpair to allow a homology search.Recent work on the meiotic alignment of chromosomessuggests that initiation of speciation involves that aspect ofthe genome phenotype (reprotype), which is constituted bythe genome-wide potential to extrude stem-loops. Those ofthe maternal and paternal chromosomal homologs maymutually explore each other and test for `self' DNA complementarity,using the `kissing' mechanism of Tomizawa(Eguchi et al., 1991; Hawley and Arbel, 1993; Kleckner,1997). If suf®cient complementarity is found (i.e. thegenomes are reprotypically compatible), then crossingover and recombination can occur. The main adaptivevalue of this would be to provide for the correction of errorsin the individual homologs (Winge, 1917; Bernstein andBernstein, 1991).Where does the (C 1 G)% `dialect' come into this? It hasbeen observed that small ¯uctuations in (C 1 G)% wouldhave a major effect on the ability of duplex DNA moleculesto extrude stem-loops and on the pattern of loops which thenoccur. A very small difference in (C 1 G)% (reprotypicdifference) would mark a meiotically pairing DNA as`not-self'. This would impair the kissing interaction with`self' DNA (Forsdyke, 1998; Bull et al., 1998), and sowould disrupt meiosis and allow divergence between thetwo parental lines, thus initiating speciation (Forsdyke,1996b). Consistent with this, direct tests of incipient speciationin the fruit ¯y (the phenomenon known as Haldane'srule) implicate differences in DNA per se, rather than indistinct genes, as initiating reproductive isolation (Naveiraand Maside, 1998; Forsdyke, 2000a). Similar considerationsmay apply to the phenomenon of `heteroduplex resistance'between recombining DNAs of different bacterial species(Majewski and Cohan, 1998).Once a speciation process has begun, prezygotic factorsand postzygotic factors other than (C 1 G)% are likely toreplace the original difference in (C 1 G)% as a barrier toreproduction (i.e. a barrier to recombination). In thiscircumstance, (C 1 G)% becomes free to adopt otherroles, such as the prevention of intragenomic recombination.This could involve intragenomic differentiation ofregions of high and low (C 1 G)% (isochores). Thesehave the potential to `reproductively isolate' (i.e. recombinationallyisolate) different parts of the genome. Thus, theattempted duplication of a globin gene into a-globin and b-globin genes might have failed since sequence similaritywould favour recombination between the two genes andany incipient differences would be eliminated. However,the duplication appears to have involved relocation to adifferent isochore with corresponding changes in(C 1 G)%, so that the two genes became recombinationallyisolated. As a consequence of the differences in (C 1 G)%the corresponding mRNAs utilize different codons for correspondingamino acids, even though both mRNAs are translatedin the same cell using the same ribosomes and tRNAFig. 3. Summary of potentially con¯icting evolutionary pressures as manifestat the level of mRNA (dark line with arrow-head). (1) (C 1 G)%pressure (`GC pressure') acting primarily at the genomic level, and secondarilyaffecting mRNA base composition. (2) Fold (stem-loop) pressureacting primarily at the genomic level and secondarily affecting mRNAbase order. (3) Purine pressure acting primarily at the cytoplasmic levelto enrich loops with purines. (4) Coding pressure deriving from classicalenvironmental interactions with the conventional phenotype, which resultin base changes in the protein-encoding part of the mRNA. (5) Regulatorypressures (small grey boxes) acting primarily at the cytoplasmic level,which result in base changes mainly in the 5 0 and 3 0 non-coding regions.


Table 1Postulated evolutionary processes leading to multiple levels of information in genomesD.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±137 135Environmental selectivefactorsSelection for mutationswhich:Primary effect on DNAfunctionBiological resultObserved featuresof modern DNAClassical phenotypicselective factorsCompetitors with moreef®cient translationMutagensRecombinationally `notself'sexual partnersChange encoded proteins None Change in classicalphenotypePurine-load RNAs None Ef®cient translationwith no `self' dsRNAformationPromote DNA stem-looppotentialImpair homology searchbetween DNAs ofspecies members whosesequences are divergingWithin species meioticrecombination promotedMeiotic recombination isimpairedChange in genomephenotype for DNArepairChange in genomephenotype forspeciationUsually change in abase pair inaccordance withChargaff's ®rstparity ruleChargaff's clusterrule and Szybalski'stranscriptiondirection ruleChargaff's secondparity ruleChargaff's GC rulepopulations. Thus, it is unlikely that the primary pressure todifferentiate codons arose at the translational level (Granthamet al., 1986). Isochores would have arisen as a random¯uctuation in base composition in a genomic region suchthat one product of a gene duplication was able to survivefor a suf®cient number of generations to allow functionaldifferentiation to occur. The regional base compositional¯uctuation would then have `hitch-hiked' through thegenerations on the successful duplicate.The multiple evolutionary pressures acting on mRNAsand genomes are summarized in Fig. 3 and Table 1. Itshould be noted that changes in genetic ®tness (changes inthe number of progeny transmitted to future generations),could result from changes in the classical phenotype and/orin the genome phenotype. A mutation which appearedneutral with respect to protein function might neverthelessaffect ®tness by changing the genome phenotype. It isargued elsewhere that the apparently neutral polymorphismof some intracellular proteins is an adaptation to facilitateintracellular self/not-self discrimination (Forsdyke, 2000b).10. Universal Darwinism?Chargaff was well aware of the evolutionary implicationsof his work, writing of `the survival of the ®ttest nucleicacids' (Chargaff, 1951). However, <strong>Biology</strong> satis®ed withwhat was called `the modern synthesis' (Carlson, 1981),had not kept pace with Chemistry, and it was dif®cult forhim to explore the implications of his discovery (Chargaff,1979). Romanes and Bateson, who had challenged Darwiniandogma, had been dismissed by generations of biologistsbewitched by genes and all that they promised. To put itmildly, the level of discourse was not constructive.Systematist Ernst Mayr (1980) when expressing `somethoughts on the history of the evolutionary synthesis' classi®edBateson as among those geneticists who failed tounderstand evolution (implying that others understood itbetter). The plant geneticist Ledyard Stebbins (1980)commenting on `botany and the synthetic theory of evolution'blamed Bateson for `delaying the synthesis'. The`universal Darwinist' Richard Dawkins lamenting the positionhe perceived to have been taken by the `mutationists ofthe early part of this century' noted Dawkins (1983) that:ªFor historians there remains the baf¯ing enigma ofhow such distinguished biologists as ¼W. Bateson ¼could rest satis®ed with such a crassly inadequatetheory. ¼The irony with which we must now readW. Bateson's dismissal of Darwin is almost painful.ºDespite the overwhelming rhetoric to the contrary, we arenow beginning to appreciate the deep truths which Romanesand Bateson were attempting to convey (Forsdyke, 1999a,b,2000a). When Bateson died in 1926, Chargaff was twenty.He is still with us, ± a citizen of the twenty ®rst century. Wewish him well.AcknowledgementsWe thank Richard Grantham and Tim Greenland for a veryhelpful review of the manuscript, and Jim Gerlach for assistancewith computer con®guration. Academic Press, ColdSpring Harbor Laboratory Press, Elsevier Science and theNational Research Council of Canada gave permission toplace full-text versions of some of the above cited papers inForsdyke's web pages, which may be accessed at:http://post.queensu.ca/,forsdyke/homepage.htm.ReferencesBall, L.A., 1972. Implications of secondary structure in messenger RNA. J.Theor. Biol. 36, 313±320.


136D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±137Ball, L.A., 1973a. Secondary structure and coding potential of the coatprotein gene of bacteriophage MS2. Nature New Biol. 242, 44±45.Ball, L.A., 1973b. Mutual in¯uence of the secondary structure and informationcontent of a messenger RNA. J. Theor. Biol. 41, 243±247.Bell, S.J., Forsdyke, D.R., 1999a. Accounting units in DNA. J. Theor. Biol.197, 51±61.Bell, S.J., Forsdyke, D.R., 1999b. Deviations from Chargaff's second parityrule correlate with direction of transcription. J. Theor. Biol. 197, 63±76.Bernardi, G., Bernardi, G., 1986. Compositional constraints and genomeevolution. J. Mol. Evol. 24, 1±11.Bernardi, G., 2000. Isochores and the evolutionary genomics of vertebrates.Gene 241, 3±17.Bernstein, C., Bernstein, H., 1991. Aging, Sex and DNA Repair. AcademicPress, San Diego, CA.Bossi, L., Roth, J.R., 1980. The in¯uence of codon context on genetic codetranslation. Nature 286, 123±127.Bull, J.J., Jacobson, A., Badgett, M.R., Molineux, I.J., 1998. Viral escapefrom antisense RNA. Molec. Microbiol. 28, 835±846.Carlson, E.A., 1981. Genes, Radiation and Society. The Life and Work ofH.J. Muller. Cornell University Press, Ithaca, NY, pp. 390±392.Chargaff, E., 1950. Chemical speci®city of nucleic acids and mechanism oftheir enzymic degradation. Experientia 6, 201±209.Chargaff, E., 1951. Structure and function of nucleic acids as cell constituents.Fed. Proc. 10, 654±659.Chargaff, E., 1963. Essays on Nucleic Acids. Elsevier, Amsterdam.Chargaff, E., 1979. How genetics got a chemical education. Ann. NY Acad.Sci. 325, 345±360.Crick, F., 1971. General model for the chromosomes of higher organisms.Nature 234, 25±27.Cristillo, A.D., Lillicrap, T.P., Forsdyke, D.R., 1998. Purine-loading ofEBNA-1 mRNA avoids sense-antisense `collisions'. FASEB J. 12,A1453.Dalgaard, J.Z., Garrett, A., 1993. Archaeal hyperthermophile genes. In:Kates, M., Kushner, D.J., Matheson, A.T. (Eds.), The Biochemistryof Archaea (Archaebacteria). Elsevier, Amsterdam, pp. 535±562.Dang, K.D., Dutt, P.B., Forsdyke, D.R., 1998. Chargaff differences correlatewith transcription direction in the bithorax complex of Drosophila.Biochem. Cell Biol. 76, 129±137.Dawkins, R., 1983. Universal Darwinism. In: Bendall, D.S. (Ed.). Evolutionfrom Molecules to Man. Cambridge University Press, Cambridge,pp. 405±425.Eguchi, Y., Itoh, T., Tomizawa, J., 1991. Antisense RNA. Annu. Rev.Biochem. 60, 631±652.Ehrenfeld, E., Hunt, T., 1971. Double-stranded poliovirus RNA inhibitsinitiation of protein synthesis by reticulocyte lysates. Proc. Natl.Acad. Sci. USA 68, 1075±1078.Elson, D., Chargaff, E., 1955. Evidence of common regularities in thecomposition of pentose nucleic acids. Biochim. Biophys. Acta 17,367±376.Filipski, J., 1990. Evolution of DNA sequences. Contributions of mutationalbias and selection to the origin of chromosomal compartments.Adv. Mut. Res. 2, 1±54.Fire, A., 1999. RNA-triggered gene silencing. Trends Genet. 15, 358±363.Fitch, W.M., 1974. The large extent of putative secondary nucleic acidstructure in random nucleotide sequences of amino acid-derivedmessenger-RNA. J. Mol. Evol. 3, 279±291.Forsdyke, D.R., 1995a. A stem-loop `kissing' model for the initiation ofrecombination and the origin of introns. Mol. Biol. Evol. 12, 949±958.Forsdyke, D.R., 1995b. Conservation of stem-loop potential in introns ofsnake venom phospholipase A2 genes. An application of FORS-Danalysis. Mol. Biol. Evol. 12, 1157±1165.Forsdyke, D.R., 1995c. Relative roles of primary sequence and (G 1 C)%in determining the hierarchy of frequencies of complementary trinucleotidepairs in DNAs of different species. J. Mol. Evol. 41, 573±581.Forsdyke, D.R., 1995d. Entropy-driven protein self-aggregation as the basisfor self/not-self discrimination in the crowded cytosol. J. Biol. Sys. 3,273±287.Forsdyke, D.R., 1996a. Stem-loop potential in MHC genes: a new way ofevaluating positive Darwinian selection. Immunogenetics 43, 182±189.Forsdyke, D.R., 1996b. Different biological species `broadcast' their DNAsat different (C 1 G)% `wavelengths'. J. Theor. Biol. 178, 405±417.Forsdyke, D.R., 1998. An alternative way of thinking about stem-loops inDNA. A case study of the human G0S2 gene. J. Theor. Biol. 192, 489±504.Forsdyke, D.R., 1999a. Two levels of information in DNA. Relationship ofRomanes' `intrinsic' variability of the reproductive system, and Bateson's`residue' to the species-dependent component of the base composition,(C 1 G)%. J. Theor. Biol. 201, 47±61.Forsdyke, D.R., 1999b. The origin of species, revisited. Queen's Quarterly106, 112±134.Forsdyke, D.R., 2000a. Haldane's rule: hybrid sterility affects the heterogameticsex ®rst because sexual differentiation is on the path to speciesdifferentiation. J. Theor. Biol. 204, 443±452.Forsdyke, D.R., 2000b. Double-stranded RNA and/or heat-shock as initiatorsof chaperone mode switches in diseases associated with proteinaggregation. Cell Stress. Chaperones 5, 375±376.Forsdyke, D.R., 2001. Search for a Victorian. The Origin of Species, Revisited.McGill-Queen's University Press, Montreal (in press).Forterre, P., Elie, C., 1993. Chromosome structure, DNA topoisomerases,and DNA polymerases in archaebacteria (archeae). In: Kates, M., Kushner,D.J., Matheson, A. (Eds.), The Biochemistry of Archaea (Archaebacteria).Elsevier, Amsterdam, pp. 325±345.Frank, A.C., Lobry, J.R., 1999. Asymmetric substitution patterns: a reviewof possible underlying mutational or selective mechanisms. Gene. 238,65±77.Friedman, S.M., Malik, M., Drlica, K., 1995. DNA supercoiling in a thermotolerantmutant of Escherichia coli. Mol. Gen. Genet. 248, 417±422.Fulton, A.B., 1982. How crowded is the cytoplasm? Cell 30, 345±347.Galtier, N., Lobry, J.R., 1997. Relationships between genomic G 1 Ccontent, RNA secondary structures, and optimal growth temperaturein prokaryotes. J. Mol. Evol. 44, 632±636.Grantham, R., 1972. Codon base randomness and composition drift incoliphage. Nature New Biol. 237, 265±266.Grantham, R., Greenland, T., Louail, S., Mouchiroud, D., Prato, J.L., Gouy,M., Gautier, C., 1985. Molecular evolution of viruses as seen by nucleicacid sequence study. Bull. Inst. Past. 83, 95±148.Grantham, R., Perrin, P., Mouchiroud, D., 1986. Patterns in codon usage indifferent kinds of species. Oxford Surv. Evol. Biol. 3, 48±81.Hamilton, A.J., Baulcombe, D.C., 1999. A species of small antisense RNAin post-transcriptional gene silencing in plants. Science 286, 950±951.Hawley, R.S., Arbel, T., 1993. Yeast genetics and the fall of the classicalview of meiosis. Cell 72, 301±303.Heximer, S.P., Cristillo, A.D., Russell, L., Forsdyke, D.R., 1996. Sequenceanalysis and expression in cultured lymphocytes of the human FOSBgene (G0S3). DNA Cell Biol. 15, 1025±1038.Heximer, S.P., Cristillo, A.D., Russell, L., Forsdyke, D.R., 1998. Expressionand processing of G 0 /G 1 Switch Gene 24 (G0S24/TIS11/NUP475)RNA in cultured human blood mononuclear cells. DNA Cell Biol. 17,249±263.Isaacs, A., 1962. Antiviral action of interferon. Br. Med. J. 2, 353±355.Izant, J.G., Weintraub, H., 1984. Inhibition of thymidine kinase geneexpression by anti-sense RNA: a molecular approach to genetic analysis.Cell 36, 1007±1015.Jaeger, J.A., Turner, D.H., Zuker, M., 1990. Predicting optimal and suboptimalsecondary structure for RNA. Meth. Enzymol. 183, 281±317.Kleckner, N., 1997. Interactions between and along chromosomes duringmeiosis. Harvey Lect. 91, 21±45.Lao, P.J., Forsdyke, D.R., 2000. Thermophilic bacteria strictly obeySzybalski's transcription direction rule and politely purine-load RNAswith both adenine and guanine. Genome Res. 10, 228±236.Le, S-Y., Maizel, J.V., 1989. A method for assessing the statistical signi®canceof RNA folding. J. Theor. Biol. 138, 495±510.Lobry, J.R., 1997. In¯uence of genomic G 1 C content on average amino-


D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±137 137acid composition of proteins from 59 bacterial species. Gene 205, 309±316.Majewski, J., Cohan, F.M., 1998. The effect of mismatch repair and heteroduplexformation on sexual isolation in Bacillus. Genetics 148, 13±18.Marcus, P., 1983. Interferon induction by viruses: one molecule of dsRNAas the threshold for induction. Interferon 5, 115±180.Mayr, E., 1980. Some thoughts on the history of the evolutionary synthesis.In: Mayr, E., Provine, W.B. (Eds.), The Evolutionary Synthesis:Perspectives on the Uni®cation of <strong>Biology</strong>. Harvard University Press,Cambridge, MA, pp. 1±48.Melton, D.A., 1985. Injected antisense RNAs speci®cally block messengerRNA translation in vivo. Proc. Natl. Acad. Sci. USA 82, 144±148.Muller, H.J., 1922. Variation due to change in the individual gene. Am. Nat.56, 32±50.Murchie, A.I.H., Bowater, R., Aboul-Ela, F., Lilley, D.M.J., 1992. Helixopening transitions in supercoiled DNA. Biochim. Biophys. Acta 1131,1±15.Muto, A., Osawa, S., 1987. The guanine and cytosine content of genomicDNA and bacterial evolution. Proc. Natl. Acad. Sci. USA 84, 166±169.Nakamura, Y., Gojobori, T., Ikemura, T., 1999. Codon usage tabulatedfrom the international DNA sequence databases: its status 1999.Nucleic Acids Res. 27, 292.Naveira, H.F., Maside, X.R., 1998. The genetics of hybrid male sterility inDrosophila. In: Howard, D.J., Berlocher, S.H. (Eds.), Endless Forms:Species and Speciation. Oxford University Press, New York, pp. 330±338.Oshima, T., Hamasaki, N., Uzawa, T., Friedman, S.M., 1990. Biochemicalfunctions of unusual polyamines found in the cells of extreme thermophiles.In: Goldembeg, S.H., Algranati, I.D. (Eds.), The <strong>Biology</strong> andChemistry of Polyamines. Oxford University Press, New York, pp. 1±10.Page, A.W., Orr-Weaver, T.L., 1996. Stopping and starting the meioticcycle. Curr. Opin. Genet. Dev. 7, 23±31.Robertson, H.D., Mathews, M.B., 1996. The regulation of the proteinkinase PKR by RNA. Biochimie 78, 909±914.Rocha, E.P.C., Danchin, A., Viari, A., 1999. Universal replication biases inbacteria. Mol. Microbiol. 32, 11±16.Rudner, R., Karkas, J.D., Chargaff, E., 1968. Separation of B. subtilis DNAinto complementary strands. III. Proc. Natl. Acad. Sci. USA 60, 921±922.Salser, W., 1970. Discussion. Cold Spring Harbor NY Symp. Quant. Biol.35, 19.Seffens, W., Digby, D., 1999. mRNAs have greater negative folding freeenergies than shuf¯ed or codon choice randomized sequences. NucleicAcids Res. 27, 1578±1584.Sharp, P.A., 1999. RNAi and double-stranded RNA. Genes Dev. 13, 139±141.Sharp, P.M., Stenico, M., Peden, J.F., Lloyd, A.T., 1993. Codon usage:mutational bias, translation selection, or both? Biochem. Soc. Trans.21, 835±841.Smithies, O., Engels, W.R., Devereux, J.R., Slightom, J.L., Shen, S., 1981.Base substitutions, length differences and DNA strand asymmetries inthe human Gg and Ag fetal globin gene region. Cell 26, 345±353.Stebbins, G.L., 1980. Botany and the synthetic theory of evolution. In:Mayr, E., Provine, W.B. (Eds.), The Evolutionary Synthesis: Perspectiveson the Uni®cation of <strong>Biology</strong>. Harvard University Press,Cambridge, MA, pp. 139±152.Sueoka, N., 1961. Compositional correlations between deoxyribonucleicacid and protein. Cold Spring Harbor NY Symp. Quant. Biol. 26, 35±43.Sueoka, N., 1995. Intrastrand parity rules of DNA base composition andusage biases of synonymous codons. J. Mol. Evol. 40, 318±325.Szybalski, W., Kubinski, H., Sheldrick, P., 1966. Pyrimidine clusters on thetranscribing strands of DNA and their possible role in the initiation ofRNA synthesis. Cold Spring Harbor NY Symp. Quant. Biol. 31, 123±127.Szybalski, W., Bovre, K., Fiandt, M., Guha, A., Hradecna, Z., Kumar, S.,Lozeron, H.A., Maher, V.M., Nijkamp, H.J.J., Summers, W.C., Taylor,K., 1969. Transcriptional controls in developing bacteriophages. J. CellPhysiol 74 (Suppl. 1), 33±70.Tian, B., White, R.J., Xia, T., Welle, S., Turner, D.H., Mathews, M.B.,Thornton, C.A., 2000. Expanded CUG repeat RNAs form hairpins thatactivate the double-stranded RNA-dependent protein kinase PKR. RNA6, 79±87.Voinnet, O., Pinto, Y.M., Baulcombe, D.C., 1999. Suppression of genesilencing: a general strategy used by diverse DNA and RNA virusesof plants. Proc. Natl. Acad. Sci. USA. 96, 14147±14152.Watson, J.D., Crick, F.H.C., 1953. Genetical implications of the structure ofdeoxyribonucleic acid. Nature 171, 964±967.Winge, OÈ ., 1917. The chromosomes, their number and general importance.Compte. Rend. Trav. Lab. Carlsberg 13, 131±275.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!