132D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±1375. The GC ruleWe propose above that in some circumstances evolutionaryselective pressures have acted to preserve nucleic acidsecondary structure, sometimes at the expense of anencoded protein. That this might also apply to the speciesdependentcomponent of the base composition, (C 1 G)%,arose from Naboru Sueoka's demonstration in 1961, beforethe genetic code was deciphered, that the amino acidcomposition of the proteins of microorganisms is in¯uenced,not just by the demands of the environment on theproteins, but also by the base composition of the genomeencoding those proteins. The observation has since beenabundantly con®rmed in a wide variety of animal andplant species (Lobry, 1997).Sueoka (1961) further pointed out that for individual`strains' of Tetrahymena the (C 1 G)% (referred to as`GC') tends to be uniform throughout the genome:ªIf one compares the distribution of DNA moleculesof Tetrahymena strains of different mean GC contents,it is clear that the difference in mean values is due to arather uniform difference of GC content in individualmolecules. In other words, assuming that strains ofTetrahymena have a common phylogenetic origin,when the GC content of DNA of a particular strainchanges, all the molecules undergo increases ordecreases of GC pairs in similar amounts. This resultis consistent with the idea that the base composition israther uniform not only among DNA molecules of anorganism, but also with respect to different parts of agiven molecule.ºAgain, this observation has been abundantly con®rmedfor a wide variety of species (Muto and Osawa, 1987),although many organisms considered higher on the evolutionaryscale have their genomes sectored into regions oflow or high (C 1 G)% (Bernardi and Bernardi, 1986;Bernardi, 2000; see Section 9).Sueoka (1961) also noted a link between (C 1 G)% andreproductive isolation for strains of Tetrahymena:ªDNA base composition is a re¯ection of phylogeneticrelationship. Furthermore, it is evident thatthose strains which mate with one another (i.e. strainswithin the same `variety') have similar base compositions.Thus strains of variety 1 ¼, which are freelyintercrossed, have similar mean GC content.ºWhen the genetic code was deciphered in the early 1960s,it was observed that there are more codons than amino acids,so that most amino acids can correspond to more than onetriplet codon. This gives some ¯exibility to a nucleic acidsequence. Sometimes an amino acid can be encoded fromamong as many as six possible synonymous codons. WalterFitch (1974) noted that `the degeneracy of the genetic codeprovides an enormous plasticity to achieve secondary structurewithout sacri®cing speci®city of the message'. Yet, asoutlined above, sometimes even this `plasticity' is insuf®cient,so that, with the exception of genes under positiveDarwinian selection (Forsdyke, 1995b, 1996a), genomicsecondary structure (`fold pressure') and (C 1 G)% `callthe tune'. Non-synonymous codon changes modify theamino acid sequence, sometimes at the expense of proteinstructure and function. A protein has to adapt to thedemands of the environment, but it also has to adapt togenomic forces which we will show have derived, notfrom the conventional environment acting upon the convention(`classical') phenotype, but from what we call the`reproductive environment' acting on the `genome phenotype'',or `reprotype'. Thus Bernardi and Bernardi noted in1986 that:ªThe organismal phenotype comprises two components,the classical phenotype, corresponding to the`gene products', and a `genome phenotype' which isde®ned by [base] compositional constraints.º6. Codon choiceThe issue of which codon was employed in a particularcircumstance was considered by Richard Grantham, whonoted in 1972 that codon choice was not random in microorganisms,`suggesting a mechanism against [base] compositiondrift'. Observing that `little latitude appears left for`neutral' or synonymous mutations in coliphage codons', hewas led to his `genome hypothesis', which speci®ed thatunde®ned adaptive genomic pressure(s) caused changes inbase composition and hence in codon choice (Grantham etal., 1986):ªEach ¼species has a `system' or coding strategy forchoosing among synonymous codons. This system ordialect is repeated in each gene of a genome and henceis a characteristic of the genome.ºThere was also a sense that the coding strategy was ofrelevance to the most fundamental aspects of an organism'sbiology:ªWhat is the fundamental explanation for interspeci®cvariation in coding strategy? Are we faced with asituation of continuous variation within and betweenspecies, thus embracing a Darwinian perspective ofgradual separation of populations to form new species¼? This is the heart of the problem of molecularevolution.ºGrantham and his colleagues further pointed to the needto determine `how much independence exists between the
D.R. Forsdyke, J.R. Mortimer / Gene 261 (2000) 127±137 133two levels of evolution' (that of the genome phenotype andof the classical phenotypic) and considered `it is too easyjust to say most mutations are neutral'. However, non-adaptive`neutralist' explanations gained much support (Filipski,1990; Sueoka, 1995). Paul Sharp and his colleaguesconcluded (Sharp et al., 1993) that the main factors in¯uencingcodon choice are mutational biases and the need forhighly expressed genes to be ef®ciently translated. Althoughphenotypic adaptive factors such as the need to translate anabundant mRNA ef®ciently can in¯uence codon choice,genomic factors, identi®ed here as stem-loop potential(`fold pressure') and (C 1 G)%, play an important andoften dominant role.7. Thermophilic bacteriaThe secondary structure of nucleic acids with a high(C 1 G)% is more stable than that of nucleic acids with alow (C 1 G)%. GC bonds are associated with a more stablenucleic acid structure than AT or AU bonds. This isre¯ected in the base composition of RNAs whose structureis vital for their function, namely rRNAs and tRNAs. Free ofcoding constraints, yet required to form part of the precisestructure of ribosomes, rRNAs might more readily acceptmutations which increase GC content than do mRNAs.Indeed, the GC content of rRNAs is directly proportionalto the normal growth temperature, so that rRNAs of thermophilicbacteria are highly enriched in G and C (Dalgaard andGarrett, 1993; Forterre and Elie, 1993; Galtier and Lobry,1997). However, although optimum growth temperaturecorrelates positively with the GC content of rRNA, it doesnot correlate similarly with the GC content of genomicDNA, and hence with that of the mRNA populations transcribedfrom that DNA.The ®nding of no consistent trend towards a high genomicGC in thermophilic organisms has been interpreted assupporting the neutralist argument that variations in genomicGC are the consequences of mutational biases and are,in themselves, of no adaptive value (Filipski, 1990; Galtierand Lobry, 1997). However, the ®nding is also consistentwith the argument that genomic GC is too important merelyto follow the dictates of temperature, since its primary roleis related to other more fundamental adaptations (Bernardiand Bernardi, 1986).Galtier and Lobry (1997) have argued that `any secondarystructure that must endure high temperatures requires a highG 1 C content'. This would include both the classicalWatson-Crick secondary structure involving inter-strandbase pairing, and any secondary structure involving intrastrandbase pairing (Murchie et al., 1992). However, thestability of genomic DNA at high temperatures might beachieved in ways other than by an increase in GC content(Bernardi, 2000). These include association with polyamines(Oshima et al., 1990), and relaxation of supercoiling(Friedman et al., 1995). There is no reason to believe that inthermophiles DNA is not able to maintain both its classicalduplex structure with H-bonding between opposite strands,and any secondary structures involving intrastrand H-bonding.As we propose (see Section 9), the latter structureswould be critical only under certain clearly de®ned, butselectively very important, circumstances, namely whenrecombination repair is required. The most enduring DNAsecondary structure, even at high temperatures, would be theclassical duplex form.8. The `holy grail' of Romanes and BatesonWith hindsight it seems that, in identifying (C 1 G)% asthe species variant component of the base composition,Chargaff had uncovered what we might now recognize asthe `holy grail' of speciation ®rst postulated in 1886 byCharles Darwin's research associate, George Romanes(Forsdyke, 1999a,b). Romanes had pointed to what wewould now call non-genic variations in the germ-line,which would tend to isolate an individual reproductivelyfrom other members of its species, but not from membersthat had undergone the same variation. William Batesonfurther postulated a non-genic inherited variation, whichwould remain constant for a species, whereas genic variationscould occur within a species. The non-genic variations,in whatever was responsible for carrying hereditary informationfrom generation to generation (not known at thattime), would have the potential to lead to species differentiation,so that variant members of a species (`not-self')would not successfully reproduce with members of themain species (`self'). The latter would constitute the `reproductiveenvironment' moulding the genome phenotype(reprotype).Once reproductive isolation was achieved, the naturalselection postulated by Darwin would be able to furtherincrease species differentiation by allowing the survival oforganisms with advantageous genic variations, and disallowingthe survival of organisms with disadvantageousgenic variations. These genic variations would affect theclassical phenotype. Romanes referred to his holy grail(speciating factor) as an `intrinsic peculiarity' of the reproductivesystem. Bateson described his holy grail as aspeciating factor uniformly attached to the same `residue'as the genes, but distinct from the genes. These are just theproperties we ®nd in the (C 1 G)% (Forsdyke, 1996b,1998).A metaphor for the role (C 1 G)% might play in keepingindividuals reproductively isolated from each other isprovided by the word `dialect' (Grantham et al., 1986). Acommon language brings people together, and in this way isconducive to sexual reproduction. But languages can vary,®rst into dialects and then into independent sub-languages.Linguistic differences keep people apart, and this differencein the reproductive environment militates against sexualreproduction.