21.07.2013 Views

The Chloroplast Genomes of the Green Algae Pyramimonas ...

The Chloroplast Genomes of the Green Algae Pyramimonas ...

The Chloroplast Genomes of the Green Algae Pyramimonas ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>The</strong> <strong>Chloroplast</strong> <strong>Genomes</strong> <strong>of</strong> <strong>the</strong> <strong>Green</strong> <strong>Algae</strong> <strong>Pyramimonas</strong>,<br />

Monomastix and Pycnococcus Shed New light on <strong>the</strong> Evolutionary<br />

History <strong>of</strong> Prasinophytes and <strong>the</strong> Origin <strong>of</strong> <strong>the</strong> Secondary<br />

<strong>Chloroplast</strong>s <strong>of</strong> Euglenids<br />

Research Article<br />

MBE Advance Access published December 12, 2008<br />

Monique Turmel,* Marie-Christine Gagnon,* Charley J. O’Kelly,† Christian<br />

Otis,* and Claude Lemieux*<br />

*Département de biochimie et de microbiologie, Université Laval, Québec<br />

(Québec) Canada; and †Botany Department, University <strong>of</strong> Hawaii, Honolulu<br />

Running Head: Analysis <strong>of</strong> 3 Prasinophyte <strong>Chloroplast</strong> <strong>Genomes</strong><br />

Key Words: prasinophyte green algae, euglenids, chloroplast genome evolution,<br />

phylogenomics, secondary endosymbiosis, genome reduction,<br />

horizontal DNA transfers<br />

Abbreviations: cpDNA – chloroplast DNA, IR – inverted repeat, LSC – large<br />

single copy region, LSU – large subunit, ML – maximum<br />

likelihood, ORF – open reading frame, PCR – Polymerase chain<br />

reaction SC – single copy region, SSC – small single copy region,<br />

SSU – small subunit<br />

Corresponding Author: Monique Turmel, Département de biochimie et microbiologie,<br />

Université Laval, 1030 avenue de la Médecine, Québec (Québec)<br />

Canada G1V 0A6.<br />

Tel: (418) 656-2131 ext. 7623;<br />

FAX: (418) 656-7176;<br />

Email: monique.turmel@rsvs.ulaval.ca<br />

© <strong>The</strong> Author 2008. Published by Oxford University Press on behalf <strong>of</strong> <strong>the</strong> Society for Molecular Biology<br />

and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Abstract<br />

Because <strong>the</strong>y represent <strong>the</strong> earliest divergences <strong>of</strong> <strong>the</strong> Chlorophyta and include <strong>the</strong> smallest<br />

known eukaryotes (e.g. <strong>the</strong> coccoid Ostreococcus), <strong>the</strong> morphologically diverse unicellular green<br />

algae making up <strong>the</strong> Prasinophyceae are central to our understanding <strong>of</strong> <strong>the</strong> evolutionary patterns<br />

that accompanied <strong>the</strong> radiation <strong>of</strong> chlorophytes and <strong>the</strong> reduction <strong>of</strong> cell size in some lineages.<br />

Seven prasinophyte lineages, 4 <strong>of</strong> which exhibit a coccoid cell organization (no flagella nor<br />

scales), were uncovered from analysis <strong>of</strong> nuclear-encoded 18S rDNA data; however <strong>the</strong>ir order<br />

<strong>of</strong> divergence remains unknown. In this study, <strong>the</strong> chloroplast genome sequences <strong>of</strong> <strong>the</strong> scaly<br />

quadriflagellate <strong>Pyramimonas</strong> parkeae (clade I), <strong>the</strong> coccoid Pycnococcus pravosoli (clade V)<br />

and <strong>the</strong> scaly uniflagellate Monomastix (unknown affiliation) were determined, annotated and<br />

compared to those previously reported for green algae/land plants, including 2 prasinophytes<br />

(Nephroselmis olivacea, clade III and Ostreococcus tauri, clade II). <strong>The</strong> chlororachniophyte<br />

Bigelowiella natans and <strong>the</strong> euglenid Euglena gracilis, whose chloroplasts originate presumably<br />

from distinct green algal endosymbionts, were also included in our comparisons. <strong>The</strong> 3 newly<br />

sequenced prasinophyte genomes differ considerably from one ano<strong>the</strong>r and from <strong>the</strong>ir homologs<br />

in overall structure, gene content and gene order, with <strong>the</strong> 80,211-bp Pycnococcus and 114,528-<br />

bp Monomastix genomes (98 and 94 conserved genes, respectively) resembling <strong>the</strong> 71,666-bp<br />

Ostreococcus genome (88 genes) in featuring a significantly reduced gene content. <strong>The</strong> 101,605-<br />

bp <strong>Pyramimonas</strong> genome (110 genes) features 2 conserved genes (rpl22 and ycf65) and ancestral<br />

gene linkages previously unrecognized in chlorophytes as well as a DNA primase gene<br />

putatively acquired from a virus. <strong>The</strong> <strong>Pyramimonas</strong> and Euglena cpDNAs revealed uniquely<br />

shared derived gene clusters. Besides providing unequivocal evidence that <strong>the</strong> green algal<br />

ancestor <strong>of</strong> <strong>the</strong> euglenid chloroplasts belonged to <strong>the</strong> Pyramimonadales, phylogenetic analyses <strong>of</strong><br />

2<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


concatenated chloroplast genes and proteins elucidated <strong>the</strong> position <strong>of</strong> Monomastix and showed<br />

that <strong>the</strong> Mamiellales, a clade comprising Ostreococcus and Monomastix, are sister to <strong>the</strong><br />

Pyramimonadales + Euglena clade. Our results also revealed that major reduction in gene<br />

content and restructuring <strong>of</strong> <strong>the</strong> chloroplast genome occurred in conjunction with important<br />

changes in cell organization in at least 2 independent prasinophyte lineages, <strong>the</strong> Mamiellales and<br />

<strong>the</strong> Pycnococcaceae.<br />

Introduction<br />

<strong>The</strong> green plants (Viridiplantae) are divided among two major lineages: <strong>the</strong> Chlorophyta,<br />

containing <strong>the</strong> bulk <strong>of</strong> <strong>the</strong> extant green algae, and <strong>the</strong> Streptophyta, containing <strong>the</strong> green algae<br />

belonging to <strong>the</strong> Charophyceae sensu Mattox and Stewart (1984) and all land plants (Lewis and<br />

McCourt 2004). It is thought that <strong>the</strong> first green plants were unicellular green algae bearing<br />

nonmineralized organic scales on <strong>the</strong>ir cell body and/or <strong>the</strong>ir flagella (Mattox and Stewart 1984).<br />

This hypo<strong>the</strong>sis was put forward when it was recognized that flagellated reproductive cells<br />

(zoospores, gametes) <strong>of</strong> some taxa in both <strong>the</strong> Chlorophyta and Streptophyta are covered by a<br />

layer <strong>of</strong> square-shaped scales, which also occur as an underlayer in many prasinophytes. Free-<br />

living scaly flagellates have been ascribed mainly to <strong>the</strong> Prasinophyceae, a nonmonophyletic<br />

class representing <strong>the</strong> earliest divergences <strong>of</strong> <strong>the</strong> Chlorophyta (Steinkotter et al. 1994; Nakayama<br />

et al. 1998; Fawley et al. 2000; Guillou et al. 2004; Proschold and Leliaert 2007). This<br />

morphologically heterogeneous assemblage <strong>of</strong> green algae gave rise to <strong>the</strong> 3 advanced classes<br />

designated as <strong>the</strong> Trebouxiophyceae, Ulvophyceae and Chlorophyceae (Lewis and McCourt<br />

2004). Note that <strong>the</strong> scaly biflagellate Mesostigma viride, traditionally classified within <strong>the</strong><br />

Prasinophyceae, has been formally excluded from this class and placed in <strong>the</strong> Streptophyta<br />

3<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


(Marin and Melkonian 1999; Lemieux et al. 2007; Rodriguez-Ezpeleta et al. 2007).<br />

Prasinophytes have always fascinated <strong>the</strong> phycologists because <strong>the</strong>ir studies have <strong>the</strong> potential to<br />

shed light on <strong>the</strong> nature <strong>of</strong> <strong>the</strong> last common ancestor <strong>of</strong> all green plants and on <strong>the</strong> origin <strong>of</strong> <strong>the</strong><br />

advanced chlorophytes.<br />

<strong>The</strong> concept <strong>of</strong> <strong>the</strong> class Prasinophyceae has been under constant revision since its formal<br />

description by Moestrup and Throndsen (1988) (Sym and Pienaar 1993); in <strong>the</strong> last few years, it<br />

has pr<strong>of</strong>oundly changed with <strong>the</strong> description <strong>of</strong> several new taxa and <strong>the</strong> analysis <strong>of</strong><br />

environmental sequences. Most prasinophytes are found in marine habitats and considerable<br />

diversity is observed with respect to cell shape and size, flagella number and behavior, mitotic<br />

and cytokinetic mechanisms, and biochemical features such as accessory photosyn<strong>the</strong>tic<br />

pigments and storage products (Melkonian 1990; O'Kelly 1992; Sym and Pienaar 1993; Latasa et<br />

al. 2004). Some species lack flagella, o<strong>the</strong>rs lack scales, and in some cases, both flagella and<br />

scales are absent (e.g. Ostreococccus tauri). <strong>The</strong> small-sized members <strong>of</strong> <strong>the</strong> Prasinophyceae,<br />

particularly those belonging to 3 genera <strong>of</strong> <strong>the</strong> Mamiellales (Micromonas, Bathycoccus, and<br />

Ostreococcus), are prominent in <strong>the</strong> oceanic picoplankton (comprising organisms less than 3 µm<br />

in diameter) (Guillou et al. 2004). Included in this category is <strong>the</strong> smallest-free living eukaryote<br />

known to date, Ostreococcus tauri (Courties et al. 1994). Phylogenetic studies using molecular<br />

data, in particular <strong>the</strong> nuclear-encoded small subunit (SSU) rRNA gene, identified 7<br />

monophyletic groups <strong>of</strong> prasinophytes at <strong>the</strong> base <strong>of</strong> <strong>the</strong> Chlorophyta (Steinkotter et al. 1994;<br />

Nakayama et al. 1998; Fawley et al. 2000; Guillou et al. 2004); however, <strong>the</strong>ir order <strong>of</strong><br />

divergence could not be resolved. Despite this uncertainty, it appears that <strong>the</strong> coccoid form<br />

evolved more than once in <strong>the</strong> Prasinophyceae (Fawley et al. 2000; Guillou et al. 2004). Coccoid<br />

cells are distributed among 4 lineages (clade II, Mamiellales; clade V, Pseudocourfieldiales,<br />

4<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Pycnococcaceae; clade VI, Prasinococcales; and clade VII, no order assigned to this clade), two<br />

<strong>of</strong> which (clades II and V) exhibit both <strong>the</strong> coccoid and flagellated cell organizations.<br />

Comparative analysis <strong>of</strong> chloroplast genomes has been helpful to resolve problematic<br />

relationships among green algae and land plants (Wolf et al. 2005; Qiu et al. 2006; Jansen et al.<br />

2007; Lemieux et al. 2007; Turmel et al. 2008) although <strong>the</strong> phylogenetic positions <strong>of</strong> some<br />

green plant lineages have remained contentious (Pombert et al. 2005; Turmel et al. 2006;<br />

Lemieux et al. 2007). <strong>The</strong> only 2 complete chloroplast DNA (cpDNA) sequences currently<br />

available for prasinophytes, those <strong>of</strong> <strong>the</strong> scaly biflagellate Nephroselmis olivacea (clade III,<br />

Pseudocourfieldiales, Nephroselmidaceae) (Turmel et al. 1999b) and <strong>of</strong> <strong>the</strong> tiny coccoid<br />

Ostreococcus tauri (clade II, Mamiellales) (Robbens et al. 2007), have revealed contrasting<br />

evolutionary patterns which can be designated as ancestral and reduced-derived, respectively.<br />

Whereas <strong>the</strong> 200.8-kb Nephroselmis genome harbors <strong>the</strong> largest gene repertoire yet reported for<br />

a chlorophyte (128 different conserved genes compared to about 138 genes for <strong>the</strong> deepest<br />

branching streptophyte algae) and has retained many ancestral gene clusters, <strong>the</strong> nearly 3-fold<br />

smaller Ostreococcus genome, which is <strong>the</strong> most compact chlorophyte cpDNA known to date,<br />

displays a reduced set <strong>of</strong> 88 genes whose order is highly scrambled. As in most o<strong>the</strong>r chloroplast<br />

genomes, 2 identical copies <strong>of</strong> a large inverted repeat (IR) are separated by single-copy (SC)<br />

regions; however, <strong>the</strong> 2 prasinophyte genomes differ remarkably in <strong>the</strong>ir quadripartite<br />

architectures. <strong>The</strong> Nephroselmis architectural design closely resembles that found in all<br />

streptophyte IR-containing cpDNAs: <strong>the</strong> SC regions are vastly unequal in size, each SC region is<br />

characterized by a highly conserved set <strong>of</strong> genes, and <strong>the</strong> rRNA operon encoded by <strong>the</strong> IR is<br />

transcribed toward <strong>the</strong> small SC (SSC) region. In Ostreococcus, <strong>the</strong> SC regions have essentially<br />

<strong>the</strong> same number <strong>of</strong> genes; <strong>the</strong> few genes (just 5) that would be expected to map to <strong>the</strong> SSC<br />

5<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


egion in streptophyte cpDNAs are confined to <strong>the</strong> same SC region, and <strong>the</strong> rRNA operon is<br />

transcribed away from <strong>the</strong> latter SC region (see supplementary fig. 1, Supplementary Material).<br />

This gene partitioning pattern is reminiscent <strong>of</strong> that reported for <strong>the</strong> cpDNAs <strong>of</strong> <strong>the</strong> ulvophytes<br />

Pseudendoclonium akinetum and Oltmannsiellopsis viridis (Pombert et al. 2005; Pombert et al.<br />

2006).<br />

To explore <strong>the</strong> relationships among prasinophyte lineages and to better understand <strong>the</strong><br />

mode <strong>of</strong> cpDNA evolution in <strong>the</strong> Prasinophyceae, we sequenced <strong>the</strong> cpDNAs <strong>of</strong> <strong>the</strong> scaly<br />

quadriflagellate <strong>Pyramimonas</strong> parkeae (clade I, Pyramimonadales), <strong>the</strong> coccoid Pycnococcus<br />

pravosoli (clade V, Pseudocourfieldiales, Pycnococcaceae), and <strong>the</strong> scaly uniflagellate<br />

Monomastix (unknown affiliation) and compared <strong>the</strong>se genomes to those previously reported for<br />

Nephroselmis (Turmel et al. 1999b), Ostreococcus (Robbens et al. 2007), o<strong>the</strong>r chlorophytes<br />

(Wakasugi et al. 1997; Maul et al. 2002; Pombert et al. 2005; Bélanger et al. 2006; de Cambiaire<br />

et al. 2006; Pombert et al. 2006; de Cambiaire et al. 2007; Brouard et al. 2008), <strong>the</strong> deep-<br />

branching streptophytes Mesostigma (Lemieux et al. 2000) and Chlorokybus atmophyticus<br />

(Lemieux et al. 2007), <strong>the</strong> euglenid Euglena gracilis (Hallick et al. 1993) and <strong>the</strong><br />

chlororachniophyte Bigelowiella natans (Rogers et al. 2007). <strong>The</strong> latter photosyn<strong>the</strong>tic<br />

eukaryotes, which presumably gained <strong>the</strong>ir chloroplasts via independent secondary<br />

endosymbiotic events (Rogers et al. 2007), were included in our comparisons in an attempt to<br />

gain more detailed information about <strong>the</strong> green algal donors <strong>of</strong> <strong>the</strong>ir chloroplasts. We found that<br />

<strong>the</strong> 3 newly sequenced prasinophyte genomes differ considerably from one ano<strong>the</strong>r and from<br />

<strong>the</strong>ir previously sequenced homologs at <strong>the</strong> overall structure, gene content and gene order levels,<br />

with both <strong>the</strong> Monomastix and Pycnococcus genomes featuring a reduced pattern <strong>of</strong> evolution.<br />

Our phylogenetic analyses <strong>of</strong> sequence data <strong>of</strong>fered significant insights into <strong>the</strong> phylogeny and<br />

6<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


evolution <strong>of</strong> prasinophytes and provided unequivocal evidence that <strong>the</strong> euglenid chloroplasts<br />

were secondarily acquired from a member <strong>of</strong> <strong>the</strong> Pyramimonadales.<br />

Materials and Methods<br />

Strains and Culture Conditions<br />

<strong>Pyramimonas</strong> parkeae (CCMP 726) and Pycnococcus provasolii (CCMP 1203), two<br />

marine species, were obtained from <strong>the</strong> Provasoli-Guillard National Center for Culture <strong>of</strong> Marine<br />

Phytoplankton (West Boothbay Harbor, Maine) and grown in K medium (Keller et al. 1987)<br />

under 12 h light/dark cycles. Monomastix sp., a freshwater strain originally collected by H. R.<br />

Preisig in New Zealand, originates from <strong>the</strong> personal collection <strong>of</strong> CJO. This strain, which is<br />

available upon request to MT, was grown in modified Volvox medium (McCracken et al. 1980)<br />

under 12 h light/dark cycles.<br />

Cloning and Sequencing <strong>of</strong> <strong>Chloroplast</strong> <strong>Genomes</strong><br />

<strong>The</strong> complete cpDNA sequences <strong>of</strong> <strong>Pyramimonas</strong>, Monomastix and Pycnococcus were<br />

generated essentially as described previously (Turmel et al. 2005). For each green alga, A + T-<br />

rich organelle DNA was separated from nuclear DNA by CsCl-bisbenzimide isopycnic<br />

centrifugation <strong>of</strong> total cellular DNA (Turmel et al. 1999a). <strong>The</strong> organelle DNA fraction was<br />

sheared by nebulization to produce 1500 to 3000-bp fragments that were subsequently cloned<br />

into a plasmid vector, ei<strong>the</strong>r pBluescrit II KS+ or pSMART-HCKan (Lucigen Corporation,<br />

Middleton, WI). After hybridization <strong>of</strong> <strong>the</strong> resulting clones with <strong>the</strong> original DNA used for<br />

cloning, plasmids from positive clones were purified with <strong>the</strong> QIAprep 96 Miniprep kit (Qiagen<br />

Inc., Mississauga, Canada) and sequenced using universal primers. DNA assembly was carried<br />

7<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


out using AUTOASSEMBLER 2.1.1 (Applied BioSystems, Foster City, CA) or SEQUENCHER<br />

4.2 (Gene Codes Corporation, Ann Arbor, MI). Distinct contigs <strong>of</strong> cpDNA origin were ordered<br />

by polymerase chain reaction (PCR) amplification with primers specific to contig ends. <strong>The</strong><br />

amplified fragments encompassing uncloned regions were sequenced on both strands.<br />

<strong>Chloroplast</strong> Genome Analyses<br />

Genes and all open reading frames (ORFs) larger than 100 codons were identified as<br />

described previously (Turmel et al. 2006). Secondary structures <strong>of</strong> group I and group II introns<br />

were modeled according to Michel et al. (1989) and Michel and Westh<strong>of</strong> (1990), respectively.<br />

Short repeats in <strong>the</strong> Monomastix genome were identified using REPuter 2.74 (Kurtz et al. 2001)<br />

and <strong>the</strong> number <strong>of</strong> copies <strong>of</strong> each repeat was determined with FINDPATTERNS <strong>of</strong> <strong>the</strong> GCG<br />

package (Accelrys, San Diego, CA). For all 3 newly sequenced prasinophyte genomes, regions<br />

containing nonoverlapping repeated elements were mapped with RepeatMasker<br />

(http://www.repeatmasker.org/) running under <strong>the</strong> WU-BLAST 2.0 search engine<br />

(http://blast.wustl.edu/), using <strong>the</strong> repeats ≥ 30 bp identified with REPuter as input sequences.<br />

Conserved gene clusters exhibiting identical gene polarities in selected green algal cpDNAs were<br />

identified using a custom-built program.<br />

Sequencing <strong>of</strong> <strong>the</strong> Monomastix 18S rRNA Gene and Phylogenetic Analysis<br />

<strong>The</strong> nuclear-encoded SSU rRNA gene was amplified from total cellular DNA by PCR<br />

using <strong>the</strong> specific primers NS1 (White et al. 1990) and 18L (Hamby and Zimmer 1991). <strong>The</strong><br />

resulting PCR product was purified and sequenced directly using <strong>the</strong>se primers and two internal<br />

primers. <strong>The</strong> Monomastix nuclear-encoded SSU rDNA sequence was aligned manually against<br />

8<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


<strong>the</strong> alignment prepared by Guillou et al. (2004) from 83 chlorophytes and 12 streptophytes. A<br />

data set <strong>of</strong> 1663 positions was obtained after removing ambiguously aligned regions using<br />

GBLOCKS 0.91b (Castresana 2000) and <strong>the</strong> same filtration parameters employed by Guillou et<br />

al. (2004). Maximum likelihood (ML) trees were inferred using Treefinder (version <strong>of</strong> April<br />

2008) (Jobb et al. 2004) with <strong>the</strong> best model fitting <strong>the</strong> data [TN + I (proportion <strong>of</strong> invariable<br />

sites) + Γ (4 discrete rate categories)] under <strong>the</strong> Akaike information criterion. Bootstrap values<br />

were calculated for 100 replications.<br />

Phylogenetic Inferences from Whole Genome Sequence Data<br />

An amino acid data set and <strong>the</strong> corresponding nucleotide data set with first and second<br />

codon positions were derived from <strong>the</strong> completely sequenced cpDNAs <strong>of</strong> Bigelowiella<br />

(NC_008408), Euglena (NC_001603), and 22 green plants [species names and accession<br />

numbers, except those for Oedogonium cardiacum (NC_011031) and Leptosira terrestris<br />

(NC_009681), are provided in table 3 <strong>of</strong> Lemieux et al. (2007)]. <strong>The</strong>se data sets were allowed to<br />

contain missing data; however, limitations were imposed to <strong>the</strong> proportion <strong>of</strong> missing data by<br />

selecting for analysis <strong>the</strong> protein-coding genes that are shared by at least 14 taxa. Seventy genes<br />

met this criterion: atpA, B, E, F, H, I, ccsA, cemA, chlB, I, L, N, clpP, ftsH, infA, petA, B, D, G, L,<br />

psaA, B, C, I, J, M, psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z, rbcL, rpl2, 5, 14, 16, 20, 23, 32,<br />

36, rpoA, B, C1, C2, rps2, 3, 4, 7, 8, 9, 11, 12, 14, 18, 19, tufA, ycf1, 3, 4, 12. <strong>The</strong> amino acid<br />

data set was prepared as follows. <strong>The</strong> deduced amino acid sequences from <strong>the</strong> 70 individual<br />

genes were aligned using MUSCLE 3.7 (Edgar 2004), <strong>the</strong> ambiguously aligned regions in each<br />

alignment were removed using GBLOCKS 0.91b (Castresana 2000) with <strong>the</strong> –b2 option<br />

(minimal number <strong>of</strong> sequences for a flank position) set to 13, and <strong>the</strong> protein alignments were<br />

9<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


concatenated. To obtain <strong>the</strong> nucleotide data set, <strong>the</strong> multiple sequence alignment <strong>of</strong> each protein<br />

was converted into a codon alignment, <strong>the</strong> poorly aligned and divergent regions in each codon<br />

alignment were excluded using GBLOCKS 0.91b with <strong>the</strong> options –b2=13 and –t=c (<strong>the</strong> latter<br />

specifying that selected sequences are complete codons), <strong>the</strong> individual codon alignments were<br />

concatenated, and finally third codon positions were excluded with PAUP* 4.0b10 (Sw<strong>of</strong>ford<br />

2003). Missing characters represented 5.9% and 5.8% <strong>of</strong> <strong>the</strong> amino acid and nucleotide data sets,<br />

respectively.<br />

Treefinder (version <strong>of</strong> April 2008) was used to perform <strong>the</strong> ML analyses and to identify <strong>the</strong><br />

best model fitting <strong>the</strong> data under <strong>the</strong> Akaike information criterion. <strong>The</strong> amino acid data set was<br />

analyzed using <strong>the</strong> cpREV + F (observed amino acid frequencies) + Γ (5 categories) model <strong>of</strong><br />

sequence evolution. Trees were inferred from <strong>the</strong> nucleotide data set using <strong>the</strong> GTR + Γ (5<br />

categories) model. Confidence <strong>of</strong> branch points was estimated by 500 bootstrap replications.<br />

<strong>The</strong> Bayesian inference method was conducted using MrBayes 3.1.2 (Ronquist and<br />

Huelsenbeck 2003). <strong>The</strong> model selected was cpREV + F + Γ for <strong>the</strong> inference from <strong>the</strong> amino<br />

acid data set and GTR + Γ for <strong>the</strong> inference <strong>of</strong> <strong>the</strong> nucleotide data set. Rates across sites were<br />

modeled on a discrete gamma distribution with 5 categories. Two independent Markov chain<br />

Monte Carlo runs, each consisting <strong>of</strong> 3 heated chains in addition to <strong>the</strong> cold chain, were carried<br />

out using <strong>the</strong> default parameters. For <strong>the</strong> analysis <strong>of</strong> <strong>the</strong> nucleotide data set, <strong>the</strong> length <strong>of</strong> each<br />

run was 3 million generations after a burn-in phase <strong>of</strong> 500,000 generations; for <strong>the</strong> amino acid<br />

data set, it was 1 million generations after a burn-in phase <strong>of</strong> 150,000 generations. Trees were<br />

sampled every 100 generations. Convergence <strong>of</strong> <strong>the</strong> 2 independent runs was verified according to<br />

<strong>the</strong> output <strong>of</strong> <strong>the</strong> ‘sump’ command; this output was also used to determine <strong>the</strong> burn-in phase.<br />

Posterior probability values were estimated from <strong>the</strong> trees sampled from both runs using <strong>the</strong><br />

10<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


‘sumt’ command.<br />

Reconstruction <strong>of</strong> Ancestral Character States<br />

A data set <strong>of</strong> gene content was prepared from <strong>the</strong> chloroplast genomes <strong>of</strong> <strong>the</strong> streptophytes<br />

Mesostigma and Chlorokybus, <strong>the</strong> prasinophytes, and Euglena by coding <strong>the</strong> presence and<br />

absence <strong>of</strong> genes as binary characters. Gene order in each <strong>of</strong> <strong>the</strong>se chloroplast genomes was<br />

converted to all possible pairs <strong>of</strong> signed genes (i.e., taking into account gene polarity) and a gene<br />

order data set was obtained by coding as binary characters <strong>the</strong> presence/absence <strong>of</strong> <strong>the</strong> ancestral<br />

gene pairs conserved in at least one streptophyte and one prasinophyte. <strong>The</strong> gene content and<br />

gene order data sets were merged to produce a data set <strong>of</strong> combined ancestral characters. Losses<br />

<strong>of</strong> <strong>the</strong>se characters on <strong>the</strong> best tree topology inferred from sequence data were mapped using<br />

MacClade 4.08 (Maddison and Maddison 2000). <strong>The</strong> most parsimonious reconstructions <strong>of</strong><br />

ancestral character states were inferred under <strong>the</strong> Dollo principle <strong>of</strong> parsimony (Farris 1977).<br />

Results and Discussion<br />

<strong>Pyramimonas</strong> cpDNA Features an Ancestral Quadripartite Structure and a Large Repertoire <strong>of</strong><br />

Genes<br />

Of <strong>the</strong> 3 newly sequenced prasinophyte genomes, only that <strong>of</strong> <strong>Pyramimonas</strong> displays a<br />

large IR (table 1). At 101,605 bp, this genome is 2-fold smaller than its Nephroselmis homolog, a<br />

size difference attributable to a much shorter IR, gene losses, and a more compact gene<br />

organization. As shown in fig. 1, <strong>the</strong> 2 copies <strong>of</strong> <strong>the</strong> IR sequence, each 13,057 bp in size and<br />

encoding 11 genes, are separated by SC regions <strong>of</strong> 10,338 and 65,153 bp comprising 12 and 76<br />

genes, respectively. On this figure are color-coded <strong>the</strong> genes whose orthologs are usually found<br />

11<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


within <strong>the</strong> IR, <strong>the</strong> SSC and large SC (LSC) regions in streptophyte cpDNAs. It can be seen that<br />

<strong>the</strong> pattern <strong>of</strong> gene partitioning among <strong>the</strong> SC regions <strong>of</strong> <strong>the</strong> <strong>Pyramimonas</strong> genome closely<br />

resembles that observed for streptophytes. Considering that <strong>the</strong> <strong>Pyramimonas</strong> IR is about 2-fold<br />

larger and encodes additional genes relative to that <strong>of</strong> Mesostigma and that <strong>the</strong> IR is known to<br />

contract and expand through gene conversion events (Goulding et al. 1996), <strong>the</strong> observation that<br />

<strong>the</strong> termini <strong>of</strong> <strong>the</strong> <strong>Pyramimonas</strong> IR contain genes characteristic <strong>of</strong> <strong>the</strong> adjacent SC regions is not<br />

surprising. <strong>The</strong> most important deviation from <strong>the</strong> highly conserved partitioning pattern<br />

displayed by streptophytes concerns <strong>the</strong> locations <strong>of</strong> chlL and chlN. <strong>The</strong>se 2 genes, which would<br />

be expected to be present in <strong>the</strong> SSC region, lie within <strong>the</strong> IR near <strong>the</strong> LSC region.<br />

<strong>The</strong> <strong>Pyramimonas</strong> chloroplast genome encodes 110 conserved genes, i.e. genes found in<br />

several o<strong>the</strong>r cpDNAs and usually present in cyanobacteria. <strong>The</strong> products <strong>of</strong> <strong>the</strong>se genes consist<br />

<strong>of</strong> 81 proteins and 29 RNA species (2 rRNAs and 27 tRNAs) (table 2). <strong>The</strong> set <strong>of</strong> 27 tRNAs is<br />

sufficient to decode all 61 sense codons provided that <strong>the</strong> tRNA species encoded by trnV(uac),<br />

trnA(ugc), trnT(ugu), trnS(uga), trnL(uag) and trnP(ugg) recognize all 4 members <strong>of</strong> <strong>the</strong>ir<br />

respective codon family through superwobble pairing between <strong>the</strong> first position <strong>of</strong> <strong>the</strong> anticodon<br />

and <strong>the</strong> third position <strong>of</strong> <strong>the</strong> codon (Rogalski et al. 2008). <strong>The</strong> size <strong>of</strong> <strong>the</strong> <strong>Pyramimonas</strong><br />

chloroplast gene complement closely matches those observed for <strong>the</strong> trebouxiophytes Chlorella<br />

vulgaris and Leptosira and for <strong>the</strong> ulvophytes Pseudendoclonium and Oltmansiellopsis (de<br />

Cambiaire et al. 2007). Although it is significantly reduced compared to its Nephroselmis<br />

counterpart (table 2), <strong>the</strong> set <strong>of</strong> <strong>Pyramimonas</strong> chloroplast genes includes 6 ndh genes (ndhA and<br />

ndhD through ndhH) typically present in streptophytes but previously found only in<br />

Nephroselmis in <strong>the</strong> Chlorophyta, as well as 2 protein-coding genes reported here for <strong>the</strong> first<br />

time in a chlorophyte chloroplast genome, rpl22 and ycf65 (supplementary table 1,<br />

12<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Supplementary Material). <strong>The</strong> ycf65 gene is present in both Mesostigma and Chlorokybus but<br />

missing in <strong>the</strong> o<strong>the</strong>r investigated streptophytes, whereas rpl22 shows a widespread distribution in<br />

<strong>the</strong> Streptophyta and also resides in <strong>the</strong> Euglena chloroplasts. Perhaps not surprisingly, most <strong>of</strong><br />

<strong>the</strong> 22 chloroplast genes present in Nephroselmis but absent in <strong>Pyramimonas</strong> are also missing<br />

from some chlorophytes belonging to <strong>the</strong> Trebouxiophyceae, Ulvophyceae or Chlorophyceae<br />

(supplementary table 1, Supplementary Material). Only 5 genes (cemA, petD, petL, psbM, and<br />

rrf) represent exceptions and interestingly, all 5, except rrf (<strong>the</strong> 5S rRNA gene), are also lacking<br />

in <strong>the</strong> Ostreococcus and Euglena chloroplasts. <strong>The</strong> analysis <strong>of</strong> <strong>the</strong> nuclear genome from both<br />

Ostreococcus tauri and Ostreococcus lucimarinus revealed that cemA, petD and psbM have been<br />

transferred to <strong>the</strong> nucleus (Derelle et al. 2006; Palenik et al. 2007; Robbens et al. 2007).<br />

Considering that <strong>the</strong>se genes are essential for chloroplast function, <strong>the</strong>y are also likely to be<br />

nuclear-encoded in <strong>Pyramimonas</strong>. Since no case <strong>of</strong> chloroplast to nucleus transfer has been<br />

documented for rrf, <strong>the</strong> possibility exists that this conserved gene is present in <strong>Pyramimonas</strong><br />

cpDNA and that its sequence has diverged beyond recognition.<br />

We found 2 large ORFs that are not associated with any introns, orf454 and orf510. For <strong>the</strong><br />

orf510, present in <strong>the</strong> LSC region near <strong>the</strong> IR, our Blast searches against <strong>the</strong> nonredundant<br />

protein sequence database <strong>of</strong> <strong>the</strong> National Center for Biotechnology Information failed to<br />

identify any putative function for <strong>the</strong> potential encoded protein. However, <strong>the</strong> product <strong>of</strong> <strong>the</strong><br />

orf454 localized in <strong>the</strong> IR revealed sequence similarity with <strong>the</strong> conserved domain <strong>of</strong> phage<br />

associated DNA primases (COG3378, E-value = 1e-06). Interestingly, in <strong>the</strong> course <strong>of</strong> <strong>the</strong><br />

present study, we have found that <strong>the</strong> orf389 in <strong>the</strong> Nephroselmis IR (Turmel et al. 1999b) also<br />

encodes a putative protein with <strong>the</strong> conserved domain <strong>of</strong> phage associated DNA primases<br />

(COG3378, E-value = 2e-12). Given that viruses have been observed in <strong>Pyramimonas</strong> (Moestrup<br />

13<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


and Thomsen 1974; Sandaa et al. 2001) and Nephroselmis (Nakayama et al. 2007), it is tempting<br />

to speculate that <strong>the</strong> abovementioned orf454 and orf389 originated from horizontal transfer <strong>of</strong><br />

viral genes. <strong>The</strong>re are only a few documented cases <strong>of</strong> non-standard, free-standing chloroplast<br />

genes that were acquired via horizontal gene transfer and all <strong>the</strong>se cases involve genes that<br />

participate in DNA recombination or replication (Khan et al. 2007; Brouard et al. 2008; Cattolico<br />

et al. 2008). Like <strong>the</strong> orf454 and orf389, <strong>the</strong> 2 horizontally transferred genes identified in <strong>the</strong><br />

chlorophycean green alga Oedogonium cardiacum are housed in <strong>the</strong> IR (Brouard et al. 2008).<br />

In general, <strong>the</strong> conserved genes present in <strong>Pyramimonas</strong> cpDNA are densely packed (table<br />

1). Prominent exceptions are those in <strong>the</strong> regions containing <strong>the</strong> orf454 and orf510 (fig. 1). <strong>The</strong>re<br />

are 2 cases <strong>of</strong> overlapping genes (psbC-psbD and ndhC-ndhK); for <strong>the</strong> remaining genes,<br />

intergenic spacers vary between 3 and 2517 bp, with an average size <strong>of</strong> 159 bp. Consistent with<br />

this high degree <strong>of</strong> compaction, only a few short repeats, mostly direct repeats, were identified<br />

(table 2); <strong>the</strong>y are found mainly in <strong>the</strong> large spacer adjacent to <strong>the</strong> orf501.<br />

Like its Ostreococcus and Pycnococcus homologs (see below), <strong>the</strong> <strong>Pyramimonas</strong> genome<br />

features a unique intron, a group II intron in atpB. However, <strong>the</strong> <strong>Pyramimonas</strong> atpB intron and<br />

those <strong>of</strong> Ostreococcus and Pycnococcus are inserted at different sites and carry distinct ORFs,<br />

indicating that <strong>the</strong>y arose from separate events <strong>of</strong> horizontal DNA transfer. It should be pointed<br />

out here that <strong>the</strong> currently available chloroplast genome data strongly support <strong>the</strong> notion that no<br />

introns were present in <strong>the</strong> chloroplast <strong>of</strong> <strong>the</strong> common ancestor <strong>of</strong> all green plants (Turmel et al.<br />

1999b; Lemieux et al. 2000; Lemieux et al. 2007). <strong>The</strong> orf608 <strong>of</strong> <strong>the</strong> <strong>Pyramimonas</strong> group IIA<br />

intron is located within domain IV <strong>of</strong> <strong>the</strong> intron secondary structure and carries <strong>the</strong> reverse<br />

transcriptase (cd01651) and maturase (pfam01348) domains, but not <strong>the</strong> endonuclease domain,<br />

<strong>of</strong> reverse transcriptases encoded by group II introns. <strong>The</strong> endonuclease domain, which carries<br />

14<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


out second-strand DNA cleavage during group II intron mobility (Lambowitz and Zimmerly<br />

2004), was most likely lost after <strong>the</strong> horizontal transfer <strong>of</strong> <strong>the</strong> intron in <strong>the</strong> <strong>Pyramimonas</strong><br />

chloroplast. <strong>The</strong> orf608 product shares strong sequence similarity with reverse transcriptases<br />

encoded by <strong>the</strong> genomes <strong>of</strong> firmicute bacteria and by <strong>the</strong> mitochondrial cox1 genes <strong>of</strong> fungi, <strong>the</strong><br />

brown alga Pylaiella littoralis, and <strong>the</strong> cryptophyte Rhodomonas salina.<br />

Like its Ostreococcus Homolog, Pycnococcus cpDNA has a Reduced Gene Content and is<br />

Highly Compact<br />

<strong>The</strong> Pycnococcus chloroplast genome is <strong>the</strong> smallest and most compact <strong>of</strong> <strong>the</strong> 3<br />

prasinophyte genomes sequenced during this study (table 1 and fig. 2). It is only 8.6 kb larger<br />

relative to Ostreococcus cpDNA and contains 10 additional conserved genes, for a total <strong>of</strong> 98<br />

genes. In term <strong>of</strong> size, this gene repertoire, which consists <strong>of</strong> 65 protein genes and 33 RNA genes<br />

encoding 2 rRNAs, 30 tRNAs and <strong>the</strong> RNA component <strong>of</strong> RNase P (table 2), is similar to that<br />

observed for chlorophycean green algae (Brouard et al. 2008). <strong>The</strong> tRNA complement includes 1<br />

tRNA species not previously documented in any chlorophytes [tRNA Pro (GGG)] but like its<br />

Ostreococcus homolog, lacks <strong>the</strong> tRNA species that reads <strong>the</strong> AUA codon [i.e. <strong>the</strong> tRNA Ile<br />

(CAU) where C is modified post-transcriptionally to lysidine]. As in <strong>Pyramimonas</strong> cpDNA, <strong>the</strong><br />

5S rRNA gene was not detected. Moreover, <strong>the</strong> Pycnococcus genome is missing <strong>the</strong> protein-<br />

coding genes psaJ and rpoB, which are present in all o<strong>the</strong>r investigated chlorophytes. Although<br />

<strong>the</strong> Pycnococcus, Ostreococcus and <strong>Pyramimonas</strong> cpDNAs all show a reduced gene content<br />

compared to <strong>the</strong> Nephroselmis genome, <strong>the</strong>ir sets <strong>of</strong> genes show substantial differences (table 2).<br />

No vestigial IR region was identified in Pycnococcus cpDNA. <strong>The</strong> genes generally found<br />

in this region are dispersed throughout <strong>the</strong> genome; in contrast, several genes usually present<br />

15<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


within <strong>the</strong> SSC region in genomes displaying an ancestral quadripartite structure [chlN, chlL,<br />

ycf1, cysT and trnP(ggg)] remained clustered toge<strong>the</strong>r (supplementary fig. 2, Supplementary<br />

Material). <strong>The</strong>re are 2 cases <strong>of</strong> overlapping genes (ycf4-rnpB and psbD-psbC); for <strong>the</strong> o<strong>the</strong>r<br />

coding regions, intergenic spacers were found to vary from 0 to 383 bp, for an average length <strong>of</strong><br />

102 bp.<br />

<strong>The</strong> Pycnococcus atpB intron shares with its Ostreococcus counterpart <strong>the</strong> same insertion<br />

position and a large ORF in domain IV that features <strong>the</strong> reverse transcriptase (cd01651),<br />

maturase (pfam08388), and HNH endonuclease (cd00085) domains <strong>of</strong> reverse transcriptases<br />

encoded by group II introns. <strong>The</strong> Pycnococcus and Ostreococcus intron ORFs share strong<br />

similarity with one ano<strong>the</strong>r and with reverse transcriptase genes found in several cyanobacterial<br />

species as well as in group II introns present in <strong>the</strong> mitochondrial large subunit (LSU) rRNA<br />

gene <strong>of</strong> <strong>the</strong> red alga Porphyra purpurea (Burger et al. 1999) and <strong>the</strong> chloroplast psbA genes <strong>of</strong><br />

Chlamydomonas sp. CCMP 1619 (Odom et al. 2004) and Euglena myxocylindracea (Sheveleva<br />

and Hallick 2004).<br />

<strong>The</strong> Monomastix <strong>Chloroplast</strong> Genome has a Reduced Gene Content but is Loosely Packed with<br />

Genes<br />

Compared to its Pycnococcus homolog, <strong>the</strong> Monomastix chloroplast genome is 34 kb<br />

larger, has a deficit <strong>of</strong> four genes, and contains five additional introns (table 1, fig. 3 and<br />

supplementary fig. 3, Supplementary Material). Its increased size is largely accounted for by <strong>the</strong><br />

expansion <strong>of</strong> intergenic spacers. <strong>The</strong> latter vary from 3 to 2566 bp, for an average size <strong>of</strong> 524 bp,<br />

and contain a myriad <strong>of</strong> short repeated sequences rich in G+C. <strong>The</strong> 94 conserved genes specify<br />

64 proteins and 30 RNAs (3 rRNAs, 26 tRNAs, and <strong>the</strong> RNA component <strong>of</strong> RNase P) (table 2).<br />

16<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


<strong>The</strong> 26 tRNAs can decode all 61 sense codons assuming that tRNA Arg (ACG), where A is<br />

modified to inosine, recognizes all 4 codons <strong>of</strong> <strong>the</strong> CGX family. <strong>The</strong> reduced gene content <strong>of</strong><br />

Monomastix is more like <strong>the</strong> gene complement <strong>of</strong> Ostreococcus than that <strong>of</strong> Pycnococcus (table<br />

2). It features 9 genes that are missing from Ostreococcus and lacks only 3 genes that are present<br />

in this alga, including psaC, a gene shared by <strong>the</strong> chloroplasts <strong>of</strong> all previously investigated<br />

chlorophytes. Although short dispersed repeats were mapped predominantly to intergenic<br />

regions, a small fraction was found within <strong>the</strong> coding regions <strong>of</strong> 5 genes (ftsH, rpoB, rpoC1,<br />

rpoC2 and ftsH) and within 2 introns (psbA intron and rrl intron 4) (supplementary fig. 4,<br />

Supplementary Material). This distribution pattern resembles those reported for o<strong>the</strong>r<br />

chlorophyte cpDNAs rich in short repeats (Maul et al. 2002; Pombert et al. 2005; Bélanger et al.<br />

2006; de Cambiaire et al. 2006; Pombert et al. 2006; de Cambiaire et al. 2007). Ranging from 19<br />

and 58 nucleotides, <strong>the</strong> most abundant short dispersed repeats <strong>of</strong> Monomastix were classified into<br />

4 families (A and A1, B and B1, C and D) according to <strong>the</strong>ir sequence motifs; moreover, some<br />

repeats displaying partial sequences characteristic <strong>of</strong> distinct families were discerned<br />

(supplementary fig. 5, Supplementary Material). <strong>The</strong> hybrid nature <strong>of</strong> <strong>the</strong> latter dispersed repeats,<br />

which were assigned to 6 categories (named AB, AC, AD, A1D, A1B and BD), suggests <strong>the</strong>y<br />

arose through recombination between regions carrying different repeats.<br />

<strong>The</strong> Monomastix chloroplast genome contains a single group II intron, located in<br />

trnK(uuu), and 5 group I introns, 1 <strong>of</strong> which resides in psbA and 4 in <strong>the</strong> LSU rRNA gene (rrl)<br />

(fig. 1). <strong>The</strong> IIB trnK intron is inserted within <strong>the</strong> D arm <strong>of</strong> <strong>the</strong> tRNA secondary structure<br />

following G23 and lacks an ORF. All o<strong>the</strong>r trnK(uuu) introns that have been identified in<br />

streptophyte cpDNAs carry an internal ORF with a maturase domain (matK) and are inserted<br />

within <strong>the</strong> anticodon loop (Turmel et al. 2006). In view <strong>of</strong> <strong>the</strong>ir ability to encode an homing<br />

17<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


endonuclease, <strong>the</strong> 5 Monomastix group I introns are likely to be mobile and were probably<br />

captured via horizontal intracellular and/or intercellular DNA transfer. <strong>The</strong> IA2 psbA intron,<br />

found at position 525 relative to <strong>the</strong> corresponding Mesostigma gene, specifies a potential<br />

homing endonuclease with <strong>the</strong> GIY-YIG motif and has chloroplast homologs with <strong>the</strong> same<br />

insertion site and highly similar endonuclease genes in <strong>the</strong> ulvophytes Oltmannsiellopsis and<br />

Pseudendoclonium and <strong>the</strong> chlorophycean green algae Oedogonium and Chlamydomonas<br />

reinhardtii (Brouard et al. 2008). <strong>The</strong> 4 remaining group I introns encode potential<br />

LAGLIDADG homing endonucleases (Côté et al. 1993; Lucas et al. 2001) and also share<br />

identical insertion sites with a large number <strong>of</strong> chlorophyte (Lucas et al. 2001; Brouard et al.<br />

2008) and cyanobacterial (Haugen et al. 2007) introns. <strong>The</strong> first and third LSU rDNA introns,<br />

whose insertion positions correspond to sites 1931 and 2500 in <strong>the</strong> E. coli 23S rRNA, fall within<br />

subgroup IB4, whereas <strong>the</strong> second and fourth introns inserted at sites 1951 and 2593 belong to<br />

<strong>the</strong> IA3 family. Like its Chlamydomonas homolog I-CreI, <strong>the</strong> Monomastix site-2593 intron-<br />

encoded homing endonuclease (I-MsoI) has been characterized at <strong>the</strong> 3-dimensional level in <strong>the</strong><br />

presence <strong>of</strong> its DNA target site, revealing that <strong>the</strong> 2 isoschizomers display strikingly different<br />

protein/DNA contacts (Lucas et al. 2001; Chevalier et al. 2003). Interestingly, at sites 1931, 2500<br />

and 2593, <strong>the</strong> Monomastix mitochondrial LSU rRNA gene features introns with similar<br />

structures and ORFs as those found at identical sites in <strong>the</strong> chloroplast gene (Lucas et al. 2001)<br />

(unpublished data <strong>of</strong> MT, MCG, CO and CL), highlighting <strong>the</strong> possibility that mobile group I<br />

introns were exchanged between different organellar compartments in <strong>the</strong> Monomastix lineage.<br />

Evidence supporting such intracellular exchanges <strong>of</strong> group I introns has also been reported for<br />

<strong>the</strong> Nephroselmis (Turmel et al. 1999a) and Pseudendoclonium (Pombert et al. 2006) lineages.<br />

18<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


<strong>Pyramimonas</strong> and Euglena cpDNAs Show Striking Similarities in Gene Order<br />

Gene orders in <strong>the</strong> three newly sequenced prasinophyte chloroplast genomes were<br />

compared with one ano<strong>the</strong>r and with those in previously examined chlorophytes, <strong>the</strong><br />

streptophytes Mesostigma and Chlorokybus, <strong>the</strong> euglenid Euglena and <strong>the</strong> chlorarachniophyte<br />

Bigelowiella. In all pairwise genome comparisons, except that including <strong>Pyramimonas</strong> and<br />

Euglena, <strong>the</strong> vast majority <strong>of</strong> <strong>the</strong> identified syntenic blocks were composed exclusively <strong>of</strong> gene<br />

clusters commonly found in streptophytes and chlorophytes. Ancestral clusters <strong>of</strong> this type<br />

display substantial variability among <strong>the</strong> Euglena and prasinophyte genomes (fig. 4). Clearly, <strong>the</strong><br />

gene-rich genome <strong>of</strong> Nephroselmis exhibits <strong>the</strong> highest number <strong>of</strong> genes (94 genes) mapping to<br />

clusters predating <strong>the</strong> split <strong>of</strong> <strong>the</strong> Chlorophyta and Streptophyta. Breakpoints within ancestral<br />

clusters proved to be too variable in positions to determine which <strong>of</strong> <strong>the</strong> compared genomes are<br />

<strong>the</strong> most closely related. Note that our comparisons <strong>of</strong> <strong>the</strong> <strong>Pyramimonas</strong> genome with those <strong>of</strong><br />

Mesostigma and Chlorokybus disclosed ancestral gene linkages that had not been reported in any<br />

chlorophyte cpDNA (e.g. psbH-petB-petD, R(ccg)-rbcL-atpB-atpE). <strong>The</strong> ancestral rps2-atpI<br />

linkage detected in <strong>the</strong> Euglena genome was also previously unrecognized in chlorophytes.<br />

Comparison <strong>of</strong> gene orders in <strong>the</strong> <strong>Pyramimonas</strong> and Euglena cpDNAs revealed striking<br />

similarities between <strong>the</strong>se genomes. Almost 2-thirds <strong>of</strong> <strong>the</strong> 87 genes (56 genes) in Euglena<br />

cpDNA were found to be part <strong>of</strong> collinear regions, for a total <strong>of</strong> 16 syntenic blocks. Thirty-five<br />

<strong>of</strong> <strong>the</strong>se genes form 8 blocks that exhibit gene linkages unique to <strong>Pyramimonas</strong> and Euglena (fig.<br />

5). Four blocks contain exclusively derived linkages, whereas <strong>the</strong> remaining 4 also include<br />

ancestral gene linkages present in chlorophytes and streptophytes (<strong>the</strong> rpl23, rpl32, rps12 and rrs<br />

clusters). It is interesting to note that in each <strong>of</strong> <strong>the</strong> latter 4 blocks, a pair <strong>of</strong> adjacent genes was<br />

cleanly excised from <strong>the</strong> Euglena genome following <strong>the</strong> formation <strong>of</strong> <strong>the</strong> derived linkages. <strong>The</strong><br />

19<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


syntenic block containing <strong>the</strong> triad psbK-ycf12-psaM is not uniquely shared by <strong>the</strong> <strong>Pyramimonas</strong><br />

and Euglena chloroplasts. Being also present in Chlorella, Pseudendoclonium and<br />

Oltmansiellopsis but not in streptophytes, this derived cluster must have arisen in prasinophytes<br />

and have been transmitted by vertical descent to <strong>the</strong> trebouxiophyte and ulvophyte lineages.<br />

Monomastix Occupies an Early-Diverging Branch <strong>of</strong> <strong>the</strong> Mamiellales in 18S rDNA Trees<br />

Monomastix has been historically affiliated with <strong>the</strong> Prasinophyceae; however, <strong>the</strong> finding<br />

that its body scales are not typical <strong>of</strong> those found in prasinophytes but are more like those <strong>of</strong> <strong>the</strong><br />

chrysophyte Chromulina placentula (Manton 1967) led to <strong>the</strong> exclusion <strong>of</strong> this genus from <strong>the</strong><br />

Prasinophyceae (Melkonian 1990; Sym and Pienaar 1993). Very limited molecular information<br />

has been reported so far for Monomastix, explaining why its phylogenetic status has remained<br />

enigmatic. In <strong>the</strong> present study, we determined <strong>the</strong> sequence <strong>of</strong> <strong>the</strong> Monomastix nuclear-encoded<br />

SSU rRNA gene and compared it with those available for o<strong>the</strong>r prasinophytes and some<br />

representatives <strong>of</strong> <strong>the</strong> Trebouxiophyceae, Ulvophyceae and Chlorophyceae. Trees inferred with<br />

ML unambiguously showed that Monomastix represents an early-diverging lineage <strong>of</strong> <strong>the</strong><br />

Mamiellales (clade II) (fig. 6). This uniflagellate, which has non-prasinophyte scales, was<br />

resolved as <strong>the</strong> first branch <strong>of</strong> this morphologically diverse clade. An unquestionable affinity<br />

<strong>the</strong>refore exists between Ostreococcus and Monomastix even though <strong>the</strong>se 2 taxa belong to<br />

different lineages <strong>of</strong> <strong>the</strong> Mamiellales. <strong>The</strong> naked Ostreococcus is closely related to <strong>the</strong> scaly<br />

Bathycoccus and <strong>the</strong> clade uniting <strong>the</strong>se nonflagellated genera is sister to that containing <strong>the</strong><br />

flagellated genera Mamiella (2 flagella), Mantoniella (1 flagellum), Micromonas (naked, 1<br />

flagellum), and <strong>the</strong> new genus represented by isolate RCC 391 (2 flagella).<br />

20<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


<strong>Chloroplast</strong> Phylogenomic Analyses Unite <strong>the</strong> Pyramimonadales with <strong>the</strong> Mamiellales and<br />

Identify <strong>the</strong> Pyramimonadales as <strong>the</strong> Source <strong>of</strong> <strong>the</strong> Euglenid <strong>Chloroplast</strong>s<br />

To explore <strong>the</strong> relationships among prasinophyte lineages (in particular clades I, II, III and<br />

V) as well as <strong>the</strong> relationships <strong>of</strong> chlorophyte chloroplasts with <strong>the</strong> secondarily acquired<br />

chloroplasts <strong>of</strong> Bigelowiella and Euglena, we generated data sets <strong>of</strong> 70 concatenated proteins and<br />

genes (first and second codon positions) from completely sequenced chloroplast genomes and<br />

analyzed <strong>the</strong>m using <strong>the</strong> ML and Bayesian methods (fig. 7). As expected, both <strong>the</strong> protein and<br />

gene trees identified a strongly supported clade uniting <strong>the</strong> 2 representatives <strong>of</strong> <strong>the</strong> Mamiellales,<br />

Monomastix and Ostreococcus. This clade is sister to a robust monophyletic group clustering <strong>the</strong><br />

<strong>Pyramimonas</strong> (scaly, 4 or 8 flagella) and Euglena chloroplasts. While this sister relationship<br />

received 87% bootstrap support in <strong>the</strong> protein ML tree (fig. 7A), exclusion <strong>of</strong> <strong>the</strong> long-branch<br />

taxa Euglena and Bigelowiella from <strong>the</strong> analysis resulted in 97% bootstrap support for <strong>the</strong><br />

<strong>Pyramimonas</strong> + Monomastix + Ostreococcus clade (data not shown). In all analyses, <strong>the</strong> scaly<br />

biflagellate Nephroselmis was sister to all chlorophytes analyzed, whereas <strong>the</strong> position <strong>of</strong> <strong>the</strong><br />

naked, nonflagellated Pycnococcus remained equivocal. <strong>The</strong> latter prasinophyte was resolved as<br />

sister to <strong>the</strong> core chlorophytes in <strong>the</strong> protein tree (fig. 7A), but was sister to <strong>the</strong> Mamiellales,<br />

Pyramimonadales and euglenids in <strong>the</strong> gene tree (fig. 7B). <strong>The</strong> protein and gene trees thus differ<br />

only in <strong>the</strong> branching position <strong>of</strong> <strong>the</strong> core chlorophytes with respect to <strong>the</strong> prasinophyte lineages.<br />

Because phylogenetic analyses based on <strong>the</strong> whole-genome approach are inherently<br />

associated with sparse taxon sampling, <strong>the</strong>y can lead to trees robustly supporting an artifactual<br />

clustering <strong>of</strong> taxa (Brinkmann and Philippe 2008; Heath et al. 2008). Caution must <strong>the</strong>refore be<br />

exercised in <strong>the</strong> interpretation <strong>of</strong> <strong>the</strong> observed topologies. In <strong>the</strong> case <strong>of</strong> trees derived from<br />

complete genome sequences, structural features <strong>of</strong> <strong>the</strong>se genomes can be used as independent<br />

21<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


data to test topologies (Rokas 2006). In <strong>the</strong> present study, <strong>the</strong> strong alliance we uncovered<br />

between <strong>the</strong> <strong>Pyramimonas</strong> and Euglena chloroplasts is streng<strong>the</strong>ned by a number <strong>of</strong> gene<br />

linkages that are unique to <strong>the</strong> cpDNAs <strong>of</strong> <strong>the</strong>se algae (fig. 5). Based on this finding, we infer<br />

with confidence that <strong>the</strong> green algal partner in <strong>the</strong> secondary endosymbiosis that gave rise to<br />

euglenids was a member <strong>of</strong> <strong>the</strong> Pyramimonadales. Euglenids are unicellular organisms that<br />

belong to <strong>the</strong> Excavata, a supergroup <strong>of</strong> eukaryotes including diverse nonphotosyn<strong>the</strong>tic groups<br />

like diplomonads, retortamonads, parabasalids, oxymonads and jakobids (Baldauf et al. 2000;<br />

Keeling et al. 2005; Baldauf 2008). Euglenids are <strong>the</strong> only photosyn<strong>the</strong>tic excavates and <strong>the</strong>y are<br />

specifically related to a subgroup containing <strong>the</strong> kinetoplastids and diplonemids (Triemer and<br />

Farmer 2007). Prior to our study, published data were consistent with <strong>the</strong> notion that <strong>the</strong> euglenid<br />

chloroplasts evolved from a green algal endosymbiont that was allied to prasinophytes (Ishida et<br />

al. 1997; Turmel et al. 1999b; Rogers et al. 2007); however, it remained unknown which <strong>of</strong> <strong>the</strong><br />

monophyletic groups <strong>of</strong> prasinophytes harbored <strong>the</strong> closest relative <strong>of</strong> <strong>the</strong> euglenid<br />

endosymbiont. In agreement with our results, <strong>the</strong> ML tree that Ishida et al. (1997) inferred from<br />

<strong>the</strong> amino acid sequences <strong>of</strong> elongation factor Tu identified a strongly supported clade clustering<br />

<strong>Pyramimonas</strong> disomata and <strong>the</strong> euglenids Euglena gracilis and Astasia longa; however, this<br />

<strong>Pyramimonas</strong> species was <strong>the</strong> only prasinophyte sampled in this single-gene analysis. Likewise,<br />

considering that <strong>Pyramimonas</strong> parkeae is <strong>the</strong> unique representative <strong>of</strong> <strong>the</strong> Pyramimonadales in<br />

our chloroplast phylogenomic study, <strong>the</strong>re remain uncertainties about <strong>the</strong> exact<br />

pyramimonadalean lineage that was <strong>the</strong> source <strong>of</strong> <strong>the</strong> euglenid chloroplasts.<br />

In <strong>the</strong> eukaryotic tree <strong>of</strong> life based on nuclear-encoded genes, euglenids and<br />

chlorarachniophytes fall within distinct branches. Like euglenids, chlorarachniophytes belong to<br />

a supergroup <strong>of</strong> eukaryotes that is primarily nonphotosyn<strong>the</strong>tic, <strong>the</strong> Rhizaria (Keeling et al. 2005;<br />

22<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Baldauf 2008). By robustly placing Bigelowiella at a separate position from Euglena, our<br />

chloroplast phylogenomic analyses strongly reinforce <strong>the</strong> hypo<strong>the</strong>sis that <strong>the</strong> euglenid and<br />

chlorarachniophyte chloroplasts trace back to 2 independent secondary endosymbioses (Rogers<br />

et al. 2007; Takahashi et al. 2007) (fig. 7). Although <strong>the</strong> chloroplast <strong>of</strong> Bigelowiella was found to<br />

be sister to those <strong>of</strong> <strong>the</strong> ulvophytes Pseudendoclonium and Oltmansiellopsis in both <strong>the</strong> protein<br />

and gene trees, broader sampling <strong>of</strong> core chlorophytes will be required to pinpoint <strong>the</strong> closest<br />

green algal relative <strong>of</strong> <strong>the</strong> chlorarachniophyte endosymbiont.<br />

<strong>The</strong> most unexpected finding that emerged from our study is <strong>the</strong> observation that <strong>the</strong><br />

<strong>Pyramimonas</strong> + Euglena clade is sister to <strong>the</strong> Monomastix + Ostreococcus clade. While <strong>the</strong><br />

existence <strong>of</strong> a sister relationship between <strong>the</strong> Pyramimonadales and Mamiellales has not been<br />

previously documented, it is compatible with <strong>the</strong> resemblance that <strong>the</strong>se monophyletic groups<br />

display at <strong>the</strong> level <strong>of</strong> flagellar scale structure (Melkonian 1984; Melkonian 1990; O'Kelly 1992;<br />

Sym and Pienaar 1993) and with <strong>the</strong> branching order inferred from 18S rDNA data. Although <strong>the</strong><br />

Pyramimonadales emerge just before <strong>the</strong> Mamiellales in most 18S rDNA trees (Steinkotter et al.<br />

1994; Nakayama et al. 1998; Fawley et al. 2000; Guillou et al. 2004), <strong>the</strong>se lineages form a<br />

weakly supported clade in <strong>the</strong> ML tree recently reported by Nakayama et al. (2007). No<br />

similarities were found at <strong>the</strong> chloroplast gene order level that link <strong>the</strong> Pyramimonadales and<br />

Mamiellales to <strong>the</strong> exclusion <strong>of</strong> o<strong>the</strong>r chlorophyte groups; however, losses <strong>of</strong> at least 4 genes<br />

(cemA, cysT, petL and rpl19) could be traced back unambiguously to <strong>the</strong> common ancestor <strong>of</strong> <strong>the</strong><br />

Pyramimonadales and Mamiellales (supplementary table 1, Supplementary Material).<br />

Because <strong>the</strong> Pyramimonadales and Mamiellales are distinguished by prominent<br />

morphological differences, <strong>the</strong> existence <strong>of</strong> a sister relationship between <strong>the</strong>se lineages has<br />

important implications for <strong>the</strong> evolution <strong>of</strong> prasinophytes. All members <strong>of</strong> <strong>the</strong> Pyramimonadales,<br />

23<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


which represent <strong>the</strong> 5 genera indicated in figure 6 and also probably <strong>the</strong> Tasmanites (a fossil<br />

resembling <strong>the</strong> phycoma stages <strong>of</strong> Cymbomonas, Pterosperma, and Halosphaera, which has<br />

been found in Precambrian deposits), share a number <strong>of</strong> synapomorphic characters and have at<br />

least 4 flagella and a complex scaly covering consisting <strong>of</strong> 3 layers <strong>of</strong> scales on <strong>the</strong> cell body and<br />

<strong>of</strong> 2 layers on <strong>the</strong> flagella (Melkonian 1984; Melkonian 1990; Sym and Pienaar 1993). <strong>The</strong><br />

intermediate scale layer on <strong>the</strong> cell body consists <strong>of</strong> spiderweb-shaped scales in Pterosperma and<br />

is homologous to <strong>the</strong> outer scale layer on <strong>the</strong> flagellum (<strong>the</strong> limulus scales) and to <strong>the</strong> spiderweb<br />

scales <strong>of</strong> <strong>the</strong> Mamiellales. <strong>The</strong> limuloid scales <strong>of</strong> Cymbomonas are also reminiscent <strong>of</strong> <strong>the</strong><br />

spiderweb scales <strong>of</strong> <strong>the</strong> Mamiellales, particularly during morphogenesis (Moestrup et al. 2003).<br />

Interestingly, an apparent food-uptake apparatus is present in Cymbomonas, which has been<br />

interpreted as a character inherited from a phagotrophic ancestor <strong>of</strong> <strong>the</strong> green plants and<br />

subsequently lost during evolution <strong>of</strong> <strong>the</strong> green algae (Moestrup et al. 2003). On <strong>the</strong> o<strong>the</strong>r hand,<br />

<strong>the</strong> members <strong>of</strong> <strong>the</strong> Mamiellales show reduced morphological complexity and are characterized<br />

by a progressive simplification <strong>of</strong> cellular structure and a reduction in cell size that occurred<br />

concomitantly with <strong>the</strong> loss <strong>of</strong> scales (Nakayama et al. 1998). <strong>The</strong>y lack an underlayer <strong>of</strong> square-<br />

shaped scales (such scales are present in most o<strong>the</strong>r prasinophyte lineages and <strong>the</strong> flagellate<br />

reproductive cells <strong>of</strong> streptophytes) and no microtubular flagellar roots are attached to <strong>the</strong> basal<br />

body no. 2. A sister relationship between <strong>the</strong> Pyramimonadales and Mamiellales implies that<br />

some <strong>of</strong> <strong>the</strong> cellular features displayed by <strong>the</strong> Mamiellales were derived from <strong>the</strong> more complex<br />

organization seen in <strong>the</strong> Pyramimonadales and presumably in <strong>the</strong> common ancestor <strong>of</strong> all<br />

chlorophytes. In this context, it is worth mentioning that <strong>the</strong> nature <strong>of</strong> <strong>the</strong> progenitor <strong>of</strong> all green<br />

plants has generated intense debate and is still controversial (Melkonian 1984; O'Kelly 1992;<br />

24<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Sym and Pienaar 1993). A better understanding <strong>of</strong> <strong>the</strong> relationships among prasinophyte lineages<br />

will be required before one can infer with confidence evolutionary scenarios <strong>of</strong> cellular changes.<br />

At present, <strong>the</strong> identity <strong>of</strong> <strong>the</strong> earliest-diverging chlorophyte lineage remains uncertain.<br />

Intriguingly, <strong>the</strong> trees inferred from 18S rDNA sequences (Guillou et al. 2004; Nakayama et al.<br />

2007) are in discordance with <strong>the</strong> chloroplast phylogenomic trees reported in this study with<br />

regards to <strong>the</strong> position <strong>of</strong> <strong>the</strong> Nephroselmis genus (clade III). <strong>The</strong> early-diverging position<br />

observed for <strong>the</strong> Nephroselmis representative in chloroplast trees is in agreement with <strong>the</strong> high<br />

degree <strong>of</strong> ancestral features found in <strong>the</strong> cpDNA <strong>of</strong> this taxon (see fig. 8) but contrasts sharply<br />

with <strong>the</strong> much later divergence observed for <strong>the</strong> genus in 18S rDNA trees. In <strong>the</strong> latter trees, <strong>the</strong><br />

branch occupied by Nephroselmis species emerges near <strong>the</strong> lineage containing Pycnococcus and<br />

Pseudocourfieldia marina, <strong>the</strong> clade VII containing only picoplanktonic species, and <strong>the</strong> clade<br />

containining <strong>the</strong> core chlorophytes (Chlorodendrales sensu Melkonian (1990) +<br />

Trebouxiophyceae + Ulvophyceae + Chlorophyceae). Toge<strong>the</strong>r, <strong>the</strong>se lineages form a large clade<br />

that is well supported in ML analysis (fig. 6). Given <strong>the</strong> close relationship observed on <strong>the</strong> basis<br />

<strong>of</strong> scale structure between Nephroselmis and <strong>the</strong> genera Tetraselmis and Scherffelia, Nakayama<br />

et al. (2007) proposed that <strong>the</strong> common ancestor <strong>of</strong> <strong>the</strong> clade containing Nephroselmis and <strong>the</strong><br />

core chlorophytes had 2 layers <strong>of</strong> small scales on <strong>the</strong> flagella (squared-shaped scales and rod-<br />

shaped scales) and cell body (square scales and stellate scales). <strong>The</strong> abovementioned discrepancy<br />

between nuclear and chloroplast trees highlights <strong>the</strong> need for analysis <strong>of</strong> chloroplast genomes<br />

from additional prasinophytes. Sampling <strong>of</strong> chloroplast genomes from all 7 known lineages <strong>of</strong><br />

prasinophytes will be required to determine <strong>the</strong> exact position <strong>of</strong> Nephroselmis relative to <strong>the</strong><br />

Pycnococcaceae, Pyramimonadales and Mamiellales.<br />

25<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Losses <strong>of</strong> Multiple Ancestral cpDNA Characters in Independent Prasinophyte Lineages are<br />

Correlated with Major Cellular Remodeling<br />

To trace some <strong>of</strong> <strong>the</strong> evolutionary changes that occurred at <strong>the</strong> chloroplast genome level<br />

during <strong>the</strong> evolution <strong>of</strong> prasinophytes and euglenids, losses <strong>of</strong> 62 genes and 75 ancestral gene<br />

pairs were mapped on <strong>the</strong> tree topology inferred from sequence data (fig. 8). In this analysis, <strong>the</strong><br />

core chlorophytes were excluded and <strong>the</strong> streptophytes Mesostigma and Chlorokybus were used<br />

as outgroup. Although multiple characters were lost in independent lineages, a substantial<br />

fraction <strong>of</strong> losses are uniquely shared. In particular, <strong>the</strong> monophyletic group containing <strong>the</strong><br />

Mamiellales + euglenids + Pyramimonadales and <strong>the</strong> node linking <strong>the</strong> latter clade with <strong>the</strong><br />

Pycnococcaceae are supported by several changes that occurred only once. Because <strong>the</strong> nuclear<br />

genome <strong>of</strong> just one prasinophyte genus (Ostreococcus) has been decrypted so far (Derelle et al.<br />

2006; Palenik et al. 2007), we cannot interpret our results in terms <strong>of</strong> gene transfers from <strong>the</strong><br />

chloroplast to <strong>the</strong> nucleus. Most <strong>of</strong> <strong>the</strong> genes that vanished from <strong>the</strong> chloroplast genome<br />

probably fall into this category; however, some might have disappeared entirely from <strong>the</strong> cell<br />

because <strong>the</strong>ir requirement is restricted to certain growth and physiological conditions (e.g. <strong>the</strong><br />

chl genes associated with chlorophyll syn<strong>the</strong>sis in <strong>the</strong> dark, <strong>the</strong> cys genes involved in sulfate and<br />

thiosulfate transport, and <strong>the</strong> ndh genes associated with chlororespiration).<br />

<strong>The</strong> chloroplast genome sustained important reduction in gene content in at least 3 separate<br />

lineages, namely <strong>the</strong> lineages leading to Euglena, to <strong>the</strong> mamiellalean genera Monomastix and<br />

Ostreococcus, and to Pycnococcus (fig. 8). In light <strong>of</strong> <strong>the</strong> close affinity <strong>of</strong> <strong>the</strong> <strong>Pyramimonas</strong> and<br />

Euglena chloroplast genomes, we propose that <strong>the</strong> secondary endosymbiosis that gave rise to <strong>the</strong><br />

euglenid chloroplasts was accompanied by extensive gene losses. Similar extinction <strong>of</strong> numerous<br />

chloroplast genes has been associated with <strong>the</strong> secondary endosymbiosis that involved <strong>the</strong><br />

26<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


capture <strong>of</strong> a red alga and generated <strong>the</strong> chloroplasts <strong>of</strong> heterokonts, cryptophytes and haptophytes<br />

(Khan et al. 2007; Oudot-Le Secq et al. 2007; Cattolico et al. 2008). With regards to <strong>the</strong><br />

Mamiellales, it appears that <strong>the</strong> common ancestor <strong>of</strong> Monomastix and Ostreococcus had already<br />

experienced multiple chloroplast gene losses (fig. 8), implying that <strong>the</strong>se events might have<br />

accompanied <strong>the</strong> simplification <strong>of</strong> cell organization that presumably coincided with <strong>the</strong><br />

emergence <strong>of</strong> <strong>the</strong> Mamiellales. Moreover, as indicated by <strong>the</strong> higher frequency <strong>of</strong> genes losses in<br />

<strong>the</strong> Ostreococcus lineage compared to <strong>the</strong> Monomastix lineage, part <strong>of</strong> <strong>the</strong> gene losses in <strong>the</strong><br />

former lineage were likely connected with <strong>the</strong> evolution <strong>of</strong> <strong>the</strong> coccoid cell organization and <strong>the</strong><br />

reduction in cell size. Pycnococcus represents an independent coccoid lineage that sustained<br />

considerable reduction <strong>of</strong> <strong>the</strong> chloroplast genome, and as observed for Ostreoccocus, <strong>the</strong>re was<br />

strong pressure to maintain a compact genome organization. In contrast, <strong>the</strong> genome <strong>of</strong> <strong>the</strong><br />

mamiellalean Monomastix followed a divergent evolutionary pathway and became loosely<br />

packed with genes following proliferation <strong>of</strong> small dispersed repeats (table 1 and supplementary<br />

fig. 4, Supplementary Material).<br />

<strong>The</strong> pressure to maintain <strong>the</strong> ancestral quadripartite architecture became relaxed during <strong>the</strong><br />

evolution <strong>of</strong> prasinophytes and euglenids. <strong>The</strong> IR was lost a minimum <strong>of</strong> 3 times (fig. 8), an<br />

observation that is not surprising given that independent IR losses have been documented for <strong>the</strong><br />

class Trebouxiophyceae (de Cambiaire et al. 2007) and for land plants (Palmer 1991; Raubeson<br />

and Jansen 2005). More unexpected was our finding that <strong>the</strong> 3 examined IR-containing<br />

prasinophyte cpDNAs differ significantly in <strong>the</strong> distribution <strong>of</strong> <strong>the</strong>ir genes among <strong>the</strong> 2 SC<br />

regions and in <strong>the</strong> orientation <strong>of</strong> <strong>the</strong> IR relative to <strong>the</strong>se regions. While <strong>the</strong> Nephroselmis genome<br />

is <strong>the</strong> most similar to <strong>the</strong> gene partitioning pattern observed for streptophytes and some non-<br />

green algae (Turmel et al. 1999b), <strong>the</strong> reduced genome <strong>of</strong> Ostreococcus shows a pattern<br />

27<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


(supplementary fig.1, Supplementary Material) more like that observed for <strong>the</strong> ulvophytes<br />

Pseudendoclonium and Oltmannsiellopsis (Pombert et al. 2005; Pombert et al. 2006). When <strong>the</strong><br />

latter pattern was identified in Pseudendoclonium, it was hypo<strong>the</strong>sized that it might represent an<br />

intermediate form between <strong>the</strong> highly derived pattern found in <strong>the</strong> chlorophycean green alga<br />

Chlamydomonas reinhardtii and <strong>the</strong> ancestral quadripartite structure found in streptophytes,<br />

Nephroselmis, and probably early-diverging trebouxiophytes, thus lending support to <strong>the</strong> notion<br />

that <strong>the</strong> Ulvophyceae is sister to <strong>the</strong> Chlorophyceae (Pombert et al. 2005). However, <strong>the</strong> great<br />

variability in <strong>the</strong> quadripartite structure uncovered here for <strong>the</strong> Prasinophyceae and recently<br />

reported for <strong>the</strong> Chlorophyceae (de Cambiaire et al. 2006; Brouard et al. 2008) casts doubt on <strong>the</strong><br />

phylogenetic value <strong>of</strong> this genomic feature. Clearly, <strong>the</strong>se data indicate that chloroplast genome<br />

rearrangements led to <strong>the</strong> exchanges <strong>of</strong> genes between opposite SC regions on multiple<br />

occasions during <strong>the</strong> evolutionary history <strong>of</strong> chlorophytes.<br />

Conclusions<br />

<strong>The</strong> chloroplast genome <strong>of</strong> prasinophytes exhibits much more fluidity in gene content and<br />

arrangement than anticipated from <strong>the</strong> earlier reports on <strong>the</strong> Nephroselmis and Ostreococcus<br />

genomes. Major reduction and restructuring <strong>of</strong> <strong>the</strong> chloroplast genome occurred in conjunction<br />

with changes in cell organization in at least 2 lineages, <strong>the</strong> Mamiellales and Pycnococcaceae. By<br />

disclosing <strong>the</strong> existence <strong>of</strong> a sister relationship between <strong>the</strong> Mamiellales and Pyramimonadales,<br />

our study represents a significant step towards a better understanding <strong>of</strong> prasinophyte evolution.<br />

Fur<strong>the</strong>rmore, it <strong>of</strong>fers for <strong>the</strong> first time compelling evidence that <strong>the</strong> evolutionary history <strong>of</strong> <strong>the</strong><br />

prasinophytes was directly linked with <strong>the</strong> acquisition <strong>of</strong> photosyn<strong>the</strong>sis through secondary<br />

endosymbiosis by a subgroup <strong>of</strong> excavates, <strong>the</strong> euglenids. Two independent lines <strong>of</strong> evidence,<br />

28<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


trees inferred from sequence data and <strong>the</strong> presence <strong>of</strong> uniquely shared derived gene clusters,<br />

robustly support to <strong>the</strong> notion that <strong>the</strong> green algal ancestor <strong>of</strong> <strong>the</strong> euglenid chloroplasts belonged<br />

to <strong>the</strong> Pyramimonadales. Although sampling <strong>of</strong> Bigelowiella has not enabled us to pinpoint <strong>the</strong><br />

green algal donor <strong>of</strong> chlorarachniophytes chloroplasts, <strong>the</strong> inferred trees streng<strong>the</strong>n <strong>the</strong><br />

hypo<strong>the</strong>sis that chloroplasts arose independently in chlorarachniophytes and euglenids.<br />

Considering that pyramimonadaleans are richer in ancestral characters at <strong>the</strong> chloroplast genome<br />

level and exhibit a more pronounced level <strong>of</strong> cell asymmetry and complexity compared to <strong>the</strong><br />

mamiellaleans, it is plausible that cell asymmetry characterized <strong>the</strong> common ancestor <strong>of</strong> <strong>the</strong>se<br />

lineages. Consistent with <strong>the</strong> hypo<strong>the</strong>sis that <strong>the</strong> common ancestor <strong>of</strong> all chlorophytes also<br />

featured an asymmetrical cell architecture is <strong>the</strong> observation that Nephroselmis occupies <strong>the</strong><br />

earliest divergence <strong>of</strong> <strong>the</strong> Chlorophyta and displays <strong>the</strong> highest conservation <strong>of</strong> ancestral<br />

characters. Future chloroplast genome investigations incorporating <strong>the</strong> Chlorodendrales, <strong>the</strong> 2<br />

picoplanktonic lineages not sampled in <strong>the</strong> present study, and a broader range <strong>of</strong> taxa in each<br />

lineage should resolve fur<strong>the</strong>r <strong>the</strong> branching pattern <strong>of</strong> prasinophyte lineages and clarify <strong>the</strong><br />

number <strong>of</strong> separate events that gave rise to coccoids and streamlining <strong>of</strong> <strong>the</strong> chloroplast genome.<br />

Supplementary Material<br />

Supplementary figures 1-5, supplementary table 1, <strong>the</strong> data sets used in phylogenetic<br />

analyses, and <strong>the</strong> data set used to infer <strong>the</strong> evolutionary scenario <strong>of</strong> character losses are available<br />

at Molecular Biology and Evolution online (http://mbe.oxfordjournals.org/). <strong>The</strong> fully annotated<br />

chloroplast genome sequences <strong>of</strong> Monomastix, Pycnococcus and <strong>Pyramimonas</strong> have been<br />

deposited in <strong>the</strong> GenBank database under <strong>the</strong> accession numbers FJ493497, FJ493498, and<br />

29<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


FJ493499, respectively. <strong>The</strong> GenBank accession number for <strong>the</strong> Monomastix 18S rDNA<br />

sequence determined in this study is FJ493496.<br />

Acknowledgments<br />

We thank Mathieu Blais and Bertrand Caillier for <strong>the</strong>ir assistance in cloning and<br />

sequencing <strong>the</strong> <strong>Pyramimonas</strong> chloroplast genome. This study was supported by a grant from <strong>the</strong><br />

Natural Sciences and Engineering Research Council <strong>of</strong> Canada (to M.T. and C.L.).<br />

Literature Cited<br />

Baldauf SL. 2008. An overview <strong>of</strong> <strong>the</strong> phylogeny and diversity <strong>of</strong> eucaryotes. J Syst Evol.<br />

46:263-273.<br />

Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF. 2000. A kingdom-level phylogeny <strong>of</strong><br />

eukaryotes based on combined protein data. Science. 290:972-977.<br />

Bélanger A-S, Brouard J-S, Charlebois P, Otis C, Lemieux C, Turmel M. 2006. Distinctive<br />

architecture <strong>of</strong> <strong>the</strong> chloroplast genome in <strong>the</strong> chlorophycean green alga Stigeoclonium<br />

helveticum. Mol Gen Genomics. 276:464-477.<br />

Brinkmann H, Philippe H. 2008. Animal phylogeny and large-scale sequencing: progress and<br />

pitfalls. J Syst Evol. 46:274-286.<br />

Brouard J-S, Otis C, Lemieux C, Turmel M. 2008. <strong>Chloroplast</strong> DNA sequence <strong>of</strong> <strong>the</strong> green alga<br />

Oedogonium cardiacum (Chlorophyceae): Unique genome architecture, derived<br />

characters shared with <strong>the</strong> Chaetophorales and novel genes acquired through horizontal<br />

transfer. BMC Genomics. 9:290.<br />

30<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Burger G, Saint-Louis D, Gray MW, Lang BF. 1999. Complete sequence <strong>of</strong> <strong>the</strong> mitochondrial<br />

DNA <strong>of</strong> <strong>the</strong> red alga Porphyra purpurea. Cyanobacterial introns and shared ancestry <strong>of</strong><br />

red and green algae. Plant Cell. 11:1675-1694.<br />

Castresana J. 2000. Selection <strong>of</strong> conserved blocks from multiple alignments for <strong>the</strong>ir use in<br />

phylogenetic analysis. Mol Biol Evol. 17:540-552.<br />

Cattolico R, Jacobs M, Zhou Y, Chang J, Duplessis M, Lybrand T, McKay J, Ong H, Sims E,<br />

Rocap G. 2008. <strong>Chloroplast</strong> genome sequencing analysis <strong>of</strong> Heterosigma akashiwo<br />

CCMP452 (West Atlantic) and NIES293 (West Pacific) strains. BMC Genomics. 9:211.<br />

Chevalier B, Turmel M, Lemieux C, Monnat RJ, Stoddard BL. 2003. Flexible DNA target site<br />

recognition by divergent homing endonuclease isoschizomers I-CreI and I-MsoI. J Mol<br />

Biol. 329:253-269.<br />

Côté V, Mercier J-P, Lemieux C, Turmel M. 1993. <strong>The</strong> single group-I intron in <strong>the</strong> chloroplast<br />

rrnL gene <strong>of</strong> Chlamydomonas humicola encodes a site-specific DNA endonuclease (I-<br />

ChuI). Gene. 129:69-76.<br />

Courties C, Vaquer A, Troussellier M, Lautier J, Chretiennot-Dinet MJ, Neveux J, Machado C,<br />

Claustre H. 1994. Smallest eukaryotic organism. Nature. 370:255-255.<br />

de Cambiaire J-C, Otis C, Lemieux C, Turmel M. 2006. <strong>The</strong> complete chloroplast genome<br />

sequence <strong>of</strong> <strong>the</strong> chlorophycean green alga Scenedesmus obliquus reveals a compact gene<br />

organization and a biased distribution <strong>of</strong> genes on <strong>the</strong> two DNA strands. BMC Evol Biol.<br />

6:37.<br />

de Cambiaire J-C, Otis C, Lemieux C, Turmel M. 2007. <strong>The</strong> chloroplast genome sequence <strong>of</strong> <strong>the</strong><br />

green alga Leptosira terrestris: multiple losses <strong>of</strong> <strong>the</strong> inverted repeat and extensive<br />

genome rearrangements within <strong>the</strong> Trebouxiophyceae. BMC Genomics. 8:213.<br />

31<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Derelle E, Ferraz C, Rombauts S et al. 2006. Genome analysis <strong>of</strong> <strong>the</strong> smallest free-living<br />

eukaryote Ostreococcus tauri unveils many unique features. Proc Natl Acad Sci USA.<br />

103:11647-11652.<br />

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high<br />

throughput. Nucleic Acids Res. 32:1792-1797.<br />

Farris JS. 1977. Phylogenetic analysis under Dollo's Law. Syst Zool. 26:77-88.<br />

Fawley MW, Yun Y, Qin M. 2000. Phylogenetic analyses <strong>of</strong> 18S rDNA sequences reveal a new<br />

coccoid lineage <strong>of</strong> <strong>the</strong> Prasinophyceae (Chlorophyta). J Phycol. 36:387-393.<br />

Goulding SE, Olmstead RG, Morden CW, Wolfe KH. 1996. Ebb and flow <strong>of</strong> <strong>the</strong> chloroplast<br />

inverted repeat. Mol Gen Genet. 252:195-206.<br />

Guillou L, Eikrem W, Chrétiennot-Dinet M-J, Le Gall F, Massana R, Romari K, Pedrós-Alió C,<br />

Vaulot D. 2004. Diversity <strong>of</strong> picoplanktonic prasinophytes assessed by direct nuclear<br />

SSU rDNA sequencing <strong>of</strong> environmental samples and novel isolates retrieved from<br />

oceanic and coastal marine ecosystems. Protist. 155:193-214.<br />

Hallick RB, Hong L, Drager RG, Favreau MR, Monfort A, Orsat B, Spielmann A, Stutz E. 1993.<br />

Complete sequence <strong>of</strong> Euglena gracilis chloroplast DNA. Nucleic Acids Res. 21:3537-<br />

3544.<br />

Hamby RK, Zimmer EA. 1991. Ribosomal RNA as a phylogenetic tool in plant systematics. In:<br />

Soltis P, Soltis D, Doyle J, editors. Molecular Systematics in Plants. New York:<br />

Routledge, Chapman and Hall. p. 50-91.<br />

Haugen P, Bhattacharya D, Palmer JD, Turner S, Lewis LA, Pryer KM. 2007. Cyanobacterial<br />

ribosomal RNA genes with multiple, endonuclease-encoding group I introns. BMC Evol<br />

Biol. 7:159.<br />

32<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Heath TA, Hedtke SM, Hillis DM. 2008. Taxon sampling and <strong>the</strong> accuracy <strong>of</strong> phylogenetic<br />

analyses. J Syst Evol. 46:239-257.<br />

Ishida K, Cao Y, Hasegawa M, Okada N, Hara Y. 1997. <strong>The</strong> origin <strong>of</strong> chlorarachniophyte<br />

plastids, as inferred from phylogenetic comparisons <strong>of</strong> amino acid sequences <strong>of</strong> EF-Tu. J<br />

Mol Evol. 45:682-687.<br />

Jansen RK, Cai Z, Raubeson LA et al. 2007. Analysis <strong>of</strong> 81 genes from 64 plastid genomes<br />

resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.<br />

Proc Natl Acad Sci USA. 104:19369-19374.<br />

Jobb G, von Haeseler A, Strimmer K. 2004. TREEFINDER: a powerful graphical analysis<br />

environment for molecular phylogenetics. BMC Evol Biol. 4:18.<br />

Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Pearlman RE, Roger AJ, Gray MW.<br />

2005. <strong>The</strong> tree <strong>of</strong> eukaryotes. Trends Ecol Evol. 20:670-676.<br />

Keller MD, Selvin RC, Claus W, Guillard RRL. 1987. Media for <strong>the</strong> culture <strong>of</strong> oceanic<br />

ultraphytoplankton. J Phycol. 23:633-638.<br />

Khan H, Parks N, Kozera C, Curtis BA, Parsons BJ, Bowman S, Archibald JM. 2007. Plastid<br />

genome sequence <strong>of</strong> <strong>the</strong> cryptophyte alga Rhodomonas salina CCMP1319: lateral<br />

transfer <strong>of</strong> putative DNA replication machinery and a test <strong>of</strong> chromist plastid phylogeny.<br />

Mol Biol Evol. 24:1832-1842.<br />

Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. 2001. REPuter:<br />

<strong>the</strong> manifold applications <strong>of</strong> repeat analysis on a genomic scale. Nucleic Acids Res.<br />

29:4633-4642.<br />

Lambowitz AM, Zimmerly S. 2004. Mobile group II introns. Annu Rev Genet. 38:1-35.<br />

33<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Latasa M, Scharek R, Le Gall F, Guillou L. 2004. Pigment suites and taxonomic groups in<br />

Prasinophyceae. J Phycol. 40:1149-1155.<br />

Lemieux C, Otis C, Turmel M. 2000. Ancestral chloroplast genome in Mesostigma viride reveals<br />

an early branch <strong>of</strong> green plant evolution. Nature. 403:649-652.<br />

Lemieux C, Otis C, Turmel M. 2007. A clade uniting <strong>the</strong> green algae Mesostigma viride and<br />

Chlorokybus atmophyticus represents <strong>the</strong> deepest branch <strong>of</strong> <strong>the</strong> Streptophyta in<br />

chloroplast genome-based phylogenies. BMC Biology. 5:2.<br />

Lewis LA, McCourt RM. 2004. <strong>Green</strong> algae and <strong>the</strong> origin <strong>of</strong> land plants. Am J Bot. 91:1535-<br />

1556.<br />

Lucas P, Otis C, Mercier J-P, Turmel M, Lemieux C. 2001. Rapid evolution <strong>of</strong> <strong>the</strong> DNA-binding<br />

site in LAGLIDADG homing endonucleases. Nucleic Acids Res. 29:960-969.<br />

Maddison D, Maddison W. 2000. MacClade 4: analysis <strong>of</strong> phylogeny and character evolution.<br />

Sunderland, MA: Sinauer Associates.<br />

Manton I. 1967. Electron microscopical observations on a clone <strong>of</strong> Monomastix Scherffel in<br />

culture. Nova Hedwigia. 14:1-11.<br />

Marin B, Melkonian M. 1999. Mesostigmatophyceae, a new class <strong>of</strong> streptophyte green algae<br />

revealed by SSU rRNA sequence comparisons. Protist. 150:399-417.<br />

Mattox KR, Stewart KD. 1984. Classification <strong>of</strong> <strong>the</strong> green algae: a concept based on comparative<br />

cytology. In: Irvine DEG, John DM, editors. <strong>The</strong> Systematics <strong>of</strong> <strong>the</strong> <strong>Green</strong> <strong>Algae</strong>.<br />

London: Academic Press. p. 29-72.<br />

Maul JE, Lilly JW, Cui L, dePamphilis CW, Miller W, Harris EH, Stern DB. 2002. <strong>The</strong><br />

Chlamydomonas reinhardtii plastid chromosome: islands <strong>of</strong> genes in a sea <strong>of</strong> repeats.<br />

Plant Cell. 14:2659-2679.<br />

34<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


McCracken DA, Nadakavukaren MJ, Cain JR. 1980. A biochemical and ultrastructural<br />

evaluation <strong>of</strong> <strong>the</strong> taxonomic position <strong>of</strong> Glaucosphaera vacuolata Korsch. New Phytol.<br />

86:39-44.<br />

Melkonian M. 1984. Flagellar apparatus ultrastructure in relation to green algal classification. In:<br />

Irvine DEG, John DM, editors. <strong>The</strong> Systematics <strong>of</strong> <strong>the</strong> <strong>Green</strong> <strong>Algae</strong>. London: Academic<br />

Press. p. 73-120.<br />

Melkonian M. 1990. Phylum Chlorophyta. Class Prasinophyceae. In: Margulis L, Corliss JO,<br />

Melkonian M, Chapman DJ, editors. Handbook <strong>of</strong> Protoctista. <strong>The</strong> Structure, Cultivation,<br />

Habitats and Life Histories <strong>of</strong> <strong>the</strong> Eukaryotic Microorganisms and <strong>the</strong>ir Descendants<br />

Exclusive <strong>of</strong> Animals, Plants and Fungi. Boston: Jones and Bartlett Publishers. p. 600-<br />

607.<br />

Michel F, Umesono K, Ozeki H. 1989. Comparative and functional anatomy <strong>of</strong> group II catalytic<br />

introns – a review. Gene. 82:5-30.<br />

Michel F, Westh<strong>of</strong> E. 1990. Modelling <strong>of</strong> <strong>the</strong> three-dimensional architecture <strong>of</strong> group I catalytic<br />

introns based on comparative sequence analysis. J Mol Biol. 216:585-610.<br />

Moestrup O, Inouye I, Hori T. 2003. Ultrastructural studies on Cymbomonas tetramitiformis<br />

(Prasinophyceae). I. General structure, scale microstructure, and ontogeny. Can J Bot.<br />

81:657-671.<br />

Moestrup O, Thomsen HA. 1974. An ultrastructural study <strong>of</strong> <strong>the</strong> flagellate <strong>Pyramimonas</strong><br />

orientalis with particular emphasis on golgi apparatus activity and <strong>the</strong> flagellar apparatus.<br />

Protoplasma. 81:247-269.<br />

35<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Moestrup O, Throndsen J. 1988. Light and electron microscopical studies on pseudoscourfieldia-<br />

marina a primitive scaly green flagellate prasinophyceae with posterior flagella. Can J<br />

Bot. 66:1415-1434.<br />

Nakayama T, Marin B, Kranz HD, Surek B, Huss VAR, Inouye I, Melkonian M. 1998. <strong>The</strong> basal<br />

position <strong>of</strong> scaly green flagellates among <strong>the</strong> green algae (Chlorophyta) is revealed by<br />

analyses <strong>of</strong> nuclear-encoded SSU rRNA sequences. Protist. 149:367-380.<br />

Nakayama T, Suda S, Kawachi M, Inouye I. 2007. Phylogeny and ultrstructure <strong>of</strong> Nephroselmis<br />

and Pseudoscourfieldia (Chlorophyta), including <strong>the</strong> description <strong>of</strong> Nephroselmis<br />

anterostigmatica sp. nov. and a proposal for <strong>the</strong> Nephroselmidales ord. nov. Phycologia.<br />

46:680-697.<br />

O'Kelly CJ. 1992. Flagellar apparatus architecture and <strong>the</strong> phylogeny <strong>of</strong> “green algae”:<br />

chlorophytes, euglenoids, glaucophytes. In: Menzel D, editor. <strong>The</strong> cytoskeleton <strong>of</strong> <strong>the</strong><br />

algae. Boca Raton: CRC Press. p. 315-345.<br />

Odom OW, Shenkenberg DL, Garcia JA, Herrin DL. 2004. A horizontally acquired group II<br />

intron in <strong>the</strong> chloroplast psbA gene <strong>of</strong> a psychrophilic Chlamydomonas: in vitro self-<br />

splicing and genetic evidence for maturase activity. RNA. 10:1097-1107.<br />

Oudot-Le Secq M-P, Grimwood J, Shapiro H, Armbrust EV, Bowler C, <strong>Green</strong> BR. 2007.<br />

<strong>Chloroplast</strong> genomes <strong>of</strong> <strong>the</strong> diatoms Phaeodactylum tricornutum and Thalassiosira<br />

pseudonana: comparison with o<strong>the</strong>r plastid genomes <strong>of</strong> <strong>the</strong> red lineage. Mol Genet<br />

Genomics. 277:427-439.<br />

Palenik B, Grimwood J, Aerts A et al. 2007. <strong>The</strong> tiny eukaryote Ostreococcus provides genomic<br />

insights into <strong>the</strong> paradox <strong>of</strong> plankton speciation. Proc Natl Acad Sci USA. 104:7705-<br />

7710.<br />

36<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Palmer JD. 1991. Plastid chromosomes: structure and evolution. In: Bogorad L, Vasil K, editors.<br />

<strong>The</strong> Molecular Biology <strong>of</strong> Plastids. San Diego: Academic Press. p. 5-53.<br />

Pombert J-F, Lemieux C, Turmel M. 2006. <strong>The</strong> complete chloroplast DNA sequence <strong>of</strong> <strong>the</strong> green<br />

alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in <strong>the</strong><br />

chloroplast genome <strong>of</strong> early diverging ulvophytes. BMC Biology. 4:3.<br />

Pombert J-F, Otis C, Lemieux C, Turmel M. 2005. <strong>The</strong> chloroplast genome sequence <strong>of</strong> <strong>the</strong><br />

green alga Pseudendoclonium akinetum (Ulvophyceae) reveals unusual structural features<br />

and new insights into <strong>the</strong> branching order <strong>of</strong> chlorophyte lineages. Mol Biol Evol.<br />

22:1903-1918.<br />

Proschold T, Leliaert F. 2007. Systematics <strong>of</strong> <strong>the</strong> green algae: conflict <strong>of</strong> classic and modern<br />

approaches. In: Brodie J, Lewis J, editors. Unravelling <strong>the</strong> <strong>Algae</strong>: <strong>The</strong> Past, Present, and<br />

Future <strong>of</strong> Algal Systematics. Boca Raton: CRC Press, Taylor & Francis. p. 123-153.<br />

Qiu YL, Li LB, Wang B et al. 2006. <strong>The</strong> deepest divergences in land plants inferred from<br />

phylogenomic evidence. Proc Natl Acad Sci USA. 103:15511-15516.<br />

Raubeson LA, Jansen RK. 2005. <strong>Chloroplast</strong> genomes <strong>of</strong> plants. In: Henry RJ, editor. Plant<br />

Diversity and Evolution: Genotypic and Phenotypic Variation in Higher Plants.<br />

Wallingford: CABI Publishing. p. 45-68.<br />

Robbens S, Derelle E, Ferraz C, Wuyts J, Moreau H, Van de Peer Y. 2007. <strong>The</strong> complete<br />

chloroplast and mitochondrial DNA sequence <strong>of</strong> Ostreococcus tauri: organelle genomes<br />

<strong>of</strong> <strong>the</strong> smallest eukaryote are examples <strong>of</strong> compaction. Mol Biol Evol. 24:956-968.<br />

Rodriguez-Ezpeleta N, Philippe H, Brinkmann H, Becker B, Melkonian M. 2007. Phylogenetic<br />

analyses <strong>of</strong> nuclear, mitochondrial, and plastid multigene data sets support <strong>the</strong> placement<br />

<strong>of</strong> Mesostigma in <strong>the</strong> Streptophyta. Mol Biol Evol. 24:723-731.<br />

37<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Rogalski M, Karcher D, Bock R. 2008. Superwobbling facilitates translation with reduced tRNA<br />

sets. Nat Struct Mol Biol. 15:192-198.<br />

Rogers MB, Gilson PR, Su V, McFadden GI, Keeling PJ. 2007. <strong>The</strong> complete chloroplast<br />

genome <strong>of</strong> <strong>the</strong> chlorarachniophyte Bigelowiella natans: evidence for independent origins<br />

<strong>of</strong> chlorarachniophyte and euglenid secondary endosymbionts. Mol Biol Evol. 24:54-62.<br />

Rokas A. 2006. Genomics and <strong>the</strong> tree <strong>of</strong> life. Science. 313:1897-1899.<br />

Ronquist F, Huelsenbeck JP. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed<br />

models. Bioinformatics. 19:1572-1574.<br />

Sandaa RA, Heldal M, Castberg T, Thyrhaug R, Bratbak G. 2001. Isolation and characterization<br />

<strong>of</strong> two viruses with large genome size infecting Chrysochromulina ericina<br />

(Prymnesiophyceae) and <strong>Pyramimonas</strong> orientalis (Prasinophyceae). Virology. 290:272-<br />

280.<br />

Sheveleva EV, Hallick RB. 2004. Recent horizontal intron transfer to a chloroplast genome.<br />

Nucleic Acids Res. 32:803-810.<br />

Steinkotter J, Bhattacharya D, Semmelroth I, Bibeau C, Melkonian M. 1994. Prasinophytes form<br />

independent lineages within <strong>the</strong> Chlorophyta: Evidence from ribosomal RNA sequence<br />

comparisons. J Phycol. 30:340-345.<br />

Sw<strong>of</strong>ford DL. 2003. PAUP*. Phylogenetic analysis using parsimony (*and o<strong>the</strong>r methods).<br />

Version 4. Sunderland, MA: Sinauer Associates.<br />

Sym SD, Pienaar RN. 1993. <strong>The</strong> class Prasinophyceae. In: Round FE, Chapman DJ, editors.<br />

Progress in Phycological Research. Bristol: Biopress Ltd. p. 281-376.<br />

38<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Takahashi F, Okabe Y, Nakada T, Sekimoto H, Ito M, Kataoka H, Nozaki H. 2007. Origins <strong>of</strong><br />

<strong>the</strong> secondary plastids <strong>of</strong> Euglenophyta and Chlorarachniophyta as revealed by an<br />

analysis <strong>of</strong> <strong>the</strong> plastid-targeting, nuclear-encoded gene psbO. J Phycol. 43:1302-1309.<br />

Triemer R, Farmer M. 2007. A decade <strong>of</strong> euglenoid molecular phylogenetics. In: Brodie J, Lewis<br />

J, editors. Unravelling <strong>the</strong> <strong>Algae</strong>: <strong>The</strong> Past, Present, and Future <strong>of</strong> Algal Systematics.<br />

Boca Raton: CRC Press, Taylor & Francis. p. 315-330.<br />

Turmel M, Brouard JS, Gagnon C, Otis C, Lemieux C. 2008. Deep division in <strong>the</strong><br />

Chlorophyceae (Chlorophyta) revealed by chloroplast phylogenomic analyses. J Phycol.<br />

44:739-750.<br />

Turmel M, Lemieux C, Burger G, Lang BF, Otis C, Plante I, Gray MW. 1999a. <strong>The</strong> complete<br />

mitochondrial DNA sequences <strong>of</strong> Nephroselmis olivacea and Pedinomonas minor: two<br />

radically different evolutionary patterns within green algae. Plant Cell. 11:1717-1729.<br />

Turmel M, Otis C, Lemieux C. 2005. <strong>The</strong> complete chloroplast DNA sequences <strong>of</strong> <strong>the</strong><br />

charophycean green algae Staurastrum and Zygnema reveal that <strong>the</strong> chloroplast genome<br />

underwent extensive changes during <strong>the</strong> evolution <strong>of</strong> <strong>the</strong> Zygnematales. BMC Biology.<br />

3:22.<br />

Turmel M, Otis C, Lemieux C. 1999b. <strong>The</strong> complete chloroplast DNA sequence <strong>of</strong> <strong>the</strong> green<br />

alga Nephroselmis olivacea: insights into <strong>the</strong> architecture <strong>of</strong> ancestral chloroplast<br />

genomes. Proc Natl Acad Sci USA. 96:10248-10253.<br />

Turmel M, Otis C, Lemieux C. 2006. <strong>The</strong> chloroplast genome sequence <strong>of</strong> Chara vulgaris sheds<br />

new light into <strong>the</strong> closest green algal relatives <strong>of</strong> land plants. Mol Biol Evol. 23:1324-<br />

1338.<br />

39<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Wakasugi T, Nagai T, Kapoor M et al. 1997. Complete nucleotide sequence <strong>of</strong> <strong>the</strong> chloroplast<br />

genome from <strong>the</strong> green alga Chlorella vulgaris: <strong>the</strong> existence <strong>of</strong> genes possibly involved<br />

in chloroplast division. Proc Natl Acad Sci USA. 94:5967-5972.<br />

White TJ, Bruns T, Lee S, Taylor J. 1990. Amplification and direct sequencing <strong>of</strong> fungal<br />

ribosomal RNA genes for phylogenetics. In: Innis MA, Gelfand DH, Sninsky JJ, White<br />

TJ, editors. PCR Protocols: A Guide to Methods and Applications. San Diego: Academic<br />

Press. p. 315-322.<br />

Wolf PG, Karol KG, Mandoli DF, Kuehl J, Arumuganathan K, Ellis MW, Mishler BD, Kelch<br />

DG, Olmstead RG, Boore JL. 2005. <strong>The</strong> first complete chloroplast genome sequence <strong>of</strong> a<br />

lycophyte, Huperzia lucidula (Lycopodiaceae). Gene. 350:117-128.<br />

40<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Table 1<br />

General Features <strong>of</strong> Prasinophyte cpDNAs<br />

Feature Nephroselmis <strong>Pyramimonas</strong> Pycnococcus Monomastix Ostreococcus<br />

Size (bp)<br />

Total 200,799 101,605 80,211 114,528 71,666<br />

IR 46,137 13,057 – a – a 6,825<br />

LSC 92,126 65,153 – a – a 35,684<br />

SSC 16,399 10,338 – a – a 22,332<br />

A+T (%) 57.9 65.3 60.5 61.0 60.1<br />

Conserved genes (no.) b 128 110 98 94 88<br />

Introns<br />

Fraction <strong>of</strong> genome (%) 0 2.7 3.3 4.6 5.2<br />

Group I (no.) 0 0 0 5 0<br />

Group II (no.) 0 1 1 1 1<br />

Intergenic sequences c<br />

Fraction <strong>of</strong> genome (%) 32.6 19.6 11.6 43.9 15.1<br />

Average size (bp) 352 159 102 524 115<br />

Short repeated sequences d<br />

Fraction <strong>of</strong> genome (%) 0.5 0.5 0.1 17.6 0<br />

a<br />

Because Pycnococcus and Monomastix cpDNAs lack an IR, only <strong>the</strong> total sizes <strong>of</strong> <strong>the</strong>ses genomes are given.<br />

b<br />

Conserved genes refer to free-standing coding sequences usually present in chloroplast genomes. Genes present in <strong>the</strong> IR were<br />

counted only once.<br />

c<br />

In addition to conserved genes, all ORFs ≥100 codons were considered as gene sequences.<br />

d<br />

Non-overlapping repeat elements were mapped on each genome with RepeatMasker using <strong>the</strong> repeats ≥30 bp identified with<br />

REPuter as input sequences.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Table 2<br />

Gene Repertoires <strong>of</strong> Prasinophyte cpDNAs<br />

Gene a Nephroselmis <strong>Pyramimonas</strong> Pycnococcus Monomastix Ostreococcus<br />

accD + – – – –<br />

ccsA + + – + –<br />

cemA + – + – –<br />

chlB + + – – –<br />

chlI + + + + –<br />

chlL + + + – –<br />

chlN + + + – –<br />

cysA + – – – –<br />

cysT + – + – –<br />

ftsI + – – – –<br />

ftsW + – – – –<br />

minD + – – – –<br />

ndhA + + – – –<br />

ndhB + + – – –<br />

ndhC + + – – –<br />

ndhD + + – – –<br />

ndhE + + – – –<br />

ndhF + + – – –<br />

ndhG + + – – –<br />

ndhH + + – – –<br />

ndhI + + – – –<br />

ndhK + + – – –<br />

petD + – + + –<br />

petL + – + – –<br />

petN + + + + –<br />

psaC + + + – +<br />

psaJ + + – + +<br />

psaM – + – + +<br />

psbM + – – + –<br />

rne + – – – –<br />

rnpB + – + + –<br />

rpl12 + + – – –<br />

rpl19 + – + – –<br />

rpl22 – + – – –<br />

rpl32 + + – + +<br />

rpoB + + – + +<br />

rps9 + + – – +<br />

rrf + – – + +<br />

trnG(gcc) + + + + –<br />

trnI(cau) + + – + –<br />

trnL(caa) + – – – –<br />

trnL(gag) + – + – +<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


trnP(ggg) – – + – –<br />

trnR(ccg) + + + – –<br />

trnS(cga) + – + – –<br />

trnS(gga) + – + – –<br />

trnT(ggu) + – – – –<br />

ycf4 + + + – –<br />

ycf20 + b + – + –<br />

ycf47 + – – – –<br />

ycf62 + – – – –<br />

ycf65 – + – – –<br />

ycf81 + – – – –<br />

a Only <strong>the</strong> genes that are missing in one or more genomes are indicated. A total <strong>of</strong> 80 genes are<br />

shared by all compared cpDNAs: atpA, B, E, F, H, I, clpP, ftsH, infA, petA, B, G, psaA, B, I,<br />

psbA, B, C, D, E, F,H, I, J, K, L, N, T, Z, rbcL, rpl2, 5, 14, 16, 20, 23, 36, rpoA, C1, C2, rps2,<br />

3, 4, 7, 8, 11, 12, 14, 18, 19, rrl, rrs, tufA, trnA(ugc), C(gca), D(guc), E(uuc), F(gaa), G(ucc),<br />

H(gug), I(gau), K(uuu), L(uaa), L(uag), Me(cau), Mf(cau), N(guu), P(ugg), Q(uug), R(acg),<br />

R(ucu), S(gcu), S(uga), T(ugu), V(uac), W(cca), Y(gua), ycf1, 3, 12.<br />

b ycf20 is present as a pseudogene in Nephroselmis (unpublished data); it is located downstream<br />

<strong>of</strong> ndhE and corresponds to orf111 in <strong>the</strong> gene map reported by Turmel et al. (1999b).<br />

43<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


FIG. 1.—Gene map <strong>of</strong> <strong>Pyramimonas</strong> cpDNA. <strong>The</strong> 2 copies <strong>of</strong> <strong>the</strong> IR sequence are<br />

represented by thick lines. Genes (filled boxes) on <strong>the</strong> outside <strong>of</strong> <strong>the</strong> map are transcribed in a<br />

clockwise direction. Coding sequences not commonly found in cpDNA are shown in gray. <strong>The</strong><br />

single intron in atpB is represented by an open box. <strong>The</strong> color-code denotes <strong>the</strong> genomic regions<br />

containing <strong>the</strong> corresponding genes in <strong>the</strong> cpDNAs <strong>of</strong> Nephroselmis and streptophytes: magenta,<br />

SSC; cyan, LSC; and yellow, IR. Given <strong>the</strong> variable gene content <strong>of</strong> <strong>the</strong> IR in <strong>the</strong>se ancestral-<br />

type genomes, only <strong>the</strong> genes invariably present in this region (i.e., those forming <strong>the</strong> rRNA<br />

operon) were represented in yellow. tRNA genes are indicated by <strong>the</strong> 1-letter amino acid code<br />

(Me, elongator methionine; Mf, initiator methionine) followed by <strong>the</strong> anticodon in paren<strong>the</strong>ses.<br />

FIG. 2.—Gene map <strong>of</strong> Pycnococcus cpDNA. Genes (filled boxes) on <strong>the</strong> outside <strong>of</strong> <strong>the</strong> map<br />

are transcribed in a clockwise direction. <strong>The</strong> single intron in atpB is represented by an open box.<br />

<strong>The</strong> orf163 and orf175 revealed no detectable similarity with any known gene sequences. <strong>The</strong><br />

genes whose orthologs are found within <strong>the</strong> IR, SSC and LSC regions in Nephroselmis and<br />

streptophyte cpDNAs are color-coded in supplementary figure 2, Supplementary Material.<br />

FIG. 3.—Gene map <strong>of</strong> Monomastix cpDNA. Genes (filled boxes) on <strong>the</strong> outside <strong>of</strong> <strong>the</strong> map<br />

are transcribed in a clockwise direction. Introns are represented by open boxes. <strong>The</strong> orf122 and<br />

orf125 revealed no detectable similarity with any known gene sequences. <strong>The</strong> genes whose<br />

orthologs are found within <strong>the</strong> IR, SSC and LSC regions in Nephroselmis and streptophyte<br />

cpDNAs are color-coded in supplementary figure 3, Supplementary Material.<br />

44<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


FIG. 4.—Conservation <strong>of</strong> ancestral gene clusters in prasinophyte and Euglena cpDNAs.<br />

Ancestral clusters were defined as those containing genes in <strong>the</strong> same order and polarity in at<br />

least 1 streptophyte and 1 prasinophyte. For each genome, <strong>the</strong> set <strong>of</strong> genes making up each <strong>of</strong> <strong>the</strong><br />

identified ancestral clusters is shown as black boxes connected by a horizontal line. Black boxes<br />

that are contiguous but not linked toge<strong>the</strong>r indicate that <strong>the</strong> corresponding genes are not adjacent<br />

on <strong>the</strong> genome. Gray boxes denote individual genes that have been relocated elsewhere on <strong>the</strong><br />

chloroplast genome and empty boxes denote missing genes. <strong>The</strong> relative polarities <strong>of</strong> <strong>the</strong> genes<br />

are not represented in this figure; for this information, consult <strong>the</strong> maps shown in figures 1-3 or<br />

that previously reported for <strong>the</strong> Nephroselmis genome (Turmel et al. 1999b).<br />

FIG. 5.—Derived gene clusters uniquely shared by <strong>the</strong> Euglena and <strong>Pyramimonas</strong> cpDNAs.<br />

<strong>The</strong> genes shown as gray boxes represent <strong>the</strong> derived components <strong>of</strong> <strong>the</strong>se clusters; those shown<br />

as black boxes exhibit an ancestral organization. <strong>The</strong> genes shown as empty boxes are missing in<br />

Euglena cpDNA.<br />

FIG. 6.—Phylogenetic position <strong>of</strong> Monomastix among prasinophytes as inferred from<br />

nuclear-encoded SSU rDNA sequences. <strong>The</strong> figure presents <strong>the</strong> best ML tree. Bootstrap values<br />

are shown on <strong>the</strong> corresponding nodes. <strong>The</strong> names <strong>of</strong> <strong>the</strong> taxa whose chloroplast genomes were<br />

examined in <strong>the</strong> present study are shown on a black background. Clade numbering follows that<br />

<strong>of</strong> Guillou et al. (2004).<br />

FIG. 7.—Phylogenies inferred from 70 concatenated chloroplast genes (first 2 codon<br />

positions) and <strong>the</strong>ir deduced amino acid sequences. (A) Best ML tree inferred from <strong>the</strong> amino<br />

45<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


acid data set. (B) Best ML tree inferred from <strong>the</strong> nucleotide data set. <strong>The</strong> bootstrap values<br />

obtained in ML analyses and <strong>the</strong> posterior probability values obtained in Bayesian analyses are<br />

shown on <strong>the</strong> left and right, respectively on <strong>the</strong> corresponding nodes.<br />

FIG. 8.—Losses <strong>of</strong> chloroplast genes and gene pairs during <strong>the</strong> evolution <strong>of</strong> prasinophytes<br />

and euglenids. Unique losses are indicated by squares, whereas convergent losses in 2 or more<br />

lineages are indicated by triangles. Red and blue symbols refer to losses <strong>of</strong> genes and gene pairs,<br />

respectively. Some gene pairs disappeared as a result <strong>of</strong> gene losses; those that were not<br />

correlated with any gene losses are denoted by dots. <strong>The</strong> number below each taxon name<br />

indicates <strong>the</strong> total number <strong>of</strong> conserved genes in <strong>the</strong> chloroplast genome. Losses <strong>of</strong> <strong>the</strong> IR<br />

occurred in <strong>the</strong> 3 indicated lineages.<br />

46<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


atpH<br />

rpl5<br />

psbK<br />

T(ugu)<br />

P(ugg)<br />

atpF<br />

psbF<br />

psbE<br />

petB<br />

rpl16<br />

petG<br />

R(ccg)<br />

chlL<br />

rpl2<br />

L(uaa)<br />

rps3<br />

rps9<br />

chlB<br />

rrs<br />

ndhE<br />

rpl12<br />

psbT<br />

ycf1<br />

infA<br />

I(gau)<br />

rpoB<br />

psbC<br />

rpl36<br />

ndhD<br />

psaM<br />

rpoC1<br />

F(gaa)<br />

chlI<br />

rps11<br />

rps4<br />

psbL<br />

rpl23<br />

rps19<br />

rpl32<br />

rpl14<br />

psaC<br />

psbJ<br />

ycf20<br />

rpl22<br />

rps8<br />

rrl<br />

G(ucc)<br />

A(ugc)<br />

W(cca)<br />

psbH<br />

rpoC2<br />

ycf12<br />

rbcL<br />

V(uac)<br />

atpI<br />

psbB<br />

ftsH<br />

rpoA<br />

ycf3<br />

psbD<br />

K(uuu) Y(gua)<br />

S(uga)<br />

rps14<br />

Me(cau)<br />

atpA<br />

D(guc)<br />

chlN<br />

R(ucu)<br />

psbA<br />

orf454<br />

I(cau)<br />

ndhI<br />

rpl20<br />

ycf65<br />

ndhF<br />

petN<br />

rps7<br />

C(gca)<br />

R(acg)<br />

petA<br />

chlL<br />

tufA<br />

rps12<br />

ndhH<br />

clpP<br />

rps2<br />

atpB<br />

rrs<br />

ndhE<br />

psaA<br />

S(gcu)<br />

I(gau)<br />

orf510<br />

atpE<br />

psbN<br />

N(guu)<br />

psaI<br />

ndhC<br />

rps18<br />

psaJ<br />

ndhA<br />

Mf(cau)<br />

G(gcc)<br />

ndhG<br />

ndhK<br />

psaC<br />

ycf20<br />

rrl<br />

L(uag)<br />

ndhB<br />

H(gug)<br />

G(ucc)<br />

A(ugc)<br />

W(cca)<br />

V(uac)<br />

ccsA<br />

psbZ<br />

E(uuc)<br />

orf608<br />

psaB<br />

chlN<br />

Q(uug)<br />

psbI<br />

orf454<br />

ycf4<br />

IR A<br />

IR B<br />

SSC<br />

LSC<br />

<strong>Pyramimonas</strong> parkeae<br />

chloroplast DNA<br />

101,605 bp<br />

Figure 1<br />

by guest on July 15, 2013<br />

http://mbe.oxfordjournals.org/<br />

Downloaded from


psbK<br />

rpl20<br />

T(ugu)<br />

P(ugg)<br />

petB<br />

petG<br />

R(acg)<br />

petA<br />

R(ccg)<br />

tufA<br />

clpP<br />

L(uaa)<br />

rps2<br />

L(gag)<br />

psbT<br />

F(gaa)<br />

rpoC1<br />

rps4<br />

psaI<br />

N(guu)<br />

rps18<br />

Mf(cau)<br />

petL<br />

G(gcc)<br />

L(uag)<br />

H(gug)<br />

A(ugc)<br />

psbH<br />

W(cca)<br />

rpoC2<br />

ycf12<br />

rbcL<br />

V(uac)<br />

psbB<br />

ycf3<br />

cemA<br />

K(uuu)<br />

rpl19<br />

psbZ<br />

S(uga)<br />

rps14<br />

E(uuc)<br />

petD<br />

D(guc)<br />

Q(uug)<br />

psbI<br />

rpl5<br />

atpH<br />

atpF<br />

psbF<br />

petN<br />

rps7<br />

psbE<br />

C(gca)<br />

rpl16<br />

chlL<br />

rps12<br />

rpl2<br />

rps3<br />

atpB<br />

rrs<br />

orf583<br />

psaA<br />

S(gcu)<br />

orf175<br />

infA<br />

ycf1<br />

I(gau)<br />

psbC<br />

rpl36<br />

atpE<br />

chlI<br />

psbN<br />

rps11<br />

psbL<br />

rpl23<br />

rps19<br />

rpl14<br />

psaC<br />

psbJ<br />

rps8<br />

rrl<br />

S(cga)<br />

G(ucc)<br />

atpI<br />

rpoA<br />

ftsH<br />

orf163<br />

psbD<br />

Y(gua)<br />

rnpB<br />

Me(cau)<br />

cysT<br />

atpA<br />

S(gga)<br />

P(ggg)<br />

psaB<br />

psbA<br />

R(ucu)<br />

chlN<br />

ycf4<br />

Pycnococcus provasolii<br />

chloroplast DNA<br />

80,211 bp<br />

Figure 2<br />

by guest on July 15, 2013<br />

http://mbe.oxfordjournals.org/<br />

Downloaded from


atpH<br />

psbK<br />

rpl20<br />

T(ugu)<br />

atpF<br />

rps7<br />

petG<br />

R(acg)<br />

petA<br />

tufA<br />

rps12<br />

clpP<br />

rps2<br />

atpB<br />

psbM<br />

ycf1<br />

rpoB<br />

psbC<br />

rpl36<br />

rpoC1<br />

atpE<br />

rps11<br />

rps4<br />

psaI<br />

G(gcc)<br />

ycf20<br />

H(gug)<br />

G(ucc)<br />

psbH<br />

W(cca)<br />

orf219<br />

rpoC2<br />

ycf12<br />

V(uac)<br />

atpI<br />

ftsH<br />

rpoA<br />

ycf3<br />

ccsA<br />

psbZ<br />

rnpB<br />

rps14<br />

atpA<br />

psbA<br />

R(ucu)<br />

psbI<br />

Q(uug)<br />

I(cau)<br />

rpl5<br />

P(ugg)<br />

psbF<br />

petN<br />

psbE<br />

petB<br />

rpl16<br />

C(gca)<br />

rpl2<br />

rps3<br />

L(uaa)<br />

rrs<br />

orf170<br />

psaA<br />

S(gcu)<br />

psbT<br />

infA<br />

I(gau)<br />

psaM<br />

F(gaa)<br />

chlI<br />

psbN<br />

orf161<br />

N(guu)<br />

rps18<br />

psaJ<br />

psbL<br />

rrf<br />

orf122<br />

Mf(cau)<br />

rpl23<br />

rps19<br />

rpl32<br />

rpl14<br />

psbJ<br />

rps8<br />

rrl<br />

L(uag)<br />

A(ugc)<br />

rbcL<br />

psbB<br />

psbD<br />

orf248<br />

K(uuu)<br />

Y(gua)<br />

S(uga)<br />

orf138<br />

petD<br />

E(uuc)<br />

Me(cau)<br />

D(guc)<br />

psaB<br />

orf125<br />

Monomastix sp.<br />

chloroplast DNA<br />

114,528 bp<br />

Figure 3<br />

by guest on July 15, 2013<br />

http://mbe.oxfordjournals.org/<br />

Downloaded from


C(gca)<br />

rpoB<br />

rpoC1<br />

rpoC2<br />

rps2<br />

rrs<br />

I(gau)<br />

A(ugc)<br />

rrl<br />

R(acg)<br />

rrf<br />

minD<br />

ndhH<br />

ndhA<br />

rps15<br />

ndhI<br />

ndhE<br />

ndhG<br />

ndhF<br />

ndhD<br />

psaC<br />

ycf20<br />

psbB<br />

psbT<br />

psbN<br />

psbH<br />

petB<br />

petD<br />

clpP<br />

atpI<br />

atpH<br />

atpF<br />

atpA<br />

psbE<br />

psbF<br />

psbL<br />

psbJ<br />

R(ccg)<br />

rbcL<br />

atpB<br />

atpE<br />

chlN<br />

chlL<br />

N(guu)<br />

ccsA<br />

ftsW<br />

rps12<br />

rps7<br />

tufA<br />

rpl19<br />

cemA<br />

petL<br />

ycf4<br />

petA<br />

petG<br />

rpl23<br />

rpl2<br />

rps19<br />

rpl22<br />

rps3<br />

rpl14<br />

rps8<br />

rpl16<br />

rpl5<br />

infA<br />

rpl36<br />

rps11<br />

rpoA<br />

rps9<br />

rpl12<br />

Pycnococcus<br />

Monomastix<br />

Ostreococcus<br />

<strong>Pyramimonas</strong><br />

Euglena<br />

Nephroselmis<br />

Pycnococcus<br />

Monomastix<br />

Ostreococcus<br />

<strong>Pyramimonas</strong><br />

psbK<br />

S(cga)<br />

accD<br />

psaI<br />

Y(gua)<br />

T(ggu)<br />

rps14<br />

Mf(cau)<br />

chlI<br />

G(ucc)<br />

cysA<br />

E(uuc)<br />

ftsI<br />

psbA<br />

psaJ<br />

P(ugg)<br />

W(cca)<br />

psbZ<br />

S(uga)<br />

ftsH<br />

rpl32<br />

cysT<br />

ycf1<br />

psbD<br />

psbC<br />

psaA<br />

psaB<br />

rpl20<br />

D(guc)<br />

ndhC<br />

ndhK<br />

Euglena<br />

Nephroselmis<br />

Figure 4 by guest on July 15, 2013<br />

http://mbe.oxfordjournals.org/<br />

Downloaded from


Euglena<br />

<strong>Pyramimonas</strong><br />

Euglena<br />

<strong>Pyramimonas</strong><br />

rpl23<br />

rpl2<br />

rps19<br />

rpl22<br />

W(cca)<br />

chlN<br />

chlL<br />

rrs<br />

rps3<br />

rpl16<br />

rpl14<br />

rpl5<br />

rps8<br />

infA<br />

rpl36<br />

I(cau)<br />

rps14<br />

ycf3<br />

ycf65<br />

F(gaa)<br />

I(gau)<br />

A(ugc)<br />

rrl<br />

psbK<br />

ycf12<br />

psaM<br />

rps4<br />

rps11<br />

psaC<br />

rpl32<br />

ycf1<br />

Figure 5<br />

rpl20<br />

rps12<br />

rps7<br />

tufA<br />

C(gca)<br />

rps2<br />

ycf4<br />

cemA<br />

petA<br />

Q(uug)<br />

N(guu)<br />

R(acg)<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


79<br />

50<br />

54<br />

100<br />

98<br />

94<br />

98<br />

90<br />

98<br />

51<br />

93 74<br />

100<br />

71<br />

98<br />

100 Pycnococcus provasolii CCMP1203<br />

Pseudoscourfieldia marina K-0017<br />

Pycnococcus provasolii RCC 244<br />

Nephroselmis pyriformis CCMP 717<br />

Nephroselmis pyriformis RCC 499<br />

Nephroselmis pyriformis MBIC 11099<br />

Nephroselmis pyriformis MBIC10641<br />

Nephroselmis olivacea SAG 40.89<br />

98 Ostreococcus sp. RCC 143<br />

100 Environmental sequence RA000412.150<br />

Ostreococcus tauri OTTHO0595<br />

90<br />

Ostreococcus sp. RCC 501<br />

Bathycoccus prasinos BLA77<br />

100 Bathycoccus prasinos ALMO2<br />

Environmental sequence RA000412.37<br />

Micromonas sp. RCC 434<br />

100<br />

Environmental sequence BL000921.10<br />

95 Environmental sequence RA000412.97<br />

100 Micromonas pusilla CCMP 489<br />

Micromonas sp. CCMP 490<br />

Mantoniella squamata CCAP 1965/1<br />

Mantoniella antarctica<br />

Unidentified flagellate RCC 391<br />

85 Mamiella sp. strain: Shizugawa<br />

Crustomastix sp.<br />

Dolichomastix tenuilepis<br />

Monomastix sp.<br />

Environmental sequence BL010625.18<br />

<strong>Pyramimonas</strong> australis<br />

<strong>Pyramimonas</strong> parkae strain: Hachijo<br />

<strong>Pyramimonas</strong> propulsa NIES251<br />

<strong>Pyramimonas</strong> olivacea Strain: Shizugawa<br />

<strong>Pyramimonas</strong> disomata Singapore<br />

Prasinopapilla vacuolata<br />

100 Pterosperma cristatum NIES 221<br />

Pterosperma cristatum strain: Yokohama<br />

Cymbomonas tetramitiformis strain: Shizugawa<br />

Halosphaera sp. strain: Shizugawa<br />

Prasinoderma cf coloniale CCMP 1220<br />

Unidentified coccoid CCMP 1193<br />

Unidentified coccoid CCMP 1413<br />

Prasinoderma coloniale MBIC 10720<br />

Unidentified coccoid MBIC 10622<br />

Entransia fimbriata UTEX LB 2352<br />

Picocystis salinarum IM214<br />

Picocystis salinarum SSFB<br />

Picocystis salinarum L7<br />

Klebsormidium subtilissimum UTEX 462<br />

Marchantia polymorpha<br />

Zamia pumila<br />

Coleochaete scutata SAG 100.80<br />

100<br />

84<br />

84<br />

57<br />

100<br />

100<br />

91<br />

68<br />

70 71<br />

100<br />

100<br />

100<br />

100<br />

100<br />

Chlorokybus atmophyticus UTEX LB 2591<br />

Chlorella minutissima C-1.1.9<br />

Nanochlorum eucaryotum Mainz 1<br />

100<br />

92<br />

100<br />

100<br />

Trebouxia impressa UTEX 892<br />

Symbiont <strong>of</strong> radiolarian host: cf. Spongodrymus 331<br />

Symbiont <strong>of</strong> radiolarian host: cf. Spongodrymus 333<br />

Symbiont <strong>of</strong> radiolarian host: cf. Spongodrymus 257<br />

Environmental sequence BL010625.1<br />

Tetraselmis sp. RCC 500<br />

Tetraselmis sp. MBIC 11125<br />

Environmental sequence BL010625.2<br />

Scherffelia dubia<br />

Tetraselmis sp. RG-07<br />

Tetraselmis striata PLY 443<br />

Tetraselmis convolutae 208<br />

Environmental sequence RA001219.46<br />

Unidentified coccoid CCMP 1205<br />

Unidentified coccoid RCC 287<br />

Environmental sequence OLI11059<br />

Environmental sequence OLI11305<br />

Environmental sequence OLI11345<br />

Pseudoscourfieldia marina K-0017<br />

Ostreococcus sp. RCC 344<br />

Environmental sequence RA010412.39<br />

Ostreococcus sp. RCC 356<br />

Ostreococcus sp. MBIC 10636<br />

Ostreococcus sp. RCC 393<br />

Prasinococcus sp. CCMP 1614<br />

Prasinococcus cf. capsulatus CCMP 1407<br />

Prasinococcus sp. CCMP 1202<br />

Prasinococcus sp. CCMP 1194<br />

Staurastrum sp. M753<br />

Genicularia spirotaenia 329<br />

100<br />

Chara foetida<br />

Nitella capillaris<br />

Chaetosphaeridium globosum M1311<br />

Mesostigma viride NIES 475<br />

Neochloris aquatica<br />

Unidentified coccoid/flagellate CCMP1189<br />

Chlamydomonas reinhardtii<br />

100<br />

Acrosiphonia duriuscula<br />

Ulothrix zonata SAG 38.86<br />

Pseudoscourfieldia marina RCC 261<br />

Pycnococcus provasolii CCMP1199<br />

Pycnococcus provasolii CCMP1198<br />

Figure 6<br />

0.05<br />

Chlorophyceae<br />

Ulvophyceae<br />

Trebouxiophyceae<br />

Clade IV<br />

Chlorodendrales<br />

Clade VII<br />

Clade V<br />

Pseudocourfieldiales<br />

Pycnococcaceae<br />

Clade III<br />

Pseudocourfieldiales<br />

Nephroselmidaceae<br />

Clade II<br />

Mamiellales<br />

Clade I<br />

Pyramimonadales<br />

Clade VI<br />

Prasinococcales<br />

Streptophyta<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Mesostigma<br />

Chlorokybus<br />

Chara<br />

Zygnema<br />

Staurastrum<br />

Chaetosphaeridium<br />

Physcomitrella<br />

Marchantia<br />

Anthoceros<br />

Nephroselmis<br />

Euglena<br />

<strong>Pyramimonas</strong><br />

Monomastix<br />

Ostreococcus<br />

Pycnococcus<br />

Chlorella<br />

Bigelowiella<br />

Oltmannsiellopsis<br />

Pseudendoclonium<br />

Leptosira<br />

Oedogonium<br />

Stigeoclonium<br />

Scenedesmus<br />

Chlamydomonas<br />

0.1<br />

Mesostigma<br />

Chlorokybus<br />

Chara<br />

Zygnema<br />

Staurastrum<br />

Chaetosphaeridium<br />

Physcomitrella<br />

Marchantia<br />

Anthoceros<br />

Nephroselmis<br />

Pycnococcus<br />

Euglena<br />

<strong>Pyramimonas</strong><br />

Monomastix<br />

Ostreococcus<br />

Chlorella<br />

Leptosira<br />

Bigelowiella<br />

Oltmannsiellopsis<br />

Pseudendoclonium<br />

Oedogonium<br />

Stigeoclonium<br />

Scenedesmus<br />

Chlamydomonas<br />

0.05<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

100/100<br />

95/100<br />

85/100<br />

77/100<br />

100/100<br />

69/100<br />

59/100<br />

45/98<br />

70/100<br />

42/50<br />

44/50<br />

56/98<br />

100/100<br />

100/100<br />

100/100<br />

72/100<br />

93/100<br />

87/100<br />

82/100<br />

100/100<br />

100/100<br />

49/83<br />

99/100<br />

47/100<br />

51/96<br />

57/100<br />

30/83<br />

Core chlorophytes<br />

Prasinophyceae<br />

Streptophyta<br />

Core chlorophytes<br />

Prasinophyceae<br />

Streptophyta<br />

A<br />

B<br />

Figure 7<br />

by guest on July 15, 2013<br />

http://mbe.oxfordjournals.org/<br />

Downloaded from


Monomastix<br />

114.5 kb<br />

94 genes<br />

-IR<br />

Clade II — Mamiellales<br />

psaC<br />

rps9<br />

L(gag)<br />

3'rpoA-5'rps9<br />

3'rps7-5'tufA<br />

5'C(gca)-5'rpoB<br />

3'rpoB-5'rpoC1<br />

3'rpoC1-5'rpoC2<br />

3'rpoC2-5'rps2<br />

5'psbH-5'psbN<br />

3'P(ugg)-5'W(cca)<br />

3'psbD-5'psbC<br />

•<br />

Ostreococcus<br />

71.7 kb<br />

88 genes<br />

chlB<br />

chlL<br />

chlN<br />

ndhA<br />

ndhB<br />

ndhC<br />

ndhD<br />

ndhE<br />

ndhF<br />

ndhG<br />

ndhH<br />

ndhI<br />

ndhK<br />

rpl12<br />

rpl22<br />

ycf4<br />

ycf65<br />

L(caa)<br />

R(ccg)<br />

3'rps19-5'rpl22<br />

3'rpl22-5'rps3<br />

3'infA-5'rpl36<br />

3'rps9-5'rpl12<br />

3'rps2-5'atpI<br />

3'ndhH-5'ndhA<br />

3'ndhA-5'ndhI<br />

5'clpP-5'psbB<br />

3'psbN-3'psbT<br />

3'psbH-5'petB<br />

3'chlL-5'chlN<br />

3'rbcL-5'R(ccg)<br />

5'atpB-5'rbcL<br />

5'psaJ-5'P(ugg)<br />

5'rpl20-5'D(guc)<br />

3'ndhC-5'ndhK<br />

Clade III — Nephroselmidaceae<br />

Nephroselmis<br />

200.8 kb<br />

128 genes<br />

psaM<br />

rpl22<br />

ycf20<br />

ycf65<br />

P(ggg)<br />

3'rps19-5'rpl22<br />

3'rpl22-5'rps3<br />

3'rps2-5'atpI<br />

5'clpP-5'psbB<br />

3'psbH-5'petB<br />

3'rbcL-5'R(ccg)<br />

5'atpB-5'rbcL<br />

3'rps14-5'Mf(cau)<br />

ccsA<br />

chlI<br />

petD<br />

petN<br />

psbM<br />

rnpB<br />

ycf20<br />

G(gcc)<br />

I(cau)<br />

3'petB-5'petD<br />

Clade I — Pyramimonadales<br />

<strong>Pyramimonas</strong><br />

•<br />

•<br />

•<br />

•<br />

•<br />

•<br />

101.6 kb<br />

110 genes<br />

rrf<br />

L(caa)<br />

3’rps2-5’atpI<br />

3'rrl-5'rrf<br />

5'psaJ-5'P(ugg)<br />

cemA<br />

cysT<br />

petL<br />

rpl19<br />

P(ggg)<br />

S(cga)<br />

S(gga)<br />

3'rpl36-5'rps11<br />

3'tufA-5'rpl19<br />

3'petA-5'petL<br />

3'petL-5'petG<br />

3'cysT-5'ycf1<br />

3'rps14-5'Mf(cau)<br />

Ancestor<br />

134 genes<br />

•<br />

•<br />

•<br />

•<br />

Euglena<br />

143.2 kb<br />

87 genes<br />

-IR<br />

petD<br />

psbM<br />

rnpB<br />

L(gag)<br />

3'rpoC2-5'rps2<br />

3'petB-5'petD<br />

3'P(ugg)-5'W(cca)<br />

accD<br />

cysA<br />

ftsI<br />

ftsW<br />

minD<br />

rne<br />

Figure 8<br />

ycf47<br />

ccsA<br />

chlB<br />

chlL<br />

chlN<br />

clpP<br />

ftsH<br />

infA<br />

ndhA<br />

ndhB<br />

ndhC<br />

ndhD<br />

ndhE<br />

ndhF<br />

ndhG<br />

ndhH<br />

ndhI<br />

ndhK<br />

petA<br />

petN<br />

psaI<br />

ycf1<br />

ycf3<br />

ycf20<br />

ycf65<br />

R(ccg)<br />

3'rps8-5'infA<br />

3'infA-5'rpl36<br />

3'rps11-5'rpoA<br />

3'rpoA-5'rps9<br />

5'C(gca)-5'rpoB<br />

3'ndhH-5'ndhA<br />

3'ndhA-5'ndhI<br />

5'clpP-5'psbB<br />

3'psbN-3'psbT<br />

5'psbH-5'psbN<br />

3'psbH-5'petB<br />

3'chlL-5'chlN<br />

3'rbcL-5'R(ccg)<br />

5'atpB-5'rbcL<br />

3'psbD-5'psbC<br />

5'rpl20-5'D(guc)<br />

3'ndhC-5'ndhK<br />

•<br />

•<br />

ycf62<br />

ycf81<br />

T(ggu)<br />

3'rpl19-5'ycf4<br />

3'ycf4-5'cemA<br />

3'cemA-5'petA<br />

3'psaC-5'ndhD<br />

3'ndhD-3'ndhF<br />

3'ndhI-5'ndhG<br />

3'ndhG-5'ndhE<br />

3'rrf-5'R(acg)<br />

3'minD-3'R(acg)<br />

5'ftsW-5'N(guu)<br />

3'ccsA-3'N(guu)<br />

5'ccsA-5'chlL<br />

3'rpl32-5'cysT<br />

5'psbZ-5'S(uga)<br />

3'S(uga)-5'ftsH<br />

3'ftsI-5'psbA<br />

5'chlI-5'G(ucc)<br />

3'psbK-5'S(cga)<br />

5'cysA-5'E(uuc)<br />

3'Y(gua)-5'T(ggu)<br />

5'accD-5'psaI<br />

Clade V — Pycnococcaceae<br />

Pycnococcus<br />

80.2 kb<br />

98 genes<br />

•<br />

•<br />

•<br />

•<br />

•<br />

•<br />

•<br />

•<br />

-IR<br />

ccsA<br />

chlB<br />

ndhA<br />

ndhB<br />

ndhC<br />

ndhD<br />

ndhE<br />

ndhF<br />

ndhG<br />

ndhH<br />

ndhI<br />

ndhK<br />

psaJ<br />

psaM<br />

psbM<br />

rpl12<br />

rpl22<br />

rpl32<br />

rpoB<br />

rps9<br />

rrf<br />

ycf20<br />

ycf65<br />

I(cau)<br />

L(caa)<br />

3'rps19-5'rpl22<br />

3'rpl22-5'rps3<br />

3'rpoA-5'rps9<br />

3'rps9-5'rpl12<br />

3'rps7-5'tufA<br />

5'C(gca)-5'rpoB<br />

3'rpoB-5'rpoC1<br />

3'rpoC2-5'rps2<br />

3'rps2-5'atpI<br />

3'ndhH-5'ndhA<br />

3'ndhA-5'ndhI<br />

5'clpP-5'psbB<br />

3'psbH-5'petB<br />

3'rrs-5'I(gau)<br />

3'I(gau)-5'A(ugc)<br />

3'A(ugc)-5'rrl<br />

3'rrl-5'rrf<br />

3'rbcL-5'R(ccg)<br />

5'atpB-5'rbcL<br />

5'psaJ-5'P(ugg)<br />

3'P(ugg)-5'W(cca)<br />

5'rpl20-5'D(guc)<br />

3'ndhC-5'ndhK<br />

•<br />

•<br />

•<br />

•<br />

•<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!