MUSTANG is a novel family of domesticated transposase genes ...
MUSTANG is a novel family of domesticated transposase genes ...
MUSTANG is a novel family of domesticated transposase genes ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>MUSTANG</strong> <strong>is</strong> a <strong>novel</strong> <strong>family</strong> <strong>of</strong> <strong>domesticated</strong> <strong>transposase</strong> <strong>genes</strong> found in diverse<br />
angiosperms<br />
Rebecca K. Cowan, Douglas R. Hoen, Daniel J. Schoen, and Thomas E. Bureau*<br />
McGill University, Biology Department, 1205 Dr. Penfield Avenue, Montreal, Quebec H3A 1B1<br />
* corresponding author: 1205 Ave Docteur Penfield, Montreal, Quebec, Canada H3A 1B1<br />
phone: (514) 398-6472; fax: (514) 398-5069; email: thomas.bureau@mcgill.ca<br />
Th<strong>is</strong> manuscript <strong>is</strong> intended as a Research Article.<br />
Running head: Self<strong>is</strong>h DNA domestication in flowering plant genomes<br />
Key words: Genome; Evolution; Transposons; Plants<br />
Abbreviations:<br />
FAR1 = FAR-RED IMPAIRED RESPONSE PROTEIN 1<br />
MUG = <strong>MUSTANG</strong><br />
MULE = Mutator-like element<br />
TAM = transposon-associated mudrA<br />
TDM = transposon-d<strong>is</strong>sociated mudrA<br />
TIR = terminal inverted repeat<br />
TSD = target site duplication<br />
MBE Advance Access publ<strong>is</strong>hed June 29, 2005<br />
© The Author 2005. Publ<strong>is</strong>hed by Oxford University Press on behalf <strong>of</strong> the Society for Molecular Biology<br />
and Evolution. All rights reserved. For perm<strong>is</strong>sions, please e-mail: journals.perm<strong>is</strong>sions@oupjournals.org<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/ by guest on July 15, 2013
Abstract<br />
While transposons have traditionally been viewed as genomic parasites or “junk DNA”, the<br />
d<strong>is</strong>covery <strong>of</strong> transposon-derived host <strong>genes</strong> has fuelled an ongoing debate over the evolutionary<br />
role <strong>of</strong> transposons. In particular, while mobility-related open reading frames have been known<br />
to acquire host functions, the contribution <strong>of</strong> these types <strong>of</strong> events to the evolution <strong>of</strong> <strong>genes</strong> <strong>is</strong> not<br />
well known. Here we report that genome-wide searches for Mutator <strong>transposase</strong>-derived host<br />
<strong>genes</strong> in Arabidops<strong>is</strong> thaliana (Columbia-0) and Oryza sativa ssp. japonica (cv. Nipponbare)<br />
(<strong>domesticated</strong> rice) identified 121 sequences, including the taxonomically-conserved<br />
<strong>MUSTANG</strong>1. Syntenic <strong>MUSTANG</strong>1 orthologs in such varied plant species as rice, poplar,<br />
Arabidops<strong>is</strong>, and Medicago truncatula appear to be under purifying selection. However, despite<br />
the evidence <strong>of</strong> th<strong>is</strong> pathway <strong>of</strong> gene evolution, <strong>MUSTANG</strong>1 belongs to one <strong>of</strong> only two Mutator-<br />
like gene families with members in both monocotyledonous and dicotyledonous plants,<br />
suggesting that Mutator-like elements seldom evolve into taxonomically widespread host <strong>genes</strong>.<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Introduction<br />
Mobile elements have traditionally been viewed as “junk DNA” or as genomic parasites<br />
(Doolittle and Sapienza 1980; Orgel and Crick 1980). However, reports <strong>of</strong> transposon-derived<br />
<strong>genes</strong> contributing to cellular processes (Agrawal, Eastman, and Schatz 1998; Mi et al. 2000;<br />
Hudson, L<strong>is</strong>ch, and Quail 2003) have indicated that such genetic elements may influence the<br />
evolution <strong>of</strong> their host by providing a source <strong>of</strong> <strong>novel</strong> coding capacity. Other lines <strong>of</strong> evidence<br />
such as sequence analys<strong>is</strong> and conservation studies, also support th<strong>is</strong> view (International Human<br />
Genome Sequencing Constorium 2001; Hammer, Strehl, and Hagemann 2005; Zdobnov et al.<br />
2005). Th<strong>is</strong>, in turn, has spurred controversy concerning the extent to which transposable<br />
elements contribute to host evolution (Hurst and Werren 2001).<br />
In plants, the only transposon-like <strong>genes</strong> with known host functions, far-red impaired<br />
response protein 1 (FAR1), far-red elongated hypocotyl 3 (FHY3) and the FAR1 related<br />
sequences (FRS), are related to the <strong>family</strong> <strong>of</strong> DNA transposons called Mutator-like elements<br />
(MULEs) (Hudson, L<strong>is</strong>ch, and Quail 2003; Lin and Wang 2004). MULEs are DNA transposons,<br />
which move via a “cut-and-paste” mechan<strong>is</strong>m (L<strong>is</strong>ch 2002). The mudrA gene <strong>of</strong> autonomous<br />
elements encodes the <strong>transposase</strong> MURA that <strong>is</strong> required for MULE mobility (L<strong>is</strong>ch et al. 1999).<br />
Though most MULEs are bounded by terminal-inverted repeats (TIRs) <strong>of</strong> ~200 bp in length,<br />
some MULEs lack terminal symmetry and are termed non-TIR MULEs (Yu, Wright, and Bureau<br />
2000). In maize, MURA binds specifically to the terminal sequences <strong>of</strong> MULEs (Benito and<br />
Walbot 1997). Th<strong>is</strong> association between the <strong>transposase</strong> and the terminal sequences appears<br />
necessary for exc<strong>is</strong>ion <strong>of</strong> the element to occur, as it <strong>is</strong> in other DNA transposons (Steiniger-<br />
White, Rayment, and Reznik<strong>of</strong>f 2004). Upon insertion <strong>of</strong> the MULE into the genome, 9-11 bp<br />
target site duplications (TSDs) are created (Bennetzen 1996).<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Transposon-derived <strong>genes</strong> with known host functions, such as FAR1 and FHY3, lack<br />
transposon-specific terminal sequences (Hudson, L<strong>is</strong>ch, and Quail 2003). Th<strong>is</strong> <strong>is</strong> likely because<br />
the transposition <strong>of</strong> a <strong>domesticated</strong> mobile element, one that has gained a host function, could be<br />
detrimental to its host. The removal <strong>of</strong> such a gene from the vicinity <strong>of</strong> c<strong>is</strong>-regulatory factors, or<br />
its insertion into a heterochromatic region, could dramatically change its expression (van<br />
Leeuwen et al. 2001). Thus, selection should act to remove mobility-enabling features such as<br />
TIRs.<br />
We took advantage <strong>of</strong> th<strong>is</strong> expected h<strong>is</strong>torical selection pressure in our search for<br />
potentially <strong>domesticated</strong> <strong>genes</strong>, by mining the genomes <strong>of</strong> Oryza sativa ssp. japonica (cv.<br />
Nipponbare) (<strong>domesticated</strong> rice) and Arabidops<strong>is</strong> thaliana (Columbia-0) for sequences that<br />
shared similarity to mudrA, but lacked transposon features such as TIRs. We then examined the<br />
evolutionary h<strong>is</strong>tories <strong>of</strong> these putative transposon-derived <strong>genes</strong> by performing phylogenetic<br />
reconstructions and clustering analyses on all mudrA-related sequences. The results <strong>of</strong> our<br />
analys<strong>is</strong> are d<strong>is</strong>cussed with respect to the importance that transposons play in the evolution <strong>of</strong><br />
plant <strong>genes</strong>.<br />
Materials and Methods<br />
Identification <strong>of</strong> transposon-associated and transposon-d<strong>is</strong>sociated mudrA <strong>genes</strong><br />
In order to determine the abundance <strong>of</strong> <strong>domesticated</strong> mudrA <strong>genes</strong> in plants, we searched<br />
the Oryza sativa ssp. japonica (cv. Nipponbare) (<strong>domesticated</strong> rice) and Arabidops<strong>is</strong> thaliana<br />
(Columbia-0) genomes for sequences with similarity to mudrA that were not associated with<br />
transposon structures such as terminal sequences and TSDs. mudrA-like sequences were<br />
identified by using Zea mays mudrA (GI:540581) as a query for a BLASTP (Altschul et al. 1997)<br />
search (cut<strong>of</strong>f = E < 1e-10) <strong>of</strong> predicted <strong>genes</strong> from the Arabidops<strong>is</strong> TIGR 5.0 release<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
(www.tigr.org) and from rice build 2 pseudomolecules (DDBJ accessions numbers AP006867<br />
through AP006877 and NCBI accession number AE016959), predicted using FGENESH<br />
(http://www.s<strong>of</strong>tberry.com/berry.phtml?topic=f<strong>genes</strong>h) in the monocot trained settings.<br />
Similarly, Zea mays mudrA (GI:540581) was used as a query for a PSI-BLAST (Altschul et al.<br />
1997) search against the rice predicted gene set and The Arabidops<strong>is</strong> Information Resource<br />
(TAIR) UNIPROT dataset (www.arabidops<strong>is</strong>.org) (cut<strong>of</strong>f = E < e-15, second PSI-BLAST<br />
iteration) in order to identify more divergent mudrA-like sequences. Mined mudrA <strong>genes</strong> are<br />
l<strong>is</strong>ted in supplementary table 1.<br />
Whether the mined mudrA <strong>genes</strong> were associated with a transposon structure was<br />
determined by (a) identifying correctly oriented MULE termini separated by less than 30 kbp via<br />
homology searches with models and queries based on the TIR and non-TIR termini <strong>of</strong> previously<br />
identified families using HMMER (Eddy 1998) and BLASTN (Altschul et al. 1997); (b) using<br />
BL2SEQ (Tatusova and Madden 1999) with an expect value <strong>of</strong> 100 and a m<strong>is</strong>match penalty <strong>of</strong> –1<br />
to identify TIRs in the 10 kbp flanking regions; (c) using the <strong>genes</strong> and flanking sequences as<br />
BLASTN (Altschul et al. 1997) queries to identify repetitiveness extending beyond the putative<br />
coding region; and (d) scanning extreme terminal sequences <strong>of</strong> TIRs and repetitive regions for<br />
TSDs. Genes with TSDs were considered transposon-associated mudrA <strong>genes</strong> (TAMs), those<br />
lacking TSDs but with other transposon features were considered potentially transposon-<br />
associated mudrA <strong>genes</strong> (pTAMs), while those lacking all transposon features were considered<br />
transposon-d<strong>is</strong>sociated mudrA <strong>genes</strong> (TDMs).<br />
Arabidops<strong>is</strong> <strong>genes</strong> were considered to have open reading frames (ORFs) if they were so<br />
annotated (www.arabidops<strong>is</strong>.org), or if they corresponded to a gene in the TAIR UNIPROT<br />
dataset. Rice <strong>genes</strong> were considered to have ORFs if they corresponded to a predicted gene from<br />
the rice build 2 pseudomolecules. Sequences for predicted rice mudrA <strong>genes</strong> are l<strong>is</strong>ted in<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
supplementary table 2. The TAIR sequence viewer tool was used to determine if Arabidops<strong>is</strong><br />
<strong>genes</strong> were expressed. To determine which rice <strong>genes</strong> were expressed, rice cDNAs were mapped<br />
to the build 2 rice pseudomolecules using BLAT (Kent 2002) with default parameters and the<br />
“fine” option. Similarly, rice expressed sequence tags (ESTs) were mapped using BLAT (Kent<br />
2002) with trimmed trailing poly-A and leading poly-T tails, 3 kbp maximum intron size, and<br />
90% overall identity (matches/length), rejecting ESTs with alignments to more than one location<br />
with less than 3% identity difference. cDNAs and ESTs that overlapped predicted mudrA <strong>genes</strong><br />
by > 100 bp were considered to be associated with them.<br />
Clustering and phylogenetic analyses<br />
Inter-relationships among TDMs and TAMs were examined through clustering and<br />
phylogenetic analyses based on the conserved MULE-specific domain, MUDR (Hershberger et<br />
al. 1995; Marchler-Bauer et al. 2003). To identify MUDR domains in the mined mudrA <strong>genes</strong>,<br />
the mudrA <strong>genes</strong> were used as queries for TBLASTN (Altschul et al. 1997) and BLASTP<br />
(Altschul et al. 1997) searches against the 21 diverse MUDR domains l<strong>is</strong>ted in the NCBI Entrez<br />
(http://www.ncbi.nlm.nih.gov/entrez) entry for pfam03108. Where both protein and nucleotide<br />
sequences were available, the BLASTP-identified MUDR domain was retained, though it was<br />
manually edited if a predicted intron eliminated a large portion <strong>of</strong> the domain. More divergent<br />
mined mudrA <strong>genes</strong> were used as PSI-BLAST (Altschul et al. 1997) queries in order to identify<br />
regions within them that aligned with the MUDR domain <strong>of</strong> conserved mudrA <strong>genes</strong>. See<br />
supplementary table 1 for a l<strong>is</strong>t <strong>of</strong> detected MUDR domain regions.<br />
For the clustering analys<strong>is</strong>, MUDR regions were iteratively clustered at every 5% change<br />
in identity using BLASTCLUST (Altschul et al. 1997) with default parameters (fig. 1 and<br />
supplementary table 3).<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
For the phylogenetic analys<strong>is</strong>, CLUSTALW (Higgins, Thompson, and Gibson 1994) was<br />
used to generate a multiple alignment <strong>of</strong> the MUDR domain regions, omitting truncated<br />
sequences less than 100 amino acid residues in length. In addition, the sequences were edited so<br />
that gaps in the alignment caused by less than 1% <strong>of</strong> the sequences were eliminated. A neighbor-<br />
joining tree with 100 bootstrap repetitions was generated from th<strong>is</strong> alignment using PHYLIP<br />
(Retief 2000; Felsenstein 2004) with default parameters (fig. 2 and supplementary fig. 1)<br />
Phylogenetic and syntenic relationships within the <strong>MUSTANG</strong> (MUG) gene <strong>family</strong><br />
MUG1 <strong>genes</strong> in other plant species were identified by using rice and Arabidops<strong>is</strong> MUG1<br />
as queries for TBLASTN (Altschul et al. 1997) searches against the nonredundant NCBI<br />
(http://www.ncbi.nlm.nih.gov/BLAST), JGI (www.jgi.doe.gov/poplar), and Plant Genome<br />
Database (http://www.plantgdb.org) databases. A multiple alignment <strong>of</strong> the amino acid<br />
sequences <strong>of</strong> fungal MULE hop78 (Chalvet et al. 2003), TAMs At2g07100 and At1g12720, and<br />
the MUG <strong>family</strong> proteins was generated using CLUSTALW (Higgins, Thompson, and Gibson<br />
1994). The alignment was edited so that the ends corresponded to those <strong>of</strong> MUG1. A maximum<br />
parsimony tree with 100 bootstrap replications was constructed from th<strong>is</strong> alignment using<br />
PHYLIP (Retief 2000; Felsenstein 2004) with default parameters (fig. 3).<br />
Predicted proteins encoded by the 50 kbp regions flanking the MUG1 <strong>genes</strong> <strong>of</strong><br />
Arabidops<strong>is</strong>, rice, and poplar were identified from the TAIR (www.arabidops<strong>is</strong>.org), TIGR<br />
(www.tigr.org), and JGI (www.jgi.doe.gov/poplar) datasets, respectively. Predicted proteins<br />
encoded by the Medicago contig (GI:50284582) were determined using the GENSCAN<br />
webserver with Arabidops<strong>is</strong> trained settings (Burge and Karlin 1997). These predicted <strong>genes</strong><br />
were used for an unfiltered all-against-all BLASTP (Altschul et al. 1997) search with an effective<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
database length <strong>of</strong> 740,307,180 in order to uncover microsynteny (fig. 4, supplementary fig. 2,<br />
and supplementary table 4).<br />
dN/dS estimates<br />
A multiple alignment <strong>of</strong> MUG1 orthologs was generated using CLUSTALW (Higgins,<br />
Thompson, and Gibson 1994). Corresponding nucleotide sequences were manually aligned using<br />
the amino acid alignment as a guide. After eliminating ambiguity sites, the CODEML program<br />
in the PAML package (Yang 1997) was used to estimate the dN/dS ratio for the provided<br />
nucleotide sequence alignment and maximum parsimony tree (constructed using the PHYLIP<br />
(Retief 2000; Felsenstein 2004) package with default parameters). dN/dS values for the N-<br />
terminal, MUDR domain, central, zinc finger domain and C-terminal subsections <strong>of</strong> MUG1 were<br />
similarly estimated.<br />
Results and D<strong>is</strong>cussion<br />
Transposon-d<strong>is</strong>sociated mudrA-like <strong>genes</strong> in rice and Arabidops<strong>is</strong><br />
Our search for sequences with similarity to mudrA that were not associated with<br />
transposon structures revealed 45 such transposon-d<strong>is</strong>sociated mudrA <strong>genes</strong> (TDMs) in<br />
Arabidops<strong>is</strong> and 76 in rice (table 1). Approximately nine-tenths <strong>of</strong> these have predicted ORFs<br />
(table 1). Strikingly, 40% (18/45) and 37% (28/76) <strong>of</strong> TDMs in Arabidops<strong>is</strong> and rice,<br />
respectively, match ESTs and/or cDNAs as compared to only 1% (2/143) and 6% (32/498) <strong>of</strong><br />
Arabidops<strong>is</strong> and rice transposon-associated mudrA <strong>genes</strong> (TAMs) (table 1), hinting that the<br />
expression pr<strong>of</strong>ile <strong>of</strong> TDMs <strong>is</strong> more similar to that <strong>of</strong> host <strong>genes</strong> than that <strong>of</strong> mobile elements.<br />
Taxonomic d<strong>is</strong>tribution <strong>of</strong> TDMs and TAMs<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Clustering analys<strong>is</strong>, based on the MUDR domains <strong>of</strong> the mined TAMs and TDMs, found<br />
that most <strong>of</strong> these sequences clustered exclusively with other MUDR domains from the same<br />
species until the identity threshold was < 40% (fig.1). A neighbor-joining tree also indicated that<br />
the MUDR domains form mainly species-specific clades (fig. 2). Th<strong>is</strong> suggests that many TDMs<br />
have evolved from MULEs since the monocot-dicot divergence 140-150 million years ago (Chaw<br />
et al. 2004). Nonetheless, two exceptional groups with both monocot and dicot members were<br />
identified, and both <strong>of</strong> these involve TDMs (fig. 1, fig. 2, and supplementary fig. 1).<br />
The first <strong>of</strong> these exceptional groups includes members <strong>of</strong> the FAR1 gene <strong>family</strong>, including<br />
two TDMs, FAR1 and FHY3, that have proven host functions in far-red light response (Hudson,<br />
L<strong>is</strong>ch, and Quail 2003). The second group, a previously unreported <strong>family</strong> henceforth referred to<br />
as the <strong>MUSTANG</strong> (MUG) gene <strong>family</strong>, <strong>is</strong> composed, with one exception, entirely <strong>of</strong> TDMs<br />
(supplementary table 3, supplementary fig. 1). However, while the neighbor-joining tree<br />
indicates that a TAM gene, At2g07100, <strong>is</strong> a part <strong>of</strong> the MUG gene <strong>family</strong>, the clustering analys<strong>is</strong><br />
places it with other Arabidops<strong>is</strong> TAMs. To resolve th<strong>is</strong> d<strong>is</strong>crepancy, as well as to gain greater<br />
resolution <strong>of</strong> the internal branches <strong>of</strong> the MUG gene <strong>family</strong>, a maximum parsimony tree was<br />
constructed using the entire protein sequences (fig. 3).<br />
The maximum parsimony tree places At2g07100 with other TAMs, rather than inside the<br />
MUG gene <strong>family</strong>, with high bootstrap support <strong>of</strong> 97% (fig. 3). The shorter sequences used to<br />
construct the original neighbor-joining tree, as well as the fact that the predicted ORF <strong>of</strong><br />
At2g07100 has accumulated several stop codons, may account for the differing placement <strong>of</strong><br />
At2g07100 and lower associated bootstrap values in th<strong>is</strong> tree (supplementary fig. 1). The<br />
maximum parsimony tree divides the remaining TDMs into two primary clades, both <strong>of</strong> which<br />
include monocot and dicot members. Th<strong>is</strong> indicates that there were several members <strong>of</strong> the<br />
MUG gene <strong>family</strong> before the divergence <strong>of</strong> the monocot and dicot lineages.<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
<strong>MUSTANG</strong>1 <strong>is</strong> a <strong>novel</strong>, taxonomically widespread gene<br />
Interestingly, a pair <strong>of</strong> potential rice/Arabidops<strong>is</strong> orthologs called MUG1 are not only<br />
highly similar to each other at the amino acid level (BL2SEQ E = 0.0, identity = 71%), but are<br />
also similar to sequences encoded by the indica subspecies <strong>of</strong> O. sativa, Medicago truncatula,<br />
Populus trichocarpa (poplar), and Zea mays (maize) (BL2SEQ E = 0.0, identity > 66%). The<br />
maximum parsimony tree supports the hypothes<strong>is</strong> that these <strong>genes</strong> are MUG1 orthologs (fig. 3).<br />
The tree also indicates that the two poplar MUG1 <strong>genes</strong> probably arose through a duplication<br />
event sometime after the Medicago and poplar lineages diverged. The duplication <strong>is</strong> unlikely to<br />
be the result <strong>of</strong> transposition as the ~20 kbp regions flanking the poplar <strong>genes</strong> share significant<br />
nucleotide level similarity (BL2SEQ E < e-120, identity > 78%). The predicted <strong>genes</strong> in the 50<br />
kbp flanking regions <strong>of</strong> the rice and Arabidops<strong>is</strong> MUG1 <strong>genes</strong> are similar to the predicted <strong>genes</strong><br />
in the poplar flanking regions (fig. 4 and supplementary fig. 2). Even the flanking regions <strong>of</strong> the<br />
Medicago MUG1 gene, which <strong>is</strong> located on an unordered contig, d<strong>is</strong>play microsynteny with the<br />
flanking regions <strong>of</strong> the other plant species (fig. 4 and supplementary fig. 2). It <strong>is</strong> impossible to<br />
determine whether the maize MUG1-like gene <strong>is</strong> located in a syntenic region as the available<br />
sequence <strong>is</strong> only 3547 bp in length. However, the syntenic and phylogenetic relationships<br />
between the other MUG1 <strong>genes</strong> support their orthology and eliminate the possibility that the<br />
d<strong>is</strong>tribution <strong>of</strong> MUG1 <strong>genes</strong> arose through horizontal transfer.<br />
MUG1 functionality<br />
The rice, Arabidops<strong>is</strong>, maize, and one <strong>of</strong> the poplar MUG1 orthologs are associated with<br />
cDNAs, ESTs or both (fig. 3). Th<strong>is</strong> expression information comes from such varied t<strong>is</strong>sues as<br />
rice calli, corn endosperm and ears, and Arabidops<strong>is</strong> calli, developing seeds, inflorescence<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
mer<strong>is</strong>tem, and seedling leaves. As an indication <strong>of</strong> whether th<strong>is</strong> expressed gene <strong>is</strong> performing a<br />
host function, we calculated the non-synonymous to synonymous substitution rate ratio (dN/dS)<br />
for the six identified orthologs. A dN/dS ratio <strong>of</strong> significantly less than 1 suggests that a coding<br />
sequence <strong>is</strong> being, or has until recently been, maintained by purifying selection, and <strong>is</strong> thus a<br />
good indication <strong>of</strong> whether a gene has protein coding functionality (Hurst 2002). The dN/dS<br />
ratio <strong>of</strong> 0.0683 for the MUG1 <strong>genes</strong> <strong>is</strong> significantly smaller than 1 (P < 0.001; log likelihood ratio<br />
test).<br />
Like the MUG1 <strong>genes</strong>, the transcription factors FAR1 and FHY3, which function as<br />
regulators <strong>of</strong> far-red light response (Hudson, L<strong>is</strong>ch, and Quail 2003), have also evolved from a<br />
MULE <strong>transposase</strong> to perform a host function. Given how d<strong>is</strong>similar MUG1 <strong>is</strong> to these proteins<br />
(BL2SEQ E > 0.17, identity < 24%), it seems unlikely that MUG1 also functions in far-red light<br />
response. However, like FAR1 and FHY3, MUG1 does have a highly conserved zinc finger<br />
domain. In fact, with a dN/dS ratio <strong>of</strong> 0.0072, the zinc finger domain appears to be the most<br />
highly conserved region <strong>of</strong> the MUG1 gene. Therefore, as with FAR1, MUG1 may bind DNA in<br />
a manner that evolved directly from the way in which MURA <strong>transposase</strong> binds to TIRs<br />
(Hudson, L<strong>is</strong>ch, and Quail 2003). Th<strong>is</strong> would be cons<strong>is</strong>tent with other transposon-derived host<br />
<strong>genes</strong> whose mode <strong>of</strong> action <strong>is</strong> remin<strong>is</strong>cent <strong>of</strong> that <strong>of</strong> the transposon gene they evolved from. For<br />
instance, the vertebrate immune system proteins RAG1 and RAG2 ass<strong>is</strong>t V(D)J recombination <strong>of</strong><br />
antibody <strong>genes</strong> by binding to and nicking DNA in a manner highly similar to DNA <strong>transposase</strong>s,<br />
from which they are likely derived (Agrawal, Eastman, and Schatz 1998). In addition, the co-<br />
opted human endogenous retroviral envelope (env) protein syncytin mediates trophoblast cell<br />
fusion during placental development (Chang et al. 2004), while env proteins in general are<br />
thought to promote membrane fusion for viral entry into cells (Colman and Lawrence 2003).<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Conclusions<br />
The rarity <strong>of</strong> taxonomically widespread MULE-d<strong>is</strong>sociated mudrA <strong>genes</strong> <strong>is</strong> such that they<br />
appear to represent an unusual detour rather than a common mechan<strong>is</strong>m for generating <strong>novel</strong><br />
proteins. These results, however, do not exclude the possibility that more recently <strong>domesticated</strong><br />
mudrA <strong>genes</strong> may be common among individual monocot and dicot lineages not investigated<br />
here. The six rice and four Arabidops<strong>is</strong> expressed TDMs that do not belong to either the FAR1<br />
or MUG gene families may be representative <strong>of</strong> such cases. Moreover, TDMs may frequently be<br />
“born”, but “die out” soon after. One reason for th<strong>is</strong> could be that many <strong>domesticated</strong> <strong>genes</strong> may<br />
benefit their host under specific environmental conditions. Another possible explanation <strong>is</strong> that<br />
newly <strong>domesticated</strong> <strong>genes</strong>, which have not yet lost their mobility enabling structures, may<br />
transpose, thus losing the c<strong>is</strong> regulatory factors required for their host function, and thereby<br />
reverting to self<strong>is</strong>h DNA. In fact, in evolutionary terms, it <strong>is</strong> likely that such sequences have<br />
been more successful as transposons than as host <strong>genes</strong>.<br />
Acknowledgements<br />
We would like to thank Nikoleta Juretic and Ken Hastings for their helpful comments on the<br />
manuscript. Th<strong>is</strong> study was supported by D<strong>is</strong>covery grants from the National Science and<br />
Engineering Research Council (NSERC) <strong>of</strong> Canada to T.E.B. and D.J.S. and a grant from the<br />
McGill University William Dawson Scholar Chair to T.E.B.<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Literature Cited<br />
Agrawal, A., Q. M. Eastman, and D. G. Schatz. 1998. Transposition mediated by RAG1 and<br />
RAG2 and its implications for the evolution <strong>of</strong> the immune system. Nature 394:744-751.<br />
Altschul, S. F., T. L. Madden, A. .A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman.<br />
1997. Gapped BLAST and PSI-BLAST: a new generation <strong>of</strong> protein database search<br />
programs. Nucleic Acids Res. 25:3389-3402.<br />
Benito, M. I., and V. Walbot. 1997. Characterization <strong>of</strong> the maize Mutator transposable element<br />
MURA <strong>transposase</strong> as a DNA-binding protein. Mol. Cell. Biol. 17:5165-5175.<br />
Bennetzen, J. L. 1996. The Mutator transposable element system <strong>of</strong> maize. Pp. 195–229 in H.<br />
Saedler and A. Gierl, eds. Transposable Elements. Springer-Verlag, Berlin.<br />
Burge, C., and S. Karlin. 1997. Prediction <strong>of</strong> complete gene structures in human genomic DNA.<br />
J. Mol. Biol. 268:78-94.<br />
Chalvet, F., C. Grimaldi, F. Kaper, T. Langin, and M. J. Daboussi. 2003. Hop, an active Mutator-<br />
like element in the genome <strong>of</strong> the fungus Fusarium oxysporum. Mol. Biol. Evol. 20:1362-<br />
1375.<br />
Chang, C., P. T. Chen, G. D. Chang, C. J. Huang, and H. Chen. 2004. Functional characterization<br />
<strong>of</strong> the placental fusogenic membrane protein syncytin. Biol. Reprod. 71:1956-1962.<br />
Chaw, S. M., C. C. Chang, H. L. Chen, and W. H. Li. 2004. Dating the monocot-dicot divergence<br />
and the origin <strong>of</strong> core eudicots using whole chloroplast genomes. J. Mol. Evol. 58:424-<br />
441.<br />
Colman, P. M., and M. C. Lawrence. 2003. The structural biology <strong>of</strong> type I viral membrane<br />
fusion. Nat. Rev. Mol. Cell. Biol. 4:309-319.<br />
Doolittle, W. F., and C. Sapienza. 1980. Self<strong>is</strong>h <strong>genes</strong>, the phenotype paradigm and genome<br />
evolution. Nature 284:601-603.<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Eddy, S. R. 1998. Pr<strong>of</strong>ile hidden Markov models. Bioinformatics 14:755-763.<br />
Felsenstein, J. 2004. PHYLIP (Phylogeny Inference Package) version 3.6b. D<strong>is</strong>tributed by the<br />
author, Department <strong>of</strong> Genome Sciences, University <strong>of</strong> Washington, Seattle.<br />
Hammer, S. E., S. Strehl, and S. Hagemann. 2005. Homologs <strong>of</strong> Drosophila P transposons were<br />
mobile in zebraf<strong>is</strong>h but have been <strong>domesticated</strong> in a common ancestor <strong>of</strong> chicken and<br />
human. Mol. Biol. Evol. 22:833-844.<br />
Hershberger, R. J., M. I. Benito, K. J. Hardeman, C. Warren, V. L. Chandler, and V. Walbot.<br />
1995. Characterization <strong>of</strong> the major transcripts encoded by the regulatory MuDR<br />
transposable element <strong>of</strong> maize. Genetics 140:1087-1098.<br />
Higgins, D., J. Thompson, and T. Gibson. 1994. CLUSTAL W: improving the sensitivity <strong>of</strong><br />
progressive multiple sequence alignment through sequence weighting, position-specific<br />
gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.<br />
Hudson, M. E., D. R. L<strong>is</strong>ch, and P. H. Quail. 2003. The FHY3 and FAR1 <strong>genes</strong> encode<br />
<strong>transposase</strong>-related proteins involved in regulation <strong>of</strong> gene expression by the<br />
phytochrome A-signaling pathway. Plant J. 34:453-471.<br />
Hurst, G. D., and J. H. Werren. 2001. The role <strong>of</strong> self<strong>is</strong>h genetic elements in eukaryotic<br />
evolution. Nat. Rev. Genet. 2:597-606.<br />
Hurst, L. D. 2002. The Ka/Ks ratio: diagnosing the form <strong>of</strong> sequence evolution. Trends Genet.<br />
18:486.<br />
International Human Genome Sequencing Consortium. 2001. Initial sequencing and analys<strong>is</strong> <strong>of</strong><br />
the human genome. Nature 409:860-921.<br />
Kent, W. J. 2002. BLAT--the BLAST-like alignment tool. Genome Res. 12:656-664.<br />
Lin, R., and H. Wang. 2004. Arabidops<strong>is</strong> FHY3/FAR1 gene <strong>family</strong> and d<strong>is</strong>tinct roles <strong>of</strong> its<br />
members in light control <strong>of</strong> Arabidops<strong>is</strong> development. Plant Physiol. 136:4010-4022.<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
L<strong>is</strong>ch, D. 2002. Mutator transposons. Trends Plant Sci. 7:498-504.<br />
L<strong>is</strong>ch, D., L. Girard, M. Donlin, and M. Freeling. 1999. Functional analys<strong>is</strong> <strong>of</strong> deletion<br />
derivatives <strong>of</strong> the maize transposon MuDR delineates roles for the MURA and MURB<br />
proteins. Genetics 151:331-341.<br />
Marchler-Bauer, A., J. B. Anderson, C. DeWeese-Scott et al. (27 co-authors). 2003. CDD: a<br />
curated Entrez database <strong>of</strong> conserved domain alignments. Nucleic Acids Res. 31:383-387.<br />
Mi, S., X. Lee, X. Li et al. (12 co-authors). 2000. Syncytin <strong>is</strong> a captive retroviral envelope protein<br />
involved in human placental morpho<strong>genes</strong><strong>is</strong>. Nature 403:785-789.<br />
Orgel, L. E., and F. H. Crick. 1980. Self<strong>is</strong>h DNA: the ultimate parasite. Nature 284:604-607.<br />
Retief, J. D. 2000. Phylogenetic analys<strong>is</strong> using PHYLIP. Methods Mol. Biol. 132:243-258.<br />
Steiniger-White, M., I. Rayment, and W. S. Reznik<strong>of</strong>f. 2004. Structure/function insights into Tn5<br />
transposition. Curr. Opin. Struct. Biol. 14:50-57.<br />
Tatusova, T. A., and T. L. Madden. 1999. Blast 2 sequences - a new tool for comparing protein<br />
and nucleotide sequences. FEMS Microbiol. Lett. 174:247-250.<br />
van Leeuwen, W., T. Ruttink, A. W. Borst-Vrenssen, L. H. van der Plas, and A. R. van der Krol.<br />
2001. Characterization <strong>of</strong> position-induced spatial and temporal regulation <strong>of</strong> transgene<br />
promoter activity in plants. J. Exp. Bot. 52:949-959.<br />
Yang, Z. 1997. PAML: a program package for phylogenetic analys<strong>is</strong> by maximum likelihood.<br />
Comput. Appl. Biosci. 13:555-556.<br />
Yu, Z., S. I. Wright, and T. E. Bureau. 2000. Mutator-like elements in Arabidops<strong>is</strong> thaliana.<br />
Structure, diversity and evolution. Genetics 156:2019-2031.<br />
Zdobnov, E. M., M. Campillos, E. D. Harrington, D. Torrents, and P. Bork. 2005. Protein<br />
coding potential <strong>of</strong> retroviruses and other transposable elements in vertebrate genomes.<br />
Nucleic Acids Res. 33:946-954.<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Figure 1 Relationships between monocot and dicot mudrA <strong>genes</strong> as determined by clustering at<br />
40% identity. The number <strong>of</strong> clusters <strong>of</strong> various sizes <strong>is</strong> shown for monocot (diagonal fill), dicot<br />
(light grey fill) and ‘other’ (dark grey fill). The number <strong>of</strong> MUG (M) and FAR1 (F) clusters <strong>of</strong> a<br />
given size <strong>is</strong> indicated by a number or <strong>is</strong> equal to one where labeled with no number. The<br />
‘Number <strong>of</strong> Clusters’ ax<strong>is</strong> <strong>is</strong> d<strong>is</strong>contiguous. The ‘other’ category includes MUG and FAR1<br />
clusters containing both monocots and dicots as well as one cluster containing as its single<br />
member the fungal outgroup hop78 (Chalvet et al. 2003). Note that FAR1 <strong>family</strong> <strong>is</strong> still in<br />
several clusters at th<strong>is</strong> at th<strong>is</strong> level <strong>of</strong> identity.<br />
Figure 2 Relationships between monocot and dicot mudrA <strong>genes</strong> as determined by phylogenetic<br />
analys<strong>is</strong>, depicted as a majority rule neighbor-joining tree rooted at MULE hop78 (Chalvet et al.<br />
2003). Monocot leaves are labelled in grey, dicot in black.<br />
Figure 3 Maximum parsimony tree <strong>of</strong> the MUG gene <strong>family</strong>. Th<strong>is</strong> majority rule consensus tree<br />
<strong>is</strong> rooted with fungal MULE hop78 (Chalvet et al. 2003). Bootstrap values are shown at nodes.<br />
All <strong>genes</strong> shown are transposon-d<strong>is</strong>sociated mudrA <strong>genes</strong> (TDMs) and belong to the MUG gene<br />
<strong>family</strong> except for hop78 and At2g07100 and At1g12720 which are transposon-associated mudrA<br />
<strong>genes</strong> (TAMs). Genes indicated by one or two aster<strong>is</strong>ks are associated with ESTs and cDNAs,<br />
respectively. Oryza sativa subspecies japonica gene names are shown beginning with “R”,<br />
Oryza sativa subspecies indica with “Ri”, Arabidops<strong>is</strong> thaliana with “A”, Zea mays with “Z”,<br />
Medicago truncatula with “M”, and Populus trichocarpa with “P”.<br />
Figure 4 Syntenic 100 kbp region surrounding MUG1 <strong>genes</strong>. The MUG1 <strong>genes</strong> are depicted in<br />
red. Genes shown with the same shading/pattern share protein-level similarity according to an<br />
all-against-all BLASTP search. A color version <strong>of</strong> th<strong>is</strong> figure <strong>is</strong> available as supplementary fig.<br />
2. See supplementary table 4 for E values. Vertical lines indicate gaps in the sequence. The<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Medicago sequence represents an unordered contig. Poplar I and II refer, respectively, to the<br />
molecules scaffold_201 containing the MUG1 gene eugene3.02010019, and LG_XIII containing<br />
the MUG1 gene grail3.0016035401. Note that diagram <strong>is</strong> not to scale.<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Supplementary Figure 1 Neighbor-joining tree <strong>of</strong> the mudrA super<strong>family</strong>. The tree depicted <strong>is</strong> a<br />
majority rule consensus tree constructed from an alignment <strong>of</strong> the MUDR domain regions <strong>of</strong><br />
mudrA-like <strong>genes</strong>. The tree <strong>is</strong> rooted with fungal MULE hop78 (Chalvet et al. 2003). Bootstrap<br />
values are shown at nodes. Oryza sativa subspecies japonica gene names are shown beginning<br />
with “R”, Oryza sativa subspecies indica with “Ri”, Arabidops<strong>is</strong> thaliana with “A”, Zea mays<br />
with “Z”, Medicago truncatula with “M”, and Populus trichocarpa with “P”.<br />
Supplementary Figure 2 Color version <strong>of</strong> Figure 4. Genes shown in the same color share<br />
protein-level similarity according to an all-against-all BLASTP search. See supplementary table<br />
4 for E values.<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Table 1 The number and d<strong>is</strong>tribution <strong>of</strong> mudrA <strong>genes</strong> in rice and Arabidops<strong>is</strong>.<br />
Arabidops<strong>is</strong><br />
Rice<br />
TAMs TDMs pTAMs<br />
total 143 45 18<br />
ORF 77 36 9<br />
expressed 2 18 0<br />
cDNA 1 11 0<br />
EST 2 18 0<br />
total 498 76 60<br />
ORF 493 76 60<br />
expressed 32 28 6<br />
cDNA 26 25 3<br />
EST 12 20 4<br />
TAMs, TDMs, and pTAMs refer to transposon-associated mudrA <strong>genes</strong>, transposon-d<strong>is</strong>sociated<br />
mudrA <strong>genes</strong>, and potentially transposon-associated mudrA <strong>genes</strong>, respectively. The latter class<br />
includes elements that lack well-defined TSDs but have other transposon features such as TIRs<br />
and repetitiveness extending beyond the coding region.<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
a<br />
MUG<br />
FAR1<br />
FAR1<br />
MUG<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
97<br />
97<br />
97<br />
94<br />
97<br />
89<br />
90<br />
hop78<br />
At2g07100<br />
97<br />
At1g12720<br />
R2_001_0826**<br />
97<br />
R6_003_1496**<br />
R10_001_0051**<br />
97<br />
At5g34853*<br />
At3g05850**<br />
97<br />
At5g48965**<br />
97<br />
At3g06940**<br />
R1_005_0969**<br />
97<br />
At5g16505**<br />
At1g06740*<br />
94<br />
At2g30640*<br />
ZmGSStuc04-27- 13378.1*<br />
97<br />
R12_011_1237**<br />
97<br />
RiCtg035198<br />
At3g04605**<br />
95<br />
Mth2-35g8*<br />
87<br />
Pgrail3.0016035401*<br />
97<br />
Peugene3.02010019<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013
ice<br />
Arabidops<strong>is</strong><br />
poplar I<br />
poplar II<br />
Medicago<br />
Downloaded from<br />
http://mbe.oxfordjournals.org/<br />
by guest on July 15, 2013