21.07.2013 Views

MUSTANG is a novel family of domesticated transposase genes ...

MUSTANG is a novel family of domesticated transposase genes ...

MUSTANG is a novel family of domesticated transposase genes ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>MUSTANG</strong> <strong>is</strong> a <strong>novel</strong> <strong>family</strong> <strong>of</strong> <strong>domesticated</strong> <strong>transposase</strong> <strong>genes</strong> found in diverse<br />

angiosperms<br />

Rebecca K. Cowan, Douglas R. Hoen, Daniel J. Schoen, and Thomas E. Bureau*<br />

McGill University, Biology Department, 1205 Dr. Penfield Avenue, Montreal, Quebec H3A 1B1<br />

* corresponding author: 1205 Ave Docteur Penfield, Montreal, Quebec, Canada H3A 1B1<br />

phone: (514) 398-6472; fax: (514) 398-5069; email: thomas.bureau@mcgill.ca<br />

Th<strong>is</strong> manuscript <strong>is</strong> intended as a Research Article.<br />

Running head: Self<strong>is</strong>h DNA domestication in flowering plant genomes<br />

Key words: Genome; Evolution; Transposons; Plants<br />

Abbreviations:<br />

FAR1 = FAR-RED IMPAIRED RESPONSE PROTEIN 1<br />

MUG = <strong>MUSTANG</strong><br />

MULE = Mutator-like element<br />

TAM = transposon-associated mudrA<br />

TDM = transposon-d<strong>is</strong>sociated mudrA<br />

TIR = terminal inverted repeat<br />

TSD = target site duplication<br />

MBE Advance Access publ<strong>is</strong>hed June 29, 2005<br />

© The Author 2005. Publ<strong>is</strong>hed by Oxford University Press on behalf <strong>of</strong> the Society for Molecular Biology<br />

and Evolution. All rights reserved. For perm<strong>is</strong>sions, please e-mail: journals.perm<strong>is</strong>sions@oupjournals.org<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/ by guest on July 15, 2013


Abstract<br />

While transposons have traditionally been viewed as genomic parasites or “junk DNA”, the<br />

d<strong>is</strong>covery <strong>of</strong> transposon-derived host <strong>genes</strong> has fuelled an ongoing debate over the evolutionary<br />

role <strong>of</strong> transposons. In particular, while mobility-related open reading frames have been known<br />

to acquire host functions, the contribution <strong>of</strong> these types <strong>of</strong> events to the evolution <strong>of</strong> <strong>genes</strong> <strong>is</strong> not<br />

well known. Here we report that genome-wide searches for Mutator <strong>transposase</strong>-derived host<br />

<strong>genes</strong> in Arabidops<strong>is</strong> thaliana (Columbia-0) and Oryza sativa ssp. japonica (cv. Nipponbare)<br />

(<strong>domesticated</strong> rice) identified 121 sequences, including the taxonomically-conserved<br />

<strong>MUSTANG</strong>1. Syntenic <strong>MUSTANG</strong>1 orthologs in such varied plant species as rice, poplar,<br />

Arabidops<strong>is</strong>, and Medicago truncatula appear to be under purifying selection. However, despite<br />

the evidence <strong>of</strong> th<strong>is</strong> pathway <strong>of</strong> gene evolution, <strong>MUSTANG</strong>1 belongs to one <strong>of</strong> only two Mutator-<br />

like gene families with members in both monocotyledonous and dicotyledonous plants,<br />

suggesting that Mutator-like elements seldom evolve into taxonomically widespread host <strong>genes</strong>.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Introduction<br />

Mobile elements have traditionally been viewed as “junk DNA” or as genomic parasites<br />

(Doolittle and Sapienza 1980; Orgel and Crick 1980). However, reports <strong>of</strong> transposon-derived<br />

<strong>genes</strong> contributing to cellular processes (Agrawal, Eastman, and Schatz 1998; Mi et al. 2000;<br />

Hudson, L<strong>is</strong>ch, and Quail 2003) have indicated that such genetic elements may influence the<br />

evolution <strong>of</strong> their host by providing a source <strong>of</strong> <strong>novel</strong> coding capacity. Other lines <strong>of</strong> evidence<br />

such as sequence analys<strong>is</strong> and conservation studies, also support th<strong>is</strong> view (International Human<br />

Genome Sequencing Constorium 2001; Hammer, Strehl, and Hagemann 2005; Zdobnov et al.<br />

2005). Th<strong>is</strong>, in turn, has spurred controversy concerning the extent to which transposable<br />

elements contribute to host evolution (Hurst and Werren 2001).<br />

In plants, the only transposon-like <strong>genes</strong> with known host functions, far-red impaired<br />

response protein 1 (FAR1), far-red elongated hypocotyl 3 (FHY3) and the FAR1 related<br />

sequences (FRS), are related to the <strong>family</strong> <strong>of</strong> DNA transposons called Mutator-like elements<br />

(MULEs) (Hudson, L<strong>is</strong>ch, and Quail 2003; Lin and Wang 2004). MULEs are DNA transposons,<br />

which move via a “cut-and-paste” mechan<strong>is</strong>m (L<strong>is</strong>ch 2002). The mudrA gene <strong>of</strong> autonomous<br />

elements encodes the <strong>transposase</strong> MURA that <strong>is</strong> required for MULE mobility (L<strong>is</strong>ch et al. 1999).<br />

Though most MULEs are bounded by terminal-inverted repeats (TIRs) <strong>of</strong> ~200 bp in length,<br />

some MULEs lack terminal symmetry and are termed non-TIR MULEs (Yu, Wright, and Bureau<br />

2000). In maize, MURA binds specifically to the terminal sequences <strong>of</strong> MULEs (Benito and<br />

Walbot 1997). Th<strong>is</strong> association between the <strong>transposase</strong> and the terminal sequences appears<br />

necessary for exc<strong>is</strong>ion <strong>of</strong> the element to occur, as it <strong>is</strong> in other DNA transposons (Steiniger-<br />

White, Rayment, and Reznik<strong>of</strong>f 2004). Upon insertion <strong>of</strong> the MULE into the genome, 9-11 bp<br />

target site duplications (TSDs) are created (Bennetzen 1996).<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Transposon-derived <strong>genes</strong> with known host functions, such as FAR1 and FHY3, lack<br />

transposon-specific terminal sequences (Hudson, L<strong>is</strong>ch, and Quail 2003). Th<strong>is</strong> <strong>is</strong> likely because<br />

the transposition <strong>of</strong> a <strong>domesticated</strong> mobile element, one that has gained a host function, could be<br />

detrimental to its host. The removal <strong>of</strong> such a gene from the vicinity <strong>of</strong> c<strong>is</strong>-regulatory factors, or<br />

its insertion into a heterochromatic region, could dramatically change its expression (van<br />

Leeuwen et al. 2001). Thus, selection should act to remove mobility-enabling features such as<br />

TIRs.<br />

We took advantage <strong>of</strong> th<strong>is</strong> expected h<strong>is</strong>torical selection pressure in our search for<br />

potentially <strong>domesticated</strong> <strong>genes</strong>, by mining the genomes <strong>of</strong> Oryza sativa ssp. japonica (cv.<br />

Nipponbare) (<strong>domesticated</strong> rice) and Arabidops<strong>is</strong> thaliana (Columbia-0) for sequences that<br />

shared similarity to mudrA, but lacked transposon features such as TIRs. We then examined the<br />

evolutionary h<strong>is</strong>tories <strong>of</strong> these putative transposon-derived <strong>genes</strong> by performing phylogenetic<br />

reconstructions and clustering analyses on all mudrA-related sequences. The results <strong>of</strong> our<br />

analys<strong>is</strong> are d<strong>is</strong>cussed with respect to the importance that transposons play in the evolution <strong>of</strong><br />

plant <strong>genes</strong>.<br />

Materials and Methods<br />

Identification <strong>of</strong> transposon-associated and transposon-d<strong>is</strong>sociated mudrA <strong>genes</strong><br />

In order to determine the abundance <strong>of</strong> <strong>domesticated</strong> mudrA <strong>genes</strong> in plants, we searched<br />

the Oryza sativa ssp. japonica (cv. Nipponbare) (<strong>domesticated</strong> rice) and Arabidops<strong>is</strong> thaliana<br />

(Columbia-0) genomes for sequences with similarity to mudrA that were not associated with<br />

transposon structures such as terminal sequences and TSDs. mudrA-like sequences were<br />

identified by using Zea mays mudrA (GI:540581) as a query for a BLASTP (Altschul et al. 1997)<br />

search (cut<strong>of</strong>f = E < 1e-10) <strong>of</strong> predicted <strong>genes</strong> from the Arabidops<strong>is</strong> TIGR 5.0 release<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


(www.tigr.org) and from rice build 2 pseudomolecules (DDBJ accessions numbers AP006867<br />

through AP006877 and NCBI accession number AE016959), predicted using FGENESH<br />

(http://www.s<strong>of</strong>tberry.com/berry.phtml?topic=f<strong>genes</strong>h) in the monocot trained settings.<br />

Similarly, Zea mays mudrA (GI:540581) was used as a query for a PSI-BLAST (Altschul et al.<br />

1997) search against the rice predicted gene set and The Arabidops<strong>is</strong> Information Resource<br />

(TAIR) UNIPROT dataset (www.arabidops<strong>is</strong>.org) (cut<strong>of</strong>f = E < e-15, second PSI-BLAST<br />

iteration) in order to identify more divergent mudrA-like sequences. Mined mudrA <strong>genes</strong> are<br />

l<strong>is</strong>ted in supplementary table 1.<br />

Whether the mined mudrA <strong>genes</strong> were associated with a transposon structure was<br />

determined by (a) identifying correctly oriented MULE termini separated by less than 30 kbp via<br />

homology searches with models and queries based on the TIR and non-TIR termini <strong>of</strong> previously<br />

identified families using HMMER (Eddy 1998) and BLASTN (Altschul et al. 1997); (b) using<br />

BL2SEQ (Tatusova and Madden 1999) with an expect value <strong>of</strong> 100 and a m<strong>is</strong>match penalty <strong>of</strong> –1<br />

to identify TIRs in the 10 kbp flanking regions; (c) using the <strong>genes</strong> and flanking sequences as<br />

BLASTN (Altschul et al. 1997) queries to identify repetitiveness extending beyond the putative<br />

coding region; and (d) scanning extreme terminal sequences <strong>of</strong> TIRs and repetitive regions for<br />

TSDs. Genes with TSDs were considered transposon-associated mudrA <strong>genes</strong> (TAMs), those<br />

lacking TSDs but with other transposon features were considered potentially transposon-<br />

associated mudrA <strong>genes</strong> (pTAMs), while those lacking all transposon features were considered<br />

transposon-d<strong>is</strong>sociated mudrA <strong>genes</strong> (TDMs).<br />

Arabidops<strong>is</strong> <strong>genes</strong> were considered to have open reading frames (ORFs) if they were so<br />

annotated (www.arabidops<strong>is</strong>.org), or if they corresponded to a gene in the TAIR UNIPROT<br />

dataset. Rice <strong>genes</strong> were considered to have ORFs if they corresponded to a predicted gene from<br />

the rice build 2 pseudomolecules. Sequences for predicted rice mudrA <strong>genes</strong> are l<strong>is</strong>ted in<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


supplementary table 2. The TAIR sequence viewer tool was used to determine if Arabidops<strong>is</strong><br />

<strong>genes</strong> were expressed. To determine which rice <strong>genes</strong> were expressed, rice cDNAs were mapped<br />

to the build 2 rice pseudomolecules using BLAT (Kent 2002) with default parameters and the<br />

“fine” option. Similarly, rice expressed sequence tags (ESTs) were mapped using BLAT (Kent<br />

2002) with trimmed trailing poly-A and leading poly-T tails, 3 kbp maximum intron size, and<br />

90% overall identity (matches/length), rejecting ESTs with alignments to more than one location<br />

with less than 3% identity difference. cDNAs and ESTs that overlapped predicted mudrA <strong>genes</strong><br />

by > 100 bp were considered to be associated with them.<br />

Clustering and phylogenetic analyses<br />

Inter-relationships among TDMs and TAMs were examined through clustering and<br />

phylogenetic analyses based on the conserved MULE-specific domain, MUDR (Hershberger et<br />

al. 1995; Marchler-Bauer et al. 2003). To identify MUDR domains in the mined mudrA <strong>genes</strong>,<br />

the mudrA <strong>genes</strong> were used as queries for TBLASTN (Altschul et al. 1997) and BLASTP<br />

(Altschul et al. 1997) searches against the 21 diverse MUDR domains l<strong>is</strong>ted in the NCBI Entrez<br />

(http://www.ncbi.nlm.nih.gov/entrez) entry for pfam03108. Where both protein and nucleotide<br />

sequences were available, the BLASTP-identified MUDR domain was retained, though it was<br />

manually edited if a predicted intron eliminated a large portion <strong>of</strong> the domain. More divergent<br />

mined mudrA <strong>genes</strong> were used as PSI-BLAST (Altschul et al. 1997) queries in order to identify<br />

regions within them that aligned with the MUDR domain <strong>of</strong> conserved mudrA <strong>genes</strong>. See<br />

supplementary table 1 for a l<strong>is</strong>t <strong>of</strong> detected MUDR domain regions.<br />

For the clustering analys<strong>is</strong>, MUDR regions were iteratively clustered at every 5% change<br />

in identity using BLASTCLUST (Altschul et al. 1997) with default parameters (fig. 1 and<br />

supplementary table 3).<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


For the phylogenetic analys<strong>is</strong>, CLUSTALW (Higgins, Thompson, and Gibson 1994) was<br />

used to generate a multiple alignment <strong>of</strong> the MUDR domain regions, omitting truncated<br />

sequences less than 100 amino acid residues in length. In addition, the sequences were edited so<br />

that gaps in the alignment caused by less than 1% <strong>of</strong> the sequences were eliminated. A neighbor-<br />

joining tree with 100 bootstrap repetitions was generated from th<strong>is</strong> alignment using PHYLIP<br />

(Retief 2000; Felsenstein 2004) with default parameters (fig. 2 and supplementary fig. 1)<br />

Phylogenetic and syntenic relationships within the <strong>MUSTANG</strong> (MUG) gene <strong>family</strong><br />

MUG1 <strong>genes</strong> in other plant species were identified by using rice and Arabidops<strong>is</strong> MUG1<br />

as queries for TBLASTN (Altschul et al. 1997) searches against the nonredundant NCBI<br />

(http://www.ncbi.nlm.nih.gov/BLAST), JGI (www.jgi.doe.gov/poplar), and Plant Genome<br />

Database (http://www.plantgdb.org) databases. A multiple alignment <strong>of</strong> the amino acid<br />

sequences <strong>of</strong> fungal MULE hop78 (Chalvet et al. 2003), TAMs At2g07100 and At1g12720, and<br />

the MUG <strong>family</strong> proteins was generated using CLUSTALW (Higgins, Thompson, and Gibson<br />

1994). The alignment was edited so that the ends corresponded to those <strong>of</strong> MUG1. A maximum<br />

parsimony tree with 100 bootstrap replications was constructed from th<strong>is</strong> alignment using<br />

PHYLIP (Retief 2000; Felsenstein 2004) with default parameters (fig. 3).<br />

Predicted proteins encoded by the 50 kbp regions flanking the MUG1 <strong>genes</strong> <strong>of</strong><br />

Arabidops<strong>is</strong>, rice, and poplar were identified from the TAIR (www.arabidops<strong>is</strong>.org), TIGR<br />

(www.tigr.org), and JGI (www.jgi.doe.gov/poplar) datasets, respectively. Predicted proteins<br />

encoded by the Medicago contig (GI:50284582) were determined using the GENSCAN<br />

webserver with Arabidops<strong>is</strong> trained settings (Burge and Karlin 1997). These predicted <strong>genes</strong><br />

were used for an unfiltered all-against-all BLASTP (Altschul et al. 1997) search with an effective<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


database length <strong>of</strong> 740,307,180 in order to uncover microsynteny (fig. 4, supplementary fig. 2,<br />

and supplementary table 4).<br />

dN/dS estimates<br />

A multiple alignment <strong>of</strong> MUG1 orthologs was generated using CLUSTALW (Higgins,<br />

Thompson, and Gibson 1994). Corresponding nucleotide sequences were manually aligned using<br />

the amino acid alignment as a guide. After eliminating ambiguity sites, the CODEML program<br />

in the PAML package (Yang 1997) was used to estimate the dN/dS ratio for the provided<br />

nucleotide sequence alignment and maximum parsimony tree (constructed using the PHYLIP<br />

(Retief 2000; Felsenstein 2004) package with default parameters). dN/dS values for the N-<br />

terminal, MUDR domain, central, zinc finger domain and C-terminal subsections <strong>of</strong> MUG1 were<br />

similarly estimated.<br />

Results and D<strong>is</strong>cussion<br />

Transposon-d<strong>is</strong>sociated mudrA-like <strong>genes</strong> in rice and Arabidops<strong>is</strong><br />

Our search for sequences with similarity to mudrA that were not associated with<br />

transposon structures revealed 45 such transposon-d<strong>is</strong>sociated mudrA <strong>genes</strong> (TDMs) in<br />

Arabidops<strong>is</strong> and 76 in rice (table 1). Approximately nine-tenths <strong>of</strong> these have predicted ORFs<br />

(table 1). Strikingly, 40% (18/45) and 37% (28/76) <strong>of</strong> TDMs in Arabidops<strong>is</strong> and rice,<br />

respectively, match ESTs and/or cDNAs as compared to only 1% (2/143) and 6% (32/498) <strong>of</strong><br />

Arabidops<strong>is</strong> and rice transposon-associated mudrA <strong>genes</strong> (TAMs) (table 1), hinting that the<br />

expression pr<strong>of</strong>ile <strong>of</strong> TDMs <strong>is</strong> more similar to that <strong>of</strong> host <strong>genes</strong> than that <strong>of</strong> mobile elements.<br />

Taxonomic d<strong>is</strong>tribution <strong>of</strong> TDMs and TAMs<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Clustering analys<strong>is</strong>, based on the MUDR domains <strong>of</strong> the mined TAMs and TDMs, found<br />

that most <strong>of</strong> these sequences clustered exclusively with other MUDR domains from the same<br />

species until the identity threshold was < 40% (fig.1). A neighbor-joining tree also indicated that<br />

the MUDR domains form mainly species-specific clades (fig. 2). Th<strong>is</strong> suggests that many TDMs<br />

have evolved from MULEs since the monocot-dicot divergence 140-150 million years ago (Chaw<br />

et al. 2004). Nonetheless, two exceptional groups with both monocot and dicot members were<br />

identified, and both <strong>of</strong> these involve TDMs (fig. 1, fig. 2, and supplementary fig. 1).<br />

The first <strong>of</strong> these exceptional groups includes members <strong>of</strong> the FAR1 gene <strong>family</strong>, including<br />

two TDMs, FAR1 and FHY3, that have proven host functions in far-red light response (Hudson,<br />

L<strong>is</strong>ch, and Quail 2003). The second group, a previously unreported <strong>family</strong> henceforth referred to<br />

as the <strong>MUSTANG</strong> (MUG) gene <strong>family</strong>, <strong>is</strong> composed, with one exception, entirely <strong>of</strong> TDMs<br />

(supplementary table 3, supplementary fig. 1). However, while the neighbor-joining tree<br />

indicates that a TAM gene, At2g07100, <strong>is</strong> a part <strong>of</strong> the MUG gene <strong>family</strong>, the clustering analys<strong>is</strong><br />

places it with other Arabidops<strong>is</strong> TAMs. To resolve th<strong>is</strong> d<strong>is</strong>crepancy, as well as to gain greater<br />

resolution <strong>of</strong> the internal branches <strong>of</strong> the MUG gene <strong>family</strong>, a maximum parsimony tree was<br />

constructed using the entire protein sequences (fig. 3).<br />

The maximum parsimony tree places At2g07100 with other TAMs, rather than inside the<br />

MUG gene <strong>family</strong>, with high bootstrap support <strong>of</strong> 97% (fig. 3). The shorter sequences used to<br />

construct the original neighbor-joining tree, as well as the fact that the predicted ORF <strong>of</strong><br />

At2g07100 has accumulated several stop codons, may account for the differing placement <strong>of</strong><br />

At2g07100 and lower associated bootstrap values in th<strong>is</strong> tree (supplementary fig. 1). The<br />

maximum parsimony tree divides the remaining TDMs into two primary clades, both <strong>of</strong> which<br />

include monocot and dicot members. Th<strong>is</strong> indicates that there were several members <strong>of</strong> the<br />

MUG gene <strong>family</strong> before the divergence <strong>of</strong> the monocot and dicot lineages.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


<strong>MUSTANG</strong>1 <strong>is</strong> a <strong>novel</strong>, taxonomically widespread gene<br />

Interestingly, a pair <strong>of</strong> potential rice/Arabidops<strong>is</strong> orthologs called MUG1 are not only<br />

highly similar to each other at the amino acid level (BL2SEQ E = 0.0, identity = 71%), but are<br />

also similar to sequences encoded by the indica subspecies <strong>of</strong> O. sativa, Medicago truncatula,<br />

Populus trichocarpa (poplar), and Zea mays (maize) (BL2SEQ E = 0.0, identity > 66%). The<br />

maximum parsimony tree supports the hypothes<strong>is</strong> that these <strong>genes</strong> are MUG1 orthologs (fig. 3).<br />

The tree also indicates that the two poplar MUG1 <strong>genes</strong> probably arose through a duplication<br />

event sometime after the Medicago and poplar lineages diverged. The duplication <strong>is</strong> unlikely to<br />

be the result <strong>of</strong> transposition as the ~20 kbp regions flanking the poplar <strong>genes</strong> share significant<br />

nucleotide level similarity (BL2SEQ E < e-120, identity > 78%). The predicted <strong>genes</strong> in the 50<br />

kbp flanking regions <strong>of</strong> the rice and Arabidops<strong>is</strong> MUG1 <strong>genes</strong> are similar to the predicted <strong>genes</strong><br />

in the poplar flanking regions (fig. 4 and supplementary fig. 2). Even the flanking regions <strong>of</strong> the<br />

Medicago MUG1 gene, which <strong>is</strong> located on an unordered contig, d<strong>is</strong>play microsynteny with the<br />

flanking regions <strong>of</strong> the other plant species (fig. 4 and supplementary fig. 2). It <strong>is</strong> impossible to<br />

determine whether the maize MUG1-like gene <strong>is</strong> located in a syntenic region as the available<br />

sequence <strong>is</strong> only 3547 bp in length. However, the syntenic and phylogenetic relationships<br />

between the other MUG1 <strong>genes</strong> support their orthology and eliminate the possibility that the<br />

d<strong>is</strong>tribution <strong>of</strong> MUG1 <strong>genes</strong> arose through horizontal transfer.<br />

MUG1 functionality<br />

The rice, Arabidops<strong>is</strong>, maize, and one <strong>of</strong> the poplar MUG1 orthologs are associated with<br />

cDNAs, ESTs or both (fig. 3). Th<strong>is</strong> expression information comes from such varied t<strong>is</strong>sues as<br />

rice calli, corn endosperm and ears, and Arabidops<strong>is</strong> calli, developing seeds, inflorescence<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


mer<strong>is</strong>tem, and seedling leaves. As an indication <strong>of</strong> whether th<strong>is</strong> expressed gene <strong>is</strong> performing a<br />

host function, we calculated the non-synonymous to synonymous substitution rate ratio (dN/dS)<br />

for the six identified orthologs. A dN/dS ratio <strong>of</strong> significantly less than 1 suggests that a coding<br />

sequence <strong>is</strong> being, or has until recently been, maintained by purifying selection, and <strong>is</strong> thus a<br />

good indication <strong>of</strong> whether a gene has protein coding functionality (Hurst 2002). The dN/dS<br />

ratio <strong>of</strong> 0.0683 for the MUG1 <strong>genes</strong> <strong>is</strong> significantly smaller than 1 (P < 0.001; log likelihood ratio<br />

test).<br />

Like the MUG1 <strong>genes</strong>, the transcription factors FAR1 and FHY3, which function as<br />

regulators <strong>of</strong> far-red light response (Hudson, L<strong>is</strong>ch, and Quail 2003), have also evolved from a<br />

MULE <strong>transposase</strong> to perform a host function. Given how d<strong>is</strong>similar MUG1 <strong>is</strong> to these proteins<br />

(BL2SEQ E > 0.17, identity < 24%), it seems unlikely that MUG1 also functions in far-red light<br />

response. However, like FAR1 and FHY3, MUG1 does have a highly conserved zinc finger<br />

domain. In fact, with a dN/dS ratio <strong>of</strong> 0.0072, the zinc finger domain appears to be the most<br />

highly conserved region <strong>of</strong> the MUG1 gene. Therefore, as with FAR1, MUG1 may bind DNA in<br />

a manner that evolved directly from the way in which MURA <strong>transposase</strong> binds to TIRs<br />

(Hudson, L<strong>is</strong>ch, and Quail 2003). Th<strong>is</strong> would be cons<strong>is</strong>tent with other transposon-derived host<br />

<strong>genes</strong> whose mode <strong>of</strong> action <strong>is</strong> remin<strong>is</strong>cent <strong>of</strong> that <strong>of</strong> the transposon gene they evolved from. For<br />

instance, the vertebrate immune system proteins RAG1 and RAG2 ass<strong>is</strong>t V(D)J recombination <strong>of</strong><br />

antibody <strong>genes</strong> by binding to and nicking DNA in a manner highly similar to DNA <strong>transposase</strong>s,<br />

from which they are likely derived (Agrawal, Eastman, and Schatz 1998). In addition, the co-<br />

opted human endogenous retroviral envelope (env) protein syncytin mediates trophoblast cell<br />

fusion during placental development (Chang et al. 2004), while env proteins in general are<br />

thought to promote membrane fusion for viral entry into cells (Colman and Lawrence 2003).<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Conclusions<br />

The rarity <strong>of</strong> taxonomically widespread MULE-d<strong>is</strong>sociated mudrA <strong>genes</strong> <strong>is</strong> such that they<br />

appear to represent an unusual detour rather than a common mechan<strong>is</strong>m for generating <strong>novel</strong><br />

proteins. These results, however, do not exclude the possibility that more recently <strong>domesticated</strong><br />

mudrA <strong>genes</strong> may be common among individual monocot and dicot lineages not investigated<br />

here. The six rice and four Arabidops<strong>is</strong> expressed TDMs that do not belong to either the FAR1<br />

or MUG gene families may be representative <strong>of</strong> such cases. Moreover, TDMs may frequently be<br />

“born”, but “die out” soon after. One reason for th<strong>is</strong> could be that many <strong>domesticated</strong> <strong>genes</strong> may<br />

benefit their host under specific environmental conditions. Another possible explanation <strong>is</strong> that<br />

newly <strong>domesticated</strong> <strong>genes</strong>, which have not yet lost their mobility enabling structures, may<br />

transpose, thus losing the c<strong>is</strong> regulatory factors required for their host function, and thereby<br />

reverting to self<strong>is</strong>h DNA. In fact, in evolutionary terms, it <strong>is</strong> likely that such sequences have<br />

been more successful as transposons than as host <strong>genes</strong>.<br />

Acknowledgements<br />

We would like to thank Nikoleta Juretic and Ken Hastings for their helpful comments on the<br />

manuscript. Th<strong>is</strong> study was supported by D<strong>is</strong>covery grants from the National Science and<br />

Engineering Research Council (NSERC) <strong>of</strong> Canada to T.E.B. and D.J.S. and a grant from the<br />

McGill University William Dawson Scholar Chair to T.E.B.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Literature Cited<br />

Agrawal, A., Q. M. Eastman, and D. G. Schatz. 1998. Transposition mediated by RAG1 and<br />

RAG2 and its implications for the evolution <strong>of</strong> the immune system. Nature 394:744-751.<br />

Altschul, S. F., T. L. Madden, A. .A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman.<br />

1997. Gapped BLAST and PSI-BLAST: a new generation <strong>of</strong> protein database search<br />

programs. Nucleic Acids Res. 25:3389-3402.<br />

Benito, M. I., and V. Walbot. 1997. Characterization <strong>of</strong> the maize Mutator transposable element<br />

MURA <strong>transposase</strong> as a DNA-binding protein. Mol. Cell. Biol. 17:5165-5175.<br />

Bennetzen, J. L. 1996. The Mutator transposable element system <strong>of</strong> maize. Pp. 195–229 in H.<br />

Saedler and A. Gierl, eds. Transposable Elements. Springer-Verlag, Berlin.<br />

Burge, C., and S. Karlin. 1997. Prediction <strong>of</strong> complete gene structures in human genomic DNA.<br />

J. Mol. Biol. 268:78-94.<br />

Chalvet, F., C. Grimaldi, F. Kaper, T. Langin, and M. J. Daboussi. 2003. Hop, an active Mutator-<br />

like element in the genome <strong>of</strong> the fungus Fusarium oxysporum. Mol. Biol. Evol. 20:1362-<br />

1375.<br />

Chang, C., P. T. Chen, G. D. Chang, C. J. Huang, and H. Chen. 2004. Functional characterization<br />

<strong>of</strong> the placental fusogenic membrane protein syncytin. Biol. Reprod. 71:1956-1962.<br />

Chaw, S. M., C. C. Chang, H. L. Chen, and W. H. Li. 2004. Dating the monocot-dicot divergence<br />

and the origin <strong>of</strong> core eudicots using whole chloroplast genomes. J. Mol. Evol. 58:424-<br />

441.<br />

Colman, P. M., and M. C. Lawrence. 2003. The structural biology <strong>of</strong> type I viral membrane<br />

fusion. Nat. Rev. Mol. Cell. Biol. 4:309-319.<br />

Doolittle, W. F., and C. Sapienza. 1980. Self<strong>is</strong>h <strong>genes</strong>, the phenotype paradigm and genome<br />

evolution. Nature 284:601-603.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Eddy, S. R. 1998. Pr<strong>of</strong>ile hidden Markov models. Bioinformatics 14:755-763.<br />

Felsenstein, J. 2004. PHYLIP (Phylogeny Inference Package) version 3.6b. D<strong>is</strong>tributed by the<br />

author, Department <strong>of</strong> Genome Sciences, University <strong>of</strong> Washington, Seattle.<br />

Hammer, S. E., S. Strehl, and S. Hagemann. 2005. Homologs <strong>of</strong> Drosophila P transposons were<br />

mobile in zebraf<strong>is</strong>h but have been <strong>domesticated</strong> in a common ancestor <strong>of</strong> chicken and<br />

human. Mol. Biol. Evol. 22:833-844.<br />

Hershberger, R. J., M. I. Benito, K. J. Hardeman, C. Warren, V. L. Chandler, and V. Walbot.<br />

1995. Characterization <strong>of</strong> the major transcripts encoded by the regulatory MuDR<br />

transposable element <strong>of</strong> maize. Genetics 140:1087-1098.<br />

Higgins, D., J. Thompson, and T. Gibson. 1994. CLUSTAL W: improving the sensitivity <strong>of</strong><br />

progressive multiple sequence alignment through sequence weighting, position-specific<br />

gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.<br />

Hudson, M. E., D. R. L<strong>is</strong>ch, and P. H. Quail. 2003. The FHY3 and FAR1 <strong>genes</strong> encode<br />

<strong>transposase</strong>-related proteins involved in regulation <strong>of</strong> gene expression by the<br />

phytochrome A-signaling pathway. Plant J. 34:453-471.<br />

Hurst, G. D., and J. H. Werren. 2001. The role <strong>of</strong> self<strong>is</strong>h genetic elements in eukaryotic<br />

evolution. Nat. Rev. Genet. 2:597-606.<br />

Hurst, L. D. 2002. The Ka/Ks ratio: diagnosing the form <strong>of</strong> sequence evolution. Trends Genet.<br />

18:486.<br />

International Human Genome Sequencing Consortium. 2001. Initial sequencing and analys<strong>is</strong> <strong>of</strong><br />

the human genome. Nature 409:860-921.<br />

Kent, W. J. 2002. BLAT--the BLAST-like alignment tool. Genome Res. 12:656-664.<br />

Lin, R., and H. Wang. 2004. Arabidops<strong>is</strong> FHY3/FAR1 gene <strong>family</strong> and d<strong>is</strong>tinct roles <strong>of</strong> its<br />

members in light control <strong>of</strong> Arabidops<strong>is</strong> development. Plant Physiol. 136:4010-4022.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


L<strong>is</strong>ch, D. 2002. Mutator transposons. Trends Plant Sci. 7:498-504.<br />

L<strong>is</strong>ch, D., L. Girard, M. Donlin, and M. Freeling. 1999. Functional analys<strong>is</strong> <strong>of</strong> deletion<br />

derivatives <strong>of</strong> the maize transposon MuDR delineates roles for the MURA and MURB<br />

proteins. Genetics 151:331-341.<br />

Marchler-Bauer, A., J. B. Anderson, C. DeWeese-Scott et al. (27 co-authors). 2003. CDD: a<br />

curated Entrez database <strong>of</strong> conserved domain alignments. Nucleic Acids Res. 31:383-387.<br />

Mi, S., X. Lee, X. Li et al. (12 co-authors). 2000. Syncytin <strong>is</strong> a captive retroviral envelope protein<br />

involved in human placental morpho<strong>genes</strong><strong>is</strong>. Nature 403:785-789.<br />

Orgel, L. E., and F. H. Crick. 1980. Self<strong>is</strong>h DNA: the ultimate parasite. Nature 284:604-607.<br />

Retief, J. D. 2000. Phylogenetic analys<strong>is</strong> using PHYLIP. Methods Mol. Biol. 132:243-258.<br />

Steiniger-White, M., I. Rayment, and W. S. Reznik<strong>of</strong>f. 2004. Structure/function insights into Tn5<br />

transposition. Curr. Opin. Struct. Biol. 14:50-57.<br />

Tatusova, T. A., and T. L. Madden. 1999. Blast 2 sequences - a new tool for comparing protein<br />

and nucleotide sequences. FEMS Microbiol. Lett. 174:247-250.<br />

van Leeuwen, W., T. Ruttink, A. W. Borst-Vrenssen, L. H. van der Plas, and A. R. van der Krol.<br />

2001. Characterization <strong>of</strong> position-induced spatial and temporal regulation <strong>of</strong> transgene<br />

promoter activity in plants. J. Exp. Bot. 52:949-959.<br />

Yang, Z. 1997. PAML: a program package for phylogenetic analys<strong>is</strong> by maximum likelihood.<br />

Comput. Appl. Biosci. 13:555-556.<br />

Yu, Z., S. I. Wright, and T. E. Bureau. 2000. Mutator-like elements in Arabidops<strong>is</strong> thaliana.<br />

Structure, diversity and evolution. Genetics 156:2019-2031.<br />

Zdobnov, E. M., M. Campillos, E. D. Harrington, D. Torrents, and P. Bork. 2005. Protein<br />

coding potential <strong>of</strong> retroviruses and other transposable elements in vertebrate genomes.<br />

Nucleic Acids Res. 33:946-954.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Figure 1 Relationships between monocot and dicot mudrA <strong>genes</strong> as determined by clustering at<br />

40% identity. The number <strong>of</strong> clusters <strong>of</strong> various sizes <strong>is</strong> shown for monocot (diagonal fill), dicot<br />

(light grey fill) and ‘other’ (dark grey fill). The number <strong>of</strong> MUG (M) and FAR1 (F) clusters <strong>of</strong> a<br />

given size <strong>is</strong> indicated by a number or <strong>is</strong> equal to one where labeled with no number. The<br />

‘Number <strong>of</strong> Clusters’ ax<strong>is</strong> <strong>is</strong> d<strong>is</strong>contiguous. The ‘other’ category includes MUG and FAR1<br />

clusters containing both monocots and dicots as well as one cluster containing as its single<br />

member the fungal outgroup hop78 (Chalvet et al. 2003). Note that FAR1 <strong>family</strong> <strong>is</strong> still in<br />

several clusters at th<strong>is</strong> at th<strong>is</strong> level <strong>of</strong> identity.<br />

Figure 2 Relationships between monocot and dicot mudrA <strong>genes</strong> as determined by phylogenetic<br />

analys<strong>is</strong>, depicted as a majority rule neighbor-joining tree rooted at MULE hop78 (Chalvet et al.<br />

2003). Monocot leaves are labelled in grey, dicot in black.<br />

Figure 3 Maximum parsimony tree <strong>of</strong> the MUG gene <strong>family</strong>. Th<strong>is</strong> majority rule consensus tree<br />

<strong>is</strong> rooted with fungal MULE hop78 (Chalvet et al. 2003). Bootstrap values are shown at nodes.<br />

All <strong>genes</strong> shown are transposon-d<strong>is</strong>sociated mudrA <strong>genes</strong> (TDMs) and belong to the MUG gene<br />

<strong>family</strong> except for hop78 and At2g07100 and At1g12720 which are transposon-associated mudrA<br />

<strong>genes</strong> (TAMs). Genes indicated by one or two aster<strong>is</strong>ks are associated with ESTs and cDNAs,<br />

respectively. Oryza sativa subspecies japonica gene names are shown beginning with “R”,<br />

Oryza sativa subspecies indica with “Ri”, Arabidops<strong>is</strong> thaliana with “A”, Zea mays with “Z”,<br />

Medicago truncatula with “M”, and Populus trichocarpa with “P”.<br />

Figure 4 Syntenic 100 kbp region surrounding MUG1 <strong>genes</strong>. The MUG1 <strong>genes</strong> are depicted in<br />

red. Genes shown with the same shading/pattern share protein-level similarity according to an<br />

all-against-all BLASTP search. A color version <strong>of</strong> th<strong>is</strong> figure <strong>is</strong> available as supplementary fig.<br />

2. See supplementary table 4 for E values. Vertical lines indicate gaps in the sequence. The<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Medicago sequence represents an unordered contig. Poplar I and II refer, respectively, to the<br />

molecules scaffold_201 containing the MUG1 gene eugene3.02010019, and LG_XIII containing<br />

the MUG1 gene grail3.0016035401. Note that diagram <strong>is</strong> not to scale.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Supplementary Figure 1 Neighbor-joining tree <strong>of</strong> the mudrA super<strong>family</strong>. The tree depicted <strong>is</strong> a<br />

majority rule consensus tree constructed from an alignment <strong>of</strong> the MUDR domain regions <strong>of</strong><br />

mudrA-like <strong>genes</strong>. The tree <strong>is</strong> rooted with fungal MULE hop78 (Chalvet et al. 2003). Bootstrap<br />

values are shown at nodes. Oryza sativa subspecies japonica gene names are shown beginning<br />

with “R”, Oryza sativa subspecies indica with “Ri”, Arabidops<strong>is</strong> thaliana with “A”, Zea mays<br />

with “Z”, Medicago truncatula with “M”, and Populus trichocarpa with “P”.<br />

Supplementary Figure 2 Color version <strong>of</strong> Figure 4. Genes shown in the same color share<br />

protein-level similarity according to an all-against-all BLASTP search. See supplementary table<br />

4 for E values.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Table 1 The number and d<strong>is</strong>tribution <strong>of</strong> mudrA <strong>genes</strong> in rice and Arabidops<strong>is</strong>.<br />

Arabidops<strong>is</strong><br />

Rice<br />

TAMs TDMs pTAMs<br />

total 143 45 18<br />

ORF 77 36 9<br />

expressed 2 18 0<br />

cDNA 1 11 0<br />

EST 2 18 0<br />

total 498 76 60<br />

ORF 493 76 60<br />

expressed 32 28 6<br />

cDNA 26 25 3<br />

EST 12 20 4<br />

TAMs, TDMs, and pTAMs refer to transposon-associated mudrA <strong>genes</strong>, transposon-d<strong>is</strong>sociated<br />

mudrA <strong>genes</strong>, and potentially transposon-associated mudrA <strong>genes</strong>, respectively. The latter class<br />

includes elements that lack well-defined TSDs but have other transposon features such as TIRs<br />

and repetitiveness extending beyond the coding region.<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


a<br />

MUG<br />

FAR1<br />

FAR1<br />

MUG<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


97<br />

97<br />

97<br />

94<br />

97<br />

89<br />

90<br />

hop78<br />

At2g07100<br />

97<br />

At1g12720<br />

R2_001_0826**<br />

97<br />

R6_003_1496**<br />

R10_001_0051**<br />

97<br />

At5g34853*<br />

At3g05850**<br />

97<br />

At5g48965**<br />

97<br />

At3g06940**<br />

R1_005_0969**<br />

97<br />

At5g16505**<br />

At1g06740*<br />

94<br />

At2g30640*<br />

ZmGSStuc04-27- 13378.1*<br />

97<br />

R12_011_1237**<br />

97<br />

RiCtg035198<br />

At3g04605**<br />

95<br />

Mth2-35g8*<br />

87<br />

Pgrail3.0016035401*<br />

97<br />

Peugene3.02010019<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013


ice<br />

Arabidops<strong>is</strong><br />

poplar I<br />

poplar II<br />

Medicago<br />

Downloaded from<br />

http://mbe.oxfordjournals.org/<br />

by guest on July 15, 2013

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!