phylogenetic relationships and classification of didelphid marsupials ...

phylogenetic relationships and classification of didelphid marsupials ... phylogenetic relationships and classification of didelphid marsupials ...

digitallibrary.amnh.org
from digitallibrary.amnh.org More from this publisher
07.05.2013 Views

10 BULLETIN AMERICAN MUSEUM OF NATURAL HISTORY NO. 322 TABLE 3 Primers Used to Amplify and Sequence BRCA1 and vWF Primer name Sequence BRCA1- F1 59 TCATTACTGCCTGAGATCACCAG BRCA1- F47 59 TATTGCCTAACACAGACAGCAT BRCA1- F593 59 CAACAATATTGAAGACAAAATATTAGGAAA BRCA1- F1163 59 ATGARACWGAACTACWGATCGATAG BRCA1- F1163a 59 AATGAGACTGAACTACAGATCGAT BRCA1- F1697 59 TTWGATGRTTGTTCATCYRAAAACAC BRCA1- R743 59 TTGATGAAATCCTCAGGCTGYAGGT BRCA1- R1218 59 GAAGYCTTCTGCTGCGTCTGA BRCA1- R1343 59 CTAACATTTGATCACTATCAGTAG BRCA1- R1780 59 TAAATAYTGGGTRTCRAGTTCACT BRCA1- R2078 59 GAAATTTCCTGGTTGTTTCCAGCAA BRCA1- R2151 59 TCCTTTTGATYAGGAACTTGTGAAATT VWF- F104 59 GGTGTGATGGAGCGTTTACACATCTC VWF- F120 59 GACTTGGCYTTYCTSYTGGATGGCTC VWF- F557 59 CCTGGGCTACCTCTGTGACCTGGT VWF- R655 59 CTTCTAGCACAAACACCACATCCAGAACCA VWF- R743 59 CTCACATCCATYCGTTGCATCA VWF- R1141 59 ATCTCATCSGTRGCRGGATTGC down protocol as described in Jansa and Voss (2000). Reamplification reactions were performed using Taq DNA polymerase (Promega Corp.) in 25 ml reactions for 30 PCR cycles. The resulting PCR products were sequenced in both directions using amplification primers and dye-terminator chemistry on an AB 3700 automated sequencer. We searched the draft Monodelphis domestica genome using the BLAT (modified BLAST; Kent, 2002) algorithm to determine the copy number and chromosomal location of query sequences from M. brevicaudata. As reported on the Ensemble database (www. ensemble.org/index.html), the available reference genome (assembly MonDom5) was released on October 2006 and has a base coverage of approximately 7.33. ALIGNMENT AND PHYLOGENETIC ANALY- SIS: We aligned DNA sequences with reference to translated amino-acid sequences using MacClade 4.08 (http://macclade.org). Aligned sequences were then analyzed phylogenetically using maximum parsimony (MP) as implemented by PAUP* ver. 4.0b10 (Swofford, 1998), maximum likelihood (ML) as implemented by GARLI ver. 0.95 (Zwickl, 2006), and Bayesian inference as implemented by MrBayes ver. 3.1.1 (Ronquist and Huelsenbeck, 2003). For MP analyses, all molecular characters were treated as unordered and equally weighted, and all tree searches were heuristic with at least 10 replicates of random stepwise taxon addition followed by tree bisection-reconnection (TBR) branch swapping. To choose the best models of nucleotide substitution for ML and Bayesian analyses, we examined the fit of various models separately for each of our five gene partition (IRBP, vWF, BRCA1, DMP1, and first and second codon positions of RAG1) based on neighbor-joining trees of Jukes-Cantor-corrected distances using both hierarchical likelihood-ratio tests (hLRTs) and the Akaike Information Criterion (AIC) as implemented in ModelTest 3.7 (Posada and Crandall, 1998). Where the two approaches disagreed on model choice, we used the model selected by the AIC for reasons outlined by Posada and Buckley (2004). For both ML and Bayesian searches, the best-fit model was specified, but model parameter values were not fixed. For ML analyses, we conducted three independent runs of geneticalgorithm searches in GARLI, with random starting topologies and automatic termination after 10,000 generations with no improvement in log-likelihood scores. For Bayesian analysis of each gene partition, we conducted two independent runs of Metrop-

2009 VOSS AND JANSA: DIDELPHID MARSUPIALS 11 olis-coupled Markov-chain Monte Carlo (MCMCMC), each with one cold and three incrementally heated chains. For each run, we assumed uniform-interval priors for all parameters, except base composition, which assumed a Dirichlet prior. Runs were allowed to proceed for 5 3 10 6 generations, and trees were sampled every 100 generations. We evaluated the burn-in for each run, and pooled the post-burn-in trees to calculate estimated parameter distributions and posterior probabilities for each node. We assessed nodal support from MP and ML analyses using nonparametric bootstrapping (Felsenstein, 1985). Bootstrap values for the parsimony analyses (MPBS) were calculated in PAUP* from 1000 pseudoreplicated datasets, each of which was analyzed heuristically with 10 random-addition replicates with TBR branch swapping. Bootstrap values for the likelihood analysis (MLBS) were calculated in GARLI using genetic-algorithm searches of 1000 pseudoreplicated datasets, allowing model parameters to be estimated for each pseudoreplicate. We analyzed the combined-gene dataset using ML as implemented in RAxML-VI- HPC (ver. 2.2.3; Stamatakis, 2006). We specified the GTRMIX model, which performs initial tree inference using a GTRCAT approximation, with final topology evaluation performed under a GTRGAMMA model. We allowed parameters to be estimated independently across the five gene partitions. To evaluate nodal support for this combined-gene, mixed-model analysis, we performed 1000 bootstrap replicates, again allowing model parameters to be estimated independently across the five genes. We also performed a Bayesian analysis of the combined-gene dataset, using the same MCMCMC settings given above. For this analysis, we specified the best-fit model for each gene, decoupled estimation of substitution parameters across the partitions, and allowed each gene to assume a separate rate. We analyzed the nonmolecular (morphological + karyotypic) data alone and in combination with the molecular data using MP and Bayesian approaches. Due to the large number of suboptimal trees recovered from parsimony analysis of the nonmolecular dataset, we first performed 1000 random- taxon-addition replicates with TBR branch swapping, but saved only 10 trees per replicate. We then used this pool of 10,000 trees as the starting point for an unbounded heuristic search. Parsimony analysis of the combined (molecular + nonmolecular) dataset did not exhibit this problem; therefore, we analyzed the combined dataset using unbounded heuristic searches with 1000 replicates of random-taxon addition and TBR branch swapping. For Bayesian analysis of the morphological dataset alone, we specified the Mkv model (Lewis, 2001) with a Cdistributed rate parameter and ascertainment bias corrected for omission of constant characters (lset coding 5 variable). For Bayesian analysis of the combined (molecular + nonmolecular) dataset, we specified this same model for the morphological data and applied the best-fit model to each of the five gene partitions. As above, we allowed parameters to be estimated independently across all partitions and used the same MCMCMC settings. Toassesstheimpactofthelargeamountof missing data from Chacodelphys, we performed all analyses that included nonmolecular characters with and without this taxon. ONLINE DATA ARCHIVES: All of the new molecular sequences produced for this study have been deposited in GenBank with accession numbers FJ159278–FJ159314 and FJ 159316–FJ159370 (for a complete list of GenBank accession numbers of all analyzed sequences, old and new, see table 9). All of our datasets, selected ML and Bayesian analyses, and associated trees have been deposited on TreeBase (http://www.treebase. org) with accession numbers S2164, M4107, and M4108. Our nonmolecular data matrix has also been deposited on MorphoBank (http://morphobank.geongrid.org) with accession number X600. COMPARATIVE MORPHOLOGY The literature on didelphid comparative morphology is widely scattered, and no adequate review of this topic has yet been published. Although the following accounts are far from comprehensive, they include most of the anatomical features that have been surveyed widely among extant genera and that provide relevant taxonomic infor-

10 BULLETIN AMERICAN MUSEUM OF NATURAL HISTORY NO. 322<br />

TABLE 3<br />

Primers Used to Amplify <strong>and</strong> Sequence BRCA1 <strong>and</strong> vWF<br />

Primer name Sequence<br />

BRCA1- F1 59 TCATTACTGCCTGAGATCACCAG<br />

BRCA1- F47 59 TATTGCCTAACACAGACAGCAT<br />

BRCA1- F593 59 CAACAATATTGAAGACAAAATATTAGGAAA<br />

BRCA1- F1163 59 ATGARACWGAACTACWGATCGATAG<br />

BRCA1- F1163a 59 AATGAGACTGAACTACAGATCGAT<br />

BRCA1- F1697 59 TTWGATGRTTGTTCATCYRAAAACAC<br />

BRCA1- R743 59 TTGATGAAATCCTCAGGCTGYAGGT<br />

BRCA1- R1218 59 GAAGYCTTCTGCTGCGTCTGA<br />

BRCA1- R1343 59 CTAACATTTGATCACTATCAGTAG<br />

BRCA1- R1780 59 TAAATAYTGGGTRTCRAGTTCACT<br />

BRCA1- R2078 59 GAAATTTCCTGGTTGTTTCCAGCAA<br />

BRCA1- R2151 59 TCCTTTTGATYAGGAACTTGTGAAATT<br />

VWF- F104 59 GGTGTGATGGAGCGTTTACACATCTC<br />

VWF- F120 59 GACTTGGCYTTYCTSYTGGATGGCTC<br />

VWF- F557 59 CCTGGGCTACCTCTGTGACCTGGT<br />

VWF- R655 59 CTTCTAGCACAAACACCACATCCAGAACCA<br />

VWF- R743 59 CTCACATCCATYCGTTGCATCA<br />

VWF- R1141 59 ATCTCATCSGTRGCRGGATTGC<br />

down protocol as described in Jansa <strong>and</strong><br />

Voss (2000). Reamplification reactions were<br />

performed using Taq DNA polymerase (Promega<br />

Corp.) in 25 ml reactions for 30 PCR<br />

cycles. The resulting PCR products were<br />

sequenced in both directions using amplification<br />

primers <strong>and</strong> dye-terminator chemistry on<br />

an AB 3700 automated sequencer.<br />

We searched the draft Monodelphis domestica<br />

genome using the BLAT (modified<br />

BLAST; Kent, 2002) algorithm to determine<br />

the copy number <strong>and</strong> chromosomal location<br />

<strong>of</strong> query sequences from M. brevicaudata. As<br />

reported on the Ensemble database (www.<br />

ensemble.org/index.html), the available reference<br />

genome (assembly MonDom5) was<br />

released on October 2006 <strong>and</strong> has a base<br />

coverage <strong>of</strong> approximately 7.33.<br />

ALIGNMENT AND PHYLOGENETIC ANALY-<br />

SIS: We aligned DNA sequences with reference<br />

to translated amino-acid sequences<br />

using MacClade 4.08 (http://macclade.org).<br />

Aligned sequences were then analyzed <strong>phylogenetic</strong>ally<br />

using maximum parsimony<br />

(MP) as implemented by PAUP* ver.<br />

4.0b10 (Sw<strong>of</strong>ford, 1998), maximum likelihood<br />

(ML) as implemented by GARLI ver.<br />

0.95 (Zwickl, 2006), <strong>and</strong> Bayesian inference<br />

as implemented by MrBayes ver. 3.1.1<br />

(Ronquist <strong>and</strong> Huelsenbeck, 2003). For MP<br />

analyses, all molecular characters were treated<br />

as unordered <strong>and</strong> equally weighted, <strong>and</strong><br />

all tree searches were heuristic with at least 10<br />

replicates <strong>of</strong> r<strong>and</strong>om stepwise taxon addition<br />

followed by tree bisection-reconnection<br />

(TBR) branch swapping. To choose the best<br />

models <strong>of</strong> nucleotide substitution for ML <strong>and</strong><br />

Bayesian analyses, we examined the fit <strong>of</strong><br />

various models separately for each <strong>of</strong> our five<br />

gene partition (IRBP, vWF, BRCA1, DMP1,<br />

<strong>and</strong> first <strong>and</strong> second codon positions <strong>of</strong><br />

RAG1) based on neighbor-joining trees <strong>of</strong><br />

Jukes-Cantor-corrected distances using both<br />

hierarchical likelihood-ratio tests (hLRTs)<br />

<strong>and</strong> the Akaike Information Criterion (AIC)<br />

as implemented in ModelTest 3.7 (Posada<br />

<strong>and</strong> Cr<strong>and</strong>all, 1998). Where the two approaches<br />

disagreed on model choice, we used<br />

the model selected by the AIC for reasons<br />

outlined by Posada <strong>and</strong> Buckley (2004). For<br />

both ML <strong>and</strong> Bayesian searches, the best-fit<br />

model was specified, but model parameter<br />

values were not fixed. For ML analyses, we<br />

conducted three independent runs <strong>of</strong> geneticalgorithm<br />

searches in GARLI, with r<strong>and</strong>om<br />

starting topologies <strong>and</strong> automatic termination<br />

after 10,000 generations with no improvement<br />

in log-likelihood scores. For<br />

Bayesian analysis <strong>of</strong> each gene partition, we<br />

conducted two independent runs <strong>of</strong> Metrop-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!