21.02.2014 Views

Molecular Systematics of Nematodes - Russian Journal of Nematology

Molecular Systematics of Nematodes - Russian Journal of Nematology

Molecular Systematics of Nematodes - Russian Journal of Nematology

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Molecular</strong> <strong>Systematics</strong> <strong>of</strong><br />

<strong>Nematodes</strong><br />

Databases and BLAST<br />

Sergei A. Subbotin<br />

Department <strong>of</strong> <strong>Nematology</strong>, University <strong>of</strong> California, Riverside, USA,<br />

Gent University, Gent, Belgium<br />

Center <strong>of</strong> Parasitology, <strong>Russian</strong> Academy <strong>of</strong> Sciences, Moscow, Russia


DataBase<br />

The one <strong>of</strong> the most important things<br />

in molecular biology: the comparison<br />

<strong>of</strong> data sequenced by yourself with all<br />

known sequences collected in a<br />

certain database. This procedure is<br />

called homology search. Numerous<br />

genetic databases are spread out all<br />

over the world. The probably biggest<br />

nucleic acid databases are:<br />

http://www.embl-heidelberg.de/<br />

http://www.ncbi.nlm.nih.gov/<br />

http://www.nig.ac.jp/


GenBank


GenBank<br />

NCBI Resources NCBI (National Center<br />

for Biotechnology Information) is a<br />

resource for molecular biology<br />

information. NCBI creates and maintains<br />

public databases, conducts research in<br />

computational biology, develops s<strong>of</strong>tware<br />

tools for analyzing genome data, and<br />

disseminates biomedical information.<br />

The NCBI site is constantly being<br />

updated and some <strong>of</strong> the changes<br />

include new databases and tools for data<br />

mining.<br />

NCBI <strong>of</strong>fers several searchable literature,<br />

molecular and genomic databases and<br />

many bioinformatic tools. An up-to-date<br />

list <strong>of</strong> databases and tools can be found<br />

on the NCBI Sitemap.<br />

Location: www.ncbi.nlm.nih.gov


GenBank<br />

from 1982 to the present, the number <strong>of</strong> bases in GenBank has doubled approximately every 18 months


GenBank<br />

NCBI Sitemap


GenBank


Entrez<br />

•Entrez: Entrez is a retrieval system<br />

designed for searching several linked<br />

databases <strong>of</strong> the NCBI for the major<br />

databases, including PubMed, Nucleotide<br />

and Protein Sequences, Protein Structures,<br />

Complete Genomes, Taxonomy, and others.<br />

. Entrez categories can be searched using<br />

subject, author, or unique identifiers such<br />

as accession numbers, phrases, truncated<br />

terms, and combined sets. There is also a<br />

simple Entrez tutorial.


GenBank<br />

PubMed: Allows searching<br />

by author names, journal<br />

titles, and a new<br />

Preview/Index option.<br />

PubMed database provides<br />

access to over 12 million<br />

MEDLINE citations back to<br />

the mid-1960's. It includes<br />

History and Clipboard<br />

options which may<br />

enhance your search<br />

session. NCBI provides a<br />

simple PubMed tutorial.


PubMed<br />

Search : Perry RN


GenBank<br />

Nucleotide Database: The<br />

nucleotide database contains<br />

sequence data from GenBank,<br />

EMBL, and DDBJ, the members<br />

<strong>of</strong> the tripartite, international<br />

collaboration <strong>of</strong> sequence<br />

databases. Nucleotide allows the<br />

user to retrieve nucleotide<br />

sequences in both GenBank and<br />

FASTA formats.<br />

The Entrez Nucleotide database<br />

is a collection <strong>of</strong> sequences from<br />

several sources, including<br />

GenBank, RefSeq, and PDB. The<br />

number <strong>of</strong> bases in these<br />

databases continues to grow at an<br />

exponential rate. As <strong>of</strong> April 2006,<br />

there are over 130 billion bases in<br />

GenBank and RefSeq alone.


GenBank<br />

Taxonomy Database: The<br />

taxonomy database contains<br />

the names <strong>of</strong> all organisms<br />

that are represented in the<br />

genetic databases with at<br />

least one nucleotide or<br />

protein sequence. You can<br />

search for nucleotide, protein,<br />

and structure data from<br />

specific taxonomic groupings,<br />

from the domain level<br />

(archaea,<br />

bacteria,<br />

eukaryota) down to the<br />

species level.


GenBank<br />

Taxonomy Browser is…<br />

browser for the major divisions <strong>of</strong> living organisms<br />

(archaea, bacteria, eukaryota, viruses)<br />

• taxonomy information such as genetic codes<br />

• molecular data on extinct organisms<br />

http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/


Taxonomy Browser


Blast (Sequence Similarity Search)<br />

BLAST: BLAST (Basic<br />

Local Alignment Search<br />

Tool) is a set <strong>of</strong> similarity<br />

search programs<br />

designed to explore all <strong>of</strong><br />

the available sequence<br />

databases regardless <strong>of</strong><br />

whether the query is<br />

protein or DNA. For a<br />

better understanding <strong>of</strong><br />

BLAST you can refer to<br />

the BLAST Course which<br />

explains the basics <strong>of</strong> the<br />

BLAST algorithm, or to<br />

the NCBI BLAST tutorial.<br />

http://www.ncbi.nlm.nih.gov/BLAST/


Blast (Sequence Similarity Search)<br />

Sequence alignments provide a<br />

powerful way to compare novel<br />

sequences with previously<br />

characterized genes. Both<br />

functional and evolutionary<br />

information can be inferred from<br />

well designed queries and<br />

alignments. BLAST 2.0, (Basic<br />

Local Alignment Search Tool),<br />

provides a method for rapid<br />

searching <strong>of</strong> nucleotide and<br />

protein databases.<br />

BLAST - the most popular datamining<br />

tool ever!<br />

For non-coding DNA - use blastn.<br />

Never forget that blastn is only for<br />

closely related DNA sequences<br />

(more than 70 percent identical).


Blast (Sequence Similarity Search)<br />

1. Point your browser to the NCBI BLAST<br />

server<br />

at:<br />

http://www.ncbi.nlm.nih.gov/BLAST<br />

2. Under the Nucleotide heading, click the<br />

Nucleotide-Nucleotide (blastn) link<br />

3. Paste your sequence in the search<br />

window<br />

4. Click Blast! button<br />

5. Click Format button (and wait)<br />

An overview <strong>of</strong> the BLAST output<br />

1. A graphic display: Shows you where<br />

your query is similar to other sequences<br />

2. A hit list: The name <strong>of</strong> sequences<br />

similar to your query, ranked by<br />

similarity<br />

3. The alignments: Every alignment<br />

between your query and the reported<br />

hits<br />

4. The parameters: A list <strong>of</strong> the varios<br />

parameters used for the search


Blast (Sequence Similarity Search)<br />

The graphic display<br />

• Your query sequence in on the top<br />

• Each bar represents the portion <strong>of</strong> another<br />

sequence similar to your query sequence<br />

• Red bars indicate the most similar sequences,<br />

pink bars indicate matches that are a bit less<br />

good, and green bars indicate matches that<br />

are not impressive at all. Blue and black hits<br />

(not here) are bad hits<br />

The hit list<br />

Each line contains four important features:<br />

• The sequence accession number and the<br />

names: this hyperlink takes you to the<br />

database entry that contains this sequence<br />

• Description<br />

• The bit score: a measure <strong>of</strong> the statistical<br />

significance <strong>of</strong> the alignment. The higher the<br />

bit score, the more similar the two sequences.<br />

Matches below 50 bits are very unreliable<br />

• The E-value (the expectation value): by<br />

estimating the number <strong>of</strong> times you could<br />

have expected such a good much only by<br />

chance. The lower the E-value, the more<br />

similar the sequences. If the E-value is less<br />

than 1 X 10 -50 ,the hit is very similar to the<br />

query sequence and is very likely to be<br />

evolutionarily related.


GenBank<br />

1. Locus gives us the locus name<br />

2. Definition provides a short definition <strong>of</strong> the<br />

gene<br />

3. Accession lists the accession number, a unique<br />

identifier within and across various databases.<br />

4. Source divulges the common name <strong>of</strong> the<br />

relevant organism to which the sequence<br />

belongs<br />

5. Organism gives a more complete identification<br />

<strong>of</strong> the organism, complete with its technical<br />

(!!!) taxonomic classification<br />

6. Reference introduces a section ehere the<br />

credits for the sequence determination are<br />

given<br />

7. Features describe precisely the gene regions<br />

and the associated biological properties<br />

Select FASTA in Display window!


GenBank<br />

Select sequence, copy and paste<br />

in a new text file. Create a file<br />

with several sequences including<br />

an outgroup taxa sequence.


TreeBase<br />

TreeBASE is a relational database designed to manage and explore information on<br />

phylogenetic relationships. Its main function is to store published phylogenetic trees and<br />

data matrices. It also includes bibliographic information on phylogenetic studies, and some<br />

details on taxa, characters, algorithms used, and analyses performed. The database is<br />

designed to allow retrieval and recombination <strong>of</strong> trees and data from different studies,<br />

and it can be explored interactively using trees included in the database. TreeBASE<br />

therefore provides a means <strong>of</strong> assessing and synthesizing phylogenetic knowledge<br />

http://www.treebase.org/treebase/


Useful databases for nematologists<br />

WormBase (http://www.wormbase.org) is the central data<br />

repository for information about Caenorhabditis elegans and related<br />

nematodes. As a model organism database, WormBase extends<br />

beyond the genomic sequence, integrating experimental results<br />

with extensively annotated view <strong>of</strong> genome. WormBase also<br />

provides large array <strong>of</strong> research and analysis tools.<br />

NemaGene (http://www.nematode.net) is a web-accessible<br />

resource for investigating gene sequences from nematode<br />

genomes. The database is an outgrowth <strong>of</strong> the parasitic nematode<br />

EST project. ESTs (Expressed Sequence Tag) are usually shorter<br />

than the full-length mRNAs from which they are derived and are<br />

prone to sequencing errors. The database provides EST cluster<br />

consensus sequence, enhanced online BLAST search tools and<br />

functional classification <strong>of</strong> cluster sequences.


Useful databases for nematologists<br />

NEMBASE (http://www.nematodes.org) is a database providing<br />

access to the sequence and associated meta-data currently being<br />

generated as part <strong>of</strong> the parasitic nematode EST project. Users<br />

may query the database on the basis <strong>of</strong> BLAST annotation,<br />

sequence similarity or expression pr<strong>of</strong>iles. NEMBASE also features<br />

an interactive which allows the simultaneous display and analysis <strong>of</strong><br />

the relative similarity relationships <strong>of</strong> groups <strong>of</strong> sequences to others<br />

databases.<br />

NemAToL (http://nematol.unh.edu) is an open database<br />

dedicated to collecting, archiving and organizing video images<br />

other morphological information, DNA sequences, alignments, and<br />

other reference materials for study <strong>of</strong> the phylogeny and diversity,<br />

and taxonomy, systematics, and ecology <strong>of</strong> nematodes.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!