Molecular Systematics of Nematodes - Russian Journal of Nematology

Molecular Systematics of 

Nematodes 

Databases and BLAST 

Sergei A. Subbotin 

Department of Nematology, University of California, Riverside, USA, 

Gent University, Gent, Belgium 

Center of Parasitology, Russian Academy of Sciences, Moscow, Russia

DataBase 

The one of the most important things 

in molecular biology: the comparison 

of data sequenced by yourself with all 

known sequences collected in a 

certain database. This procedure is 

called homology search. Numerous 

genetic databases are spread out all 

over the world. The probably biggest 

nucleic acid databases are: 

http://www.embl-heidelberg.de/ 

http://www.ncbi.nlm.nih.gov/ 

http://www.nig.ac.jp/

GenBank

GenBank 

NCBI Resources NCBI (National Center 

for Biotechnology Information) is a 

resource for molecular biology 

information. NCBI creates and maintains 

public databases, conducts research in 

computational biology, develops software 

tools for analyzing genome data, and 

disseminates biomedical information. 

The NCBI site is constantly being 

updated and some of the changes 

include new databases and tools for data 

mining. 

NCBI offers several searchable literature, 

molecular and genomic databases and 

many bioinformatic tools. An up-to-date 

list of databases and tools can be found 

on the NCBI Sitemap. 

Location: www.ncbi.nlm.nih.gov

GenBank 

from 1982 to the present, the number of bases in GenBank has doubled approximately every 18 months

GenBank 

NCBI Sitemap

GenBank

Entrez 

•Entrez: Entrez is a retrieval system 

designed for searching several linked 

databases of the NCBI for the major 

databases, including PubMed, Nucleotide 

and Protein Sequences, Protein Structures, 

Complete Genomes, Taxonomy, and others. 

. Entrez categories can be searched using 

subject, author, or unique identifiers such 

as accession numbers, phrases, truncated 

terms, and combined sets. There is also a 

simple Entrez tutorial.

GenBank 

PubMed: Allows searching 

by author names, journal 

titles, and a new 

Preview/Index option. 

PubMed database provides 

access to over 12 million 

MEDLINE citations back to 

the mid-1960's. It includes 

History and Clipboard 

options which may 

enhance your search 

session. NCBI provides a 

simple PubMed tutorial.

PubMed 

Search : Perry RN

GenBank 

Nucleotide Database: The 

nucleotide database contains 

sequence data from GenBank, 

EMBL, and DDBJ, the members 

of the tripartite, international 

collaboration of sequence 

databases. Nucleotide allows the 

user to retrieve nucleotide 

sequences in both GenBank and 

FASTA formats. 

The Entrez Nucleotide database 

is a collection of sequences from 

several sources, including 

GenBank, RefSeq, and PDB. The 

number of bases in these 

databases continues to grow at an 

exponential rate. As of April 2006, 

there are over 130 billion bases in 

GenBank and RefSeq alone.

GenBank 

Taxonomy Database: The 

taxonomy database contains 

the names of all organisms 

that are represented in the 

genetic databases with at 

least one nucleotide or 

protein sequence. You can 

search for nucleotide, protein, 

and structure data from 

specific taxonomic groupings, 

from the domain level 

(archaea, 

bacteria, 

eukaryota) down to the 

species level.

GenBank 

Taxonomy Browser is… 

browser for the major divisions of living organisms 

(archaea, bacteria, eukaryota, viruses) 

• taxonomy information such as genetic codes 

• molecular data on extinct organisms 

http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/

Taxonomy Browser

Blast (Sequence Similarity Search) 

BLAST: BLAST (Basic 

Local Alignment Search 

Tool) is a set of similarity 

search programs 

designed to explore all of 

the available sequence 

databases regardless of 

whether the query is 

protein or DNA. For a 

better understanding of 

BLAST you can refer to 

the BLAST Course which 

explains the basics of the 

BLAST algorithm, or to 

the NCBI BLAST tutorial. 

http://www.ncbi.nlm.nih.gov/BLAST/


Sequence alignments provide a 

powerful way to compare novel 

sequences with previously 

characterized genes. Both 

functional and evolutionary 

information can be inferred from 

well designed queries and 

alignments. BLAST 2.0, (Basic 

Local Alignment Search Tool), 

provides a method for rapid 

searching of nucleotide and 

protein databases. 

BLAST - the most popular datamining 

tool ever! 

For non-coding DNA - use blastn. 

Never forget that blastn is only for 

closely related DNA sequences 

(more than 70 percent identical).


1. Point your browser to the NCBI BLAST 

server 

at: 

http://www.ncbi.nlm.nih.gov/BLAST 

2. Under the Nucleotide heading, click the 

Nucleotide-Nucleotide (blastn) link 

3. Paste your sequence in the search 

window 

4. Click Blast! button 

5. Click Format button (and wait) 

An overview of the BLAST output 

1. A graphic display: Shows you where 

your query is similar to other sequences 

2. A hit list: The name of sequences 

similar to your query, ranked by 

similarity 

3. The alignments: Every alignment 

between your query and the reported 

hits 

4. The parameters: A list of the varios 

parameters used for the search


The graphic display 

• Your query sequence in on the top 

• Each bar represents the portion of another 

sequence similar to your query sequence 

• Red bars indicate the most similar sequences, 

pink bars indicate matches that are a bit less 

good, and green bars indicate matches that 

are not impressive at all. Blue and black hits 

(not here) are bad hits 

The hit list 

Each line contains four important features: 

• The sequence accession number and the 

names: this hyperlink takes you to the 

database entry that contains this sequence 

• Description 

• The bit score: a measure of the statistical 

significance of the alignment. The higher the 

bit score, the more similar the two sequences. 

Matches below 50 bits are very unreliable 

• The E-value (the expectation value): by 

estimating the number of times you could 

have expected such a good much only by 

chance. The lower the E-value, the more 

similar the sequences. If the E-value is less 

than 1 X 10 -50 ,the hit is very similar to the 

query sequence and is very likely to be 

evolutionarily related.

GenBank 

1. Locus gives us the locus name 

2. Definition provides a short definition of the 

gene 

3. Accession lists the accession number, a unique 

identifier within and across various databases. 

4. Source divulges the common name of the 

relevant organism to which the sequence 

belongs 

5. Organism gives a more complete identification 

of the organism, complete with its technical 

(!!!) taxonomic classification 

6. Reference introduces a section ehere the 

credits for the sequence determination are 

given 

7. Features describe precisely the gene regions 

and the associated biological properties 

Select FASTA in Display window!

GenBank 

Select sequence, copy and paste 

in a new text file. Create a file 

with several sequences including 

an outgroup taxa sequence.

TreeBase 

TreeBASE is a relational database designed to manage and explore information on 

phylogenetic relationships. Its main function is to store published phylogenetic trees and 

data matrices. It also includes bibliographic information on phylogenetic studies, and some 

details on taxa, characters, algorithms used, and analyses performed. The database is 

designed to allow retrieval and recombination of trees and data from different studies, 

and it can be explored interactively using trees included in the database. TreeBASE 

therefore provides a means of assessing and synthesizing phylogenetic knowledge 

http://www.treebase.org/treebase/

Useful databases for nematologists 

WormBase (http://www.wormbase.org) is the central data 

repository for information about Caenorhabditis elegans and related 

nematodes. As a model organism database, WormBase extends 

beyond the genomic sequence, integrating experimental results 

with extensively annotated view of genome. WormBase also 

provides large array of research and analysis tools. 

NemaGene (http://www.nematode.net) is a web-accessible 

resource for investigating gene sequences from nematode 

genomes. The database is an outgrowth of the parasitic nematode 

EST project. ESTs (Expressed Sequence Tag) are usually shorter 

than the full-length mRNAs from which they are derived and are 

prone to sequencing errors. The database provides EST cluster 

consensus sequence, enhanced online BLAST search tools and 

functional classification of cluster sequences.

Useful databases for nematologists 

NEMBASE (http://www.nematodes.org) is a database providing 

access to the sequence and associated meta-data currently being 

generated as part of the parasitic nematode EST project. Users 

may query the database on the basis of BLAST annotation, 

sequence similarity or expression profiles. NEMBASE also features 

an interactive which allows the simultaneous display and analysis of 

the relative similarity relationships of groups of sequences to others 

databases. 

NemAToL (http://nematol.unh.edu) is an open database 

dedicated to collecting, archiving and organizing video images 

other morphological information, DNA sequences, alignments, and 

other reference materials for study of the phylogeny and diversity, 

and taxonomy, systematics, and ecology of nematodes.

Molecular Systematics of Nematodes - Russian Journal of Nematology

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?