21.01.2013 Views

note - FIZ Karlsruhe

note - FIZ Karlsruhe

note - FIZ Karlsruhe

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Introduction to similarity searching<br />

Introduction<br />

Similarity Searching<br />

Similarity searching is an advanced form of sequence searching and there are several established<br />

algorithms 1 available for use in sequence databases. A typical approach is to take a given sequence<br />

query and compare it algorithmically with a complete database of sequences using a detailed<br />

similarity scoring matrix. Each database record is assigned a score, relative to the query, and<br />

those answers exceeding a defined minimum similarity threshold are delivered to the searcher<br />

ranked by a similarity or identity score.<br />

Sequences which are identical to a similarity search query will have the highest score, i.e. 100%<br />

similarity. However, the key benefit of similarity searching is that it will also deliver answers at<br />

less than 100% similarity which will often be similar enough to be of interest to the searcher. This<br />

includes answers with insertions, deletions and non-matching regions. It is often impossible to<br />

achieve this result via the SCM approach.<br />

There are two basic options for similarity searching in DGENE.<br />

� RUN BLAST – software using the NCBI BLAST algorithm<br />

� RUN GETSIM – software based upon the FASTA algorithm<br />

NCBI BLAST<br />

The RUN BLAST function makes use of the industry-standard BLAST methodology, and is used<br />

in DGENE with the permission of the National Center for Biotechnology Information (NCBI).<br />

The Basic Local Alignment Search Tool (BLAST) was described by Altschul et al. 2 in 1997.<br />

BLAST search modes<br />

� Protein similarity (BLASTP) (/SQP) [default]<br />

� Nucleic acid similarity (BLASTN) (/SQN)<br />

� Translated protein similarity 3 (TBLASTN) (/TSQN)<br />

RUN BLAST offers both offline BATCH (page 79) and ALERT (page 83) options.<br />

1 See for example: http://www.ebi.ac.uk/Tools/similarity.html<br />

2 Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and<br />

David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs."<br />

Nucleic Acids Res. 25:3389-3402. See: http://www.ncbi.nlm.nih.gov/pubmed/9254694.<br />

3 A protein query sequence searched against a nucleotide database translated in all three reading frames.<br />

Page 12 | GENESEQ on STN (DGENE) Workshop Manual

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!