note - FIZ Karlsruhe
note - FIZ Karlsruhe
note - FIZ Karlsruhe
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Introduction to similarity searching<br />
Introduction<br />
Similarity Searching<br />
Similarity searching is an advanced form of sequence searching and there are several established<br />
algorithms 1 available for use in sequence databases. A typical approach is to take a given sequence<br />
query and compare it algorithmically with a complete database of sequences using a detailed<br />
similarity scoring matrix. Each database record is assigned a score, relative to the query, and<br />
those answers exceeding a defined minimum similarity threshold are delivered to the searcher<br />
ranked by a similarity or identity score.<br />
Sequences which are identical to a similarity search query will have the highest score, i.e. 100%<br />
similarity. However, the key benefit of similarity searching is that it will also deliver answers at<br />
less than 100% similarity which will often be similar enough to be of interest to the searcher. This<br />
includes answers with insertions, deletions and non-matching regions. It is often impossible to<br />
achieve this result via the SCM approach.<br />
There are two basic options for similarity searching in DGENE.<br />
� RUN BLAST – software using the NCBI BLAST algorithm<br />
� RUN GETSIM – software based upon the FASTA algorithm<br />
NCBI BLAST<br />
The RUN BLAST function makes use of the industry-standard BLAST methodology, and is used<br />
in DGENE with the permission of the National Center for Biotechnology Information (NCBI).<br />
The Basic Local Alignment Search Tool (BLAST) was described by Altschul et al. 2 in 1997.<br />
BLAST search modes<br />
� Protein similarity (BLASTP) (/SQP) [default]<br />
� Nucleic acid similarity (BLASTN) (/SQN)<br />
� Translated protein similarity 3 (TBLASTN) (/TSQN)<br />
RUN BLAST offers both offline BATCH (page 79) and ALERT (page 83) options.<br />
1 See for example: http://www.ebi.ac.uk/Tools/similarity.html<br />
2 Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and<br />
David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs."<br />
Nucleic Acids Res. 25:3389-3402. See: http://www.ncbi.nlm.nih.gov/pubmed/9254694.<br />
3 A protein query sequence searched against a nucleotide database translated in all three reading frames.<br />
Page 12 | GENESEQ on STN (DGENE) Workshop Manual