21.01.2013 Views

note - FIZ Karlsruhe

note - FIZ Karlsruhe

note - FIZ Karlsruhe

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

RUN BLAST<br />

Advanced similarity searching<br />

The RUN BLAST function makes use of the industry standard BLAST methodology, and is used<br />

in DGENE with the permission of the National Center for Biotechnology Information (NCBI).<br />

The Basic Local Alignment Search Tool (BLAST) was described by Altschul et al. 1 in 1997.<br />

BLAST is a sequence comparison algorithm 2 optimized for speed used to search sequence<br />

databases for optimal local alignments to a query. The initial search is done for a word of length<br />

"W" that scores at least "T" when compared to the query using a substitution matrix. Word hits are<br />

then extended in either direction in an attempt to generate an alignment with a score exceeding the<br />

threshold of "S". The "T" parameter dictates the speed and sensitivity of the search.<br />

BLAST search modes<br />

� Protein similarity (BLASTP) (/SQP) [default]<br />

� Translated protein similarity 3 (TBLASTN) (/TSQN)<br />

� Nucleic acid similarity (BLASTN) (/SQN)<br />

1. Single strand (/SQN SIN)<br />

2. Complementary strand (/SQN COM)<br />

3. Both strands (/SQN BOTH) [default]<br />

RUN BLAST offers both offline BATCH (page 79) and ALERT (page 83) search options. For the<br />

basic steps of a BLAST search, including how to gather, display and review results, see page 14.<br />

BLAST query limits<br />

BLAST accepts sequence queries up to 10,000 characters in length for all search modes. While<br />

the command line is limited to 256 characters, longer queries can be conveniently prepared offline<br />

and uploaded (see page 14). The uploaded query can be displayed and saved online for future<br />

use. BLAST has a maximum limit of 10,000 best scoring answers which can be reported.<br />

<strong>note</strong><br />

RUN BLAST requires substantially less computational resources than RUN<br />

GETSIM, which is based on FASTA methodology. Searches conducted using<br />

RUN BLAST will therefore usually take much less time to run to completion<br />

online than RUN GETSIM. However, BLAST is known to sometimes be less<br />

sensitive than FASTA, and as such may not always retrieve a comprehensive<br />

set of results compared to a GETSIM (FASTA) search (see page 47).<br />

1 Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and<br />

David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs."<br />

Nucleic Acids Res. 25:3389-3402. See: http://www.ncbi.nlm.nih.gov/pubmed/9254694.<br />

2 See: http://www.ncbi.nlm.nih.gov/books/NBK21097/.<br />

3 A protein query sequence searched against a nucleotide database translated in all three reading frames.<br />

GENESEQ on STN (DGENE) Workshop Manual | Page 47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!