21.01.2013 Views

note - FIZ Karlsruhe

note - FIZ Karlsruhe

note - FIZ Karlsruhe

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The uploaded sequence text file becomes a query L-number<br />

UPLOAD SUCCESSFULLY COMPLETED<br />

L1 GENERATED<br />

You may check if the sequence was uploaded correctly with D LQUE<br />

=> D LQUE<br />

Introduction to similarity searching<br />

L1 ANSWER 1 DGENE COPYRIGHT 2012 THOMSON REUTERS on STN<br />

LQUE tccagtgtgtccgctactccgctccccctcagtcctcagttcctcacctagcggtnnnnggcncgcggagacg<br />

tagatggcggcttcggaggcggccaggcggcgcaacaccggtgacgagaggggacggtgatcgcgatccacag<br />

cctggaggagtggagcatccagatcgaggaggccaacagcgccaagaagctggtggtgattgacttcactgca<br />

acatggtgtcctccntnccgcnccatggctccaatttttnctgatatggccaagaagtccc<br />

Running GETSIM and BLAST<br />

Once the query sequence is uploaded the RUN GETSIM or RUN BLAST similarity search may be<br />

started. To initiate the desired search the following field codes may be specified:<br />

/SQP for searching peptide sequences [default]<br />

/SQN for nucleotide sequences (Options: /SQN SIN, /SQN COM or /SQN BOTH)<br />

/TSQN for searching peptide sequences against a nucleotide sequence database<br />

For SQP and TSQN a peptide sequence is expected, for SQN a nucleotide sequence. If a mismatch<br />

appears to be likely between the content of the query sequence and the field code, a warning is<br />

issued. A GETSIM search is automatically adjusted to yield the most appropriate candidate<br />

answers for the search type, query length and current size of the database. In contrast, the various<br />

user-defined BLAST options provided by NCBI should be appended to the RUN BLAST<br />

command as required. The syntax for including BLAST options is shown by example below.<br />

Example<br />

=> RUN BLAST L1 /SQP –F F –W 2 –E 10000 –M PAM30<br />

The BLAST options which are most commonly adjusted by sequence searchers are as follows.<br />

Low Complexity Filtering (on by default) (-F)<br />

The low complexity filter can eliminate biologically uninteresting segments that have low<br />

compositional complexity and are statistically significant, as determined by specific programs for<br />

peptide or nucleotide sequences in nature. Filtering is applied to the query sequence and is<br />

indicated by a series of Xs for peptide sequences and Ns for nucleotide sequences. Low<br />

complexity filtering can be turned off (i.e. set to F for False).<br />

Word Size (-W)<br />

Word Size is the length of the character string fragments of a sequence query that are used as the<br />

basis for a BLAST search. For SQN the default is 11 and the range 7-23. For all other BLAST<br />

searches the default is 3 and the range 2-3. For short search queries, reducing the default word<br />

size can give improved search results.<br />

GENESEQ on STN (DGENE) Workshop Manual | Page 15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!