21.01.2013 Views

note - FIZ Karlsruhe

note - FIZ Karlsruhe

note - FIZ Karlsruhe

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Introduction to similarity searching<br />

GETSIM alignments for nucleic acid sequences<br />

Similarity in GETSIM alignments is represented by line of dots or blanks placed between two<br />

lines representing the query sequence (upper line) and the hit subject sequence (lower line). Two<br />

dots mark a full match between two nucleic acid sequence letters and blanks show non-matching<br />

letters. One dot represents a “family” similarity match between Uracil (U) and Thymine (T). Gaps<br />

represented by an underscore, may be introduced into the query or the subject sequence for a<br />

better alignment of both sequences. Inclusion of gaps in this way reduces the similarity score.<br />

Example<br />

ALIGN Smith-Waterman score: 57<br />

33 na overlap starting at 546<br />

aggagugguaggucuuacgaugccagcuguaau ← Query<br />

:: : . .: . .: : ::::::.: ::.<br />

Agtattcatatttactaacaagccagctggaat ← Answer<br />

GETSIM alignments for amino acid sequences<br />

Similarity in GETSIM alignments of amino acid sequences is represented by line of dots or blanks<br />

placed between two lines representing the query sequence (upper line) and the hit subject sequence<br />

(lower line). Two dots mark a full match between two amino acid letters and one dot represents<br />

an amino acid family match. Blanks show non-matching letters. Gaps represented by an<br />

underscore, may be introduced into the query or the subject sequence for a better alignment of<br />

both sequences. Inclusion of gaps in this way reduces the similarity score.<br />

Example<br />

ALIGN Smith-Waterman score: 80<br />

49 aa overlap starting at 569<br />

vge_gaiplsigyatllhmdqgvalgrvlpmvmlggltaiiisgclnql ← Query<br />

::: :: :: . :: : .::: .. .::.... ....: .:<br />

vgeygasplclpyap__pegqpaalgftvalvmmnsfcflvvagayikl ← Answer<br />

Helpful<br />

HINT<br />

GETSIM is slower than BLAST, so using GETSIM in BATCH mode is<br />

recommended. In addition, BATCH query limits are higher. See page 79.<br />

GENESEQ on STN (DGENE) Workshop Manual | Page 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!