note - FIZ Karlsruhe
note - FIZ Karlsruhe
note - FIZ Karlsruhe
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Introduction to similarity searching<br />
GETSIM alignments for nucleic acid sequences<br />
Similarity in GETSIM alignments is represented by line of dots or blanks placed between two<br />
lines representing the query sequence (upper line) and the hit subject sequence (lower line). Two<br />
dots mark a full match between two nucleic acid sequence letters and blanks show non-matching<br />
letters. One dot represents a “family” similarity match between Uracil (U) and Thymine (T). Gaps<br />
represented by an underscore, may be introduced into the query or the subject sequence for a<br />
better alignment of both sequences. Inclusion of gaps in this way reduces the similarity score.<br />
Example<br />
ALIGN Smith-Waterman score: 57<br />
33 na overlap starting at 546<br />
aggagugguaggucuuacgaugccagcuguaau ← Query<br />
:: : . .: . .: : ::::::.: ::.<br />
Agtattcatatttactaacaagccagctggaat ← Answer<br />
GETSIM alignments for amino acid sequences<br />
Similarity in GETSIM alignments of amino acid sequences is represented by line of dots or blanks<br />
placed between two lines representing the query sequence (upper line) and the hit subject sequence<br />
(lower line). Two dots mark a full match between two amino acid letters and one dot represents<br />
an amino acid family match. Blanks show non-matching letters. Gaps represented by an<br />
underscore, may be introduced into the query or the subject sequence for a better alignment of<br />
both sequences. Inclusion of gaps in this way reduces the similarity score.<br />
Example<br />
ALIGN Smith-Waterman score: 80<br />
49 aa overlap starting at 569<br />
vge_gaiplsigyatllhmdqgvalgrvlpmvmlggltaiiisgclnql ← Query<br />
::: :: :: . :: : .::: .. .::.... ....: .:<br />
vgeygasplclpyap__pegqpaalgftvalvmmnsfcflvvagayikl ← Answer<br />
Helpful<br />
HINT<br />
GETSIM is slower than BLAST, so using GETSIM in BATCH mode is<br />
recommended. In addition, BATCH query limits are higher. See page 79.<br />
GENESEQ on STN (DGENE) Workshop Manual | Page 19