11.07.2015 Views

Bioinformatics for DNA Sequence Analysis.pdf - Index of

Bioinformatics for DNA Sequence Analysis.pdf - Index of

Bioinformatics for DNA Sequence Analysis.pdf - Index of

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Similarity Searching Using BLAST 7Fig. 1.3. PAM250 and BLOSUM45 substitution matrices.selection, are called accepted point mutations (PAM). ForDayh<strong>of</strong>f’sPAMmatrices,groups<strong>of</strong>proteinswith85%ormoresequence similarity were analyzed and their 1,571 substitutionswere cataloged. Each cell <strong>of</strong> a PAM matrix corresponds to thefrequency in substitutions per 100 residues between two givenamino acids. This frequency is referred to as one PAM unit. Backin the 1970s, when they were created, however, there was alimited number and variety <strong>of</strong> protein sequences available, sotheyarebiasedtowardsmall,globular proteins. It is also importantto note that each PAM matrix corresponds to a specificevolutionary distance and that each is simply an extrapolation<strong>of</strong> the original. For example, a PAM250 (Fig. 1.3) matrixisconstructed by multiplying the PAM1 matrix by itself 250times and is viewed as a typical scoring matrix <strong>for</strong> proteins thathave been separated by 250 million years <strong>of</strong> evolution.2.3.2. BLOSUM Matrices To overcome some <strong>of</strong> the drawbacks <strong>of</strong> PAM matrices, Henik<strong>of</strong>fand Henik<strong>of</strong>f developed the BLOSUM matrices in 1992 (14).These matrices were based on the BLOCKS database, which organizesproteins into blocks, where each block, defined by an alignment<strong>of</strong> motifs, corresponds to a family. Whereas the original PAMmatrix was calculated with proteins with at least 85% identity,BLOSUM matrices are each calculated separately using conservedmotifs at or below a specific evolutionary distance. This diversity <strong>of</strong>matrices coupled with being based on larger datasets makes theBLOSUM matrices more robust at detecting similarity at greaterevolutionary distances and more accurate, in many cases, at per<strong>for</strong>minglocal similarity searches (15).2.3.3. Choosing a Matrix When choosing a matrix, it is important to consider the alternatives.Do not simply choose the default setting without some initialconsideration. In general, finding similarity at increasing divergencecorresponds to increasing PAM matrices (PAM1, PAM40,PAM120, etc.) and decreasing BLOSUM matrices (BLOSUM90,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!