12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 Fold Recognition 37dependent on the quality of the initial alignment, which again was the problem inthe first place. To avoid some of this dependence on the initial alignment theprocess is iterated several times, realigning the sequence <strong>with</strong> a contribution fromthe threading score on the previous iteration.A quite complex approach was developed by (Jones et al. 1992) called ‘doubledynamic programming’ used in the program THREADER. This approach provedquite successful in the early days of CASP. A full description of this technique isbeyond the scope of this chapter. However in summary, the idea is <strong>to</strong> align a singleposition in the query sequence <strong>with</strong> a single position in the template structure. Thenone uses a conventional alignment algorithm <strong>to</strong> align the remainder of the sequence<strong>to</strong> optimise the potential <strong>with</strong> respect <strong>to</strong> this one fixed position. The optimal alignmentfound is then added <strong>to</strong> a scoring matrix. This process is repeated for everypossible (or at least a large reasonable subset) pair of residues in the sequence andstructure, each time accumulating the optimal alignments in the secondary scoringmatrix. Finally, the secondary scoring matrix is used <strong>to</strong> generate a final alignment,attempting <strong>to</strong> pass through as much of the accumulated alignments as possible. Itis this dual-level alignment that accounts for the name double-dynamic programming.It is essentially a method <strong>to</strong> break down the threading problem in<strong>to</strong> a largenumber of simple problems whose solutions are combined <strong>to</strong> produce a singleanswer – a theme repeated in many of the methods described below.The Gibbs Sampling Algorithm was applied by Bryant (1996) <strong>to</strong> the problem ofthreading. The technique begins <strong>with</strong> a random alignment. At each step it randomlychooses a core secondary structure element C, generates all possible alternativealignments for it, calculates each new alignment score S and chooses a new alignment<strong>with</strong> probability proportional <strong>to</strong> exp(−S/kT), where k is the Boltzmann constantand T is a notional ‘temperature’ of the system. At every iteration a differentrandomly chosen core element is the target for alignment. A simulated annealingpro<strong>to</strong>col is used whereby the ‘temperature’ of the system is slowly reduced overtime. Thus, starting <strong>with</strong> an initial high temperature means poorly scoring alignmentsare accepted almost as frequently as good scoring alignments. This is suitableduring the beginning of the simulation as it is highly unlikely <strong>to</strong> have an overallgood alignment by chance. However, as the temperature drops, it becomes progressivelyless likely that poorly scoring alignments are accepted and the system gradually‘settles’ on a globally low energy alignment. Simulated annealing is a widelyused approach <strong>to</strong> many optimisation problems in bioinformatics and elsewhere. Themethod does not guarantee a global optimum alignment, but is very fast and givesgood performance.The divide and conquer threading algorithm (Xu et al. 1998) repeatedlydivides the structure model in<strong>to</strong> sub-models, solves the alignment problem forsub-models, and combines the sub-solutions <strong>to</strong> find a globally optimal alignment.Similarly the branch and bound search algorithm (Lathrop and Smith 1996)repeatedly divides the threading search space in<strong>to</strong> smaller subsets and alwayschooses the most promising subset <strong>to</strong> split next. Eventually the most promisingsubset contains only one alignment, which is a global optimum. Finding the globaloptimum is extremely time-consuming, so a so-called ‘anytime’ version

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!