12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 Fold Recognition 33distribute themselves according <strong>to</strong> a Boltzmann distribution. Second, one can calculatethe potential of mean force responsible for the observed statistics via theBoltzmann equation. The ‘energy’ associated <strong>with</strong> a given property p is:nE(p) = -log ⎡ ⎢⎣⎢n(p) ⎤⎥(p) ⎦⎥where n obs(p) is the observed value of p and n exp(p) is the ‘expected’ value of p in areference state that assumes there are no specific interactions or preferences.Implementing this usually means discretizing distances and producing a look-uptable of force-field values rather than the continuous differentiable functions usedin molecular mechanics (<strong>with</strong> some recent exceptions). <strong>From</strong> a threading perspective,this look-up table permits one <strong>to</strong> assign an ‘energy’ <strong>to</strong> a given threaded structure.Each amino acid in the model will have some degree of exposure/burial.Depending on the amino acid type in question, one can reference the look-up tablefor a value for having, say, a valine that is 30% exposed. One can assign an energy<strong>to</strong> the entire model by simply summing these ‘energies’ across all residues in themodel. (Note that summing can be used due <strong>to</strong> the log term in the equation).A more interesting and powerful energy function can be derived if one considersinteracting pairs of amino acids. One may count the frequency <strong>with</strong> which oneamino acid type is found in close proximity <strong>to</strong> another, e.g. how often do weobserve a leucine residue 4 Å from a valine residue? As above, one gathers suchstatistics for every different pairing of the 20 amino acid types. One then calculatesthe expected frequency based on the number of observations of each of the aminoacid types at any distance separation.In fact the typical pair-potentials in widespread use are usually rather moreelaborate. For those more mathematically inclined readers I present below the fulltreatment of a widely used pair potential. Others may feel free <strong>to</strong> skip thissection.Contacts can be classified in<strong>to</strong> distance bins up <strong>to</strong> some ceiling (say 30 Å).Distance bins can then be further subdivided in<strong>to</strong> sequence separation bins for closerange (say three <strong>to</strong> nine residues apart) and long range (>9 residues apart). In addition,even <strong>with</strong> 50,000 protein structures in the database, data sparsity can be aproblem <strong>with</strong> this many subdivisions and so an observation weighting scheme isintroduced (the M ijkσ term) which essentially only counts an observation if it hasbeen seen 1/σ times. The residue energy E kijfor pair ij separated by k residues indistance bin l is then calculated as:obsexp⎡tjEk=RTln [1 +Mtjks] -RTln ⎢1+M⎣tjksijf (l) ⎤kxx ⎥fk(l) ⎦where M ijkis the number of occurrences for pair ij separated by k residues, σ is theijobservation weight (often set <strong>to</strong> 1/50), fk(l) is the relative frequency of pair ij separatedby k residues in distance class l:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!