Protein Engineering Protocols - Mycobacteriology research center

Protein Engineering Protocols - Mycobacteriology research center Protein Engineering Protocols - Mycobacteriology research center

12.07.2015 Views

1Combinatorial Protein Design StrategiesUsing Computational MethodsHidetoshi Kono, Wei Wang, and Jeffery G. SavenSummaryComputational methods continue to facilitate efforts in protein design. Most of this work hasfocused on searching sequence space to identify one or a few sequences compatible with a givenstructure and functionality. Probabilistic computational methods provide information regardingthe range of amino acid variability permitted by desired functional and structural constraints.Such methods may be used to guide the construction of both individual sequences and combinatoriallibraries of proteins.Key Words: De novo protein design; combinatorial libraries; computational protein design;biased codons.1. Introduction1.1. Protein DesignThrough attempts to design protein structures, including those having particularfunctions, researchers can refine the understanding we have of the forcesand effects that specify the properties of the folded state. In addition, controlover the design of particular folded state structures will likely lead to new syntheticproteins having the efficiency and specificity of biological proteins. Suchapplications include therapeutics, sensors, catalysts, and materials. The successfuldesign of proteins is possible even without a complete quantitativeunderstanding of all the forces involved in specifying their structures.Designing proteins is nontrivial, however, because of both their complexity andthe subtlety of the interactions that specify the folded state. Proteins are large (tensto hundreds of amino acid residues), and many structural variables specify thefolded state, including sequence, backbone topology, and side-chain conformations.Each residue may have multiple conformations, even if the backboneFrom: Methods in Molecular Biology, vol. 352: Protein Engineering ProtocolsEdited by: K. M. Arndt and K. M. Müller © Humana Press Inc., Totowa, NJ3

4 Kono et al.structure is specified. In addition to this structural complexity, there is alsosequence complexity. Design involves identifying folding sequences from theenormous ensemble of possible sequences. This search is guided by the largedegree of “consistency” observed in folded proteins (1). On average, a foldedprotein is atomically well-packed with favorable van der Waals interactions,hydrophobic residues are sequestered from solvents, and most hydrogen-bondinginteractions are satisfied. However, this consistency is often complex and may havelittle simplifying symmetry. In addition, such noncovalent interactions are some ofthe most difficult to accurately quantify, and estimating free energies associatedwith mutation or structural ordering remains a subtle area of computationalresearch (2,3). Despite their predictive power, presently we cannot expect to determinethe relative stability changes of large numbers of sequences using detailedsimulation methods for estimating free energy differences. Nonetheless, molecularpotentials derived from small molecules and from the protein structure databasedo contain partial information regarding the interactions and forces known to beimportant for specifying and stabilizing protein structures. In some cases, the optimizationof such potentials has lead to striking successes in protein design (4).Such potentials are necessarily approximate, and any sequence so designed islikely sensitive to the particular potential and target structure used. Alternatively,the partial information contained in these potentials may be used in a probabilisticmanner, to yield the likelihoods of the amino acids. A probabilistic approach is alsoappropriate for characterizing the full variability of sequences that may fold to acommon structure, because there are likely to be an enormous number of suchsequences—far more than can be addressed via sequence search or enumeration.Such probabilistic approaches are also particularly appropriate for de novoprotein design in the context of combinatorial protein experiments, which createand rapidly assay many sequences. Although combinatorial methods canaddress very large numbers of sequences (10 4 –10 12 ), these numbers are stillinfinitesimal compared with the numbers of possible protein sequences, e.g.,20 100 ≈ 10 130 for a 100-residue protein. Thus, even with combinatorial methods,we still must focus on selected regions of sequence space. This is most oftenaccomplished by preselecting a few residue sites within the protein by inspectionand allowing full or partial variability at these sites. Recently, computationalmethods have been developed that can keep track of a much wider rangeof sequence variability and provide quantitative methods for winnowing andfocusing the sequence space. Herein, we discuss computational methods forsequence design with an emphasis on probabilistic methods that address thesite-specific amino acid variability for a given structure.1.2. Directed Methods of Protein DesignHere, “directed protein design” refers to the identification of a sequence (ora set of sequences) likely to fold to a predetermined backbone structure. Each

4 Kono et al.structure is specified. In addition to this structural complexity, there is alsosequence complexity. Design involves identifying folding sequences from theenormous ensemble of possible sequences. This search is guided by the largedegree of “consistency” observed in folded proteins (1). On average, a foldedprotein is atomically well-packed with favorable van der Waals interactions,hydrophobic residues are sequestered from solvents, and most hydrogen-bondinginteractions are satisfied. However, this consistency is often complex and may havelittle simplifying symmetry. In addition, such noncovalent interactions are some ofthe most difficult to accurately quantify, and estimating free energies associatedwith mutation or structural ordering remains a subtle area of computational<strong>research</strong> (2,3). Despite their predictive power, presently we cannot expect to determinethe relative stability changes of large numbers of sequences using detailedsimulation methods for estimating free energy differences. Nonetheless, molecularpotentials derived from small molecules and from the protein structure databasedo contain partial information regarding the interactions and forces known to beimportant for specifying and stabilizing protein structures. In some cases, the optimizationof such potentials has lead to striking successes in protein design (4).Such potentials are necessarily approximate, and any sequence so designed islikely sensitive to the particular potential and target structure used. Alternatively,the partial information contained in these potentials may be used in a probabilisticmanner, to yield the likelihoods of the amino acids. A probabilistic approach is alsoappropriate for characterizing the full variability of sequences that may fold to acommon structure, because there are likely to be an enormous number of suchsequences—far more than can be addressed via sequence search or enumeration.Such probabilistic approaches are also particularly appropriate for de novoprotein design in the context of combinatorial protein experiments, which createand rapidly assay many sequences. Although combinatorial methods canaddress very large numbers of sequences (10 4 –10 12 ), these numbers are stillinfinitesimal compared with the numbers of possible protein sequences, e.g.,20 100 ≈ 10 130 for a 100-residue protein. Thus, even with combinatorial methods,we still must focus on selected regions of sequence space. This is most oftenaccomplished by preselecting a few residue sites within the protein by inspectionand allowing full or partial variability at these sites. Recently, computationalmethods have been developed that can keep track of a much wider rangeof sequence variability and provide quantitative methods for winnowing andfocusing the sequence space. Herein, we discuss computational methods forsequence design with an emphasis on probabilistic methods that address thesite-specific amino acid variability for a given structure.1.2. Directed Methods of <strong>Protein</strong> DesignHere, “directed protein design” refers to the identification of a sequence (ora set of sequences) likely to fold to a predetermined backbone structure. Each

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!