Protein Engineering Protocols - Mycobacteriology research center
Protein Engineering Protocols - Mycobacteriology research center Protein Engineering Protocols - Mycobacteriology research center
1Combinatorial Protein Design StrategiesUsing Computational MethodsHidetoshi Kono, Wei Wang, and Jeffery G. SavenSummaryComputational methods continue to facilitate efforts in protein design. Most of this work hasfocused on searching sequence space to identify one or a few sequences compatible with a givenstructure and functionality. Probabilistic computational methods provide information regardingthe range of amino acid variability permitted by desired functional and structural constraints.Such methods may be used to guide the construction of both individual sequences and combinatoriallibraries of proteins.Key Words: De novo protein design; combinatorial libraries; computational protein design;biased codons.1. Introduction1.1. Protein DesignThrough attempts to design protein structures, including those having particularfunctions, researchers can refine the understanding we have of the forcesand effects that specify the properties of the folded state. In addition, controlover the design of particular folded state structures will likely lead to new syntheticproteins having the efficiency and specificity of biological proteins. Suchapplications include therapeutics, sensors, catalysts, and materials. The successfuldesign of proteins is possible even without a complete quantitativeunderstanding of all the forces involved in specifying their structures.Designing proteins is nontrivial, however, because of both their complexity andthe subtlety of the interactions that specify the folded state. Proteins are large (tensto hundreds of amino acid residues), and many structural variables specify thefolded state, including sequence, backbone topology, and side-chain conformations.Each residue may have multiple conformations, even if the backboneFrom: Methods in Molecular Biology, vol. 352: Protein Engineering ProtocolsEdited by: K. M. Arndt and K. M. Müller © Humana Press Inc., Totowa, NJ3
4 Kono et al.structure is specified. In addition to this structural complexity, there is alsosequence complexity. Design involves identifying folding sequences from theenormous ensemble of possible sequences. This search is guided by the largedegree of “consistency” observed in folded proteins (1). On average, a foldedprotein is atomically well-packed with favorable van der Waals interactions,hydrophobic residues are sequestered from solvents, and most hydrogen-bondinginteractions are satisfied. However, this consistency is often complex and may havelittle simplifying symmetry. In addition, such noncovalent interactions are some ofthe most difficult to accurately quantify, and estimating free energies associatedwith mutation or structural ordering remains a subtle area of computationalresearch (2,3). Despite their predictive power, presently we cannot expect to determinethe relative stability changes of large numbers of sequences using detailedsimulation methods for estimating free energy differences. Nonetheless, molecularpotentials derived from small molecules and from the protein structure databasedo contain partial information regarding the interactions and forces known to beimportant for specifying and stabilizing protein structures. In some cases, the optimizationof such potentials has lead to striking successes in protein design (4).Such potentials are necessarily approximate, and any sequence so designed islikely sensitive to the particular potential and target structure used. Alternatively,the partial information contained in these potentials may be used in a probabilisticmanner, to yield the likelihoods of the amino acids. A probabilistic approach is alsoappropriate for characterizing the full variability of sequences that may fold to acommon structure, because there are likely to be an enormous number of suchsequences—far more than can be addressed via sequence search or enumeration.Such probabilistic approaches are also particularly appropriate for de novoprotein design in the context of combinatorial protein experiments, which createand rapidly assay many sequences. Although combinatorial methods canaddress very large numbers of sequences (10 4 –10 12 ), these numbers are stillinfinitesimal compared with the numbers of possible protein sequences, e.g.,20 100 ≈ 10 130 for a 100-residue protein. Thus, even with combinatorial methods,we still must focus on selected regions of sequence space. This is most oftenaccomplished by preselecting a few residue sites within the protein by inspectionand allowing full or partial variability at these sites. Recently, computationalmethods have been developed that can keep track of a much wider rangeof sequence variability and provide quantitative methods for winnowing andfocusing the sequence space. Herein, we discuss computational methods forsequence design with an emphasis on probabilistic methods that address thesite-specific amino acid variability for a given structure.1.2. Directed Methods of Protein DesignHere, “directed protein design” refers to the identification of a sequence (ora set of sequences) likely to fold to a predetermined backbone structure. Each
- Page 1 and 2: METHODS IN MOLECULAR BIOLOGY 352Pr
- Page 4 and 5: M E T H O D S I N M O L E C U L A R
- Page 6 and 7: PrefaceProtein engineering is a fas
- Page 8 and 9: ContentsPreface ...................
- Page 10 and 11: ContributorsKATJA M. ARNDT • Inst
- Page 12: Contributors xiKAZUNARI TAIRA • D
- Page 18 and 19: Combinatorial Protein Design Strate
- Page 20 and 21: Combinatorial Protein Design Strate
- Page 22 and 23: Combinatorial Protein Design Strate
- Page 24 and 25: Combinatorial Protein Design Strate
- Page 26 and 27: Combinatorial Protein Design Strate
- Page 28 and 29: Combinatorial Protein Design Strate
- Page 30 and 31: Combinatorial Protein Design Strate
- Page 32 and 33: Combinatorial Protein Design Strate
- Page 34 and 35: Combinatorial Protein Design Strate
- Page 36 and 37: 2Global Incorporation of Unnatural
- Page 38 and 39: Incorporation of Unnatural Amino Ac
- Page 40 and 41: Incorporation of Unnatural Amino Ac
- Page 42 and 43: Incorporation of Unnatural Amino Ac
- Page 44 and 45: Incorporation of Unnatural Amino Ac
- Page 46 and 47: Incorporation of Unnatural Amino Ac
- Page 48 and 49: 3Considerations in the Design and O
- Page 50 and 51: Design of Coiled Coil Structures 37
- Page 52 and 53: Design of Coiled Coil Structures 39
- Page 54 and 55: Design of Coiled Coil Structures 41
- Page 56 and 57: Design of Coiled Coil Structures 43
- Page 58 and 59: Design of Coiled Coil Structures 45
- Page 60 and 61: Design of Coiled Coil Structures 47
- Page 62 and 63: Design of Coiled Coil Structures 49
- Page 64 and 65: Design of Coiled Coil Structures 51
4 Kono et al.structure is specified. In addition to this structural complexity, there is alsosequence complexity. Design involves identifying folding sequences from theenormous ensemble of possible sequences. This search is guided by the largedegree of “consistency” observed in folded proteins (1). On average, a foldedprotein is atomically well-packed with favorable van der Waals interactions,hydrophobic residues are sequestered from solvents, and most hydrogen-bondinginteractions are satisfied. However, this consistency is often complex and may havelittle simplifying symmetry. In addition, such noncovalent interactions are some ofthe most difficult to accurately quantify, and estimating free energies associatedwith mutation or structural ordering remains a subtle area of computational<strong>research</strong> (2,3). Despite their predictive power, presently we cannot expect to determinethe relative stability changes of large numbers of sequences using detailedsimulation methods for estimating free energy differences. Nonetheless, molecularpotentials derived from small molecules and from the protein structure databasedo contain partial information regarding the interactions and forces known to beimportant for specifying and stabilizing protein structures. In some cases, the optimizationof such potentials has lead to striking successes in protein design (4).Such potentials are necessarily approximate, and any sequence so designed islikely sensitive to the particular potential and target structure used. Alternatively,the partial information contained in these potentials may be used in a probabilisticmanner, to yield the likelihoods of the amino acids. A probabilistic approach is alsoappropriate for characterizing the full variability of sequences that may fold to acommon structure, because there are likely to be an enormous number of suchsequences—far more than can be addressed via sequence search or enumeration.Such probabilistic approaches are also particularly appropriate for de novoprotein design in the context of combinatorial protein experiments, which createand rapidly assay many sequences. Although combinatorial methods canaddress very large numbers of sequences (10 4 –10 12 ), these numbers are stillinfinitesimal compared with the numbers of possible protein sequences, e.g.,20 100 ≈ 10 130 for a 100-residue protein. Thus, even with combinatorial methods,we still must focus on selected regions of sequence space. This is most oftenaccomplished by preselecting a few residue sites within the protein by inspectionand allowing full or partial variability at these sites. Recently, computationalmethods have been developed that can keep track of a much wider rangeof sequence variability and provide quantitative methods for winnowing andfocusing the sequence space. Herein, we discuss computational methods forsequence design with an emphasis on probabilistic methods that address thesite-specific amino acid variability for a given structure.1.2. Directed Methods of <strong>Protein</strong> DesignHere, “directed protein design” refers to the identification of a sequence (ora set of sequences) likely to fold to a predetermined backbone structure. Each