Transition Pathways in Complex Systems: Application of the Finite ...

Transition Pathways in Complex Systems: Application of the Finite ... Transition Pathways in Complex Systems: Application of the Finite ...

from math.princeton.edu More from this publisher

10.07.2015 Views

Page 2 and 3: AbstractThe finite-temperature stri
Page 4 and 5: transitions in such systems has bec
Page 6 and 7: II.THEORETICAL BACKGROUNDWe begin w
Page 8 and 9: to do so, it is helpful to consider
Page 10 and 11: III.FINITE-TEMPERATURE STRING METHO
Page 12 and 13: andˆn(α) ‖ ϕ ′ (α). (19)Thi
Page 14 and 15: dynamics on each hyperplane. The pa
Page 16 and 17: obtained two-dimensional potential
Page 18 and 19: θ is approximately constant along
Page 20 and 21: inferred that the reaction coordina
Page 22 and 23: APPENDIXWe derive the formula (12)
Page 24 and 25: LIST OF FIGURESFIG. 1: Transition t
Page 26 and 27: Γ zABFIG. 1:26
Page 28 and 29: FIG. 3:28
Page 30 and 31: FIG. 5:30
Page 32 and 33: 180’S 2S 3C 7eqS 190C 7eqψ0C ax
Page 34 and 35: 43P(p Cax)2100 0.2 0.4 0.6 0.8 1p C
Page 36 and 37: FIG. 11:36
Page 38 and 39: 1.5P(pαR)1.00.5000.20.4pαR0.60.81

AbstractThe finite-temperature string method proposed by E, Ren and Vanden-Eijnden is a very effectiveway of identifying transition mechanisms and transition rates between metastable states insystems with complex energy landscapes. In this paper, we discuss the theoretical backgroundand algorithmic details of the finite-temperature string method, as well as application to the studyof isomerization reaction of the alanine dipeptide, both in vacuum and in explicit solvent.Wedemonstrate that the method allows us to identify directly the iso-committor surfaces, which areapproximated by hyperplanes, in the region of configuration space where the most probable transitiontrajectories are concentrated. These results are verified subsequently by computing directlythe committor distribution on the hyperplanes that define the transition state region.PACS numbers: 2.50.-r, 02.60.-x, 82.20.Pm2

I. INTRODUCTIONThis paper is a continuation of the works presented in Refs. 1 and 2 on the study oftransition pathways and transition rates between metastable states in complex systems. Inthe present paper, we will focus on the finite-temperature string method (FTS), presentedin Ref. 2. We will discuss the theoretical background behind FTS, the algorithmic detailsand implementation issues, and we will demonstrate the success and difficulties of FTS forthe example of conformation changes of an alanine dipeptide molecule, both in vacuum andwith explicit solvent.The study of transition between metastable states has been a topic of great interestin many areas for many years. For systems with simple energy landscapes in which themetastable states are separated by a few isolated barriers, there exists a solid theoreticalfoundation as well as satisfactory computational tools for identifying the transition pathwaysand transition rates. The key objects in this case are the transition states, which are saddlepoints on the potential energy landscape that separate the metastable states. The relevantnotion for the transition pathways is that of minimum energy paths (MEP). MEPs arepaths in configuration space that connects the metastable states along which the potentialforce is parallel to the tangent vector. MEP allows us to identify the relevant saddle pointswhich act as bottlenecks for a particular transition. Several computational methods havebeen developed for finding the MEPs. Most successful among these methods are the nudgedelastic band (NEB) method 3 and the zero-temperature string method (ZTS). 1 Once the MEPis obtained, transition rates can be computed using several strategies (see for example Refs.1 and 4). Alternatively, one may look directly for the saddle point, and several methodshave been developed for this purpose, including the conjugate peak refinement method, 18the ridge method 19 and the dimer method. 20The situation is quite different for systems with rough energy landscapes, as is the case fortypical chemical reactions of solvated systems. In this case, traditional notions of transitionstates have to be reconsidered since there may not exist specific microscopic configurationsthat identify the bottleneck of the transition. Instead the potential energy landscape typicallycontains numerous saddle points, most of which are separated by barriers that are lessthan or comparable to k B T , and therefore do not act as barriers. There is not a unique mostprobable path for the transition. Instead, a collection of paths are important. 21 Describing3

transitions in such systems has become a central issue in recent years.The standard practice for identifying transition pathways and transition rates in complexsystems is to coarse-grain the system using a pre-determined “reaction coordinate,” usuallyselected on an empirical basis. The free energy landscape associated with this reaction coordinateis then calculated, using for example the blue-sampling technique, 5 and the transitionstates are identified as the saddle points in the free energy landscape.Such a procedure based on intuitive notions of reaction coordinates has been both helpfuland misleading. For simple enough systems, for which a lot is known about the mechanismsof the transition, this procedure allows one to make use of and further refine the existingknowledge. But as has been pointed out by Chandler and collaborators, 6,7 if little is knownabout the mechanism of the transition, a poor choice of the reaction coordinate can lead towrong predictions for the transition mechanism. Furthermore, intuitively reasonable coarsegrainedvariables may not be good reaction coordinates. 7In view of this, it is helpful to make the notion of reaction coordinates more precise inorder to place our discussions on a firm basis. There is indeed a distinguished reactioncoordinate, defined with the help of the backward Kolmogorov equation, which specifies, ateach point of the configuration space or phase space, the probability that the reaction willsucceed if the system is initiated at that particular configuration or phase space location(see the discussion in section II). All other “reaction coordinates” are approximate, andthe quality of the approximation can in principle be quantified. This particular notion ofreaction coordinate is crucial for our discussion, since it identifies the so-called iso-committorsurfaces, which is a central object in our study.There are several ways of characterizing this distinguished reaction coordinate that identifiesthe iso-committor surfaces. Besides being the solution of the backward Kolmogorovequation, it also satisfies a variational principle, as we discuss below. Nevertheless, it is oftenimpractical to find this reaction coordinate, since it is an object that may depend on verymany variables. The backward Kolmogorov equation is an equation in a huge dimensionalspace. What we are really interested in, after all, are the ensemble of transition trajectories.This point of view has been emphasized by Chandler and co-workers, in their developmentof the transition path sampling technique (TPS). 6,7Our viewpoint is consistent with that of TPS, but there is a crucial difference: TPS viewsthe transition path ensemble in the space of trajectories parameterized by the physical time,4

whereas we view the transition path ensemble in the configuration space with parameterizationchosen for the convenience of computations. As we show below, this latter viewpointoffers a number of advantages, the most important of which is that in configuration space, thetransition path ensemble can be identified by looking at the equilibrium distribution of thesystem restricted on the family of the iso-committor surfaces that separate the metastablesets.We assume that the transition path ensemble is localized in the configuration space, i.e.they form isolated tubes. We call them transition tubes. Our primary objective is to identifythe transition tubes as well as the iso-committor surfaces within the tubes. By focusingon the transition tubes, we reduce a problem in large dimensional space, the backwardKolmogorov equation, to a large coupled system in one dimensional space, along the tube.With this understanding, we can easily appreciate the basic principles behind the finitetemperaturestring method. Consistent with the localization assumption, we approximatethe iso-committor surfaces within the tubes by hyperplanes. We then adopt the variationalcharacterization of the iso-committor surfaces (see (10)), but we restrict the trial functions toa family of hyperplanes parametrized by a curve that passes through the center of mass (withrespect to the equilibrium distribution) on each plane. We call this curve the string. Thefinite-temperature string method is a way of finding the family of minimizing hyperplanes.It is worth noting that in the case when the energy landscape is simple and smooth,the transition tube reduces to MEP in the zero-temperature limit. At the same time, thefinite-temperature string method reduces to the zero-temperature string method. 1This paper is organized as follows. In section II, we recall some theoretical backgroundfor describing transition in complex systems. In section III we discuss the finite-temperaturestring method, its foundation and implementation. FTS is then used in section IV to studythe conformation changes of the alanine dipeptide, first in vacuum (section IV A) and thenin explicit solvent (section IV B). In each case we identify the transition tube using FTS,analyze the reaction coordinates and transition states. This is possible since FTS gives theiso-committor surfaces near the transition tube.5

II.THEORETICAL BACKGROUNDWe begin with some theoretical background. To simplify the presentation, we will focuson the case when the system of interest obeys over-damped friction dynamics:γẋ = −∇V (x) + √ 2γβ −1 η (1)where V is the potential energy of the system, γ is the friction coefficient, η is a whitenoise forcing, and β = 1/k B T is the inverse temperature. More general forms of Langevindynamics are considered in Ref. 9, where it is shown that the mechanism of transition isunaffected by the value of the friction coefficient at least under the assumption that themechanism of transition can be described within the configuration space alone. The systemdescribed by (1) has a standard equilibrium distributionρ e (x) = Z −1 e −βV (x) (2)in the configuration space Ω, where Z = ∫ Ω e−βV (x) dx is a normalization constant. We willassume that there are two metastable states described by two subsets, A and B, of theconfiguration space Ω respectively, and we will refer to A and B as being the metastablesets. For simplicity, we will assume that there are no other metastable sets, i.e.∫ρ e (x)dx ≈ 1 (3)A∪BWe are interested in characterizing the mechanism of transition between the metastablesets A and B. In the simplest situation when the energy landscape is smooth and themetastable states are separated by a few isolated barriers, this is usually done by identifyingthe transition states, which are saddle points on the potential energy landscape. The mostprobable path for the transition is the so-called minimum energy path which is the pathin the configuration space such that the potential force is parallel to the tangent along thepath. 1,3As has been discussed earlier, 2,6 these concepts are no longer appropriate if the energylandscape is rough with many saddle points, most of which are separated by potential energybarriers that are less than or comparable to k B T , and therefore do not act as barriers forthe transition. In this case, the notion of transition states has to be replaced by a transitionstate ensemble which is a probability distribution of states. 8,96

A. Transition path ensemble in trajectory spaceHow do we characterize such an ensemble and the most probable transition paths? Toaddress this question, Chandler et al proposed studying the ensemble of reactive trajectories,which are trajectories that successfully make the transition, parametrized by the physicaltime.By assigning appropriate probability densities to these trajectories, Monte Carloprocedures can be developed to sample the reactive trajectories. Indeed this is the basicidea behind transition path sampling 6,7 (see also Ref. 10 for an alternative technique tosample reactive trajectories). The efficiency gain of TPS over standard molecular dynamicscomes from the fact that reactive trajectories are much shorter than the time it takes betweensuccessive transitions. Therefore, TPS allows one to sample much more reactive trajectoriesthan in a standard molecular dynamics run. Each reactive trajectory can be post-processedto determine the committor value of the points along this trajectory, i.e. the probabilitythat the reaction will succeed if initiated at that particular point. The collection of pointswith committor value close to 1 2form the transition state ensemble.TPS in principle gives a systematic procedure for studying reactive paths in complexsystems. In practice, however, it has several difficulties. First of all, harvesting enoughreactive paths for statistical analysis might be very demanding.This task is not madeeasier by the fact the paths are parametrized by physical time, and these trajectories can bequite long compared to the time-step used in the simulations. In addition, the ensemble ofreactive trajectories does not give directly the main object of interest, the transition stateensemble, and it requires a very non-trivial post-processing step to analyze the reactive pathsin order to extract the transition state ensemble. As explained before, this post-processingstep involves launching trajectories at each point along each reactive path to determine thecommittor value of that point, and this can be quite tedious indeed. 11B. Transition path ensemble in configuration spaceAn alternative viewpoint was proposed in Ref. 2 in connection with the finite-temperaturestring method. Its theoretical background was explained in Refs. 8 and 9. Instead ofconsidering the ensemble of dynamic trajectories parametrized by the physical time, oneviews the transition paths in the configuration space. To understand why it is advantageous7

to do so, it is helpful to consider first a distinguished reaction coordinate q(x) defined bythe solutions of the backward Kolmogorov equation:−∇V · ∇q + β −1 ∆q = 0, q| x∈A = 0, q| x∈B = 1, (4)The solution to this equation has a very simple probabilistic interpretation: 12 q(x) is theprobability that a trajectory initiated at x reaches the metastable set B before it reaches theset A. In other words, in terms of the reaction from A to B, q(x) gives the probability thatthis reaction will succeed if it has already made it to the point x.The level sets (or iso-surfaces) of q, Γ z= {x : q(x) = z}, z ∈ [0, 1], define the isocommittorsurfaces: For any fixed z, trajectories initiated at points on Γ z have the sameprobability to reach B before reaching A.The iso-committor surfaces allow one to define the transition path ensemble in the configurationspace in a different way, by restricting the equilibrium distribution (2) on thesesurfaces (see figure 1). To see how this is done, consider an arbitrary surface S in the configurationspace and an arbitrary point x on S. The probability density of hitting S at x byany typical trajectory isρ S (x) = Z −1S e−βV (x) (5)where the normalization factor Z S is the following integral over the surface S: Z S =∫S e−βV (x) dσ(x), where dσ(x) is the surface element on S. (Note that ρ S (x) is a densitythat is attached to and lives in S, and that is why Z S is not equal to the normalizationfactor Z in (2)).On the other hand, if we consider only reactive trajectories, then thisprobability density changes to˜ρ S (x) =˜Z−1S q(x)(1 − q(x))e−βV , (6)where ˜Z S = ∫ q(x)(1 − S q(x))e−βV dσ(x). This can be seen as follows (see also Ref. 9). ˜ρ S (x)is equal to the probability density that a trajectory (whether reactive or not) hits S at x,times the probability that the trajectory came from A in the past and the probability that itreaches B before reaching A in the future. Using the strong Markov property and statisticaltime reversibility, this latter quantity is given by q(x)(1 − q(x)), and this gives (6).If S is an iso-committor surface, we see that ˜ρ S = ρ S , i.e., the probability of hitting aniso-committor surface at x by a hopping trajectory is simply the equilibrium distributionrestricted on the surface.8

Let ρ z (x) = ρ S (x) when S = Γ z , the iso-committor surfaces labeled by z ∈ [0, 1]. We cannow define the transition path ensemble as the one-parameter family of probability densities{ρ z (x), z ∈ [0, 1]} on the iso-committor surfaces. The discussion above tells us that ρ z (x)gives the target distribution when reactive trajectories hit the surface Γ z .C. Transition tubes in configuration spaceIn principle the set on which the transition path ensemble concentrates can be a verycomplicated set. In the following, we will assume the distribution ρ z (x) is singly peaked oneach iso-committor surface, and consequently the set becomes a localized tube. The casewhen ρ z is peaked at several isolated locations and hence there are several isolated tubes issimilar. The tube can be characterized by its centerline and width. To be more precise, thecenterline is defined asϕ(z) = 〈x〉 Γz (7)where the average is with respect to ρ z (x) which is the restricted equilibrium distribution onΓ z . The width of the tube can be characterized by the eigenvalues of the covariance matrixC(z) = 〈(x − ϕ(z)) ⊗ (x − ϕ(z))〉 Γz . (8)The tube can be parameterized by other parameters, e.g. the normalized arc-length of thecenterline.Remark. The transition tube can also be characterized by the probability that the reactivetrajectories stay inside the tube. To be more precise, we proceed as follows. Fix a positivenumber p ∈ (0, 1). Let c(z) be the subset of Γ z with smallest volume such that∫∫e −βV dσ(x) = p e −βV dσ(x), (9)c(z)q(x)=zThe union of the c(z) for z ∈ [0, 1] defines the effective transition tube.The above discussion establishes a most remarkable fact, namely if considered in configurationspace, the ensemble of reactive trajectories can be characterized by equilibrium distributionson the iso-committor surfaces. This is the basic idea behind the finite-temperaturestring method. It is easy to see that if the potential is smooth, then in the zero-temperaturelimit, the transition path ensemble or the transition tube reduces to the minimum energypath.9

III.FINITE-TEMPERATURE STRING METHODA. DerivationFor numerical purpose we will label the iso-committor surfaces by α ∈ [0, 1] which is anincreasing function of z, e.g. the normalized arc-length of the centerline of the transitiontube. We make the following two assumptions:1. The transition tube is thin compared to the local radius of curvature of the centerline.2. The iso-committor surfaces can be approximated by hyperplanes within the transitiontube.The precise form of the second assumption is given in (30). By this assumption we avoidthe intersection of the hyperplanes inside the transition tube.Our objective is to find the transition tube as well as the reaction coordinate q(x) whoselevel sets are the iso-committor surfaces. For this purpose it is useful to recall a variationalformulation of q(x): As the solution of (4), q(x) has an equivalent characterization as beingthe minimizer of∫I =Ω\(A∪B)subject to the boundary conditions in (4).e −βV |∇q| 2 dx. (10)In general this variational problem is difficult to solve since it is on a large dimensionalspace. But under the assumptions above, we can make an approximation by restricting thetrial functions to functions whose level sets are hyperplanes.We will represent the oneparameterfamily of hyperplanes, {P α , α ∈ [0, 1]}, by their unit normal ˆn(α) and the meanposition in the plane with respect to the equilibrium distribution restricted to P α , i.e.∫Pϕ(α) = 〈x〉 Pα ≡ αxe −βV dσ(x)∫P αe −βV dσ(x) . (11)It was shown in Ref. 8 and 9 (see also the Appendix) that (10) reduces toI =∫ 10(f ′ (α)) 2 e −βF (α) |ˆn · ϕ ′ | −1 dα (12)under the assumptions described above. Here f(α) = q(ϕ(α)), the prime denotes the derivativewith respect to α, and F (α) is the free energy∫F (α) = −β −1 log e −βV (x) dσ(x). (13)P α10

The minimization condition for this functional is given by:ˆn(α) ‖ ϕ ′ (α) (14)where ϕ ′ (α) is the tangent vector of the string at parameter α, and∫ α0f(α) =eβF (α′) dα ′∫ 10 eβF (α′ )dα . (15)′Equation (14) says that the centerline ϕ(α) must be normal to the family of hyperplanes.A simple application of Laplace’s method on the integrals in (15) implies that the isocommittorsurface at α ∗ where f(α ∗ ) = 1 2roughly coincides with the surface on which Fattains maximum. Therefore the transition state ensemble can be characterized by either ofthe following:(i). F (α) is within k B T of its maximum value;(ii). f(α) ≈ 1 2 .We emphasize that for (i) to be true, the correct free energy has to be used, i.e. one hasto select the right reaction coordinate. We also note that the transition state region can bequite broad, if F or f changes slowly at α ∗ .A rigorous derivation of the conditions (14) and (15) can be found in Ref. 8. Heuristically,the minimum of the factor |ˆn · ϕ ′ | −1 in (12) is achieved under the condition (14) (we assumethe length of ϕ ′ (α) is a constant). Then the functional (12) reduces toI =∫ 10e −βF (α) (f ′ (α)) 2 dα (16)up to a constant. The minimizer of (16) is given by (15), which is also the solution to theone-dimensional version of the backward Kolmogorov equation:with boundary conditions f(0) = 0 and f(1) = 1.−F ′ (α)f ′ (α) + β −1 f ′′ (α) = 0 (17)B. ImplementationThe finite-temperature string method is an iterative method for solvingϕ(α) = 〈x〉 Pα , (18)11

andˆn(α) ‖ ϕ ′ (α). (19)This is done by seamlessly combining sampling on the hyperplanes with the dynamics of thehyperplanes. In (18) and (19), one of the equations can be viewed as a kinematic constraintand the other as the equilibrium condition. In the discussions above, we parametrizedthe planes according to (18), then (19) came out as the equilibrium condition. In theimplementation of FTS, particularly for systems with rough energy landscapes, we find itmore convenient to parametrize the planes according to (19) and set up an iterative procedureto solve (18) in order to find the transition tube. The object that we iterate upon is theso-called string, which is normal to the hyperplanes. At steady state, it coincides with thecenterline of the transition tube. At each iteration step n, we denote the mean string andassociated perpendicular hyperplanes by ϕ n (α) and Pαn respectively, where α ∈ [0, 1] is aparameterization of the string. The next iteration is given by:ϕ n+1 (α ′ ) = 〈x〉 P n α(20)where α ′ = g(α) is obtained by reparameterization in order to enforce a particular constrainton the parametrization (say, by normalized arc-length).However, due to the roughness of the energy landscape, using (20) directly may lead toan unstable numerical algorithm. This difficulty is overcome by adding a smoothing stepfor the mean string. The computation of ϕ n+1 therefore follows four steps:a. Calculate the mean position ˜ϕ(α) on Pα n by constrained dynamics;b. Smoothing ˜ϕ(α) gives ¯ϕ(α);c. Reparameterization of ¯ϕ(α) gives ϕ n+1 (α);d. Reinitialization of the sampling on the new hyperplanes.Notice that the smoothing step is used to accelerate convergence and avoid numerical instability.The details of each step are given below. A schematic of the computation procedureis shown in figure 2.12

a. Sampling the energy landscape around ϕ n . The computation of 〈x〉 P n αis done asfollows. On each hyperplane P n αa Langevin dynamicsẋ = −(∇V ) ⊥ (x) + √ 2β −1 η ⊥ (21)is carried out, where (·) ⊥ denotes the projection to the hyperplane Pα n , and η is a white-noiseforcing. The mean position of the system on P n α is calculated from (21):˜ϕ(α) =1T 1 − T 0∫ T1T 0x(t)dt (22)where T 0 is the relaxation time and T 1 is the total simulation time. Constrained moleculardynamics can be used as well to calculate the mean position.For systems with very rough energy landscapes, it is convenient to add a constrainingpotential that localizes the sampling in (21) to region close to the current mean string.This is again used to avoid numerical instability, but has no effect on the final result. Thefollowing potential is used in our calculation⎧0,⎪⎨r < cV c (x) = (r − d) −12 − 2((c − d)(r − d)) −6 , c ≤ r < d(23)⎪⎩∞, r ≥ d.Here r = |x − ϕ n |, ϕ n is the current mean string and c and d are two parameters. Under thisconstraining force, sampling in (21) is confined in a tube with radius d around the currentmean string. Moreover, the system experiences no constraining force in a even smaller tubewith radius c. The constraining force is gradually switched off as the computation converges.This is done by gradually increase the confining parameter c.b. Smoothing the mean string. The calculated mean string ˜ϕ(α) may not be verysmooth and this may make the tangent vector so inaccurate that it introduces numericalinstabilities. Therefore at each iteration, we smooth out ˜ϕ(α) by solving the followingoptimization problemmin¯ϕ(α)∫ 10(C 1 κ(α) + C 2 || ¯ϕ(α) − ˜ϕ(α)|| M ) dα. (24)Here κ(α) is the curvature of ¯ϕ(α), ||x|| M = (x, Mx) 1/2 is the L 2 norm of x weightedby M = (C(α) + I) −1 where C(α) is the covariance matrix of x generated by constrained13

dynamics on each hyperplane. The parameters C 1 , C 2 ∈ [0, 1] determine the relative weightsof the two terms. As two extreme cases, the solution of (24) is a line segment when C 1 = 1,C 2 = 0 and the solution ¯ϕ(α) = ˜ϕ(α) when C 1 = 0, C 2 = 1.The minimization problem (24) can be solved by the steepest descent method, or moreadvanced techniques, e.g. the quasi-Newton (BFGS) method.c. Reparameterization. At the third step of each iteration, the mean string ¯ϕ(α) isreparameterized, for example, by normalized arc-length, to give ϕ n+1 (α). The reparameterizationis done by polynomial interpolation. 1d. Reinitialization. A new collection of hyperplanes P n+1αwhich are orthogonal toϕ n+1 (α) are obtained after the reparameterization step. To start the sampling process onthe new planes, we need to generate the initial configurations for the Langevin dynamics(21). A naive way to generate the initial configurations is to project the realizations onP n αn+1from the previous sampling process onto the new planes Pα. But unfortunately thisprocedure usually result in abrupt changes in the molecular structure and thus very largepotential force. In the case that the artificial constraining potential (23) is used, the directprojection becomes even worse since it is possible that the projected realizations are out ofthe tube defined by the constraining potential and thus have infinite force.In our implementation, this difficulty is overcome by inserting a few intermediate hyperplanes,denoted by { ˜P j , j = 0, 1, · · · , J}, in between P n αn+1and Pα, and each time weproject the realization from ˜P j to ˜P j+1 , followed by a few steps of relaxation on ˜P j+1 . This isanalogous to choosing small time steps for numerical stability when solving ODEs or PDEs.Here ˜P 0 = Pα n, ˜P J = Pαn+1 , and { ˜P j , j = 1, · · · , J − 1} are obtained by linear interpolationbetween P n α and P n+1α, i.e. the unit normals are given by((1 − j J )ˆn0 + j )J ˆnJ , j = 1, 2, · · · J − 1 (25)ˆn j = cwhere ˆn 0 and ˆn J are the unit normals to P n α and P n+1αfactor. The interpolated plane ˜P j goes through φ j where φ j is given byrespectively, and c is the normalizationφ j = (1 − j J )φ0 + j J φJ , j = 1, 2, · · · J − 1 (26)where φ 0 = ϕ n α and φJ = ϕ n+1α . On each interpolated hyperplane ˜P j , the constraining force(23) is centered at φ j .14

For each α, we move the realization x on P n αj = 1, 2, · · · , J, we first project x from ˜P j−1 to ˜P j :to Pn+1α in J steps. At the j-th step,x := x − ((x − φ j ) · ˆn j )ˆn j . (27)Then we relax the projected configuration on ˜P j for a short time T 2 (a few time steps):ẋ(t) = −(∇V ) ⊥,j (x) + √ 2β −1 η ⊥,j , 0 ≤ t ≤ T 2 (28)where (∇V ) ⊥,j = ∇V − (∇V, ˆn j )ˆn j and V (x) includes the potential of the system and theconstraining potential. Similar projection is done for η.In the following application to the alanine dipeptide, the parameter T 2 is taken to be afew time steps for (28), and J = 10.Sampling on the new hyperplanes P n+1 begins after we obtain the new set of realizations,and the above four-step procedure repeats.When the iteration converges with satisfactory accuracy, the mean string gives the centerlineof the tube, the variance of x(α) on the hyperplanes give the width of the transitiontube, and the function f(α) in (15) gives the committor value of each plane. To obtain f(α),the free energy (13) must be computed, which can be done by calculating the mean forceby sampling via (21) on the final planes, and thermodynamic integration.IV.APPLICATION TO THE ALANINE DIPEPTIDEThe ball and stick model of the alanine dipeptide is shown in figure 3. Despite its simplechemical structure, it exhibits some of the important features common to biomolecules, withthe backbone degree of freedoms (dihedral angles φ and ψ), three methyl groups (CH 3 ), aswell as the polar groups (N-H, C=O) which form hydrogen bonds with water in aqueoussolution. The isomerization of alanine dipeptide has been the subject of several theoreticaland computational studies and therefore serves as an excellent test for FTS.Apostolakis et al calculated the transition pathways and barriers for the isomerizationof alanine dipeptide. 13 In their work, a two-dimensional potential of mean force on the φ-ψ plane was first calculated by the adaptive umbrella sampling scheme. The conjugatepeak refinement algorithm and targeted molecular dynamics technique were applied on the15

obtained two-dimensional potential to obtain the transition pathways. As a result, themechanism of the transition is pre-assumed to depend only on the two torsion angles φ andψ. Bolhuis et al applied TPS to study the isomerization of alanine dipeptide in vacuum andin solution. 14 Their calculation used all-atom representation for the molecule and explicitsolvent model. An ensemble of transition pathways was collected, from which the transitionstate ensemble was found by determining the committor for each configurations visited by thetrajectories. Their analysis shows that more degrees of freedom than the two torsion angleare necessary to describe the reaction coordinates. Recently, Ma and Dinner introduced astatistical method for identifying reaction coordinates from a database of candidate physicalvariables, and applied their method to study the isomerization of alanine dipeptide. 11 In thefollowing, we apply FTS to study the transition of the alanine dipeptide both in vacuumand in explicit solvent.A. Alanine dipeptide in vacuumWe use the full atomic representation of the alanine dipeptide molecule with theCHARMM22 force field. 15 We have not used the CMAP correction to the dihedral anglepotentials 17 since it is unclear how the CMAP correction influences the dynamics of proteins.Figure 4 shows the adiabatic energy landscape of the molecule as a function of the twobackbone dihedral angles φ and ψ. There are two stable conformers C 7eq and C ax . The stateC 7eq is further split into two sub-states (denoted by C 7eq and C ′ 7eq in figure 4) separated bya small barrier.Normally, one needs to identify the initial and final states before FTS can be used.This case is different since there is some periodicity in the system. The initial string (andthe realizations on each hyperplane) is obtained by rotating the two backbone dihedralangles of the dipeptide along the diagonal line from (−180 ◦ , 180 ◦ ) to (180 ◦ , −180 ◦ ), withall other internal degrees of freedom kept fixed. Only non-hydrogen atoms are representedon the string. The string is discretized using 40 points and correspondingly 40 hyperplanesare evolved in the computation. At each iteration, time averaging is performed for 0.5psfollowing the dynamics described in (21) with T = 272K. A hard-core constraining potentialis added to restrict the sampling to a tube with width 0.2Å (c = 0.2 and d = 1 in (23))around the string. This constraint is gradually removed as the string converges to the steady16

state. The parameters used in smoothing the string are C 1 = C 2 = 1.To fix the overall rotation and translation of the molecule, we constrained one Carbonatom at the origin, one neighboring Carbon atom at the positive x-axis, and one Nitrogenatom in the xy-plane by harmonic springs. The computation converges after about 60-70iterations. It takes about several-hour CPU time on a single processor PC.The converged string and the tube are shown in figure 5. This tube is represented byprojecting onto the φ-ψ plane the ensemble of the realizations on each hyperplane correspondingto the converged string. As we discussed earlier, this tube should coincide withthe tube obtained from the most probable transition paths between the stable conformersC ax and C 7eq . The free energy F along the string is shown in figure 6 (upper panel). The freeenergy has three local maxima S 1 , S 2 and S 3 , corresponding to two transition pathways fromC ax to C 7eq : (a) C ax - S 1 - C ′ 7eq - S 2 - C 7eq ; and (b) C ax - S 3 - C 7eq . The transition at S 1 andS 3 is quite sharp. The energy difference of C ax relative to C 7eq is about E = 2.5 kcal/mol.The energy barrier is about ∆E = 7.0 kcal/mol at S 1 , and about ∆E = 7.5 kcal/mol atS 3 . Path (a) goes through an intermediate metastable state C ′ 7eq around (−150 ◦ , 170 ◦ ). Butthe energy barrier from C ′ 7eq to C 7eq is comparable to k B T and it has little effect on thetransition from C ax to C 7eq .Shown in the lower panel of figure 6 is the committor function f(α) to the state C ax forthe pathway C 7eq - S 3 - C ax calculated using the formula (15). The transition state regionis identified as the hyperplane labeled by α ⋆ that satisfies f(α ⋆ ) ≈ 1 . This plane is close to2the one with maximum free energy between C 7eq and C axFigure 7 shows the projection of the level curves of the equilibrium probability density onthe hyperplanes corresponding to the local maxima of the free energy. We see that at S 1 andS 2 , the projected transition states are localized on lines which are roughly perpendicular tothe projected mean path (dotted line), which suggests that the two backbone dihedral anglesmay parametrize reasonably well the reaction coordinate q(x) for this transition. However,the projected transition states at S 3 spread out in φ-ψ plane and are no longer concentratedon the line perpendicular to the projected mean path. This indicates that in addition toφ and ψ, there are other degree of freedoms that are important for this transition. In Ref.14, the calculation by TPS shows that an additional torsion angle θ plays an importantrole in this transition, and the angles φ and θ provides a good set of coarse variables toparametrize the reaction coordinate q(x). However, our calculation shows that the angle17

θ is approximately constant along the transition tube, and the projected transition stateson φ-θ plane is still quite broad and far from being concentrated on the line perpendicularto the projected mean string (see figure 8). Therefore, we conclude that φ and θ are notsufficient to parametrize the reaction coordinate q(x). This discrepancy may be due to thedifferent force fields used in the two calculations: We used the CHARMM22 force field, andthe authors in Ref. 14 used AMBER.To see that we indeed obtained the right transition tube and transition state ensemble,we computed the committor distribution of the plane P α ⋆for which f(α ⋆ ) ≈ 1 2by initiatingtrajectories from this plane (this plane corresponds to the transition state region S 3 ). Recallthat in our calculation the string is discretized to a collection of points parametrized by{α i }. In general α ⋆ in figure 6 is not in this discrete set. To compensate for this, wecompute the committor distribution for the two neighboring hyperplanes, one on each sideof α ⋆ . We randomly picked 2490 points based on the equilibrium distribution on each ofthe two planes. Starting from each of these points, 5000 trajectories were launched, fromwhich the committor distribution P Caxor the probability that the trajectory goes to C ax iscalculated. Figure 9 shows the distribution of P Cax . The distribution is peaked at P Cax = 0.3and P Cax= 0.7 respectively. This confirms our prediction that the transition state regionmust be in between these two hyperplanes.We next refined the string and added one more plane between the two neighboring planesdiscussed above. The committor distribution of the new hyperplane is shown in figure 10.The peak is located around 1 2indicating that we indeed obtained the right transition stateensemble. The small deviation of the peak from 1 2is due to the approximation to the transitionstate surface by hyperplanes, and also to sampling errors in estimating the committorvalues. In figure 11, we plot four of the trajectories initiated from S 1 . They stay well withinthe transition tube that we obtained by FTS. It is interesting to note that computing thecommittor distribution of P α ⋆to confirm the result of FTS is much more expensive thanobtaining P α ⋆by FTS.B. Alanine dipeptide in explicit solventThe system consists of a full atom representation of an alanine dipeptide molecule modeledwith the CHARMM22 force field, and 198 water molecules in a periodic cubic box.18

The temperature of the solvated system is at 298.15 K. The side length of the cubic box is18.4Å. Ewald summation was used to treat the long-range Coulomb force.Only the backbone atoms on the dipeptide are represented on the string. Other atomson the dipeptide as well as the water molecules are not included in the string, but theycontribute to the force field as well as the width of the transition tube. For the initial string,we rotated ψ from −180 ◦ to 180 ◦ while keeping φ fixed at 50 ◦ . The string is discretizedusing 30 points in the configuration space, and correspondingly 30 hyperplanes are evolved.To remove the rigid body translation and rotations for the dipeptide, we used a frameof reference which moves with the dipeptide. At each iteration, a constrained Langevindynamics is carried out for 2 ps on each hyperplane in order to calculate the new position ofthe string. A hard core constraining potential is applied to restrict the Langevin dynamicsto a neighborhood of width 0.4Å (c = 0.4 and d = 1 in (23)) of the string. The parametersused in smoothing the string are C 1 = C 2 = 1. The computation converged to a steadystate at 45 iterations of 40,000 dynamics steps. The mean string was subsequently fixed, theconstraining potential was removed, and the distributions in the hyperplanes were sampled.Figure 12 shows the converged transition tube on the φ-ψ plane. The background of thefigure shows iso-contours of the free energy as a function of the φ-ψ variables. The freeenergy was obtained from 50 adaptive umbrella sampling 22 simulations of 100 ps at 300 Kusing the CHARMM program. 16 The biasing potential of those simulations was describedby a combination of 12 harmonic functions and a constant in each dimension. Compared tothe alanine dipeptide in vacuum, the angle ψ of C 7eq is shifted to a higher value by about90 ◦ . In addition, the helical metastable state has appeared. This state has a significantπ-helix contribution in the CHARMM22 potential without the CMAP correction. 17 Thereare two transition path ensembles from C 7eq to α R : (a) C 7eq - S 1 - α R ; and (b) C 7eq - S 2 -α R .The projected hyperplanes onto the φ − ψ plane concentrate on the lines perpendicularto the projected string. Therefore, the two backbone dihedral angles, φ and ψ, are qualifiedas a set of coarse variables to parametrize the reaction coordinate. Note that the transitiontube is curved and deviates from vertical lines, therefore both φ and ψ are necessary todescribe the transition dynamics, but they may not be sufficient. This is consistent with theconclusion in Ref. 14, where Bolhuis and coauthors calculated the committor distributionfor configurations with ψ = 60 ◦ . A uniform distribution was obtained from which the author19

inferred that the reaction coordinate includes other important degrees of freedom, e.g. thesolvent motion.To see whether the computed transition tube identifies with reasonable accuracy thetransition state region, we computed the committor distribution on the hyperplane wherethe committor estimate was closest to 1/2. This is the plane centered around (−96 ◦ , 51.5 ◦ )in the φ-ψ map. We initiated 100 trajectories from each of 896 points sampled according tothe equilibrium distribution restricted to the hyperplane. We used finite-friction Langevindynamics with a friction coefficient of 20 ps −1 as implemented in the CHARMM program. 16The trajectory was stopped once the ψ angle was larger than 100 ◦ or smaller than −20 ◦ andthe trajectory was counted as committed to the extended or the helical basin respectively.Due to the very long commitment times required by these trajectories, we limited ourselvesto a smaller ensemble than in the vacuum study. The calculated committor distributionis shown in figure 13. The distribution is peaked around 1 , but it is less peaked than the2one obtained in vacuum. The average value of the committor distribution is 0.50 and thestandard deviation is 0.21. One component of the distribution width is due to the limitedsampling of the ensemble by only 100 trajectories per initial condition. The limited samplingby only 100 trajectories per initial condition contributes a standard deviation of 0.05 to theresults of sampling a sharp 0.5 committor iso-surface. The remaining part of the error isdue to the approximation of the hyper-surfaces by hyperplanes and due to the assumptionthat the isocommittors can be described on the peptide backbone degrees of freedom.Figure 14 shows four typical reactive trajectories. These trajectories have equal probabilityto fall into C 7eq or α R , and they follow the transition tube quite well. Comparedto the behavior in vacuum, the dynamics of the solvated system is quite diffusive due tothe collisions with water molecules and the lack of a high energetic barrier in the transitionregion.V. CONCLUSIONSIn this paper FTS was successfully applied to study the transition events of the alaninedipeptide in vacuum and in explicit solvent. The results of FTS were justified by launchingtrajectories from the transition state region and the calculation of the committor distributionsto confirm that FTS allows one indeed to find iso-committor surfaces, in particular the20

one with committor value 1 corresponding to the transition state region.2It is illuminating to make a comparison between the philosophies behind TPS and FTS.TPS and FTS are after the same objects, namely the ensemble of transition paths and transitionstates. Their key difference is in the way that the transition paths are parameterized.TPS considers transition paths in the space of physical trajectories, parametrized by thereal time. FTS considers transition paths in configuration space, parametrized, for example,by the arc-length of the centerline of the transition tube. This different parametrizationgives FTS considerable advantage. First, the ensemble of transition paths can be characterizedby equilibrium densities, if we consider where they hit the iso-committor surfaces.Secondly, this allows us to give a direct link between the transition path ensemble and thetransition state ensemble. In addition, it also allows us to develop sampling procedures bywhich a collection of paths are harvested at the same time, instead of one by one as in TPS.In other words FTS allows one to bypass completely the calculation of reactive trajectoriesparametrized by time and directly obtain the iso-committor surfaces and the transitiontubes which are more relevant to describe the mechanism of the transition.To conclude, FTS is a powerful tool for determining effective transition pathways incomplex systems with multiscale energy landscapes. It does not require specifying a reactioncoordinate beforehand. It allows us to determine the hyperplanes which are approximationsto the iso-committor surfaces in configuration space by evolving a smooth curve. The smoothcurve converges to the center of a tube by which transitions occur with high probability.AcknowledgmentsThe authors thank M. Karplus for helpful discussions. E and Vanden-Eijnden was partiallysupported by ONR grants N00014-01-1-0674 and N00014-04-1-0565, and NSF grantsDMS01-01439, DMS02-09959 and DMS02-39625. PM acknowledges support by the MarieCurie European Fellowship grant number MEIF-CT-2003-501953. The explicit water calculationswere performed in the Crimson Grid cluster at Harvard, the SDSC teragrid and theCIMS computer center.21

APPENDIXWe derive the formula (12) from (10):∫I = |∇q| 2 e −βV dxΩ ′ ∫ 1= |∇q|∫Ω 2 e −βV δ(q(x) − z)dzdx′ 0∫ 1 ∫= dz |∇q| 2 e −βV δ(q(x) − z)dx0 Ω∫ ′ 1 ∫= f ′ (α)dα |∇q|e −βV δ(ˆn · (x − ϕ(α)))dx0Ω∫ ′ 1∫= f ′ (α)|∇q(ϕ)|dα e −βV δ(ˆn · (x − ϕ(α)))dx0Ω ′=∫ 10f ′ (α) 2 (ˆn · ϕ ′ ) −1 e −βF (α) dα.Here we denote Ω\(A ∪ B) by Ω ′ . To go from the third to the fourth line, we made theassumption that the iso-committor surface is locally planar with normal ˆn, and definedf(α) = q(ϕ(α)). To go from the fourth to the fifth line, we used the following assumption〈(ˆn ′ · (x − ϕ)) 2 〉 Pα ≪ (ˆn · ϕ ′ ) 2 (30)where the average is with respect to the equilibrium distribution restricted to P α . In the laststep, we defined the free energy F (α) as in (13) and used f ′ (α) = ∇q(ϕ)·ϕ ′ = |∇q(ϕ)|(ˆn·ϕ ′ ).(29)∗Electronic address: weiqing@math.pinceton.edu† Electronic address: eve2@cims.nyu.edu‡ Electronic address: plm@tammy.harvard.edu§ Electronic address: weinan@math.princeton.edu1 W. E, W. Ren, and E. Vanden-Eijnden, Phys. Rev. B 66, 052301 (2002)2 W. E, W. Ren, and E. Vanden-Eijnden, J. Phys. Chem. 109, 6688 (2005)3 H. Jónsson, G. Mills, and K. W. Jacobsen, Nudged elastic band method for finding minimumenergy paths of transitions. In: Classical and Quantum Dynamics in Condensed Phase Simulations.Edited by: Berne, B. J.; Ciccoti, G.; Coker, D. F. World Scientific, 1998.4 W. E, W. Ren, and E. Vanden-Eijnden, unpublished22

5 E. A. Carter, G. Ciccotti, J. T. Hynes, and R. Kapral, Chem. Phys. Lett. 156, 472 (1989)6 P. G. Bolhuis, D. Chandler, C. Dellago, and P. Geissler, Ann. Rev. Phys. Chem. 59, 291 (2002)7 C. Dellago, P. G. Bolhuis, and P. L. Geissler, Adv. Chem. Phys. 123 (2002)8 W. E and E. Vanden-Eijnden, Metastability, conformation dynamics, and transition pathwaysin complex system, in: Multiscale Modeling and Simulation, S. Attinger and P. Koumoutsakoseds. LNCSE 39, Springer, 2004.9 W. E, W. Ren, and E. Vanden-Eijnden, submitted to Chem. Phys. Lett.10 R. Olender and R. Elber, J. Chem. Phys. 105, 9299 (1996)11 A. Ma and A. Dinner, J. Phys. Chem. 109, 6769 (2005)12 C. W. Gardiner, Handbook of stochastic methods for physics, chemistry, and the natural sciences.Springer, 1997.13 J. Apostolakis, P. Ferrara, and A. Caflisch, J. Chem. Phys. 10, 2099 (1999)14 P. G. Bolhuis, C. Dellago, and D. Chandler, Proc. Natl. Acad. Sci. 97, 5877 (2000)15 A. D. MacKerell, D. Bashford, M. Bellott, R. L. Dunbrack, J. D. Evanseck, M. J. Field, S.Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir, K. Kuczera, F. T. K. Lau,C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher, B. Roux, M.Schlenkrich, J. C. Smith, R. Stote, J. Straub, M. Watanabe, J. Wiorkiewicz-Kuczera, D. Yin,and M. Karplus, J. Phys. Chem. B 102, 3586-3616 (1998)16 B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus,J. Comp. Chem. 4, 187-217 (1983)17 M. Feig, A. D. MacKerell, and C. L. Brooks, J. Phys. Chem. B 107, 2831 (2003)18 S. Fischer, and M. Karplus, Chem. Phys. Lett. 194, 252-261 (1992)19 I. V. Ionova, and E. A. Carter, J. Chem. Phys. 98, 6377-6386 (1993)20 G. Henkelman, and H. Jónsson, J. Chem. Phys. 111, 7010-7022 (1999)21 L. R. Pratt, J. Chem. Phys. 85, 5045-5048 (1986)22 C. Bartels, and M. Karplus, J. Comput. Chem. 18, 1450-1462 (1997)23

LIST OF FIGURESFIG. 1: Transition tube between the metastable sets A and B.Also shown are the isocommittorsurfaces Γ z .FIG. 2: Schematic of the computation procedure. Top figure (1st step): Sampling the hyperplanePα n . The empty circles are the mean positions on the hyperplanes; Second figure(2nd step): Smoothing ˜ϕ(α) to obtain ¯ϕ(α); Third figure (3rd step): Reparameterizationof ¯ϕ(α) to obtain ϕ n+1 (α); Last figure (4th step): Reinitializing the samplingprocess on P n+1 . The realizations on P n (empty circles) are moved to P n+1 (emptydiamonds).FIG. 3: Schematic representation of the alanine dipeptide (CH 3 -CONH-CHCH 3 -CONH-CH 3 ).The backbone dihedral angles are labeled by φ: C-N-C-C and ψ: N-C-C-N. The pictureis taken from Ref. 14.FIG. 4: Adiabatic energy landscape of the alanine dipeptide. The energy landscape is obtainedby minimizing the potential energy of the molecule with (φ, ψ) fixed. The contoursare drawn at multiples of 1 kcal/mol above the C 7eq minimum. There are two stableconformers C 7eq and C ax .FIG. 5: The transition tube between C 7eq and C ax . Two pathways exist: (a)C ax - S 1 - C ′ 7eq- S 2 - C 7eq ; and (b): C ax - S 3 - C 7eq . The dotted line is the path of (φ, ψ) along themean string. The converged hyperplanes across the points denoted by diamonds areapproximations to the transition state surfaces.FIG. 6: Top figure: Free energy of the alanine dipeptide along the transition tube shown infigure 5; Lower figure: Committor f(α) along the path C 7eq - S 3 - C ax . The transitionstate ensemble for the transition C 7eq - S 3 - C ax is on the hyperplane labeled by α ⋆determined by f(α ⋆ ) = 1 2 .Note that this plane is also very close to the one withmaximum free energy between C 7eq and C ax .FIG. 7: Transition state ensemble projected onto φ-ψ plane. The level curves show the densityof the realizations on the hyperplanes corresponding to the local maxima of the freeenergy. These planes have committor value f(α) very close 1 2for the correspondingtransition.24

FIG. 8: Transition tube projected onto φ-θ plane. The level curves show the density of thetransition states at S 3 .FIG. 9: Committor distribution for the transition state region S 3 . Two histograms correspondto the two hyperplanes with index α directly above and below α ⋆ , where f(α ⋆ ) = 1.2FIG. 10: Committor distribution for the refined hyperplane with committor value 1.2FIG. 11: Four trajectories initiated from the transition state region S 1 . The trajectories haveequal probabilities to go to C 7eq and C ax . The trajectories stay in the transition tubeobtained by string method.FIG. 12: Transition tube between C 7eq and α R projected onto the φ-ψ plane. The contour plotshows the potential of mean force of the φ-ψ angles. The iso-contours are separated by0.5 kcal/mol. The thick black points that are connected with lines show the locationof the mean string.FIG. 13: Committor distribution function for the transition state region S 1 which is the planecentered around (−96 ◦ , 51.5 ◦ ) in the φ-ψ map.FIG. 14: Four typical trajectories initiated from the transition state region S 1 . The trajectorieshave equal probability to go to C 7eq and α R , and they follow the transition tube.25

Γ zABFIG. 1:26

~ϕAϕ nBAϕ _ϕ nBϕ n+1Aϕ nBϕ n+1Aϕ nBFIG. 2:27

FIG. 3:28

180C 7eq’90C 7eqψ0C ax−90−180−180 −90 0 90 180φFIG. 4:29

FIG. 5:30

8S 3’S 1S 276E (kcal/mol)5432C ax10C 7eq0 0.2 0.4 0.6 0.8 1αC 7eqC 7eq10.80.6f(α)0.40.200 0.1 0.2 0.3 α* 0.4αFIG. 6:31

180’S 2S 3C 7eqS 190C 7eqψ0C ax−90−180−180 −90 0 90 180φFIG. 7:32

FIG. 8:33

43P(p Cax)2100 0.2 0.4 0.6 0.8 1p Cax43P(p Cax)2100 0.2 0.4 0.6 0.8 1p CaxFIG. 9:34

32P(p Cax)100 0.2 0.4 0.6 0.8 1p CaxFIG. 10:35

FIG. 11:36

FIG. 12:37

1.5P(pαR)1.00.5000.20.4pαR0.60.81FIG. 13:38

FIG. 14:39

Transition Pathways in Complex Systems: Application of the Finite ...

Transition Pathways in Complex Systems: Application of the Finite ... ... View more Transition Pathways in Complex Systems: Application of the Finite ...

Delete template?

Save as template ?

Transition Pathways in Complex Systems: Application of the Finite ... Transition Pathways in Complex Systems: Application of the Finite ...