Ab initio molecular dynamics: Theory and Implementation

More documents

Recommendations

Info

over processors. All processors should hold approximately the same number ofplane waves. If a plane wave for the wavefunction cutoff is on a certain processor,the same plane wave should be on the same processor for the density cutoff. Thedistribution of the plane waves should be such that at the beginning or end of athree dimensional Fourier transform no additional communication is needed. Toachieve all of these goals the following heuristic algorithm 137 is used. The planewaves are ordered into ”pencils”. Each pencil holds all plane waves with the sameg y and g z components. The pencils are numbered according to the total numberof plane waves that are part of it. Pencils are distributed over processors in a”round robin” fashion switching directions after each round. This is first done forthe wavefunction cutoff. For the density cutoff the distribution is carried over, andall new pencils are distributed according to the same algorithm. Experience showsthat this algorithm leads to good results for the load balancing on both levels, thetotal number of plane waves and the total number of pencils. The number of pencilson a processor is proportional to the work for the first step in the three-dimensionalFourier transform.Special care has to be taken for the processor that holds the G = 0 component.This component has to be treated individually in the calculation of the overlaps.The processor that holds this component will be called p0.3.9.3 CPMD Program: Computational KernelsThere are three communication routines mostly used in the parallelization of theCPMD code. All of them are collective communication routines, meaning that allprocessors are involved. This also implies that synchronization steps are performedduring the execution of these routines. Occasionally other communication routineshave to be used (e.g. in the output routines for the collection of data) but theydo not appear in the basic computational kernels. The three routines are theBroadcast, GlobalSum, and MatrixTranspose. In the Broadcast routine data issend from one processor (px) to all other processorsx p ← x px . (265)In the GlobalSum routine a data item is replaced on each processor by the sum overthis quantity on all processorsx p ← ∑ px p . (266)The MatrixTranspose changes the distribution pattern of a matrix, e.g. from rowdistribution to column distributionx(p, :) ← x(:, p) . (267)On a parallel computer with P processors, a typical latency time t L (time for thefirst data to arrive) and a bandwidth of B, the time spend in the communicationroutines isBroadcastGlobalSumMatrixTransposelog 2 [P] {t L + N/B}log 2 [P] {t L + N/B}Pt L + N/(PB)85
Table 4. Distribution of plane waves and ”pencils” in parallel runs on different numbers of processors.Example for a cubic box with a volume of 6479.0979 bohr 3 and a 70 Rydberg cutoff forthe wavefunctions. This is the simulation box needed for 32 water molecules at normal pressure.wavefunction cutoffPE plane waves pencilsmax min max min1 32043 32043 1933 19332 16030 16013 967 9664 8016 8006 484 4828 4011 4000 242 24016 2013 1996 122 11932 1009 994 62 5964 507 495 32 29128 256 245 16 14density cutoffPE plane waves pencilsmax min max min1 256034 256034 7721 77212 128043 127991 3859 38624 64022 63972 1932 19298 32013 31976 966 96416 16011 15971 484 48232 8011 7966 242 24064 4011 3992 122 119128 2006 1996 62 59where it is assumed that the amount of data N is constant. The time neededin Broadcast and GlobalSum will increase with the logarithm of the number ofprocessors involved. The time for the matrix transposition scales for one partlinearly with the number of processors. Once this part is small, then the latencypart will be dominant and increase linearly. Besides load balancing problems, thecommunication routines will limit the maximum speedup that can be achieved ona parallel computer for a given problem size. Examples will be shown in the lastpart of this section.With the distribution of the data structures given, the parallelization of the computationalkernels is in most cases easy. In the StructureFactor and Rotationroutines the loop over the plane waves N D has to be replaced by N p D. The routinesperforming inner products have to be adapted for the G = 0 term and the globalsummation of the final result.MODULE DotProductIF (p == P0) THENab = A(1) * B(1) + 2 * sdot(2 * (N p D − 1),A(2),1,B(2),1)ELSE86
Page 1 and 2:
John von Neumann Institute for Comp
Page 4:
2500Number200015001000CP PRL 1985AI
Page 7 and 8:
The goal of this section is to deri
Page 9:
¡the Newtonian equation of motion
Page 13 and 14:
Ehrenfest molecular dynamics is cer
Page 15 and 16:
the Car-Parrinello approach 108 , s
Page 17:
According to the Car-Parrinello equ
Page 20 and 21:
Figure 4. (a) Comparison of the x-c
Page 22 and 23:
Up to this point the entire discuss
Page 24 and 25:
parameters are those used to repres
Page 26 and 27:
in terms of a linear combination of
Page 28 and 29:
structure calculations, see e.g. Re
Page 30 and 31:
“unbound electrons” dissolved i
Page 32 and 33:
Table 1. Timings in cpu seconds and
Page 34 and 35:
stressed that the energy conservati
Page 36 and 37: see e.g. the discussion following E
Page 38 and 39: from a set of one-particle spin orb
Page 40 and 41: is used, which represents exactly a
Page 42 and 43: 2.8.3 Generalized Plane WavesAn ext
Page 44 and 45: disposable parameters that can be o
Page 46 and 47: The index i runs over all states an
Page 48 and 49: f(G) are related by three-dimension
Page 50 and 51: where j l are spherical Bessel func
Page 52 and 53: andE self = ∑ I1√2πZ 2 IR c I.
Page 54 and 55: ¢££¤¤¢¢¢n tot (G)inv FTn to
Page 56 and 57: correlation energyΩ ∑E xc = ε x
Page 58 and 59: 3.4 Total Energy, Gradients, and St
Page 60 and 61: 3.4.3 Gradient for Nuclear Position
Page 62 and 63: The local part of the pseudopotenti
Page 64 and 65: ¢¢¢¢¢i = 1 . . .N b¢c i (G)¢
Page 66 and 67: ¢¢¢¢¢c i (G)123g, E self∆V I
Page 68 and 69: where G c is a free parameter that
Page 70 and 71: and a matrix form of the Gram-Schmi
Page 72 and 73: The two sets of equations are coupl
Page 74 and 75: introducing different masses for di
Page 76 and 77: The Lagrange multiplier have to be
Page 78 and 79: Table 3. Relative size of character
Page 80 and 81: of the G vectors, and only real ope
Page 82 and 83: CALL SGEMM(’T’,’N’,M,N b ,2
Page 84 and 85: ing standard communication librarie
Page 88 and 89: ab = 2 * sdot(2 * N p D ,A(1),1,B(1
Page 90 and 91: ENDCALL ParallelFFT3D("INV",scr1,sc
Page 92 and 93: are the improved load-balancing for
Page 94 and 95: situations where• it is necessary
Page 96 and 97: espectively), but completely analog
Page 98 and 99: 4.2.3 Imposing Pressure: BarostatsK
Page 100 and 101: in the previous section. The isobar
Page 102 and 103: 4.3.2 Many Excited States: Free Ene
Page 104 and 105: down the generalization of the Helm
Page 106 and 107: free energy functional discussed in
Page 108 and 109: Figure 16. Four patterns of spin de
Page 110 and 111: and electrons r = {r i } can be wri
Page 112 and 113: The effective classical partition f
Page 114 and 115: up e.g. in Refs. 132,37,596,597,428
Page 116 and 117: The eigenvalues of A when multiplie
Page 118 and 119: frequency of the electronic degrees
Page 120 and 121: 5.2 Solids, Polymers, and Materials
Page 122 and 123: the penetration of the oxidation la
Page 124 and 125: in terms of their electronic struct
Page 126 and 127: culations on very accurate global p
Page 128 and 129: AcknowledgmentsOur knowledge on ab
Page 130 and 131: 57. M. Bernasconi, G. L. Chiarotti,
Page 132 and 133: Superiore di Studi Avanzati (SISSA)
Page 134 and 135: 175. E. Ermakova, J. Solca, H. Hube
Page 136 and 137:
244. H. Goldstein, Klassische Mecha
Page 138 and 139:
313. T. Ikeda, M. Sprik, K. Terakur
Page 140 and 141:
384. N. A. Marks, D. R. McKenzie, B
Page 142 and 143:
442. S. Nosé and M. L. Klein, Mol.
Page 144 and 145:
502. L. M. Ramaniah, M. Bernasconi,
Page 146 and 147:
562. F. Shimojo, K. Hoshino, and Y.
Page 148 and 149:
620. A. Tongraar, K. R. Liedl, and
Page 150:
682. R. M. Wentzcovitch, Phys. Rev.
show all

Ab initio molecular dynamics: Theory and Implementation

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?