Ab initio molecular dynamics: Theory and Implementation

More documents

Recommendations

Info

are the improved load–balancing for the Fourier transforms and the bigger datapackages in the matrix transposes. The number of plane waves in the row groups(N prPW) is calculated as the sum over all local plane waves in the correspondingcolumn groups.MODULE Densityrho(1:Nx pr ,1:N y ,1:N z ) = 0FOR i=1:N b ,2*PcCALL ParallelTranspose(c(:,i),colgrp)scr1(1:N x ,1:N PWpencil,r ) = 0FOR j=1:N prPWscr1(ipg(1,i),mapxy(ipg(2,i),ipg(3,i))) = && c(j,i) + I * c(j,i+1)scr1(img(1,i),mapxy(img(2,i),img(3,i))) = && CONJG[c(j,i) + I * c(j,i+1)]ENDCALL ParallelFFT3D("INV",scr1,scr2,rowgrp)rho(1:N prx ,1:N y,1:N z ) = rho(1:N prx ,1:N y,1:N z ) + && REAL[scr2(1:N p x,1:N y ,1:N z )]**2 + && IMAG[scr2(1:N p x,1:N y ,1:N z )]**2ENDCALL GlobalSum(rho,colgrp)The use of two task groups in the example shown in Fig. 13 leads to an increase ofspeedup for 256 processors from 120 to 184 on a Cray T3E/600 computer.The effect of the non-scalability of the global communication used in CPMD isshown in Fig. 14. This example shows the percentage of time used in the globalcommunication routines (global sums and broadcasts) and the time spend in theparallel Fourier transforms for a system of 64 silicon atoms with a energy cutoff of12 Rydberg. It can clearly be seen that the global sums and broadcasts do not scaleand therefore become more important the more processors are used. The Fouriertransforms on the other hand scale nicely for this range of processors. Wherethe communication becomes dominant depends on the size of the system and theperformance ratio of communication to cpu.Finally, the memory available on each processor may become a bottleneck for largecomputations. The replicated data approach for some arrays adapted in the implementationof the code poses limits on the system size that can be processed ona given type of computer. In the outline given in this chapter there are two typesof arrays that scale quadratically in system size that a replicated. The overlapmatrix of the projectors with the wavefunctions (fnl) and the overlap matrices ofthe wavefunctions themselves (smat). The fnl matrix is involved in two types ofcalculations where the parallel loop goes either over the bands or the projectors.To avoid communication, two copies of the array are kept on each processor. Eachcopy holds the data needed in one of the distribution patterns. This scheme needsonly a small adaptation of the code described above.The distribution of the overlap matrices (smat) causes some more problems. In91
3020Percentage1002 6 10 14Number of ProcessorsFigure 14. Percentage of total cpu time spend in global communication routines (solid line) andin Fourier transform routines (dashed line) for a system of 64 silicon atoms on a Cray T3E/600computer.addition to the adaptation of the overlap routine, also the matrix multiply routinesneeded for the orthogonalization step have to be done in parallel. Although thereare libraries for these tasks available the complexity of the code is considerablyincreased.3.9.5 SummaryEfficient parallel algorithms for the plane wave–pseudopotential density functionaltheory method exist. Implementations of these algorithms are available and wereused in most of the large scale applications presented at the end of this paper(Sect. 5). Depending on the size of the problem, excellent speedups can be achievedeven on computers with several hundreds of processors. The limitations presentedin the last paragraph are of importance for high–end applications. Together withthe extensions presented, existing plane wave codes are well suited also for the nextgeneration of supercomputers.4 Advanced Techniques: Beyond . . .4.1 IntroductionThe discussion up to this point revolved essentially around the “basic” ab initiomolecular dynamics methodologies. This means in particular that classical nucleievolve in the electronic ground state in the microcanonical ensemble. This combinationallows already a multitude of applications, but many circumstances existwhere the underlying approximations are unsatisfactory. Among these cases are92
Page 1 and 2:
John von Neumann Institute for Comp
Page 4:
2500Number200015001000CP PRL 1985AI
Page 7 and 8:
The goal of this section is to deri
Page 9:
¡the Newtonian equation of motion
Page 13 and 14:
Ehrenfest molecular dynamics is cer
Page 15 and 16:
the Car-Parrinello approach 108 , s
Page 17:
According to the Car-Parrinello equ
Page 20 and 21:
Figure 4. (a) Comparison of the x-c
Page 22 and 23:
Up to this point the entire discuss
Page 24 and 25:
parameters are those used to repres
Page 26 and 27:
in terms of a linear combination of
Page 28 and 29:
structure calculations, see e.g. Re
Page 30 and 31:
“unbound electrons” dissolved i
Page 32 and 33:
Table 1. Timings in cpu seconds and
Page 34 and 35:
stressed that the energy conservati
Page 36 and 37:
see e.g. the discussion following E
Page 38 and 39:
from a set of one-particle spin orb
Page 40 and 41:
is used, which represents exactly a
Page 42 and 43: 2.8.3 Generalized Plane WavesAn ext
Page 44 and 45: disposable parameters that can be o
Page 46 and 47: The index i runs over all states an
Page 48 and 49: f(G) are related by three-dimension
Page 50 and 51: where j l are spherical Bessel func
Page 52 and 53: andE self = ∑ I1√2πZ 2 IR c I.
Page 54 and 55: ¢££¤¤¢¢¢n tot (G)inv FTn to
Page 56 and 57: correlation energyΩ ∑E xc = ε x
Page 58 and 59: 3.4 Total Energy, Gradients, and St
Page 60 and 61: 3.4.3 Gradient for Nuclear Position
Page 62 and 63: The local part of the pseudopotenti
Page 64 and 65: ¢¢¢¢¢i = 1 . . .N b¢c i (G)¢
Page 66 and 67: ¢¢¢¢¢c i (G)123g, E self∆V I
Page 68 and 69: where G c is a free parameter that
Page 70 and 71: and a matrix form of the Gram-Schmi
Page 72 and 73: The two sets of equations are coupl
Page 74 and 75: introducing different masses for di
Page 76 and 77: The Lagrange multiplier have to be
Page 78 and 79: Table 3. Relative size of character
Page 80 and 81: of the G vectors, and only real ope
Page 82 and 83: CALL SGEMM(’T’,’N’,M,N b ,2
Page 84 and 85: ing standard communication librarie
Page 86 and 87: over processors. All processors sho
Page 88 and 89: ab = 2 * sdot(2 * N p D ,A(1),1,B(1
Page 90 and 91: ENDCALL ParallelFFT3D("INV",scr1,sc
Page 94 and 95: situations where• it is necessary
Page 96 and 97: espectively), but completely analog
Page 98 and 99: 4.2.3 Imposing Pressure: BarostatsK
Page 100 and 101: in the previous section. The isobar
Page 102 and 103: 4.3.2 Many Excited States: Free Ene
Page 104 and 105: down the generalization of the Helm
Page 106 and 107: free energy functional discussed in
Page 108 and 109: Figure 16. Four patterns of spin de
Page 110 and 111: and electrons r = {r i } can be wri
Page 112 and 113: The effective classical partition f
Page 114 and 115: up e.g. in Refs. 132,37,596,597,428
Page 116 and 117: The eigenvalues of A when multiplie
Page 118 and 119: frequency of the electronic degrees
Page 120 and 121: 5.2 Solids, Polymers, and Materials
Page 122 and 123: the penetration of the oxidation la
Page 124 and 125: in terms of their electronic struct
Page 126 and 127: culations on very accurate global p
Page 128 and 129: AcknowledgmentsOur knowledge on ab
Page 130 and 131: 57. M. Bernasconi, G. L. Chiarotti,
Page 132 and 133: Superiore di Studi Avanzati (SISSA)
Page 134 and 135: 175. E. Ermakova, J. Solca, H. Hube
Page 136 and 137: 244. H. Goldstein, Klassische Mecha
Page 138 and 139: 313. T. Ikeda, M. Sprik, K. Terakur
Page 140 and 141: 384. N. A. Marks, D. R. McKenzie, B
Page 142 and 143:
442. S. Nosé and M. L. Klein, Mol.
Page 144 and 145:
502. L. M. Ramaniah, M. Bernasconi,
Page 146 and 147:
562. F. Shimojo, K. Hoshino, and Y.
Page 148 and 149:
620. A. Tongraar, K. R. Liedl, and
Page 150:
682. R. M. Wentzcovitch, Phys. Rev.
show all

Ab initio molecular dynamics: Theory and Implementation

Create successful ePaper yourself

Delete template?

Save as template?