CASINO manual - Theory of Condensed Matter

More documents

Recommendations

Info

[CPU time (2592 cores) / CPU time (N cores)] * 2592 1.2e+05 1e+05 80000 60000 40000 20000 Ideal linear scaling CASINO 2.6 CASINO 2.8 FIXED TARGET POPULATION PER CORE 0 0 20000 40000 60000 80000 1e+05 1.2e+05 Number N of processor cores (JaguarPF) The largest calculations that have been done were by MDT on up to 524288 cores of Japan’s K computer where a similar scaling was achieved. Note, however, that perfect linear scaling may require that the combination of your hardware and MPI implementation is capable of genuinely asynchronous non-blocking MPI, i.e., that commands like MPI ISEND actually do what they are supposed to (in some MPI implementations this functionality is ‘faked’). Understanding the extent to which this true requires further study. 39 OpenMP support 39.1 Introduction In addition to MPI, casino also has a preliminary implementation of OpenMP, currently considered experimental. Further development depends on successful testing, currently underway. It is believed that the top-performance computing systems of this decade (2010–2019), which should reach the exaflop scale, will have processors with a hierarchical architecture due to limitations in the amount of power that can be reasonably delivered to and dissipated from each processing unit [115]. It is likely that the different levels in the hierarchy will required multiple simultaneous approaches to parallelism, with an OpenMP-like level parallelizing across the cores in one or a few CPUs and an MPI-like level parallelizing across the entire system. For a pure-MPI QMC calculation with P processors, the total computation time t is roughly given by t ≈ MCt c /P , where M is number of steps, C is number of configurations and t c is the average time to move one configuration at each step. However on very large computers one can be in a situation where the desired C and P are such that P > C, which means that there will be nodes with no configurations in them (and thus idle), which is a waste of resources. The second level of parallelism becomes useful when P > C. Running multiple OpenMP threads on multiple cores allows keeping C small, effectively reducing t c in the cost formula above. 210
39.2 Implementation basics and performance The general strategy of this implementation is to use OpenMP parallelism for the loops whose trip counts scale with the number of electrons or atoms. In the QMC algorithm the basic logical units that need to be parallelized are routines like • orbital evaluation routines, • Jastrow factor evaluation routines, • inverse Slater matrix updating routine, • potential energy evaluation routine, • electron–electron and electron–nucleus distance evaluation routines, • etc. Extensive performance tests were done on pseudopotential systems that use the blip3d and blip3dgamma. The best performance obtained on a AMD quadcore CPU was for a system of 1024 electrons. The speedup factor was close to 1.5 for 2 OpenMP threads and close to 2 for 4 OpenMP threads. Larger systems had update dbar as an OpenMP bottleneck. 39.3 Using OpenMP To use the experimental OpenMP feature, compile the code with make Openmp on a supported architecture. Then use the option --tpp threads-per-process in the runqmc command line to specify the number of OpenMP threads per process. E.g. to run two processes with two threads each you would type runqmc --nproc=2 --tpp=2, ideally on a 4-core machine. By default on batch-queueing systems the number of cores reserved for the job will be nproc * tpp. Note finally that when analysing timing data from Openmp runs you need to look at the ‘Real Time’ data, rather than the ‘CPU time’ data, since the CPU time is summed over all OMP threads. A Appendix 1: Programming guide for CASINO Because you signed our legal agreement (you did, didn’t you?) you are not allowed to make modifications to casino without explicit written permission from the Cambridge group. This is generally very easy to obtain, though we do not guarantee to incorporate your changes into the public distribution if we don’t like them, or to keep them there afterwards if we think of a better way to do it (but then, who does?). If you have obtained this permission, then please read the following. The main casino source code and many of the utilities are written in Fortran, which must conform to the Fortran 95 standard (later standards are not necessarily supported by all compilers). There are also a couple of simple C routines in the main code (for shared memory etc.). Utilities not in Fortran should generally be written as bash shell scripts (or possibly tcsh or csh, though this is deprecated). We did once accept a C++ pseudopotential conversion utility, though this adds to problems with portability. The ADF converter in utils/wfn converters/adf is written in Python, though we provide no official support for this or any other programming language. Please understand that casino is meant to compile, setup and run out of the box on any computer in the world, and the fewer languages we are dependent on the easier this is to achieve. A.1 Style casino has a Fortran95 ‘style’ which should be adhered to when writing code, both for the main source and for the Fortran utilities. This is because it is desirable that the package has a homogeneous look and feel (and because searching for text strings then works consistently). Everybody has their own style. Yours is different and may even be better, but we’ve decided on one for casino and there it is. If you don’t write your code like this, the likelihood is that MDT or someone else will reformat it for you, and they will probably accidentally delete a crucial minus sign while correcting your routine, 211
Page 1 and 2:
CASINO User’s Guide Version 2.13
Page 3 and 4:
8.6 GAUSSIAN94/98/03/09 . . . . . .
Page 5 and 6:
29.2 Fourier transformation of CASI
Page 7 and 8:
1 Introduction casino is a computer
Page 9 and 10:
3.2 Legal stuff casino is given awa
Page 11 and 12:
- Grossman-Mitas DMC-DFT molecular
Page 13 and 14:
Your defined setup can then be perm
Page 15 and 16:
- : perform compilation (default).
Page 17 and 18:
6 Introductory user’s guide: how
Page 19 and 20:
ATOM BASIS TYPE The basis used to r
Page 21 and 22:
ENVMC v0.60: Script to extract VMC
Page 23 and 24:
Std. err. in the mean DMC energy (a
Page 25 and 26:
Jastrow factor from each cycle of t
Page 27 and 28:
V(x) Ψ init (x) τ{ t Ψ 0 (x) x T
Page 29 and 30:
the energy becomes roughly constant
Page 31 and 32:
many cores, time limits etc, which
Page 33 and 34:
Set the verbosity level of the mach
Page 35 and 36:
6.7 How to run coupled DFT-DMC mole
Page 37 and 38:
unqmcmd --startqmc=M Start the chai
Page 39 and 40:
config.out This is the name under w
Page 41 and 42:
‘slater-type’: use Slater-type
Page 43 and 44:
CUSTOM SPAIR DEP (Block) This input
Page 45 and 46:
initial configurations come from DM
Page 47 and 48:
DTDMC (Real) Time step for DMC run
Page 49 and 50:
FORCES INFO (Integer) Controls the
Page 51 and 52:
nucleus is assumed to lie at the or
Page 53 and 54:
MOVIECELLS (Logical) If F then casi
Page 55 and 56:
OPT MAXITER (Integer) Largest permi
Page 57 and 58:
of G vectors and discards all those
Page 59 and 60:
half a million cores, it may be des
Page 61 and 62:
VM REWEIGHT (Logical) If vm reweigh
Page 63 and 64:
default this is 0 hartree. It shoul
Page 65 and 66:
A very detailed specification of th
Page 67 and 68:
-1 0 0 1 1 1 1 1 1 1 0 2 1 0 1 2 Pa
Page 69 and 70:
cusp conditions; otherwise, only th
Page 71 and 72:
Detailed information about each of
Page 73 and 74:
|k| (outermost). Hence one can spec
Page 75 and 76:
large and the wave function is near
Page 77 and 78:
R(i) in atomic units 0.000000000000
Page 79 and 80:
2. The charge-density Fourier coeff
Page 81 and 82:
c_767: [ 996 ] ... Compressed expan
Page 83 and 84:
valence charges for each atom %%%%%
Page 85 and 86:
1988). This can be found online. An
Page 87 and 88:
1.3813695578425E+00 1.3957621401173
Page 89 and 90:
PWSCF Method: DFT DFT Functional: u
Page 91 and 92:
0.125203784849961E-01 0.13042462855
Page 93 and 94:
3.3000000000000E+00 2.0000000000000
Page 95 and 96:
Potential Number of grid points 200
Page 97 and 98:
9. Clearly, the total number of pos
Page 99 and 100:
EBEST (DMC only) Best estimate of t
Page 101 and 102:
Use G-vector set 1 Number of sets 2
Page 103 and 104:
1000.0 rho_a(G)*rho_b(-G) 16.0 4.12
Page 105 and 106:
total weight must always be include
Page 107 and 108:
The exporter does not currently wor
Page 109 and 110:
8.4.2 Generating gwfn.data files wi
Page 111 and 112:
After successfully running properti
Page 113 and 114:
% mv Test.FChk dna.Fchk run gaussia
Page 115 and 116:
• By default, gaussian uses spin
Page 117 and 118:
Routine Purpose qmc write Writes th
Page 119 and 120:
not the case the defaults can be ch
Page 121 and 122:
8.10 TURBOMOLE Website: www.turbomo
Page 123 and 124:
• crysgen06/09, crystaltoqmc: The
Page 125 and 126:
• quickblock: Simple reblocking u
Page 127 and 128:
• In Movie Settings choose Trajec
Page 129 and 130:
the move rejection probability is h
Page 131 and 132:
This leads to the single-electron d
Page 133 and 134:
In the UNR [19] scheme the modified
Page 135 and 136:
13.7 Evaluating expectation values
Page 137 and 138:
Population-control catastrophes sho
Page 139 and 140:
where u k has the periodicity of th
Page 141 and 142:
Although the constraint equations a
Page 143 and 144:
18 Wave-function updating Consider
Page 145 and 146:
19.2 Evaluating the nonlocal pseudo
Page 147 and 148:
of a unit point charge at r j in ev
Page 149 and 150:
19.4.3 1D Coulomb interaction Coulo
Page 151 and 152:
To investigate whether the extrapol
Page 153 and 154:
correlations, such as might be enco
Page 155 and 156:
where n is the order of the expansi
Page 157 and 158:
terms by definition, and it can be
Page 159 and 160:
which is therefore independent of r
Page 161 and 162:
where the local energy, E L , is E
Page 163 and 164:
Another variant of variance minimiz
Page 165 and 166: The energy minimization method used
Page 167 and 168: valid. The first problem, which ari
Page 169 and 170: • Is emin min energy too high? Th
Page 171 and 172: It may be activated by setting vmc
Page 173 and 174: that the number of configuration mo
Page 175 and 176: • It is possible to use blip orbi
Page 177 and 178: 0.5 0.5 0.0 particle 1 det 1 : 9 or
Page 179 and 180: The clearup twistav script can be u
Page 181 and 182: In the infinite system limit, the s
Page 183 and 184: Note that this has the k −2 diver
Page 185 and 186: • The effective time step, Eq. (3
Page 187 and 188: 1 2 H He 1.00794 4.00260 3 4 5 6 7
Page 189 and 190: and the Dirac delta function is δ(
Page 191 and 192: 33.2.2 Atomic densities K eywords:
Page 193 and 194: The spin-density matrix is a two-by
Page 195 and 196: 33.3.3 Homogeneous and isotropic sy
Page 197 and 198: 33.4 Structure factor and spherical
Page 199 and 200: 33.5 One-body density matrix, two-b
Page 201 and 202: [ where the approximation ρ (1)T
Page 203 and 204: The figure shows the localization l
Page 205 and 206: with F HFT mix = − F P mix = −
Page 207 and 208: The input keyword to activate the c
Page 209 and 210: To each walker r j , we assign a li
Page 211 and 212: 37 Magnetic fields and the fixed-ph
Page 213 and 214: 38 Analysis of the performance of C
Page 215: the orbitals, α = 2 [11]. The use
Page 219 and 220: • Use ‘endif’ and ‘enddo’
Page 221 and 222: • Evidence of testing on serial a
Page 223 and 224: B Appendix 2: Automatic testing of
Page 225 and 226: • Delete any eepot.data and densi
Page 227 and 228: * "Generic" CASINO_ARCHs are intend
Page 229 and 230: - ‘head -n 1 /etc/redhat-release
Page 231 and 232: for a single job in an ensemble of
Page 233 and 234: inary executable. CASINO does not r
Page 235 and 236: otherwise. The following tags are o
Page 237 and 238: [2] V.R. Saunders, R. Dovesi, C. Ro
Page 239 and 240: [50] W. Janke, Statistical Analysis
show all

CASINO manual - Theory of Condensed Matter

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?