14.09.2014 Views

CASINO manual - Theory of Condensed Matter

CASINO manual - Theory of Condensed Matter

CASINO manual - Theory of Condensed Matter

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

different processors interact via the population-control mechanism. The population on each processor<br />

fluctuates, and the cost <strong>of</strong> each time step is determined by the processor with the largest population.<br />

It is therefore necessary to even up the distribution <strong>of</strong> configurations between processors from time to<br />

time. Unfortunately, transferring configurations between processors can be costly, and is usually the<br />

principal limitation on the scaling <strong>of</strong> the DMC method with the number <strong>of</strong> processors. (There is also<br />

a small cost associated with the communication required to decide on a reference energy after each<br />

time step, but we assume this to be negligible henceforth.)<br />

[Note that with the release <strong>of</strong> casino 2.8 in Feb 2011, it was shown that the effective cost <strong>of</strong> configuration<br />

transfers can be reduced to essentially zero by using fancy MPI tricks such as non-blocking<br />

sends and receives, so the following discussion is largely academic, but we retain it for completeness.]<br />

38.3.2 Behaviour <strong>of</strong> the population on each processor<br />

Let T redist τ be the redistribution period, i.e., we redistribute the configuration population after every<br />

T redist time steps τ. At any given time the population on a processor p must be increasing or decreasing<br />

exponentially, because the mean energy e p <strong>of</strong> the configuration population on that processor is unlikely<br />

to be exactly equal to the reference energy E T . We assume that e p −E T remains roughly constant over<br />

the redistribution period, i.e., that the autocorrelation period is much longer than the redistribution<br />

period. At the start <strong>of</strong> the redistribution period the population C p (1) on each processor is the same.<br />

At the end <strong>of</strong> the redistribution period, the expected population on processor p is C p (T redist ) =<br />

C p (1) exp[−(e p − E T )T redist τ]. Hence ¯C(T redist ) ≈ ¯C(1) exp[−(Ē − E T )T redist τ] + O(Tredist 2 τ 2 ), where<br />

the bar denotes an average over the processors, and so the average growth or decay <strong>of</strong> the population<br />

is the same as that <strong>of</strong> the entire population (which should be small, because E T is chosen so as to<br />

ensure this).<br />

38.3.3 Optimal redistribution period<br />

Let A be the cost <strong>of</strong> propagating a single configuration over one time step. Let B be the cost <strong>of</strong><br />

transferring a single configuration between processors.<br />

Let q be the processor with the largest number <strong>of</strong> configurations, i.e., with the lowest energy e q ≡<br />

min{e p }. Both the cost <strong>of</strong> propagating configurations and the cost <strong>of</strong> transferring configurations are<br />

determined by processor q. The expected number <strong>of</strong> configurations on processor q at the end <strong>of</strong><br />

the redistribution period (i.e., after T redist time steps) is max{C p (T redist )} ≈ ¯C(T redist ) + cT redist +<br />

O(Tredist 2 ), where c = ¯C(1)(Ē − min{e p})τ. NB, 〈 ¯C(1)〉 = N C /P , where N C is the target population<br />

and P is the number <strong>of</strong> processors, and 〈c〉 is a positive constant.<br />

At the end <strong>of</strong> the redistribution period, cT redist configurations are to be transferred from processor q.<br />

Hence the average cost <strong>of</strong> transferring configurations per time step is B〈c〉, which is independent <strong>of</strong><br />

T redist .<br />

The average cost per time step <strong>of</strong> waiting for the processor q with the greatest number <strong>of</strong> configurations<br />

to finish propagating all its excess configurations is<br />

A〈c〉 [0 + 1 + . . . + (T redist − 1)]<br />

= A〈c〉(T redist − 1)<br />

. (459)<br />

T redist 2<br />

So the total average cost per time step in DMC is<br />

T = AN C<br />

P + A〈c〉(T redist − 1)<br />

+ B〈c〉. (460)<br />

2<br />

Clearly the redistribution period should be chosen to be as small as possible to minimize T . Numerical<br />

tests confirm that increasing the redistribution period only acts to slow down calculations. One should<br />

therefore choose T redist = 1, i.e., redistribution should take place after every time step. We assume<br />

this to be the case henceforth (the previously existing keyword redist period which allowed the user<br />

to do otherwise has now anyway been deleted).<br />

38.4 Scaling <strong>of</strong> the DMC algorithm with the number <strong>of</strong> processors<br />

The cost A <strong>of</strong> propagating each configuration scales as N α . For typical systems, where extended<br />

orbitals represented in a localized basis are used and the CPU time is dominated by the evaluation <strong>of</strong><br />

208

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!