12.07.2015 Views

Ab initio molecular dynamics: Theory and Implementation

Ab initio molecular dynamics: Theory and Implementation

Ab initio molecular dynamics: Theory and Implementation

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

12010080Speedup60402000 50 100ProcessorsFigure 13. Maximal theoretical speedup for a calculation with a real space grid of dimension 100(solid line). Effective speedup for a 32 water molecule system with an energy cutoff of 70 Rydberg<strong>and</strong> a real space grid of dimension 100 (dotted line with diamonds)limitation is related to bad load-balancing or the computation becomes dominatedby the non-scaling part of the communication routines. Load–balancing problemsin the CPMD code are almost exclusively due to the distribution of the real spacearrays. Only the x coordinate is distributed. There are typically of the order of100 grid points in each direction. Figure 13 shows the maximal theoretical speedupfor a calculation with a real space grid of dimension 100. The steps are due to theload–balancing problems initiated by the granularity of the problem (the dimensionis an integer value). No further speedup can be achieved once 100 processors arereached. The second curve in Fig. 13 shows actual calculations of the fullCPMD code.It is clearly shown that the load balancing problem in the Fourier transforms affectsthe performance of this special example. Where this steps appear <strong>and</strong> how severethe performance losses are depends of course on the system under consideration.To overcome this limitation a method based on processor groups has been implementedinto the code. For the two most important routines where the real spacegrid load–balancing problem appears, the calculation of the charge density <strong>and</strong> theapplication of the local potential, a second level of parallelism is introduced. Theprocessors are arranged into a two-dimensional grid <strong>and</strong> groups are build accordingto the row <strong>and</strong> column indices. Each processor is a member of its column group(colgrp) <strong>and</strong> its row group (rowgrp). In a first step a data exchange in the columngroup assures that all the data needed to perform Fourier transforms within therow groups are available. Then each row group performs the Fourier transformsindependently <strong>and</strong> in the end another data exchange in the column groups rebuildsthe original data distribution. This scheme (shown in the pseudo code for the densitycalculation) needs roughly double the amount of communication. Advantages90

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!