3020Percentage1002 6 10 14Number of ProcessorsFigure 14. Percentage of total cpu time spend in global communication routines (solid line) <strong>and</strong>in Fourier transform routines (dashed line) for a system of 64 silicon atoms on a Cray T3E/600computer.addition to the adaptation of the overlap routine, also the matrix multiply routinesneeded for the orthogonalization step have to be done in parallel. Although thereare libraries for these tasks available the complexity of the code is considerablyincreased.3.9.5 SummaryEfficient parallel algorithms for the plane wave–pseudopotential density functionaltheory method exist. <strong>Implementation</strong>s of these algorithms are available <strong>and</strong> wereused in most of the large scale applications presented at the end of this paper(Sect. 5). Depending on the size of the problem, excellent speedups can be achievedeven on computers with several hundreds of processors. The limitations presentedin the last paragraph are of importance for high–end applications. Together withthe extensions presented, existing plane wave codes are well suited also for the nextgeneration of supercomputers.4 Advanced Techniques: Beyond . . .4.1 IntroductionThe discussion up to this point revolved essentially around the “basic” ab <strong>initio</strong><strong>molecular</strong> <strong>dynamics</strong> methodologies. This means in particular that classical nucleievolve in the electronic ground state in the microcanonical ensemble. This combinationallows already a multitude of applications, but many circumstances existwhere the underlying approximations are unsatisfactory. Among these cases are92

