10.07.2015 Views

Large scale and hybrid computing with CP2K - Prace Training Portal

Large scale and hybrid computing with CP2K - Prace Training Portal

Large scale and hybrid computing with CP2K - Prace Training Portal

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GPU application benchmark●>400 multiplications for 1 run.●Additional thresholding in multiplications (less flops for same data)●This week's results.... subject to change20736 atoms (6912 water molecules), matrix dim 159000, on 576 nodes XK6,~60 matrix multiplications / iter.XK6 <strong>with</strong>out GPU : 1965s per iterationXK6 <strong>with</strong> GPU : 924s per iterationSpeedup 2.12xMPI performance (b<strong>and</strong>width) appears to be the bottleneck (e.g. 50% slowdown<strong>with</strong>out custom rank reordering) :● Still need to figure out MPI performance (incl. effectiveness of overlap).● Is the dynamic linking still an issue ?● Any interference between GPU+CPU ?● One Communication thread per node enough ?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!