10.07.2015 Views

Large scale and hybrid computing with CP2K - Prace Training Portal

Large scale and hybrid computing with CP2K - Prace Training Portal

Large scale and hybrid computing with CP2K - Prace Training Portal

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The GPU port of DBCSR (II)Open Challenges:● Only one process per node can connect to GPU.Need a functional OMPed code as well● Work sharing between GPU <strong>and</strong> CPU not yet optimal.Have the CPU also do stacks of multiplications ?● Sending data panels to GPU not yet overlapping <strong>with</strong> any workCould be started as soon as MPI completes. Maybe double buffering● GPU Kernel optimized for 23x23 matrices (water molecule). We need many sizes.Needs an auto-tuning <strong>and</strong> auto-generating framework for the GPUA strategy to deal <strong>with</strong> smaller blocks.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!