Auto-generating optimized CUDA for stencil ... - FEniCS Project
Auto-generating optimized CUDA for stencil ... - FEniCS Project
Auto-generating optimized CUDA for stencil ... - FEniCS Project
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
! Per<strong>for</strong>mance(tuning(parameters(<br />
! HighSlevel(interface(to(lowSlevel(<br />
hardware(specific(optimizations(<br />
(<br />
1. ForSloop(clauses(((<br />
› handle(data(decomposition(and(thread(<br />
management(<br />
› nest((),(tile((),(chunksize(()(<br />
2. (Compiler(flags(<strong>for</strong>(data(locality(<br />
(<br />
› Register:(Sregister(<br />
› SoftwareSmanaged(memory:(Sshared(<br />
› Cache:(SpreferL1(<br />
Device Memory<br />
Shared Memory/L1 cache<br />
Register File<br />
13<br />
#pragma mint copy(U,toDevice,(n+2),(m+2),(k+2))<br />
(n+2),(m+2),(k+2))!<br />
#pragma mint copy(Unew,toDevice,(n+2),(m+2),(k+2))<br />
(n+2),(m+2),(k+2))!<br />
#pragma mint parallel<br />
{<br />
Data<br />
while( t++ < T ){<br />
Transfers<br />
#pragma mint <strong>for</strong> nest(all) tile(16,16,64) chunksize(1,1,64)<br />
<strong>for</strong> (int z=1; z