19.06.2015 Views

Auto-generating optimized CUDA for stencil ... - FEniCS Project

Auto-generating optimized CUDA for stencil ... - FEniCS Project

Auto-generating optimized CUDA for stencil ... - FEniCS Project

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

! Per<strong>for</strong>mance(tuning(parameters(<br />

! HighSlevel(interface(to(lowSlevel(<br />

hardware(specific(optimizations(<br />

(<br />

1. ForSloop(clauses(((<br />

› handle(data(decomposition(and(thread(<br />

management(<br />

› nest((),(tile((),(chunksize(()(<br />

2. (Compiler(flags(<strong>for</strong>(data(locality(<br />

(<br />

› Register:(Sregister(<br />

› SoftwareSmanaged(memory:(Sshared(<br />

› Cache:(SpreferL1(<br />

Device Memory<br />

Shared Memory/L1 cache<br />

Register File<br />

13<br />

#pragma mint copy(U,toDevice,(n+2),(m+2),(k+2))<br />

(n+2),(m+2),(k+2))!<br />

#pragma mint copy(Unew,toDevice,(n+2),(m+2),(k+2))<br />

(n+2),(m+2),(k+2))!<br />

#pragma mint parallel<br />

{<br />

Data<br />

while( t++ < T ){<br />

Transfers<br />

#pragma mint <strong>for</strong> nest(all) tile(16,16,64) chunksize(1,1,64)<br />

<strong>for</strong> (int z=1; z

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!