19.06.2015 Views

Auto-generating optimized CUDA for stencil ... - FEniCS Project

Auto-generating optimized CUDA for stencil ... - FEniCS Project

Auto-generating optimized CUDA for stencil ... - FEniCS Project

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

! Aims(programmer’s(productivity(and(high(per<strong>for</strong>mance(<br />

! Simplifies(application(development(<br />

! Based(on(a(modest(number(of(compiler(directives(<br />

› #pragma(mint(<strong>for</strong>(<br />

› Incremental(parallelization(<br />

! Abstracts(away(the(programmer’s(view(of((the(hardware(<br />

Seismic Modeling<br />

Cardiac Simulation<br />

Turbulent Flow<br />

Main Memory<br />

L2<br />

L2<br />

Mint<br />

Device Memory<br />

! SourceStoSsource(translator((<strong>for</strong>(the(Nvidia(GPUs(<br />

› Parallelizes(loop(nests(<br />

› Relieves(the(programmer(of(a(variety(of(tedious(tasks(<br />

(<br />

(<br />

C + directives<br />

Mint<br />

! MotifSspecific(autoSoptimizer(<br />

› Targets(<strong>stencil</strong>(methods((<br />

<strong>CUDA</strong><br />

› Incorporates(semantic(knowledge(to(compiler(analysis(<br />

› Per<strong>for</strong>ms(data(locality(optimizations(via(onSchip(memory(<br />

› Compiler(flags(<strong>for</strong>(per<strong>for</strong>mance(tuning(<br />

core<br />

core<br />

core<br />

core<br />

7<br />

(<br />

10<br />

Serial!code!<br />

!!!!Accelerated!Region!<br />

Data!parallel!<strong>for</strong>!<br />

Host!Region!<br />

Data!parallel!<strong>for</strong>!<br />

Host!!<br />

Thread!<br />

8<br />

11<br />

Serial!code!<br />

!!!!Accelerated!Region!<br />

Data!parallel!<strong>for</strong>!<br />

Host!Region!<br />

kernel<br />

……<br />

Block Block<br />

Device Memory<br />

! #pragma(mint(parallel(<br />

Accelerated Region<br />

› Indicates(the(accelerated(region(<br />

! #pragma(mint(<strong>for</strong>(<br />

› Marks(enclosed(loopSnest(<strong>for</strong>(acceleration(<br />

› 3(additional(clauses(<strong>for</strong>(optimizations(<br />

! #pragma(mint(copy(<br />

Data Transfer<br />

› Expresses(data(transfers(between(the(host(and(device(<br />

Data!parallel!<strong>for</strong>!<br />

Host!!<br />

Thread!<br />

Block<br />

……<br />

Block<br />

Block<br />

! #pragma(mint(single(<br />

› Handles(serial(section(<br />

! #pragma(mint(barrier(<br />

› Synchronizes(host(and(device(threads(<br />

Synchronization<br />

9<br />

12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!