Auto-generating optimized CUDA for stencil ... - FEniCS Project
Auto-generating optimized CUDA for stencil ... - FEniCS Project
Auto-generating optimized CUDA for stencil ... - FEniCS Project
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
! Aims(programmer’s(productivity(and(high(per<strong>for</strong>mance(<br />
! Simplifies(application(development(<br />
! Based(on(a(modest(number(of(compiler(directives(<br />
› #pragma(mint(<strong>for</strong>(<br />
› Incremental(parallelization(<br />
! Abstracts(away(the(programmer’s(view(of((the(hardware(<br />
Seismic Modeling<br />
Cardiac Simulation<br />
Turbulent Flow<br />
Main Memory<br />
L2<br />
L2<br />
Mint<br />
Device Memory<br />
! SourceStoSsource(translator((<strong>for</strong>(the(Nvidia(GPUs(<br />
› Parallelizes(loop(nests(<br />
› Relieves(the(programmer(of(a(variety(of(tedious(tasks(<br />
(<br />
(<br />
C + directives<br />
Mint<br />
! MotifSspecific(autoSoptimizer(<br />
› Targets(<strong>stencil</strong>(methods((<br />
<strong>CUDA</strong><br />
› Incorporates(semantic(knowledge(to(compiler(analysis(<br />
› Per<strong>for</strong>ms(data(locality(optimizations(via(onSchip(memory(<br />
› Compiler(flags(<strong>for</strong>(per<strong>for</strong>mance(tuning(<br />
core<br />
core<br />
core<br />
core<br />
7<br />
(<br />
10<br />
Serial!code!<br />
!!!!Accelerated!Region!<br />
Data!parallel!<strong>for</strong>!<br />
Host!Region!<br />
Data!parallel!<strong>for</strong>!<br />
Host!!<br />
Thread!<br />
8<br />
11<br />
Serial!code!<br />
!!!!Accelerated!Region!<br />
Data!parallel!<strong>for</strong>!<br />
Host!Region!<br />
kernel<br />
……<br />
Block Block<br />
Device Memory<br />
! #pragma(mint(parallel(<br />
Accelerated Region<br />
› Indicates(the(accelerated(region(<br />
! #pragma(mint(<strong>for</strong>(<br />
› Marks(enclosed(loopSnest(<strong>for</strong>(acceleration(<br />
› 3(additional(clauses(<strong>for</strong>(optimizations(<br />
! #pragma(mint(copy(<br />
Data Transfer<br />
› Expresses(data(transfers(between(the(host(and(device(<br />
Data!parallel!<strong>for</strong>!<br />
Host!!<br />
Thread!<br />
Block<br />
……<br />
Block<br />
Block<br />
! #pragma(mint(single(<br />
› Handles(serial(section(<br />
! #pragma(mint(barrier(<br />
› Synchronizes(host(and(device(threads(<br />
Synchronization<br />
9<br />
12