07.01.2013 Views

3D graphics eBook - Course Materials Repository

3D graphics eBook - Course Materials Repository

3D graphics eBook - Course Materials Repository

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Stencil codes 199<br />

Implementation Issues<br />

Many simulation codes may be formulated naturally as stencil codes. Since computing time and memory<br />

consumption grow linearly wth the number of array elements, parallel implementations of stencil codes are of<br />

paramount importance to research. [6] This is challenging since the computations are tightly coupled (because of the<br />

cell updates depending on neighboring cells) and most stencil codes are memory bound (i.e. the ratio of memory<br />

accesses and calculations is high). [7] Virtually all current parallel architectures have been explored for executing<br />

stencil codes efficiently [8] ; at the moment GPGPUs have proven to be most efficient. [9]<br />

Libraries<br />

Due to both, the importance of stencil codes to computer simulations and their high computational requirements,<br />

there are a number of efforts which aim at creating reusable libraries to support scientists in implementing new<br />

stencil codes. The libraries are mostly concerned with the parallelization, but may also tackle other challenges, such<br />

as IO, steering and checkpointing. They may be classified by their API.<br />

Patch-Based Libraries<br />

This is a traditional design. The library manages a set of n-dimensional scalar arrays, which the user code may access<br />

to perform updates. The library handles the synchronization of the boundaries (dubbed ghost zone or halo). The<br />

advantage of this interface is that the user code may loop over the arrays, which makes it easy to integrate legacy<br />

codes [10] . The disadvantage is that the library can not handle cache blocking (as this has to be done within the<br />

loops [11] ) or wrapping of the code for accelerators (e.g. via CUDA or OpenCL). Notable implementations include<br />

Cactus [12] , a physics problem solving environment, and waLBerla [13] .<br />

Cell-Based Libraries<br />

These libraries move the interface to updating single simulation cells: only the current cell and its neighbors are<br />

exposed to the user code, e.g. via getter/setter methods. The advantage of this approach is that the library can control<br />

tightly which cells are updated in which order, which is useful not only to implement cache blocking, [9] but also to<br />

run the same code on multi-cores and GPUs. [14] This approach requires the user to recompile his source code<br />

together with the library. Otherwise a function call for every cell update would be required, which would seriously<br />

impair performance. This is only feasible with techniques such as class templates or metaprogramming, which is also<br />

the reason why this design is only found in newer libraries. Examples are Physis [15] and LibGeoDecomp [16] .<br />

References<br />

[1] Roth, Gerald et al. (1997) Proceedings of SC'97: High Performance Networking and Computing. Compiling Stencils in High Performance<br />

Fortran. (http:/ / citeseer. ist. psu. edu/ viewdoc/ summary?doi=10. 1. 1. 53. 1505)<br />

[2] Sloot, Peter M.A. et al. (May 28, 2002) Computational Science - ICCS 2002: International Conference, Amsterdam, The Netherlands, April<br />

21-24, 2002. Proceedings, Part I. (http:/ / books. google. com/ books?id=qVcLw1UAFUsC& pg=PA843& dq=stencil+ array&<br />

sig=g3gYXncOThX56TUBfHE7hnlSxJg#PPA843,M1) Page 843. Publisher: Springer. ISBN 3540435913.<br />

[3] Fey, Dietmar et al. (2010) Grid-Computing: Eine Basistechnologie für Computational Science (http:/ / books. google. com/<br />

books?id=RJRZJHVyQ4EC& pg=PA51& dq=fey+ grid& hl=de& ei=uGk8TtDAAo_zsgbEoZGpBQ& sa=X& oi=book_result& ct=result&<br />

resnum=1& ved=0CCoQ6AEwAA#v=onepage& q& f=true).<br />

Page 439. Publisher: Springer. ISBN 3540797467<br />

[4] Yang, Laurence T.; Guo, Minyi. (August 12, 2005) High-Performance Computing : Paradigm and Infrastructure. (http:/ / books. google.<br />

com/ books?id=qA4DbnFB2XcC& pg=PA221& dq=Stencil+ codes& as_brr=3& sig=H8wdKyABXT5P7kUh4lQGZ9C5zDk) Page 221.<br />

Publisher: Wiley-Interscience. ISBN 047165471X<br />

[5] Micikevicius, Paulius et al. (2009) <strong>3D</strong> finite difference computation on GPUs using CUDA (http:/ / portal. acm. org/ citation.<br />

cfm?id=1513905) Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units ISBN: 978-1-60558-517-8<br />

[6] Datta, Kaushik (2009) Auto-tuning Stencil Codes for Cache-Based Multicore Platforms (http:/ / www. cs. berkeley. edu/ ~kdatta/ pubs/<br />

EECS-2009-177. pdf), Ph.D. Thesis

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!