3D graphics eBook - Course Materials Repository
3D graphics eBook - Course Materials Repository
3D graphics eBook - Course Materials Repository
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Stencil codes 199<br />
Implementation Issues<br />
Many simulation codes may be formulated naturally as stencil codes. Since computing time and memory<br />
consumption grow linearly wth the number of array elements, parallel implementations of stencil codes are of<br />
paramount importance to research. [6] This is challenging since the computations are tightly coupled (because of the<br />
cell updates depending on neighboring cells) and most stencil codes are memory bound (i.e. the ratio of memory<br />
accesses and calculations is high). [7] Virtually all current parallel architectures have been explored for executing<br />
stencil codes efficiently [8] ; at the moment GPGPUs have proven to be most efficient. [9]<br />
Libraries<br />
Due to both, the importance of stencil codes to computer simulations and their high computational requirements,<br />
there are a number of efforts which aim at creating reusable libraries to support scientists in implementing new<br />
stencil codes. The libraries are mostly concerned with the parallelization, but may also tackle other challenges, such<br />
as IO, steering and checkpointing. They may be classified by their API.<br />
Patch-Based Libraries<br />
This is a traditional design. The library manages a set of n-dimensional scalar arrays, which the user code may access<br />
to perform updates. The library handles the synchronization of the boundaries (dubbed ghost zone or halo). The<br />
advantage of this interface is that the user code may loop over the arrays, which makes it easy to integrate legacy<br />
codes [10] . The disadvantage is that the library can not handle cache blocking (as this has to be done within the<br />
loops [11] ) or wrapping of the code for accelerators (e.g. via CUDA or OpenCL). Notable implementations include<br />
Cactus [12] , a physics problem solving environment, and waLBerla [13] .<br />
Cell-Based Libraries<br />
These libraries move the interface to updating single simulation cells: only the current cell and its neighbors are<br />
exposed to the user code, e.g. via getter/setter methods. The advantage of this approach is that the library can control<br />
tightly which cells are updated in which order, which is useful not only to implement cache blocking, [9] but also to<br />
run the same code on multi-cores and GPUs. [14] This approach requires the user to recompile his source code<br />
together with the library. Otherwise a function call for every cell update would be required, which would seriously<br />
impair performance. This is only feasible with techniques such as class templates or metaprogramming, which is also<br />
the reason why this design is only found in newer libraries. Examples are Physis [15] and LibGeoDecomp [16] .<br />
References<br />
[1] Roth, Gerald et al. (1997) Proceedings of SC'97: High Performance Networking and Computing. Compiling Stencils in High Performance<br />
Fortran. (http:/ / citeseer. ist. psu. edu/ viewdoc/ summary?doi=10. 1. 1. 53. 1505)<br />
[2] Sloot, Peter M.A. et al. (May 28, 2002) Computational Science - ICCS 2002: International Conference, Amsterdam, The Netherlands, April<br />
21-24, 2002. Proceedings, Part I. (http:/ / books. google. com/ books?id=qVcLw1UAFUsC& pg=PA843& dq=stencil+ array&<br />
sig=g3gYXncOThX56TUBfHE7hnlSxJg#PPA843,M1) Page 843. Publisher: Springer. ISBN 3540435913.<br />
[3] Fey, Dietmar et al. (2010) Grid-Computing: Eine Basistechnologie für Computational Science (http:/ / books. google. com/<br />
books?id=RJRZJHVyQ4EC& pg=PA51& dq=fey+ grid& hl=de& ei=uGk8TtDAAo_zsgbEoZGpBQ& sa=X& oi=book_result& ct=result&<br />
resnum=1& ved=0CCoQ6AEwAA#v=onepage& q& f=true).<br />
Page 439. Publisher: Springer. ISBN 3540797467<br />
[4] Yang, Laurence T.; Guo, Minyi. (August 12, 2005) High-Performance Computing : Paradigm and Infrastructure. (http:/ / books. google.<br />
com/ books?id=qA4DbnFB2XcC& pg=PA221& dq=Stencil+ codes& as_brr=3& sig=H8wdKyABXT5P7kUh4lQGZ9C5zDk) Page 221.<br />
Publisher: Wiley-Interscience. ISBN 047165471X<br />
[5] Micikevicius, Paulius et al. (2009) <strong>3D</strong> finite difference computation on GPUs using CUDA (http:/ / portal. acm. org/ citation.<br />
cfm?id=1513905) Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units ISBN: 978-1-60558-517-8<br />
[6] Datta, Kaushik (2009) Auto-tuning Stencil Codes for Cache-Based Multicore Platforms (http:/ / www. cs. berkeley. edu/ ~kdatta/ pubs/<br />
EECS-2009-177. pdf), Ph.D. Thesis