Abstracts - Dipartimento di Elettronica Applicata
Abstracts - Dipartimento di Elettronica Applicata
Abstracts - Dipartimento di Elettronica Applicata
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Meta 2010 & FEM 2010 – Rome, 13-15 December 2010<br />
Implementation of a 3D Micromagnetic Code on a<br />
Parallel and Distributed Architecture<br />
Carlo Ragusa (1) , Bartolomeo Montrucchio (2) , Vittorio Giovara (2) ,<br />
Fiaz Khan (2) , Omar Khan (2) , Maurizio Repetto (1) (1, 3)<br />
, and Baochang Xie<br />
(1) Politecnico <strong>di</strong> Torino, Department of Electrical Engineering<br />
Torino, Italy – E-mail: carlo.ragusa@polito.it<br />
(2) Politecnico <strong>di</strong> Torino, Department of Control and Computer Engineering<br />
Torino, Italy – E-mail: bartolomeo.montrucchio@polito.it<br />
(3) Shanghai Jiaotong University (SJTU), Shanghai 200030,China<br />
We present the implementation of a full micromagnetic code developed on a low cost<br />
and low latency parallel and <strong>di</strong>stributed architecture based on OpenMP [1] and MPI<br />
over Infiniband [2]. Since the most time consuming part of a micromagnetic code is<br />
the magnetostatic field computation algorithm, many existing parallel<br />
implementations take advantage of Ethernet-based computer clusters [3]. Moreover,<br />
in recent years, the availability of low cost multi-core and<br />
multi-processor computers have enabled the parallelization of micromagnetic<br />
programs on shared memory computer systems [4]. In our approach we use a low<br />
latency Infiniband network coupled with a low cost multi processor, multi core<br />
cluster. The hardware architecture includes a 16 cores cluster composed by two<br />
double processor computers. The two computers are connected by means of<br />
Infiniband network cards that are <strong>di</strong>rectly connected together, without using a switch.<br />
The general implementation scheme is summed up in the following. As first, any<br />
standard sequential loop is parallelized to fully exploit all the eight cores<br />
each single machine can offer. By setting up proper shared/private variables lists, the<br />
loop is <strong>di</strong>vided among a given number of OpenMP threads and each carries out a<br />
portion of that iteration. Afterwards, the loop is split in two (n in the general<br />
case of a n nodes cluster) data sets, before executing OpenMP. Each part of the loop is<br />
submitted to a node of the cluster and separately executed. Eventually, at the end of<br />
the loop, data is exchanged back with MPI and merged so that the two (n) machines<br />
can continue working on complete arrays.<br />
References<br />
[1] R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, R. Menon, Parallel programming in<br />
OpenMP, Morgan Kaufmann Publishers, 2001<br />
[2] W. Gropp, E. Lusk, A. Skjellum, Using MPI - Portable parallel programming with the Message-<br />
Passing Interface, Scientific and Engineering computation series, The MIT Press, 1999<br />
[3] Y. Kanai, M. Saiki, K. Hirasawa, T. Tsukamomo, and K. Yoshida, IEEE Trans. on Magnetics, 44,<br />
1602, 2008<br />
[4] M.J. Donahue, “Parallelizing a micromagnetic program for use on multiprocessor shared memory<br />
computers” IEEE Trans. on Magnetics, 45, 3923-3925, 2009<br />
27