14.01.2013 Views

Abstracts - Dipartimento di Elettronica Applicata

Abstracts - Dipartimento di Elettronica Applicata

Abstracts - Dipartimento di Elettronica Applicata

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Meta 2010 & FEM 2010 – Rome, 13-15 December 2010<br />

Implementation of a 3D Micromagnetic Code on a<br />

Parallel and Distributed Architecture<br />

Carlo Ragusa (1) , Bartolomeo Montrucchio (2) , Vittorio Giovara (2) ,<br />

Fiaz Khan (2) , Omar Khan (2) , Maurizio Repetto (1) (1, 3)<br />

, and Baochang Xie<br />

(1) Politecnico <strong>di</strong> Torino, Department of Electrical Engineering<br />

Torino, Italy – E-mail: carlo.ragusa@polito.it<br />

(2) Politecnico <strong>di</strong> Torino, Department of Control and Computer Engineering<br />

Torino, Italy – E-mail: bartolomeo.montrucchio@polito.it<br />

(3) Shanghai Jiaotong University (SJTU), Shanghai 200030,China<br />

We present the implementation of a full micromagnetic code developed on a low cost<br />

and low latency parallel and <strong>di</strong>stributed architecture based on OpenMP [1] and MPI<br />

over Infiniband [2]. Since the most time consuming part of a micromagnetic code is<br />

the magnetostatic field computation algorithm, many existing parallel<br />

implementations take advantage of Ethernet-based computer clusters [3]. Moreover,<br />

in recent years, the availability of low cost multi-core and<br />

multi-processor computers have enabled the parallelization of micromagnetic<br />

programs on shared memory computer systems [4]. In our approach we use a low<br />

latency Infiniband network coupled with a low cost multi processor, multi core<br />

cluster. The hardware architecture includes a 16 cores cluster composed by two<br />

double processor computers. The two computers are connected by means of<br />

Infiniband network cards that are <strong>di</strong>rectly connected together, without using a switch.<br />

The general implementation scheme is summed up in the following. As first, any<br />

standard sequential loop is parallelized to fully exploit all the eight cores<br />

each single machine can offer. By setting up proper shared/private variables lists, the<br />

loop is <strong>di</strong>vided among a given number of OpenMP threads and each carries out a<br />

portion of that iteration. Afterwards, the loop is split in two (n in the general<br />

case of a n nodes cluster) data sets, before executing OpenMP. Each part of the loop is<br />

submitted to a node of the cluster and separately executed. Eventually, at the end of<br />

the loop, data is exchanged back with MPI and merged so that the two (n) machines<br />

can continue working on complete arrays.<br />

References<br />

[1] R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, R. Menon, Parallel programming in<br />

OpenMP, Morgan Kaufmann Publishers, 2001<br />

[2] W. Gropp, E. Lusk, A. Skjellum, Using MPI - Portable parallel programming with the Message-<br />

Passing Interface, Scientific and Engineering computation series, The MIT Press, 1999<br />

[3] Y. Kanai, M. Saiki, K. Hirasawa, T. Tsukamomo, and K. Yoshida, IEEE Trans. on Magnetics, 44,<br />

1602, 2008<br />

[4] M.J. Donahue, “Parallelizing a micromagnetic program for use on multiprocessor shared memory<br />

computers” IEEE Trans. on Magnetics, 45, 3923-3925, 2009<br />

27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!