13.01.2014 Views

Tree Poisson solver

Tree Poisson solver

Tree Poisson solver

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Tree</strong>-based self-gravity <strong>solver</strong><br />

R. Wünsch I. Berentzen<br />

A. P. Whitworth R. Banerjee<br />

Features:<br />

• two versions for Flash2 and Flash3<br />

• Barnes & Hut octal tree with monopole moments<br />

• works with 3D Cartezian coords, AMR tree needed<br />

• isolated and periodic boundaries<br />

◮ periodic: Ewald method<br />

• efficient MPI communication<br />

• interaction lists<br />

◮ nearby cells and tree nodes are not tested for opening angle criterion<br />

• ported to GPUs (by Ingo Berentzen)<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 1/26


Algorithm overview<br />

• Parameters:<br />

◮ tree limangle (θ lim ) . . . REAL . . . 1.0 - 0.5<br />

◮ tree ilist . . . INTEGER . . . 0 or 1<br />

• Algorithm: (as in Grid solve<strong>Poisson</strong>())<br />

if (grid changed .eq. 1) then<br />

call treeComBlkProperties() (nodetype, lrene, child, neigh, coords) 1%<br />

if (tree ilist .eq. 1) call treeFindNeighbours() 0%<br />

endif<br />

call gr treeBuild<strong>Tree</strong>() 3%<br />

call gr treeExchange<strong>Tree</strong>s() 3%<br />

call gr treePotential(idensvar, ipotvar) 93%<br />

call gr treeDestroy<strong>Tree</strong>() 0%<br />

◮ routines that include MPI communication marked red<br />

◮ relative times for a collapse of the BE sphere<br />

(512 CPUs, θ lim = 0.5, 76000 leaf blocks)<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 2/26


Communication of grid properties<br />

• treeComBlkProperties()<br />

◮ nodetype, lrene, child, neigh and cell coords in each block<br />

◮ memory: MAXBLOCKS × nCPUs × (16×INT + 27×REAL)<br />

◮ for MAXBLOCKS = 1000, nCPU = 1000 → 300 MB<br />

◮ only active values communicated<br />

• treeFindNeighbours() (needed by interaction lists)<br />

◮ nds 26 neighbours in all (incl. diagonal) directions<br />

◮ tr surbox(2, 27, MAXBLOCKS)<br />

→ records block number and cpu of each neighbour<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 3/26


Build <strong>Tree</strong><br />

1. build trees in blocks<br />

◮ octal tree with log 2 (nxb) levels<br />

2. communicate mass and mass<br />

centre pos. of all leaf blocks<br />

3. build Parent<strong>Tree</strong> on each CPU<br />

◮ Parent<strong>Tree</strong>(4, MAXBLOCKS, nCPU)<br />

3<br />

0<br />

2<br />

1<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 4/26


• for 8 × 8 × 8 blocks:<br />

Block tree in RAM<br />

level 0<br />

level 1<br />

m x mc y mc z mc<br />

1 2 3 4<br />

5 9 13 17 21 25 29 33<br />

level 2<br />

36 68 100 132 164 196 228 260<br />

level 3<br />

masses only (mc given by cell coordinates), 8 3<br />

= 512<br />

292 804<br />

<strong>Tree</strong> size = 8 L +4<br />

L−1<br />

∑<br />

i=0<br />

8 i = 8 L +4 8L − 1<br />

7<br />

L . . . number of the lowest level (e.g. 3 for 8 × 8 × 8 blocks)<br />

tree nodes identified by multi-index - integer array of size L: (l 1 , l 2 , l 3 ); l i =<br />

◮ 1-8 . . . number of node on i-th level<br />

◮ 0 . . . multi-index (i.e. node) is of level i-1<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 5/26


Communication of block trees<br />

1. determine tree levels to be sent<br />

◮ distance between a given block and all<br />

blocks on a given CPU<br />

2. communicate tree levels<br />

◮ all values for a given CPU packed into<br />

a single message<br />

3. allocate space for block trees<br />

from other CPUs<br />

◮ dynamic memory allocation to avoid<br />

wasting of memory<br />

4. communicate block trees<br />

◮ all block trees for a given CPU packed<br />

into a single message<br />

CPU 1<br />

level 3<br />

level 2<br />

level 1<br />

level 0<br />

CPU 0<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 6/26


• 5 types of iteractions:<br />

1. cell – cell<br />

2. cell – block tree node<br />

◮ θ lim checked for each cell<br />

3. cell – block<br />

<strong>Tree</strong> walk<br />

◮ θ lim checked once for the whole block<br />

4. cell – cell (with interaction lists)<br />

◮ distance pre-calculated<br />

5. cell – block tree node (with interaction lists)<br />

◮ θ lim criterion pre-calculated<br />

[ 02-13-2012 15:15:36.682 ] [TREE]: cell-cell distances: 0.548E+08, per zone: 111.4<br />

[ 02-13-2012 15:15:36.692 ] [TREE]: cell-node distances: 0.756E+09, per zone: 1537.1<br />

[ 02-13-2012 15:15:36.733 ] [TREE]: cell-block distances: 0.169E+08, per zone: 34.3<br />

[ 02-13-2012 15:15:36.773 ] [TREE]: IL cell-cell distances: 0.169E+09, per zone: 343.4<br />

[ 02-13-2012 15:15:36.803 ] [TREE]: IL cell-node distances: 0.461E+09, per zone: 938.4<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 7/26


Interaction lists<br />

• relative positions of cells and tree nodes are known<br />

• for each cell can be found:<br />

◮ list of cells with which it interacts<br />

◮ list of block tree nodes with which it interacts<br />

• lists do not depend on lrefine<br />

• for cells/nodes within a block and 26 surrounding blocks<br />

• makes tree walk faster by ∼ 25%<br />

◮ more ecient for smaller sims<br />

• costs memory:<br />

◮ 8×8×8 blocks:<br />

∼ 200 MB<br />

◮ 16×16×16 blocks: ∼ 700 MB<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 8/26


Test: Bonnor-Ebert sphere<br />

• mass: M = 1 M ⊙<br />

• temperature: T = 10 K (BES), T amb = 10 4 K (ambient)<br />

• unstable: ξ 0 = 10 (threshold value is 6.5)<br />

• radius: R = 0.041 pc<br />

• accuracy tests:<br />

◮ "uniform grid": lrene min = lrene max = 5<br />

→ 4096 leaf blocks<br />

◮ "AMR grid": lrene min = 1, lrene max = 5<br />

→ 1240 leaf blocks<br />

→ renement controlled by Jeans length<br />

• performance tests:<br />

◮ "AMR grid": lrene min = 1, lrene max = 8<br />

→ 76000 leaf blocks (run on 64 − 512 CPUs)<br />

◮ integrated for 10 time-steps<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 9/26


Flash 3: error in Φ<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 10/26


Flash 3: error in Φ (lref min = lref max = 5)<br />

log 10 [(Φ-Φ anl )/|Φ anl (0)|]<br />

0.025<br />

0.02<br />

0.015<br />

0.01<br />

0.005<br />

0<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

-0.005<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 11/26


Flash 3: error in Φ (lref min = lref max = 5)<br />

log 10 [(Φ-Φ anl )/|Φ anl (0)|]<br />

0.025<br />

0.02<br />

0.015<br />

0.01<br />

0.005<br />

0<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

<strong>Tree</strong>, θ lim = 0.0<br />

-0.005<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 12/26


Flash 3: error in F r (lref min = lref max = 5)<br />

log 10 [(F r -F r,anl )/|F r,anl |]<br />

0.1<br />

0.05<br />

0<br />

-0.05<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

-0.1<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 13/26


Flash 3: error in F r (lref min = lref max = 5)<br />

log 10 [(F r -F r,anl )/|F r,anl |]<br />

0.1<br />

0.05<br />

0<br />

-0.05<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

<strong>Tree</strong>, θ lim = 0.0<br />

-0.1<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 14/26


Flash 3: error in F r (lref min = 1, lref max = 6)<br />

log 10 [(F r -F r,anl )/|F r,anl |]<br />

0.1<br />

0.05<br />

0<br />

-0.05<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

-0.1<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 15/26


Flash 2: error in Φ (lref min = 1, lref max = 6)<br />

log 10 [(Φ-Φ anl )/|Φ anl (0)|]<br />

0.025<br />

0.02<br />

0.015<br />

0.01<br />

0.005<br />

0<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

-0.005<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 16/26


Flash 2: error in F r (lref min = 1, lref max = 6)<br />

0.3<br />

0.2<br />

log 10 [(F r -F r,anl )/|F r,anl |]<br />

0.1<br />

0<br />

-0.1<br />

-0.2<br />

-0.3<br />

<strong>Tree</strong>, θ lim = 1.0<br />

<strong>Tree</strong>, θ lim = 0.5<br />

<strong>Tree</strong>, θ lim = 0.2<br />

Multigrid, mpole_lmax = 0<br />

Multigrid, mpole_lmax = 8<br />

Multigrid, mpole_lmax = 15<br />

0 0.01 0.02 0.03 0.04 0.05<br />

r [pc]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 17/26


Flash 3, BES: time(nCPUs)<br />

350<br />

300<br />

Seconds per timestep<br />

250<br />

200<br />

150<br />

100<br />

50<br />

other<br />

com/gcell<br />

tree walk/fft<br />

hydro<br />

0<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

64 128 256 512<br />

Number of CPUs<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 18/26


Flash 3, BES: relative time<br />

1<br />

other<br />

com/gcell<br />

tree walk/fft<br />

hydro<br />

Seconds per timestep<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

64 128 256 512<br />

Number of CPUs<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 19/26


Flash 2, test: expanding shell<br />

M shell = 2 × 10 4 M ⊙<br />

-20<br />

T shell = 10 K<br />

-22<br />

R shell,0 = 10 pc<br />

-24<br />

V shell,0 = 2.2 km s −1<br />

-26<br />

R shell,max = 23 pc<br />

-28<br />

P ext = 10 −17 , 10 −13<br />

-30<br />

or 5 × 10 −13 dyne cm −2 -32<br />

log(ρ) [g/cm 3 ], (log(T) - 32) [K]<br />

ρ ∝ sech 2<br />

T = 10 4 K<br />

2.5<br />

1.5<br />

0.5<br />

T = 10 K<br />

0 5 10 15 20 25 30 0<br />

r [pc]<br />

log ρ<br />

log(T) - 32<br />

v<br />

2<br />

1<br />

v [km/s]<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 20/26


Flash 2, shell: time(nCPUs)<br />

70,000<br />

60,000<br />

Seconds per evolution<br />

50,000<br />

40,000<br />

30,000<br />

20,000<br />

10,000<br />

other<br />

com/gcell<br />

tree walk/fft<br />

hydro<br />

0<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

tree_1.0<br />

tree_0.5<br />

mg<br />

64 128 256<br />

Number of CPUs<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 21/26


Flash 3, BES: speedup<br />

speedup (64 x t 64 /t)<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

hydro<br />

multigrid<br />

tree, θ lim = 0.5<br />

tree, θ lim = 1.0<br />

64 128 256 512<br />

Number of CPUs<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 22/26


Flash 3: time(number of blocks)<br />

seconds per time-step<br />

1000<br />

100<br />

10<br />

1<br />

0.1<br />

0.01<br />

64 CPUs, lrefine_min = lrefine_max, ilist=1<br />

hydro<br />

tree, θ lim = 1.0<br />

N log(N)<br />

N<br />

0.001<br />

64 512 4096 32768<br />

Number of blocks<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 23/26


Flash 3, BES: time(θ lim )<br />

seconds per time-step<br />

10000<br />

1000<br />

100<br />

10<br />

1<br />

64 CPUs, lrefine_min = lrefine_max = 5, ilist=1<br />

θ lim<br />

-2<br />

θ lim<br />

-3<br />

0.1<br />

0.2 0.5 1<br />

θ lim<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 24/26


GPU version<br />

• tree walk ported to GPUs by Ingo Berentzen<br />

• 5-10 faster, comparable to hydro<br />

Future<br />

• elimination of MAXBLOCKS×nCPU size arrays<br />

• uniform grid<br />

• quadrupole (higher) moments<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 25/26


Download at:<br />

http://galaxy.asu.cas.cz/˜richard/tree-<strong>solver</strong>/


Download at:<br />

http://galaxy.asu.cas.cz/˜richard/tree-<strong>solver</strong>/<br />

Thank you!<br />

Richard Wünsch, Flash workshop at the Hamburg Observatory, 15th February 2012 26/26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!