Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

— skipped —SUN: Running hybrid on Sun ConstellationCluster RangerSUN: NPB-MZ Class E Scalability on Ranger• Highly hierarchical• Shared Memory:– Cache-coherent, Nonuniformmemory access(ccNUMA) 16-way Node(Blade)• Distributed memory:– Network of ccNUMA blades• Core-to-Core• Socket-to-Socket• Blade-to-Blade• Chassis-to-ChassisHybrid Parallel ProgrammingSlide21/154Rabenseifner, Hager, Jost networkMFlop/s5000000450000040000003500000300000025000002000000150000010000005000000Hybrid Parallel ProgrammingSlide22/154NPB-MZ Class E Scalability on Sun ConstellationSP-MZ (MPI)SP-MZ MPI+OpenMPBT-MZ (MPI)BT-MZ MPI+OpenMP1024 2048 4096 8192core#• Scalability in Mflops• MPI/OpenMP outperforms pure MPI• Use of numactl essential to achieve scalabilityRabenseifner, Hager, JostBTSignificant improvement(235%):Load-balancingissues solved withMPI+OpenMPSPPure MPI is alreadyload-balanced.But hybrid9.6% faster, due tosmaller messagerate at NICCannot be build for8192 processes!Hybrid:SP: still scalesBT: does not scaleNUMA Control: Process Placement• Affinity and Policy can be changed externally through numactl atthe socket and core level.Command: numactl ./a.outnumactl -N 1 ./a.outnumactl –c 0,1 ./a.outNUMA Operations: Memory Placementnumactl -N 1 -l ./a.outMemory allocation:• MPI– local allocation is best• OpenMP– Interleave best for large, completelyshared arrays that are randomlyaccessed by different threads– local best for private arrays• Once allocated,a memory-structure is fixedHybrid Parallel ProgrammingSlide23/154Rabenseifner, Hager, JostHybrid Parallel ProgrammingSlide24/154Rabenseifner, Hager, Jost

Previous page

Next page

1

2

3

5

6

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

Hybrid MPI and OpenMP programming tutorial - Prace Training Portal

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?