07.12.2015 Views

2H 2015

intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3

intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Comparative Performance<br />

LAMMPS*<br />

Stillinger-Weber Water Benchmark<br />

32 NODES CLUSTER BENCHMARK<br />

APPROVED FOR PUBLIC PRESENTATION<br />

NEW<br />

3<br />

LAMMPS* Stillinger-Weber Water Benchmark Speed Up<br />

3X<br />

3.41X<br />

3.05X<br />

3.6X<br />

Application: LAMMPS*<br />

Description: Simulation of molecular systems with classical<br />

models. More at http://lammps.sandia.gov/<br />

Availability:<br />

• Code: In main LAMMPS repository.<br />

• Recipe: Available here.<br />

2<br />

1<br />

0<br />

1<br />

0.9X<br />

1 Node (256K molecules) 32 Nodes (8.2M molecules)<br />

2S Intel® Xeon® processor E5-2697 v3 (LAMMPS baseline)<br />

2S Intel® Xeon® processor E5-2697 v3 (LAMMPS IA Package)<br />

2S Xeon E5-2697 v3 + Tesla K40c*, boost off, ECC on<br />

2S Xeon E5-2697 v3 + Xeon Phi 7120A, turbo off (LAMMPS IA Package)<br />

“Xeon E5-2697 v3” = Intel® Xeon® processor E5-2697 v3<br />

“Xeon Phi 7120A” = Intel® Xeon Phi coprocessor 7120A<br />

1<br />

No<br />

testing<br />

on Tesla<br />

Usage Model: Load balancer offloads part of neighbor-list and<br />

non-bond force calculations to Intel® Xeon Phi coprocessor<br />

for concurrent calculations with CPU.<br />

Highlights: Improved results with Intel® Xeon® processor E5-<br />

2697 v3 and Intel® Xeon Phi coprocessor 7120A. Dynamic<br />

load balancing allows for concurrent:<br />

• Data transfer between host and coprocessor.<br />

• Calculations of neighbor-list, non-bond, bond, and longrange<br />

terms.<br />

Same routines in LAMMPS Intel Package also run faster on<br />

CPU.<br />

Results: Simulation rate increase with Intel® Package is up to<br />

3.6X. Concurrent Intel Xeon Phi coprocessor computations and<br />

MPI communications yield improved speedup and higher node<br />

counts.<br />

For configuration details, go here.<br />

SOURCE: INTEL MEASURED RESULTS AS OF MARCH, <strong>2015</strong><br />

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,<br />

components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated<br />

purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others<br />

15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!