2H 2015

intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3 intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3

07.12.2015 Views

Comparative Performance LAMMPS* Rhodopsin Benchmark; 512K Atoms LAMMPS* Rhodopsin Benchmark Performance (Mixed Precision); Includes External NVIDIA* Results 2/K20X + 1S AMD* 1 0 1 Node 32 Nodes 2S Intel® Xeon® processor E5-2697v2 (LAMMPS Baseline) 2S Intel® Xeon® processor E5-2697v2 (LAMMPS IA Package) 2S E5-2697v2 + Intel® Xeon Phi coprocessor 7120A Turbo Off (LAMMPS IA Package) Cray XK7: 1S AMD Opteron* 6274 + NVIDIA Tesla* K20X; Cray Gemini* Interconnect, PCIe* 2.0 (LAMMPS GPU Package) http://www.nvidia.com/docs/IO/122634/computational-chemistrybenchmarks.pdf) For configuration details, go here. 1.21X 1.75X 1 1 .9X 1.2X 1.72X 1.22X SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2014 CLUSTER BENCHMARK Application: LAMMPS* 32 NODES Description: Simulation of molecular systems with classical models. Wide variety of academic, government, and industry users. Popular due to its versatility and support for a wide range of forcefields/potential models: Materials Science, Chemistry, Biophysics, Solid Mechanics, Granular Flow, etc. More at http://lammps.sandia.gov/ Availability: • Code: In main LAMMPS repository. • Recipe: Available here. APPROVED FOR PUBLIC PRESENTATION Usage Model: Load balancer offloads part of neighbor-list and nonbond force calculations to Intel® Xeon Phi coprocessor for concurrent calculations with CPU. Highlights: Improved results with Intel® Xeon® processor E5-2697 v2 and Intel Xeon Phi coprocessor 7120A. Dynamic load balancing allows for concurrent: • Data transfer between host and coprocessor. • Calculations of neighbor-list, non-bond, bond, and long-range terms. Same routines in LAMMPS Intel Package also run faster on CPU. Results: Up to 1.75X performance improvement utilizing Intel® Xeon® processors and Intel® Xeon Phi coprocessors with application optimization on a single node compared to the baseline configuration. Performance gains continue to hold at 1.72X when scaling up to 32 nodes, out-performing the alternative configuration. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others 30

Comparative Performance LAMMPS* Production Protein Sim.; 474K Atoms 1 0 LAMMPS* Production Protein Simulation Performance (Mixed Precision) 1 1.18X 1.9X 1 Node 32 Nodes 2S Intel® Xeon® processor E5-2697 v2 (LAMMPS Baseline) 2S Intel® Xeon® processor E5-2697 v2 (LAMMPS IA Package) 2S E5-2697 v2 + Intel® Xeon Phi coprocessor 7120A Turbo Off (LAMMPS IA Package) 1 1.07X 1.7X CLUSTER BENCHMARK Application: LAMMPS* 32 NODES Description: Simulation of molecular systems with classical models. Wide variety of academic, government, and industry users. Popular due to its versatility and support for a wide range of forcefields/potential models: Materials Science, Chemistry, Biophysics, Solid Mechanics, Granular Flow, etc. More at http://lammps.sandia.gov/. Availability: • Code: In main LAMMPS repository. • Recipe: Available here. APPROVED FOR PUBLIC PRESENTATION Usage Model: Load balancer offloads part of neighbor-list and nonbond force calculations to Intel® Xeon Phi coprocessor for concurrent calculations with CPU. Highlights: Improved results with Intel® Xeon® processor E5-2697 v2 and Intel Xeon Phi coprocessor 7120A. Dynamic load balancing allows for concurrent: • Data transfer between host and coprocessor. • Calculations of neighbor-list, non-bond, bond, and long-range terms. Same routines in LAMMPS Intel Package also run faster on CPU. Results: Up to 1.9X performance improvement utilizing Intel® Xeon® processors and Intel® Xeon Phi coprocessors with application optimization on a single node compared to the baseline configuration. Performance at 4.84X when scaling up to 32 nodes. For configuration details, go here. SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2014 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others 31

Comparative Performance<br />

LAMMPS*<br />

Production Protein Sim.; 474K Atoms<br />

1<br />

0<br />

LAMMPS* Production Protein Simulation Performance<br />

(Mixed Precision)<br />

1<br />

1.18X<br />

1.9X<br />

1 Node 32 Nodes<br />

2S Intel® Xeon® processor E5-2697 v2 (LAMMPS Baseline)<br />

2S Intel® Xeon® processor E5-2697 v2 (LAMMPS IA Package)<br />

2S E5-2697 v2 + Intel® Xeon Phi coprocessor 7120A Turbo Off (LAMMPS<br />

IA Package)<br />

1<br />

1.07X<br />

1.7X<br />

CLUSTER BENCHMARK<br />

Application: LAMMPS*<br />

32 NODES<br />

Description: Simulation of molecular systems with classical models.<br />

Wide variety of academic, government, and industry users. Popular<br />

due to its versatility and support for a wide range of forcefields/potential<br />

models: Materials Science, Chemistry, Biophysics,<br />

Solid Mechanics, Granular Flow, etc. More at<br />

http://lammps.sandia.gov/.<br />

Availability:<br />

• Code: In main LAMMPS repository.<br />

• Recipe: Available here.<br />

APPROVED FOR PUBLIC PRESENTATION<br />

Usage Model: Load balancer offloads part of neighbor-list and nonbond<br />

force calculations to Intel® Xeon Phi coprocessor for<br />

concurrent calculations with CPU.<br />

Highlights: Improved results with Intel® Xeon® processor E5-2697<br />

v2 and Intel Xeon Phi coprocessor 7120A. Dynamic load balancing<br />

allows for concurrent:<br />

• Data transfer between host and coprocessor.<br />

• Calculations of neighbor-list, non-bond, bond, and long-range<br />

terms.<br />

Same routines in LAMMPS Intel Package also run faster on CPU.<br />

Results: Up to 1.9X performance improvement utilizing Intel® Xeon®<br />

processors and Intel® Xeon Phi coprocessors with application<br />

optimization on a single node compared to the baseline<br />

configuration. Performance at 4.84X when scaling up to 32 nodes.<br />

For configuration details, go here.<br />

SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2014<br />

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,<br />

components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated<br />

purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others<br />

31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!