2H 2015

More documents

Recommendations

Info

Comparative Performance LAMMPS* Rhodopsin Benchmark; 512K Atoms LAMMPS* Rhodopsin Benchmark Performance (Mixed Precision); Includes External NVIDIA* Results 2/K20X + 1S AMD* 1 0 1 Node 32 Nodes 2S Intel® Xeon® processor E5-2697v2 (LAMMPS Baseline) 2S Intel® Xeon® processor E5-2697v2 (LAMMPS IA Package) 2S E5-2697v2 + Intel® Xeon Phi coprocessor 7120A Turbo Off (LAMMPS IA Package) Cray XK7: 1S AMD Opteron* 6274 + NVIDIA Tesla* K20X; Cray Gemini* Interconnect, PCIe* 2.0 (LAMMPS GPU Package) http://www.nvidia.com/docs/IO/122634/computational-chemistrybenchmarks.pdf) For configuration details, go here. 1.21X 1.75X 1 1 .9X 1.2X 1.72X 1.22X SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2014 CLUSTER BENCHMARK Application: LAMMPS* 32 NODES Description: Simulation of molecular systems with classical models. Wide variety of academic, government, and industry users. Popular due to its versatility and support for a wide range of forcefields/potential models: Materials Science, Chemistry, Biophysics, Solid Mechanics, Granular Flow, etc. More at http://lammps.sandia.gov/ Availability: • Code: In main LAMMPS repository. • Recipe: Available here. APPROVED FOR PUBLIC PRESENTATION Usage Model: Load balancer offloads part of neighbor-list and nonbond force calculations to Intel® Xeon Phi coprocessor for concurrent calculations with CPU. Highlights: Improved results with Intel® Xeon® processor E5-2697 v2 and Intel Xeon Phi coprocessor 7120A. Dynamic load balancing allows for concurrent: • Data transfer between host and coprocessor. • Calculations of neighbor-list, non-bond, bond, and long-range terms. Same routines in LAMMPS Intel Package also run faster on CPU. Results: Up to 1.75X performance improvement utilizing Intel® Xeon® processors and Intel® Xeon Phi coprocessors with application optimization on a single node compared to the baseline configuration. Performance gains continue to hold at 1.72X when scaling up to 32 nodes, out-performing the alternative configuration. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others 30
Comparative Performance LAMMPS* Production Protein Sim.; 474K Atoms 1 0 LAMMPS* Production Protein Simulation Performance (Mixed Precision) 1 1.18X 1.9X 1 Node 32 Nodes 2S Intel® Xeon® processor E5-2697 v2 (LAMMPS Baseline) 2S Intel® Xeon® processor E5-2697 v2 (LAMMPS IA Package) 2S E5-2697 v2 + Intel® Xeon Phi coprocessor 7120A Turbo Off (LAMMPS IA Package) 1 1.07X 1.7X CLUSTER BENCHMARK Application: LAMMPS* 32 NODES Description: Simulation of molecular systems with classical models. Wide variety of academic, government, and industry users. Popular due to its versatility and support for a wide range of forcefields/potential models: Materials Science, Chemistry, Biophysics, Solid Mechanics, Granular Flow, etc. More at http://lammps.sandia.gov/. Availability: • Code: In main LAMMPS repository. • Recipe: Available here. APPROVED FOR PUBLIC PRESENTATION Usage Model: Load balancer offloads part of neighbor-list and nonbond force calculations to Intel® Xeon Phi coprocessor for concurrent calculations with CPU. Highlights: Improved results with Intel® Xeon® processor E5-2697 v2 and Intel Xeon Phi coprocessor 7120A. Dynamic load balancing allows for concurrent: • Data transfer between host and coprocessor. • Calculations of neighbor-list, non-bond, bond, and long-range terms. Same routines in LAMMPS Intel Package also run faster on CPU. Results: Up to 1.9X performance improvement utilizing Intel® Xeon® processors and Intel® Xeon Phi coprocessors with application optimization on a single node compared to the baseline configuration. Performance at 4.84X when scaling up to 32 nodes. For configuration details, go here. SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2014 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others 31
Page 1 and 2: 2H 2015
Page 3 and 4: Intel® Modern Code Developer Chall
Page 5 and 6: New or Updated Proof Points NEW pro
Page 7 and 8: Intel® Xeon Phi Coprocessors Softw
Page 9 and 10: Intel® Xeon® Processor E5-2697 v2
Page 11 and 12: Memory Capacity (GB) Memory Compari
Page 13 and 14: A Growing Ecosystem: The Intel® Xe
Page 15 and 16: Comparative Performance LAMMPS* Sti
Page 17 and 18: Comparative Performance Johns Hopki
Page 19 and 20: Comparative Performance 1 0 BLAST*
Page 21 and 22: Comparative Performance NAMD* 2.10
Page 23 and 24: Comparative Performance LAMMPS* Liq
Page 25 and 26: Comparative Performance LAMMPS* Rho
Page 27 and 28: Comparative Performance LAMMPS* Liq
Page 29: Comparative Performance LAMMPS* Rho
Page 33 and 34: Comparative Performance AMBER* 14 P
Page 35 and 36: Comparative Performance AMBER* 14 P
Page 37 and 38: Comparative Performance Burrows-Whe
Page 39 and 40: Comparative Performance NWChem* CCS
Page 41 and 42: Discover and design like never befo
Page 43 and 44: Comparative Performance miniGhost*
Page 45 and 46: Comparative Performance Quantum ESP
Page 47 and 48: Comparative Performance ANSYS Mecha
Page 53 and 54: Comparative Performance Sandia Mant
Page 55 and 56: Comparative Increase Autodesk Maya*
Page 57 and 58: Comparative Performance OpenLB* Cyl
Page 59 and 60: CLUSTER BENCHMARKS New Data Center
Page 61 and 62: Comparative Performance Monte Carlo
Page 63 and 64: Comparative Performance QuantLib* S
Page 71 and 72: Comparative Performance Xcelerit* L
Page 73 and 74: Comparative Increase 1 0 Iso3DFD* 1
Page 75 and 76: Comparative Performance Petrobras*
Page 77 and 78: CLUSTER BENCHMARK Data Center Serve
Page 79 and 80: Comparative Performance BerkeleyGW*
Page 81 and 82:
Comparative Performance ASKAP* tHog
Page 83 and 84:
Comparative Increase specfem3D 300K
Page 85 and 86:
CLUSTER BENCHMARK 6,400 NODES APPRO
Page 87 and 88:
Comparative Performance Gyrokinetic
Page 89 and 90:
Comparative Increase ROMS* Idealize
Page 91 and 92:
Comparative Performance NASA* OVERF
Page 93 and 94:
Improving speed and quality through
Page 95 and 96:
Comparative Performance Embree 2.2
Page 97 and 98:
Intel® Software Development Tools
Page 99 and 100:
Features and Configurations Intel®
Page 101 and 102:
Speedup Turn Big Data Into Informat
Page 103 and 104:
Scalable Profiling for MPI and Hybr
Page 105 and 106:
Bright Cluster Manager* Advanced Cl
Page 107 and 108:
Intel® Xeon Phi Coprocessor Develo
Page 109 and 110:
Intel® Developer Zone Join us on S
Page 111 and 112:
Recommended Links Getting Started:
Page 113 and 114:
Hardware Configuration - Intel® Xe
Page 115 and 116:
OPTIMIZATION NOTICE Optimization No
show all

2H 2015

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?