2H 2015
intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3 intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3
Memory Access Analysis New! Intel® VTune Amplifier 2016 New! Tune data structures for better performance • Attribute cache misses to data structures Bandwidth Analysis for Non-Uniform Memory • See Read & Write contributions to Total Bandwidth • Easier tuning of multi-socket bandwidth Seeing total bandwidth can suggest data blocking opportunities to change a bandwidth bound app into a compute bound app. 102
Scalable Profiling for MPI and Hybrid Clusters with MPI Performance Snapshot Lightweight – Low overhead profiling up to 32K Ranks Scalability- Performance variation at scale can be detected sooner Identifying Key Metrics – Shows PAPI counters and MPI/OpenMP* imbalances 103
- Page 51 and 52: Comparative Performance ANSYS Mecha
- Page 53 and 54: Comparative Performance Sandia Mant
- Page 55 and 56: Comparative Increase Autodesk Maya*
- Page 57 and 58: Comparative Performance OpenLB* Cyl
- Page 59 and 60: CLUSTER BENCHMARKS New Data Center
- Page 61 and 62: Comparative Performance Monte Carlo
- Page 63 and 64: Comparative Performance QuantLib* S
- Page 65 and 66: Comparative Performance Monte Carlo
- Page 67 and 68: Comparative Performance Monte Carlo
- Page 69 and 70: Comparative Performance Monte Carlo
- Page 71 and 72: Comparative Performance Xcelerit* L
- Page 73 and 74: Comparative Increase 1 0 Iso3DFD* 1
- Page 75 and 76: Comparative Performance Petrobras*
- Page 77 and 78: CLUSTER BENCHMARK Data Center Serve
- Page 79 and 80: Comparative Performance BerkeleyGW*
- Page 81 and 82: Comparative Performance ASKAP* tHog
- Page 83 and 84: Comparative Increase specfem3D 300K
- Page 85 and 86: CLUSTER BENCHMARK 6,400 NODES APPRO
- Page 87 and 88: Comparative Performance Gyrokinetic
- Page 89 and 90: Comparative Increase ROMS* Idealize
- Page 91 and 92: Comparative Performance NASA* OVERF
- Page 93 and 94: Improving speed and quality through
- Page 95 and 96: Comparative Performance Embree 2.2
- Page 97 and 98: Intel® Software Development Tools
- Page 99 and 100: Features and Configurations Intel®
- Page 101: Speedup Turn Big Data Into Informat
- Page 105 and 106: Bright Cluster Manager* Advanced Cl
- Page 107 and 108: Intel® Xeon Phi Coprocessor Develo
- Page 109 and 110: Intel® Developer Zone Join us on S
- Page 111 and 112: Recommended Links Getting Started:
- Page 113 and 114: Hardware Configuration - Intel® Xe
- Page 115 and 116: OPTIMIZATION NOTICE Optimization No
Scalable Profiling for MPI and Hybrid Clusters with<br />
MPI Performance Snapshot<br />
Lightweight – Low overhead<br />
profiling up to 32K Ranks<br />
Scalability- Performance<br />
variation at scale can be<br />
detected sooner<br />
Identifying Key Metrics –<br />
Shows PAPI counters and<br />
MPI/OpenMP* imbalances<br />
103