2H 2015

intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3 intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3

07.12.2015 Views

Memory Access Analysis New! Intel® VTune Amplifier 2016 New! Tune data structures for better performance • Attribute cache misses to data structures Bandwidth Analysis for Non-Uniform Memory • See Read & Write contributions to Total Bandwidth • Easier tuning of multi-socket bandwidth Seeing total bandwidth can suggest data blocking opportunities to change a bandwidth bound app into a compute bound app. 102

Scalable Profiling for MPI and Hybrid Clusters with MPI Performance Snapshot Lightweight – Low overhead profiling up to 32K Ranks Scalability- Performance variation at scale can be detected sooner Identifying Key Metrics – Shows PAPI counters and MPI/OpenMP* imbalances 103

Scalable Profiling for MPI and Hybrid Clusters with<br />

MPI Performance Snapshot<br />

Lightweight – Low overhead<br />

profiling up to 32K Ranks<br />

Scalability- Performance<br />

variation at scale can be<br />

detected sooner<br />

Identifying Key Metrics –<br />

Shows PAPI counters and<br />

MPI/OpenMP* imbalances<br />

103

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!