2H 2015
intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3 intel-xeon-phi-sw-ecosystem-guide-2h-2015-public3
Comparative Performance 1 0 BLAST* BLASTn v.30 1 2S Intel® Xeon® processor E5-2697 v2 (BLASTn v.30 baseline) 2S Xeon E5-2697 v2 + Intel® Xeon Phi coprocessor 7120A 2S Xeon E5-2697 v2 + Xeon Phi 7120A (OFS parallelized) 2S Intel® Xeon® processor E5-2697 v3 (BLASTn v.30 baseline) 2S Xeon E5-2697 v3 + Intel® Xeon Phi coprocessor 7120A 2S Xeon E5-2697 v3 + Xeon Phi 7120A (OFS parallelized) “Xeon E5-2697 v2/v3” = Intel® Xeon® processor E5-2697 v2/v3 “Xeon Phi 7120A” = Intel® Xeon Phi coprocessor 7120A For configuration details, go here. BLASTn* v.30 Speed Up 1.26X 1.41X 1.22X 1.49X 1.52X SOURCE: INTEL MEASURED RESULTS AS OF MARCH, 2015 Application: Basic Local Alignment Search Tool (BLASTn) v.30. Description: Searching for alignment in nucleotide query sequences against a known nucleotide db volume set. National Center for Biotechnology Information (NCBI*). More at http://blast.ncbi.nlm.nih.gov/. Availability: • Code: Available here. • Recipe: Available here. Usage Model: #4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna.00-02 are distributed to the Intel® Xeon® processor and Intel® Xeon Phi coprocessor for maximum speedup sweet spot. Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 80/20 and 59/23/18. Highlights: Throughput for this load sharing model has a small sweet spot for a sufficiently large query set. Results: Compared to the baseline, simulation rate speed up with Intel® Xeon® processor E5-2697 v3 and Intel® Xeon Phi coprocessor 7120A heterogeneous model is 1.52X. Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others . 1 NODE APPROVED FOR PUBLIC PRESENTATION NEW 18
Comparative Performance 1 0 BLAST* BLASTp v.30 1 2S Intel® Xeon® processor E5-2697 v2 (BLASTp v.30 baseline) 2S Xeon E5-2697 v2 + Intel® Xeon Phi coprocessor 7120A 2S Xeon E5-2697 v2 + Xeon Phi 7120A (OFS parallelized) 2S Intel® Xeon® processor E5-2697 v3 (BLASTp v.30 baseline) 2S Xeon E5-2697 v3 + Intel® Xeon Phi coprocessor 7120A 2S Xeon E5-2697 v3 + Xeon Phi 7120A (OFS parallelized) For configuration details, go here. BLASTp* v.30 Speed Up 1.15X 1.3X 1.21X 1.39X “Xeon E5-2697 v2/v3” = Intel® Xeon® processor E5-2697 v2/v3 “Xeon Phi 7120A” = Intel® Xeon Phi coprocessor 7120A 1.41X SOURCE: INTEL MEASURED RESULTS AS OF MARCH, 2015 Application: Basic Local Alignment Search Tool (BLASTp) v.30 Description: Searching for alignment in protein query sequence against a known protein db volume set. More at http://blast.ncbi.nlm.nih.gov/. Availability: • Code: Available here. • Recipe: Available here. • Usage Model: #4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted.00-02 are distributed to Intel® Xeon® processor and Intel® Xeon Phi coprocessor for maximum speedup sweet spot. Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 33/7 and 28/5/7. Highlights: Throughput for this offload model has a small sweet spot for a sufficiently large query set. Throughput is limited due to GAT stage not parallelized. Results: Compared to the baseline, simulation rate speed up with Intel® Xeon® processor E5-2697 v3 and Intel® Xeon Phi coprocessor 7120A heterogeneous model is 1.41X. Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others . 1 NODE APPROVED FOR PUBLIC PRESENTATION NEW 19
- Page 1 and 2: 2H 2015
- Page 3 and 4: Intel® Modern Code Developer Chall
- Page 5 and 6: New or Updated Proof Points NEW pro
- Page 7 and 8: Intel® Xeon Phi Coprocessors Softw
- Page 9 and 10: Intel® Xeon® Processor E5-2697 v2
- Page 11 and 12: Memory Capacity (GB) Memory Compari
- Page 13 and 14: A Growing Ecosystem: The Intel® Xe
- Page 15 and 16: Comparative Performance LAMMPS* Sti
- Page 17: Comparative Performance Johns Hopki
- Page 21 and 22: Comparative Performance NAMD* 2.10
- Page 23 and 24: Comparative Performance LAMMPS* Liq
- Page 25 and 26: Comparative Performance LAMMPS* Rho
- Page 27 and 28: Comparative Performance LAMMPS* Liq
- Page 29 and 30: Comparative Performance LAMMPS* Rho
- Page 31 and 32: Comparative Performance LAMMPS* Pro
- Page 33 and 34: Comparative Performance AMBER* 14 P
- Page 35 and 36: Comparative Performance AMBER* 14 P
- Page 37 and 38: Comparative Performance Burrows-Whe
- Page 39 and 40: Comparative Performance NWChem* CCS
- Page 41 and 42: Discover and design like never befo
- Page 43 and 44: Comparative Performance miniGhost*
- Page 45 and 46: Comparative Performance Quantum ESP
- Page 47 and 48: Comparative Performance ANSYS Mecha
- Page 49 and 50: Comparative Performance ANSYS Mecha
- Page 51 and 52: Comparative Performance ANSYS Mecha
- Page 53 and 54: Comparative Performance Sandia Mant
- Page 55 and 56: Comparative Increase Autodesk Maya*
- Page 57 and 58: Comparative Performance OpenLB* Cyl
- Page 59 and 60: CLUSTER BENCHMARKS New Data Center
- Page 61 and 62: Comparative Performance Monte Carlo
- Page 63 and 64: Comparative Performance QuantLib* S
- Page 65 and 66: Comparative Performance Monte Carlo
- Page 67 and 68: Comparative Performance Monte Carlo
Comparative Performance<br />
1<br />
0<br />
BLAST*<br />
BLASTn v.30<br />
1<br />
2S Intel® Xeon® processor E5-2697 v2 (BLASTn v.30 baseline)<br />
2S Xeon E5-2697 v2 + Intel® Xeon Phi coprocessor 7120A<br />
2S Xeon E5-2697 v2 + Xeon Phi 7120A (OFS parallelized)<br />
2S Intel® Xeon® processor E5-2697 v3 (BLASTn v.30 baseline)<br />
2S Xeon E5-2697 v3 + Intel® Xeon Phi coprocessor 7120A<br />
2S Xeon E5-2697 v3 + Xeon Phi 7120A (OFS parallelized)<br />
“Xeon E5-2697 v2/v3” = Intel® Xeon® processor E5-2697 v2/v3<br />
“Xeon Phi 7120A” = Intel® Xeon Phi coprocessor 7120A<br />
For configuration details, go here.<br />
BLASTn* v.30 Speed Up<br />
1.26X<br />
1.41X<br />
1.22X<br />
1.49X<br />
1.52X<br />
SOURCE: INTEL MEASURED RESULTS AS OF MARCH, <strong>2015</strong><br />
Application: Basic Local Alignment Search Tool (BLASTn) v.30.<br />
Description: Searching for alignment in nucleotide query<br />
sequences against a known nucleotide db volume set. National<br />
Center for Biotechnology Information (NCBI*). More at<br />
http://blast.ncbi.nlm.nih.gov/.<br />
Availability:<br />
• Code: Available here.<br />
• Recipe: Available here.<br />
Usage Model: #4 (multiple queries multiple db) 100 NCBI queries<br />
(concatenated) against db refseq_rna.00-02 are distributed to the<br />
Intel® Xeon® processor and Intel® Xeon Phi coprocessor for<br />
maximum speedup sweet spot. Experiment was repeated 20 times<br />
with the pick of queries randomized for a sweet spot split 80/20<br />
and 59/23/18.<br />
Highlights: Throughput for this load sharing model has a small<br />
sweet spot for a sufficiently large query set.<br />
Results: Compared to the baseline, simulation rate speed up with<br />
Intel® Xeon® processor E5-2697 v3 and Intel® Xeon Phi<br />
coprocessor 7120A heterogeneous model is 1.52X. Performance is<br />
also improved on the CPU due to Output Formatting Section (OFS)<br />
parallelization.<br />
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,<br />
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated<br />
purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others<br />
.<br />
1 NODE<br />
APPROVED FOR PUBLIC PRESENTATION<br />
NEW<br />
18