08.06.2013 Views

Jim Tomkins

Jim Tomkins

Jim Tomkins

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The Red Storm Story<br />

Red Storm Update<br />

<strong>Jim</strong> <strong>Tomkins</strong><br />

<strong>Jim</strong> <strong>Tomkins</strong><br />

SOS 11<br />

Key West, Florida<br />

June 12 - 14, 2007<br />

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,<br />

for the United States Department of Energy’s National Nuclear Security Administration<br />

under contract DE-AC04-94AL85000.<br />

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,<br />

for the United States Department of Energy’s National Nuclear Security Administration<br />

under contract DE-AC04-94AL85000.


Upgrade Changes<br />

• SeaStar 2.1 NIC/Router Chips<br />

– 2 x Injection Bandwidth<br />

• 5th Row (25% more compute and service nodes)<br />

• Dual Core Opterons<br />

– 2.5 x More Compute Processors<br />

– 20% Higher Clock Rate<br />

• New OS’s (Linux 2.6, CVN)<br />

– Virtual node model –– 1PE or 2PE / node<br />

• New File Systems<br />

– Lustre 1.4 on Linux 2.6<br />

• Complete: mid-November, 2006


Normally<br />

Classified<br />

I/O and<br />

Service Nodes<br />

Upgraded Red Storm Layout<br />

(27 x 20 x 24 Compute Node Mesh)<br />

Switchable Nodes<br />

Disconnect Cabinets<br />

Disk storage system<br />

not shown<br />

Normally<br />

Unclassified<br />

I/O and<br />

Service Nodes


Upgraded Red Storm


Theoretical Peak Performance<br />

(Compute Nodes Only)<br />

Red Storm Comparison<br />

Red Storm<br />

(initial operations)<br />

41.47 TF 124.42 TF<br />

HPL Performance 36.19 TF 101.400 TF<br />

Red Storm<br />

(post-upgrade)<br />

Compute Nodes / Processors 10368 / 10368 12960 / 25920<br />

Service and I/O Nodes /<br />

Processors<br />

256 + 256<br />

256 + 256<br />

320 + 320<br />

640 + 640<br />

Processor 2.0 GHz Opteron 2.4 GHz Dual Core<br />

Opteron<br />

Total Memory (TB) 33.38 39.19<br />

System Memory B/W (TB/s) 55.2 82.9<br />

Topology 27 x 16 x 24 27 x 20 x 24<br />

Bi-Section B/W (TB/s) 3.7, 6.2, 8.3 4.6, 6.2, 10.4<br />

Power / Cooling (MW) 1.7 2.5<br />

System Size ~3100 ft 2 ~3800 ft 2


Some Performance Data


#2 on Top 500 list at 101.4 TF


Single Core Versus Dual Core


Single Core Versus Dual Core


Single Core Versus Dual Core<br />

1.20<br />

1.00<br />

0.80<br />

0.60<br />

0.40<br />

0.20<br />

0.00<br />

CTH (2000)<br />

SAGE (2048)<br />

UMT2K (3200)<br />

ITS (3200)<br />

Red Storm (SN vs. VN)<br />

SN = 1PE/socket, VN = 2PE/socket<br />

PRTSN (4096)<br />

SAGE (4500)<br />

UMT2K (4500)<br />

ITS (4500)<br />

sPPM (4500)<br />

Application (PEs)<br />

CTH (6480)<br />

PRTSN (6480)<br />

sPPM (6561)<br />

PRTSN (8930)<br />

SN (post-upgrade)<br />

VN (post-upgrade)<br />

CTH (9000)<br />

sPPM (9000)


John Daly’s Production Work<br />

• Recent results have shown that for John’s application we<br />

are getting a large fraction of the second core.<br />

– John’s application code runs on 3000 dual core nodes in<br />

the same time as 5000 single core nodes. This means<br />

that he is getting about 70% efficiency out of using the<br />

second core.


Red Storm Upgraded to ~500TF<br />

• Upgrade Existing System to Quad Core XT4 Specifications<br />

– Replace all Compute Boards<br />

– 2.4 GHz AMD Opteron Quad Core Processors<br />

– New DDR2 Memory (>75 TB)<br />

– Replace Power Supplies<br />

– Replace Cabinet Cooling Fans<br />

– Reuse Seastar Mezzanines, L0s, Cabinets, Cables, PDUs, etc.<br />

• A Few System Parameters<br />

– ~500 TF<br />

– >75 TB of Memory<br />

– >1 PB of Disk Storage<br />

• Timeframe - Early 2008


Red Storm Impact<br />

• Cray and Sandia with the support of NNSA/ASC<br />

have created one of the most successful new<br />

supercomputers ever.<br />

• There are now 17 sites with Red Storm / XT3 / XT4<br />

computer systems.


ARMY/HPC<br />

AWE (England)<br />

CSC (Finland)<br />

CSCS<br />

(Switzerland)<br />

DOD/ERDC<br />

EPSRC (UK)<br />

JAIST (Japan)<br />

NERSC<br />

ORNL<br />

PSC<br />

SNL<br />

Red Storm / XT3 / XT4 Sites<br />

SS-634 (non-US)<br />

SS-635<br />

SS-643<br />

SS-661<br />

U of Tokyo (JST)<br />

U of Western Australia<br />

Worldwide<br />

Count of Sites<br />

(Cumulative)<br />

20<br />

15<br />

10<br />

Number of XT3/XT4 Sites Worldwide<br />

5<br />

0<br />

1H05 1 2H05 2 1H06 3 2H06 4 1H07 5 2H07 6<br />

TFLOPS<br />

800<br />

600<br />

400<br />

200<br />

0<br />

Half Years<br />

TFLOPS available Worldwide<br />

1 2 3 4 5 6<br />

1H05 2H05 1H06 2H06 1H07 2H07<br />

Half Years

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!