11.07.2015 Views

Cluster Computing Motivation - SERC

Cluster Computing Motivation - SERC

Cluster Computing Motivation - SERC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

¡Parallel¡Parallel¡Cost<strong>Cluster</strong> <strong>Computing</strong><strong>Motivation</strong> for clusters– Higher performance at lower costComparison of Parallel Machines and <strong>Cluster</strong>sMajor Issues– Performance related : Communication s/w and h/w– Mangement relatedApplications of clusters– Scientific Application (Original Purpose)– Web server– Google cluster<strong>Motivation</strong>Applications– Grand Challenge apps: Weather forcastingMachines– Very powerful, high cost– Partly due to custom h/w and s/weffective replacement ??– Emergence of clusters1


Parallel ComputersHardware– Processor – Alpha 450 MhzAlpha21164ControlLocal Mem– Custom comm H/WAccess remote mem w/o remoteprocessor involvmentRouter3D torous (600 MB/S)Sofware– Custom S/W (get(), put()Cray 3TELatency : 1usB/W : 350 MB/sBirth of <strong>Cluster</strong>s (Beowulf)Workstation– Uniprocessor – Intel 486, 66 Mhz– 16 MB RAM– 10 Mb/x Ethernet– Standard OS (Linux)CPU<strong>Cluster</strong>s of 16 workstations , 2 Ethernet adapter– Assembled from commodity off-shelf componets– Better cost performance trade-offBeowulf computer (1994)– 1 Mbytes/s / 1.25 Mbyes/s– Good Applcation scaleupMEMMBUSI/OIO BUSNIC2


¡Communication¡Communication¡Better¡Fast¡Faster¡Communication¡Top500Performance issuescharacteristics– Parallel Apps : Fine grain comm (small messages)– NFS : 200 byte messagesperformance– Latency ( small message)– Bandwidth (large messages)Current TrendsUniprocessor performanceMemoryI/O BusH/Wsupercomputers (based in LINPACK)– 200+ are clusters3


¡System¡Buffer¡MemoryOverheads in comm s/wcall overhead¡InterruptsmanagementcopyCPUMEMUserKernelApplLibMsg layerDevice DrvApplBuffKernelBuffH/WOne -CopyZero-CopyNICU-Net (User-level n/w)¡Objectives– Achieve Latency and Bandwidth close to thatprovided by h/w (link level)– Support parallel jobs (with protection)– Flexible enough for upper layer protocols¡How?Eg.: synchronous v/s asynchronous– Avoid kernel from the critical path of send/recv– Use virtual memory hardware for protection– Provide a simple and efficient interface forapplications for network access4


¡Sending¡Receiving¡WaitU-Net (Idea and Design)U-Net (send/recv)Message– Assemble message in the comm segment– Place a queue entry (descriptor) on to send queueMessage– Get the descriptor form the recv queue– Consume message from comm segment– Place the descriptor into free queueMessage– Poll recv queue (select() or signal() )5


¡PrimitiveComm. H/W¡Ethernet– Shared, Switched– 100 Mbps, (moderate performance, cost)– 1 Gbps (better performance, higher cost)¡Myrinet– Switched– Latency : < 2 us– Bandwidth: 1.28 Gbps / 2.56 GbsU-Net (Implementation)NIC (inefficient)– Kernel multiplexes application endpoints andhandles send/recvProcNICsyscallsenddescCmdnicDMADMATrapRecvdescRecv6


U-Net (Implementaion)NIC with processing capabilities (efficient)– Descriptor processing, host NIC DMA– Demultiplexing endpoints (recv )SendRecvProcNICUserlevelsenddescCmdnicDMADMARecvdescUserlevelPerformance (user level/TCP)LatencyBandwidthPIII, 933 Mhz, 64bit, 66 Mhz PCI,: EMP: Gigabit EthenetGM: Myrinet 1.28 MB/s. --Ref. EMP7


Message Passing Interface (MPI)• Standard API forapplications to send/recvmessages• Library for commonoperations like broadcastApplicationMPI LibADITCP U-Net GMTCP/MPI/GM/MyrinetTPC (P4, 2.0 GHz)latency (1byte): 32 usB/w : 231 Mbytes/sUDPlatency : 31 usB/w : 245 Mbytes/sMPI (PIII, 1GHz)Myrinet: 250 Mbytes/s8


<strong>Cluster</strong> based web server (LARD)<strong>Cluster</strong> Server (Implementation)1. Request from Clients to Front end2. Forward the request to Back end (hand-off)3. Response from server<strong>Cluster</strong> server9


¡Active¡Fast¡BIP¡U-Net¡GAMMA¡NOW¡VIA,PerformanceHTTP ThroughputRelated Projects (few)Messages (UCB)Messages (UIUC, UCSD)( INRIA, France)(Cornell)(Italy)lab (Ohio state)Infiniband, Quadrics (matured products)10


ReferenceS.L.Scott. Syncronization and Communication in the T3E Multiprocessor,ASPLOS VII, 1996.T.Sterling, D.Becker, D.Savarese: BEOWULF: A Parallel Workstaion forScientific Computation, ICPP 1995.T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: A User-LevelNetwork Interface for Parallel and Distributed <strong>Computing</strong>. 15th SOSP, 1995.P.Shivam, P. Wyckoff, D.K.Panda, EMP: Zero-copy OS-bypass NIC-drivenGigabit Ethernet Message Passing, SC-2001.V.S.Pai, et.al. Locality Aware Request Distribution in <strong>Cluster</strong>-based NetworkServers. ASPLOS VIII, 1998.http://www.myri.comIEEE Micro 1998.11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!