11.07.2015 Views

1 Distributed Shared Memory Architecture (Scalable ... - SERC

1 Distributed Shared Memory Architecture (Scalable ... - SERC

1 Distributed Shared Memory Architecture (Scalable ... - SERC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Distributed</strong> <strong>Shared</strong> <strong>Memory</strong> Multiprocessors<strong>Distributed</strong> <strong>Shared</strong> <strong>Memory</strong><strong>Architecture</strong>(<strong>Scalable</strong> Multiprocessors)• Each processor has local memory – Non-uniform memoryaccess (NUMA)• Local memory controller determines whether the access islocal or remote• Performance depends on how well you partition the dataProcessor 1 Processor 2 Processor nMem Mem MemInterconnection NetworkCache Coherence in DSMDirectory-Based Cache Coherence Protocol• Snooping cache coherence protocol not suitable for large scalemultiprocessors– 16 processors, block size 64 bytes, data cache size 512 KB, with 1 datareference every 1 ns, total bandwidth demand 1GB/sec to 42 GB/sec• May not have any hardware cache coherence protocol– Cray T3D/E– only private data can be cached– shared data uncacheable• Software can maintain cache coherence• Compiler techniques very limited• Performance suffers without caching shared data– local cache access – 2 cycles– remote memory access – 400 cycles• A directory keeps the state of every memory block• A directory keeps track of which processor caches containcopy• Messages are not braodcast• Typically size of directory proportional to M*P• The directory is distributed along with the memory• Individual caches also have state information like snoopingprotocol1


Directory-Based Cache Coherence ProtocolProcessor 1 Processor 2 Processor nMem Mem MemDirectory Directory DirectoryInterconnection NetworkDirectory-Based Cache Coherence Protocol –An Example• The memory states– <strong>Shared</strong>– Uncached– Exclusive• The cache states– Invalid– <strong>Shared</strong>– Exclusive• Local node- The node that generates the request• Home node – The node where the memory block resides• Remote node – The node whose cache contains the latestdataOperation of a simple directory schemeOperation of a simple directory scheme1: Request for dataLH2: Response with data1: Request for dataL H R4: Responsewith data3: Data2: Intervention3: Intervention1: Request for dataL H R2: Response 4a: Data4b: Data1: Request for data 2: InterventionL H R3a: Data1c: invalidate1b: invalidateH1a: invalidateS1 S2 S32a: ack2b: ack2c: ack3b: Data2


The possible messages sent among the nodes tomaintain coherenceMessage typeRead MissWrite missInvalidateFetchFetch/invalidateData value replyData write backSourcelocal cachelocal cachehome directoryhome directoryhome directoryhome directoryremote cacheDestinationhome directoryhome directoryremote cacheremote cacheremote cachelocal cachehome directoryMessagecontentsP, AP, AAAADA, Ddata write backfetch invalidateAn Example Directory ProtocolInvalid InvalidCPU writesend writemiss messageExclusive Exclusiveread/write read/writeCPU write hitCPU read hitwrite back blockinvalidateCPU readsend read miss messageCPU read missCPU writeread missCPU write missshared sharedread read only onlysend write miss messagefetch write back blockwrite back cache blockwrite missCPU read hitCPU read missread missCache state transition for an individual cache block in a directory-based systemOrganization of the DirectoryData write back;Sharers = {}Uncached UncachedData value reply;Sharers = {P}writemissExclusive Exclusiveread/write read/writeData value replySharers = {P}read missread missFetch; data value reply;Sharers = Sharers + {P}write missCPU read hitwrite missFetch/Invalidate; Data value reply;Sharers = {P}shared sharedread read only onlyread missData value replySharers = Sharers + {P}Invalidate; Data value reply;Sharers = {P}• Flat memory-based directory scheme– Full map• directory size is large• Limited pointer• Caching the directory entries• Multiple nodes per directory• Flat cache-based directory scheme• Hierarchical directory scheme• Software – hardware hybrid organization– LimitLESS protocolThe state transition for the directory3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!