The Software Stack for Transactional Memory - Computer Systems ...

The Software Stack for Transactional Memory - Computer Systems ... The Software Stack for Transactional Memory - Computer Systems ...

csl.stanford.edu
from csl.stanford.edu More from this publisher
27.12.2014 Views

The Software Stack for Transactional Memory Challenges and Opportunities Brian D. Carlstrom JaeWoong Chung, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford University http://tcc.stanford.edu (picture will come here)

<strong>The</strong> <strong>Software</strong> <strong>Stack</strong> <strong>for</strong><br />

<strong>Transactional</strong> <strong>Memory</strong><br />

Challenges and Opportunities<br />

Brian D. Carlstrom<br />

JaeWoong Chung,<br />

Christos Kozyrakis, Kunle Olukotun<br />

<strong>Computer</strong> <strong>Systems</strong> Lab<br />

Stan<strong>for</strong>d University<br />

http://tcc.stan<strong>for</strong>d.edu<br />

(picture will come here)


Parallel Programming<br />

on Shared <strong>Memory</strong><br />

Traditionally done using locks<br />

But locks are hard to use<br />

Semantic problems<br />

Deadlock<br />

Priority inversion<br />

Per<strong>for</strong>mance problems<br />

Simplicity at the expense of concurrency<br />

High concurrency at the expense of simplicity<br />

Pessimistic concurrency<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

2


<strong>Transactional</strong> <strong>Memory</strong><br />

Allows <strong>for</strong> lock-free parallel programming<br />

Transactions mark critical sections<br />

Same properties as database transactions<br />

Atomicity : all or nothing<br />

Isolation : no partial updates<br />

Transactions are easier to use than locks<br />

Coarse-grained non-blocking synchronization<br />

Optimistic concurrency<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

3


Opportunities and Challenges<br />

TM is a promising solution <strong>for</strong> easy and<br />

efficient parallel programming on multi-<br />

core systems<br />

TM brings up both opportunities and<br />

challenges to software stack<br />

Today’s s talk focuses on, but not limited to,<br />

the software stack on top of hardware TM<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

4


Contents<br />

<strong>Transactional</strong> <strong>Memory</strong> Overview<br />

What is TM<br />

Why is it interesting to Multi-core systems<br />

TM example and primitives<br />

<strong>Software</strong> <strong>Stack</strong><br />

Data Structure<br />

Programming Composition<br />

Operating System<br />

Language Implementation<br />

Programming Models<br />

Distributed Transactions<br />

Conclusion<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

5


TM execution model example<br />

(<strong>Transactional</strong> Coherence and Consistency)<br />

CPU 0 CPU 1 CPU 2<br />

...<br />

ld 0xdddd<br />

ld 0xeeee<br />

...<br />

st 0xbeef<br />

Execute<br />

Code<br />

Validation<br />

Commit<br />

...<br />

ld 0xaaaa<br />

ld 0xbbbb<br />

...<br />

0xbeef<br />

Execute<br />

Code<br />

Validation<br />

0xbeef<br />

Execute<br />

Code<br />

Violate<br />

...<br />

ld 0xbeef<br />

...<br />

Commit<br />

Re-<br />

Execute<br />

Code<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

6


Advanced TM primitives<br />

Nesting [isca06]<br />

Core 0 Core 1<br />

Core 0<br />

Core 1<br />

// A is initially 0;<br />

Atomic {<br />

....<br />

Atomic {<br />

// A is initially 0;<br />

Atomic {<br />

....<br />

Open_Atomic {<br />

A++; // 1 A = ;<br />

A++; // 1<br />

….<br />

….<br />

}<br />

}<br />

= A; // 0<br />

A++; // 2<br />

A++; // 2<br />

A = ;<br />

= A; // 1<br />

}<br />

= A; // 2<br />

}<br />

= A; // 2<br />

< Closed Nesting > < Open Nesting ><br />

Callback<br />

Violation Handler and Commit Handler<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

7


Data Structure<br />

(Opportunity)<br />

Coarse-grain non-blocking synchronization<br />

Both ease-to<br />

to-use and per<strong>for</strong>mance<br />

Nesting – reduces violation overhead<br />

Open nesting reduces the frequency of conflicts<br />

Closed nesting reduces the penalty of violation<br />

Callback – provides more programmability<br />

Violation Handler<br />

Automatic clean-up at conflicts<br />

Application specific conflict handler<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

8


Data Structure<br />

(Challenge)<br />

How to hide the advanced techniques <strong>for</strong><br />

novice programmers<br />

TM-based library<br />

like GNU classpath Java library<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

9


Programming Composition<br />

Transaction nesting<br />

(Opportunity)<br />

Flexibility in composing transactions<br />

Speculative parallel loops<br />

To avoid the hassle of setting and wrapping<br />

up multiple threads<br />

T_FOR(..)<br />

{<br />

// loop iteration<br />

}<br />

Iter1<br />

Iter2<br />

IterN<br />

tx1<br />

tx2<br />

txN<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

10


Programming Composition<br />

<strong>Transactional</strong> I/O<br />

I/O buffering<br />

(Challenge)<br />

Defer I/O operations by the commit<br />

Execute the deferred operations at commit<br />

Conditional Waits<br />

Wait() is related to lock objects<br />

Composible conditions <strong>for</strong> atomic regions<br />

Overflows<br />

Deep call stacks make transactions long<br />

Buffer overflow mechanism<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

11


Operating System<br />

(Opportunity)<br />

Non-blocking synchronization<br />

Easier kernel construction<br />

Potential <strong>for</strong> speedup<br />

Atomicity <strong>for</strong> fault-tolerance<br />

tolerance<br />

TM undoes instructions at rollback<br />

Easy check-pointing<br />

Isolation <strong>for</strong> security [sosp03]<br />

TM isolates instructions by commit<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

12


Context-switch<br />

Operating System<br />

(Challenge)<br />

In hardware TMs, transactional states have<br />

affinity to processors<br />

Interrupt [hpca06]<br />

I/O<br />

Swapping in/out transactions<br />

<strong>Software</strong> TM runs on virtual address space<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

13


Programming Models<br />

(Opportunity)<br />

TM-based models tuned <strong>for</strong> parallelism<br />

Atomos [pldi06]<br />

Atomos<br />

Java – old synchronization APIs + new TM primitives<br />

support <strong>for</strong> nesting, callback, and high-level<br />

language construct<br />

X10, Fortress, and Chapel also explore<br />

transactions<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

14


Programming Models<br />

(Challenge)<br />

Many different semantics <strong>for</strong> TM<br />

Different definitions <strong>for</strong> the same term<br />

Strong vs. weak consistency<br />

We prefer strong consistency<br />

No need to worry about possible bugs due to interaction<br />

between transaction code and non-transactional code<br />

APIs <strong>for</strong> application and system programming<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

15


Language Implementation<br />

(Opportunity)<br />

Aggressive JIT compiler optimization<br />

Try unsafe optimization<br />

Constant Propagation<br />

Rollback the computation if there is a problem<br />

Restart with safe code<br />

Speculative Parallelism<br />

Make a code segment run in parallel<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

16


Language Implementation<br />

<strong>Memory</strong> allocation<br />

(Challenge)<br />

Private memory pool or Nesting<br />

Incremental/Concurrent garbage collection<br />

Use violation handlers to deal with conflicts<br />

between collectors and mutators<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

17


Distributed Transactions<br />

(Opportunity)<br />

Integration with distributed transaction<br />

systems<br />

Transaction Service in .Net, J2EE, and<br />

CORBA<br />

Extracting parallelism from distributed<br />

objects with transactional properties<br />

Enterprise Java Beans<br />

TX_REQUIRED, TX_BEAN_MANAGED<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

18


Distributed Transactions<br />

(Challenge)<br />

E-commerce transactions are long<br />

Longer than time quanta<br />

I/O operations<br />

TM virtualization can be helpful<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

19


Conclusion<br />

<strong>Transactional</strong> <strong>Memory</strong> is a promising<br />

solution <strong>for</strong> parallel programming<br />

<strong>Transactional</strong> memory brings up both<br />

opportunities and challenges to software<br />

stack<br />

We hope research <strong>for</strong>ces from many areas<br />

join the ef<strong>for</strong>ts <strong>for</strong> <strong>Transactional</strong> memory<br />

<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />

20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!