The Software Stack for Transactional Memory - Computer Systems ...
The Software Stack for Transactional Memory - Computer Systems ... The Software Stack for Transactional Memory - Computer Systems ...
The Software Stack for Transactional Memory Challenges and Opportunities Brian D. Carlstrom JaeWoong Chung, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford University http://tcc.stanford.edu (picture will come here)
- Page 2 and 3: Parallel Programming on Shared Memo
- Page 4 and 5: Opportunities and Challenges TM is
- Page 6 and 7: TM execution model example (Transac
- Page 8 and 9: Data Structure (Opportunity) Coars
- Page 10 and 11: Programming Composition Transactio
- Page 12 and 13: Operating System (Opportunity) Non
- Page 14 and 15: Programming Models (Opportunity) T
- Page 16 and 17: Language Implementation (Opportunit
- Page 18 and 19: Distributed Transactions (Opportuni
- Page 20: Conclusion Transactional Memory is
<strong>The</strong> <strong>Software</strong> <strong>Stack</strong> <strong>for</strong><br />
<strong>Transactional</strong> <strong>Memory</strong><br />
Challenges and Opportunities<br />
Brian D. Carlstrom<br />
JaeWoong Chung,<br />
Christos Kozyrakis, Kunle Olukotun<br />
<strong>Computer</strong> <strong>Systems</strong> Lab<br />
Stan<strong>for</strong>d University<br />
http://tcc.stan<strong>for</strong>d.edu<br />
(picture will come here)
Parallel Programming<br />
on Shared <strong>Memory</strong><br />
Traditionally done using locks<br />
But locks are hard to use<br />
Semantic problems<br />
Deadlock<br />
Priority inversion<br />
Per<strong>for</strong>mance problems<br />
Simplicity at the expense of concurrency<br />
High concurrency at the expense of simplicity<br />
Pessimistic concurrency<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
2
<strong>Transactional</strong> <strong>Memory</strong><br />
Allows <strong>for</strong> lock-free parallel programming<br />
Transactions mark critical sections<br />
Same properties as database transactions<br />
Atomicity : all or nothing<br />
Isolation : no partial updates<br />
Transactions are easier to use than locks<br />
Coarse-grained non-blocking synchronization<br />
Optimistic concurrency<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
3
Opportunities and Challenges<br />
TM is a promising solution <strong>for</strong> easy and<br />
efficient parallel programming on multi-<br />
core systems<br />
TM brings up both opportunities and<br />
challenges to software stack<br />
Today’s s talk focuses on, but not limited to,<br />
the software stack on top of hardware TM<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
4
Contents<br />
<strong>Transactional</strong> <strong>Memory</strong> Overview<br />
What is TM<br />
Why is it interesting to Multi-core systems<br />
TM example and primitives<br />
<strong>Software</strong> <strong>Stack</strong><br />
Data Structure<br />
Programming Composition<br />
Operating System<br />
Language Implementation<br />
Programming Models<br />
Distributed Transactions<br />
Conclusion<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
5
TM execution model example<br />
(<strong>Transactional</strong> Coherence and Consistency)<br />
CPU 0 CPU 1 CPU 2<br />
...<br />
ld 0xdddd<br />
ld 0xeeee<br />
...<br />
st 0xbeef<br />
Execute<br />
Code<br />
Validation<br />
Commit<br />
...<br />
ld 0xaaaa<br />
ld 0xbbbb<br />
...<br />
0xbeef<br />
Execute<br />
Code<br />
Validation<br />
0xbeef<br />
Execute<br />
Code<br />
Violate<br />
...<br />
ld 0xbeef<br />
...<br />
Commit<br />
Re-<br />
Execute<br />
Code<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
6
Advanced TM primitives<br />
Nesting [isca06]<br />
Core 0 Core 1<br />
Core 0<br />
Core 1<br />
// A is initially 0;<br />
Atomic {<br />
....<br />
Atomic {<br />
// A is initially 0;<br />
Atomic {<br />
....<br />
Open_Atomic {<br />
A++; // 1 A = ;<br />
A++; // 1<br />
….<br />
….<br />
}<br />
}<br />
= A; // 0<br />
A++; // 2<br />
A++; // 2<br />
A = ;<br />
= A; // 1<br />
}<br />
= A; // 2<br />
}<br />
= A; // 2<br />
< Closed Nesting > < Open Nesting ><br />
Callback<br />
Violation Handler and Commit Handler<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
7
Data Structure<br />
(Opportunity)<br />
Coarse-grain non-blocking synchronization<br />
Both ease-to<br />
to-use and per<strong>for</strong>mance<br />
Nesting – reduces violation overhead<br />
Open nesting reduces the frequency of conflicts<br />
Closed nesting reduces the penalty of violation<br />
Callback – provides more programmability<br />
Violation Handler<br />
Automatic clean-up at conflicts<br />
Application specific conflict handler<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
8
Data Structure<br />
(Challenge)<br />
How to hide the advanced techniques <strong>for</strong><br />
novice programmers<br />
TM-based library<br />
like GNU classpath Java library<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
9
Programming Composition<br />
Transaction nesting<br />
(Opportunity)<br />
Flexibility in composing transactions<br />
Speculative parallel loops<br />
To avoid the hassle of setting and wrapping<br />
up multiple threads<br />
T_FOR(..)<br />
{<br />
// loop iteration<br />
}<br />
Iter1<br />
Iter2<br />
IterN<br />
tx1<br />
tx2<br />
txN<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
10
Programming Composition<br />
<strong>Transactional</strong> I/O<br />
I/O buffering<br />
(Challenge)<br />
Defer I/O operations by the commit<br />
Execute the deferred operations at commit<br />
Conditional Waits<br />
Wait() is related to lock objects<br />
Composible conditions <strong>for</strong> atomic regions<br />
Overflows<br />
Deep call stacks make transactions long<br />
Buffer overflow mechanism<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
11
Operating System<br />
(Opportunity)<br />
Non-blocking synchronization<br />
Easier kernel construction<br />
Potential <strong>for</strong> speedup<br />
Atomicity <strong>for</strong> fault-tolerance<br />
tolerance<br />
TM undoes instructions at rollback<br />
Easy check-pointing<br />
Isolation <strong>for</strong> security [sosp03]<br />
TM isolates instructions by commit<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
12
Context-switch<br />
Operating System<br />
(Challenge)<br />
In hardware TMs, transactional states have<br />
affinity to processors<br />
Interrupt [hpca06]<br />
I/O<br />
Swapping in/out transactions<br />
<strong>Software</strong> TM runs on virtual address space<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
13
Programming Models<br />
(Opportunity)<br />
TM-based models tuned <strong>for</strong> parallelism<br />
Atomos [pldi06]<br />
Atomos<br />
Java – old synchronization APIs + new TM primitives<br />
support <strong>for</strong> nesting, callback, and high-level<br />
language construct<br />
X10, Fortress, and Chapel also explore<br />
transactions<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
14
Programming Models<br />
(Challenge)<br />
Many different semantics <strong>for</strong> TM<br />
Different definitions <strong>for</strong> the same term<br />
Strong vs. weak consistency<br />
We prefer strong consistency<br />
No need to worry about possible bugs due to interaction<br />
between transaction code and non-transactional code<br />
APIs <strong>for</strong> application and system programming<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
15
Language Implementation<br />
(Opportunity)<br />
Aggressive JIT compiler optimization<br />
Try unsafe optimization<br />
Constant Propagation<br />
Rollback the computation if there is a problem<br />
Restart with safe code<br />
Speculative Parallelism<br />
Make a code segment run in parallel<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
16
Language Implementation<br />
<strong>Memory</strong> allocation<br />
(Challenge)<br />
Private memory pool or Nesting<br />
Incremental/Concurrent garbage collection<br />
Use violation handlers to deal with conflicts<br />
between collectors and mutators<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
17
Distributed Transactions<br />
(Opportunity)<br />
Integration with distributed transaction<br />
systems<br />
Transaction Service in .Net, J2EE, and<br />
CORBA<br />
Extracting parallelism from distributed<br />
objects with transactional properties<br />
Enterprise Java Beans<br />
TX_REQUIRED, TX_BEAN_MANAGED<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
18
Distributed Transactions<br />
(Challenge)<br />
E-commerce transactions are long<br />
Longer than time quanta<br />
I/O operations<br />
TM virtualization can be helpful<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
19
Conclusion<br />
<strong>Transactional</strong> <strong>Memory</strong> is a promising<br />
solution <strong>for</strong> parallel programming<br />
<strong>Transactional</strong> memory brings up both<br />
opportunities and challenges to software<br />
stack<br />
We hope research <strong>for</strong>ces from many areas<br />
join the ef<strong>for</strong>ts <strong>for</strong> <strong>Transactional</strong> memory<br />
<strong>Software</strong> <strong>Stack</strong> <strong>for</strong> <strong>Transactional</strong> <strong>Memory</strong><br />
20