11.02.2014 Views

software engineering - Reiner Hartenstein

software engineering - Reiner Hartenstein

software engineering - Reiner Hartenstein

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

keynote<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

1<br />

reiner@hartenstein.de<br />

4 December 2009<br />

10 Sep 2009, Stockholm, Sweden<br />

<strong>Reiner</strong><br />

<strong>Hartenstein</strong><br />

The Impact of<br />

Reconfigurable Computing on<br />

Manycore Programming Trends<br />

reiner@hartenstein.de<br />

9:15 – 10:00<br />

1<br />

Teaching for Change: an early martyr<br />

„Turing is irrelevant“<br />

punished for blasphemy?<br />

D. L. Parnas (keynote):<br />

"Teaching for Change“;<br />

10 th Conf. Softw. Engineering Education<br />

and Training (CSEET '97)<br />

http://www.sigsoft.org/SEN/parnas.html<br />

The von Neumann model<br />

is the emulation of a tape machine<br />

„The von Neumann syndrome“:<br />

coined ~ a decade later<br />

(mimicking tape<br />

on RAM)<br />

Prof. C.V.<br />

Ramamoorthy,<br />

(UC Berkeley),<br />

SDPS 2006,<br />

San Diego, CA<br />

Critique of von Neumann is not new:<br />

Peter G.<br />

Neumann<br />

Dijkstra 1968: The Goto considered harmful Peter G. Neumann 1985-2003: 216x<br />

http://hartenstein.de<br />

Brad Cox 1990:<br />

“Inside<br />

Planning<br />

Risks“<br />

(18 years inside back cover<br />

R. <strong>Hartenstein</strong>, © 2009, G. Koch 1975: The universal Bus considered harmful the Software Industrial Revolution<br />

reiner@hartenstein.de<br />

of Comm_ACM)<br />

Backus 1978: Can programming be liberated from the von Neumann style? L. Savain 2006:<br />

Arvind et al., 1983: A critique of Multiprocessing the von Neumann 2 Style Why Software is bad …<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

2<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Outline (1)<br />

• The Power Consumption of Computing<br />

• The Single-Core Approach<br />

• The Multicore Scenario<br />

• The Silver Bullet?<br />

• A CPU-centric Flat World<br />

• The Generalisation of Software Engineering<br />

• Conclusions<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

3<br />

Impact of the<br />

von Neumann<br />

Syndrome<br />

Dig more coal --<br />

the PCs are coming<br />

Peter W. Huber,<br />

Mark P. Mills,<br />

05.31.99<br />

http://www.forbes.com/forbes/1999/0531/6311070a.html<br />

[1989 from a student at Kaiserslautern]<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

4<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


Production (%)<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

3<br />

reiner@hartenstein.de<br />

4 December 2009<br />

never run out of energy?<br />

2007:<br />

80% crude oil coming from decline fields<br />

natural gas: similar situation<br />

„6 more Saudi Arabias needed<br />

for demand predicted for 2030“<br />

~ 55 %<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

> 30 %<br />

[Fatih Birol, Chief Economist IEA]. https://www.theoildrum.com/<br />

100<br />

typical oil field operation<br />

0<br />

5<br />

coal<br />

hydro<br />

nuclear<br />

gas<br />

oil<br />

5<br />

Power consumption by internet:<br />

x30 til 2030 if trends continue<br />

G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends<br />

and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008<br />

Server Farms<br />

at banks of the Columbia river:<br />

[Randy Katz: IEEE Spectrum, Febr. 2009]<br />

at Dallas<br />

Quincy<br />

each 6500 m 2<br />

WASHINGTON<br />

at Quincy, size: 10<br />

Am. football fields<br />

48 MW<br />

power for<br />

40,000 homes<br />

Dalles<br />

Boardman<br />

OREGON<br />

the electricity bill is a key issue<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

6<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

4<br />

reiner@hartenstein.de<br />

4 December 2009<br />

(subject<br />

of my talk)<br />

Power Consumption of Computers<br />

... has become an industry-wide issue:<br />

incremental improvements are on track,<br />

but „we may ultimately need<br />

revolutionary new solutions“ [Horst Simon, LBNL, Berkeley]<br />

Current trends will lead to<br />

unaffordable future operation<br />

cost of our cyber infrastructure<br />

Energy cost may overtake<br />

IT equipment cost<br />

in the near future<br />

[Albert<br />

Zomaya]<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

7<br />

Outline (2)<br />

• The Power Consumption of Computing<br />

• The Single-Core Approach<br />

• The Multicore Scenario<br />

• The Silver Bullet?<br />

• A CPU-centric Flat World<br />

• The Generalisation of Software Engineering<br />

• Conclusions<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

8<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

5<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Single-core<br />

approach:<br />

10 9<br />

Software Performance<br />

10 8<br />

10 7<br />

free ride on<br />

10 Moore„s Law<br />

6<br />

10 the burden of<br />

5<br />

<strong>software</strong> performance is<br />

the task of chip designers*<br />

10 3<br />

10 4<br />

*) M-&-Ccreated<br />

year population<br />

70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

9<br />

Rapid VLSI Design Education Revolution<br />

1980 - 1983<br />

E.I.S.<br />

project<br />

(Heinz<br />

Riesenhuber)<br />

The incubator<br />

of the free ride<br />

on Moore‘s law<br />

massive<br />

funding:<br />

DARPA; NSF; many<br />

national governments;<br />

European Union …<br />

Created the missing<br />

designer population<br />

The most http://hartenstein.de effective<br />

project © 2009,<br />

in the<br />

reiner@hartenstein.de<br />

history of modern<br />

[Foto: GMD]<br />

computer science 10<br />

10<br />

10 Carver<br />

Mead<br />

Lynn<br />

Conway<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

6<br />

reiner@hartenstein.de<br />

4 December 2009<br />

The End of Moore„s Law<br />

10 9<br />

10 8<br />

10 7<br />

10 6<br />

the end of the<br />

single-core era<br />

stop in<br />

2005<br />

soon:<br />

the 20<br />

nm wall<br />

The end of<br />

Moore„s Law<br />

10 5<br />

10 4<br />

traditional instruction-based computing<br />

is running out of steam<br />

[DAC’09 special session: Computation in the Post-Turing Era]<br />

year<br />

10<br />

70 3<br />

72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

11<br />

Growth beyond Moore„s Law?<br />

relative performance<br />

10 13<br />

10 12<br />

10 11<br />

10 10<br />

10 9<br />

10 8<br />

10 7<br />

10 6<br />

10 5<br />

the end of the<br />

single-core era<br />

10 4<br />

year<br />

10 3 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

the key issue:<br />

processor cores<br />

number of transistors<br />

doubles every 2 years<br />

12<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


egin of the<br />

multicore era<br />

begin of the<br />

multicore era<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

7<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Multimedia in the Multicore Era<br />

relative performance<br />

Multimedia<br />

Performance Needs<br />

application performance<br />

needs up to:<br />

Audio<br />

800 MIPS<br />

Graphics<br />

11 GOPS<br />

Video<br />

160 GOPS<br />

Digital TV<br />

900 GOPS<br />

[Pierre Paulin, MPSoC’09]<br />

year<br />

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

13<br />

13<br />

Broadband in the Multicore Era<br />

relative performance<br />

[courtesy E. Sanchez]<br />

MIPS<br />

needed<br />

performance<br />

growing<br />

faster than<br />

Moore‘s law<br />

next<br />

GSM GPRS EDGE UMTS standard year<br />

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

14<br />

14<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

8<br />

reiner@hartenstein.de<br />

4 December 2009<br />

ICT is at an inflection point<br />

handheld & living room commercially more important<br />

than the comparatively small PC market.<br />

„Future prosperity depends on network capacity,<br />

..., efficient pricing, and flexible platforms“<br />

Cheap Revolution:<br />

affordable broadband<br />

& <strong>software</strong> performance<br />

MIPS<br />

[courtesy E. Sanchez]<br />

requirement<br />

growing<br />

faster than<br />

Moore‘s law<br />

„Broadband is significant at the inflection point,<br />

prompting major market governance changes“<br />

Senior Counselor<br />

to the U.S. Trade<br />

Representative<br />

(USTR) on strategy<br />

and negotiations.<br />

Cowhey„s & Aronson„s Law:<br />

massive funding needed<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

15<br />

Funding market governance changes<br />

RUS Broadband Initiatives Program (BIP)<br />

NTIA Broadband Technology Opportunities Program (BTOP).<br />

other<br />

sources<br />

DARPA ?<br />

ARPA-E ?<br />

EU-FP7 ?<br />

EFRCEs ?<br />

Energy Frontier<br />

Research Centers 777 bio $<br />

http://www.broadbandusa.gov/<br />

The Recovery Act:<br />

$7.2 billion<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

16<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

9<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Outline (3)<br />

• The Power Consumption of Computing<br />

• The Single-Core Approach<br />

• The Multicore Scenario<br />

• The Silver Bullet?<br />

• A CPU-centric Flat World<br />

• The Generalisation of Software Engineering<br />

• Conclusions<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

17<br />

Multicore has been around for decades<br />

Dead (Super)Computer Society [Gordon Bell, keynote, ISCA 2000]<br />

•ACRI<br />

•Alliant<br />

•American Supercomputer<br />

•Ametek<br />

•Applied Dynamics<br />

•Astronautics<br />

•BBN<br />

•CDC<br />

•Convex<br />

•Cray Computer<br />

•Cray Research<br />

•Culler-Harris<br />

•Culler Scientific<br />

•Cydrome<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

•Dana/Ardent/ Stellar/Stardent<br />

•DAPP<br />

•Denelcor<br />

•Elexsi<br />

•ETA Systems<br />

•Evans and Sutherland<br />

Computer<br />

•Floating Point Systems<br />

•Galaxy YH-1<br />

•Goodyear Aerospace MPP<br />

•Gould NPL<br />

•Guiltech<br />

•ICL<br />

•Intel Scientific Computers<br />

•International Parallel<br />

Machines<br />

•Kendall Square Research<br />

•Key Computer Laboratories<br />

18<br />

only 2 or 3<br />

successes<br />

•MasPar<br />

•Meiko<br />

•Multiflow<br />

•Myrias<br />

•Numerix<br />

•Prisma<br />

•Tera<br />

•Thinking Machines<br />

•Saxpy<br />

•Scientific Computer<br />

•Systems (SCS)<br />

•Soviet Supercomputers<br />

•Supertek<br />

most in<br />

1985-1995<br />

- mainly<br />

research<br />

•Supercomputer Systems<br />

•Suprenum<br />

•Vitesse Electronics<br />

the single core sequential mind set was the winner<br />

18<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


Speedup-Factor<br />

Speedup-Factor<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

10<br />

reiner@hartenstein.de<br />

4 December 2009<br />

?<br />

Speed-up factors<br />

*<br />

by GPGPUs (1)<br />

[Michael Garland, NVIDIA Research: Parallel Computing<br />

on Manycore GPUs; IPDPS, Rome, Italy, June 25-29, 2009]<br />

The power efficiency is disputable<br />

this hardware can only be<br />

used only in certain ways.<br />

?<br />

such speed-ups by GPGPUs<br />

only for embarrassingly<br />

parallel applications<br />

effective only at problems<br />

that can be solved using<br />

stream processing.<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

10 3<br />

10 2<br />

10 1<br />

10 0<br />

streams provide data parallelism<br />

Jan<br />

2007<br />

July<br />

2007<br />

19<br />

Jan<br />

2008<br />

19<br />

(up to ~150 x)<br />

July<br />

2008<br />

130<br />

100<br />

50<br />

30<br />

18<br />

149<br />

146<br />

47<br />

36<br />

20<br />

Jan<br />

2009<br />

July<br />

2009<br />

Numerics<br />

Imaging<br />

Bioinformatics<br />

Video<br />

Jan<br />

2010<br />

*) migration from x86 singlecore<br />

*<br />

Speed-up factors by GPGPUs (2)<br />

NVIDIA<br />

GeForce<br />

GTX<br />

stream<br />

processor<br />

cores<br />

Compute Unified Device<br />

Architecture (CUDA),<br />

accelerates BLAS<br />

libraries (Basic Linear<br />

Algebra Subroutines)<br />

http://hartenstein.de<br />

Less flexible<br />

© 2009,<br />

reiner@hartenstein.de<br />

than FPGAs.<br />

minium<br />

power supply<br />

recommended<br />

275 240 650–680 watt<br />

295 480 650–680 watt<br />

(GPGPU tool development<br />

years earlier than f. x86)<br />

10 3<br />

http://www.nvidia.co.uk/object/cuda_home_uk.html#state=home<br />

CUDA ZONE pages [NVIDIA Corp.]:<br />

non-reviewed CUDA user submissions<br />

675<br />

470<br />

340 500<br />

250<br />

270 327 420<br />

169 270<br />

150 260 169 170<br />

138 150<br />

100 109 100<br />

100<br />

100<br />

120 172<br />

100<br />

100 100<br />

100<br />

55<br />

100<br />

90<br />

60<br />

90<br />

55<br />

77<br />

75<br />

50<br />

30<br />

5055<br />

60<br />

50 50 60<br />

50<br />

50<br />

36<br />

50 40<br />

35<br />

29<br />

35<br />

40 30 50 39 50<br />

35<br />

35<br />

35<br />

32<br />

27<br />

26 20 30 26<br />

31<br />

29<br />

15 16 20 20 25 23<br />

17 20<br />

12<br />

15 13 16<br />

15<br />

13<br />

15<br />

10<br />

10<br />

12<br />

10 10 10<br />

10 10<br />

10 9<br />

10<br />

7<br />

10 9<br />

9 7 8<br />

5<br />

5 5<br />

5 5<br />

4<br />

8<br />

4<br />

4 . 3<br />

4<br />

3 . 5<br />

4<br />

3 3<br />

3 5<br />

2 2 2<br />

1 . 3<br />

10 2<br />

10 0<br />

Jan<br />

2007<br />

July<br />

2007<br />

Jan<br />

2008<br />

EDA<br />

20<br />

July<br />

2008<br />

Intel Core2 Quad (desktop PCs): 4 cores<br />

Intel Xeon "Nehalem-EX" for servers: 8 cores<br />

(up to ~600 x)<br />

34<br />

oil &<br />

gas<br />

Jan<br />

2009<br />

20<br />

July<br />

2009<br />

Jan<br />

2010<br />

Astrophysics<br />

Bioinformatics<br />

CFD Computational<br />

Fluid Dyamics<br />

Cryptography<br />

DCC<br />

10 1 DSP<br />

2 Imaging<br />

Digital<br />

Content<br />

Creation<br />

Digital Signal<br />

Processing<br />

Graphics<br />

Numerics<br />

Video & Audio<br />

*) migration from x86 singlecore<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


egin of the<br />

multicore era<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

11<br />

reiner@hartenstein.de<br />

4 December 2009<br />

relative<br />

performance<br />

Growth by Multicore<br />

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30<br />

year<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

21<br />

21<br />

Hastily knitted<br />

compilers for the<br />

heavy lifting?<br />

new types<br />

of bugs<br />

introduced<br />

e. g. automatically<br />

parallelizing<br />

compilation via<br />

multi-threading,<br />

and many other<br />

ad-hoc solutions<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

John<br />

Hennessy:<br />

“wait for current<br />

generation of<br />

programmers to die<br />

off and be replaced<br />

by programmers<br />

widespread confusion and brought up on the<br />

competing claims, „I would be new paradigm?”<br />

panicked if I were in industry“<br />

[Jim Turley]<br />

22<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

12<br />

reiner@hartenstein.de<br />

4 December 2009<br />

50 years to die off<br />

„<strong>software</strong> <strong>engineering</strong>“:<br />

the term appeared the late 50ies<br />

Max Planck:<br />

Replacement of false doctrines<br />

by new insights needs 50 years<br />

waiting for not only old professors<br />

but also their scholars to die off.<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

23<br />

Outline (4)<br />

• The Power Consumption of Computing<br />

• The Single-Core Approach<br />

• The Multicore Scenario<br />

• The Silver Bullet?<br />

• A CPU-centric Flat World<br />

• The Generalisation of Software Engineering<br />

• Conclusions<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

24<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


egin of the<br />

multicore era<br />

Speedup-Factor<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

13<br />

reiner@hartenstein.de<br />

4 December 2009<br />

relative performance<br />

Growth in the Multicore Era<br />

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30<br />

year<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

25<br />

Speed-up<br />

factors<br />

obtained<br />

by Software<br />

to Configware<br />

migration<br />

(up to ~30,000x)<br />

vs. GPU: almost 50x<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

(200x)<br />

2 orders of magnitude<br />

10 6<br />

10 3<br />

“... design techniques will evolve, by<br />

necessity, to satisfy the demands of<br />

reconfigurable hardware and <strong>software</strong><br />

programmability”. J. R. Rattner, DAC 2008<br />

intel supports direct front<br />

side bus access by FPGAs<br />

10 0<br />

real-time<br />

face detection<br />

6000<br />

GRAPE<br />

by FPGA:<br />

Image processing,<br />

Pattern matching,<br />

Multimedia<br />

26<br />

DSP and<br />

wireless<br />

video-rate<br />

stereo vision<br />

pattern 730<br />

recognition<br />

900<br />

52<br />

BLAST<br />

40<br />

26<br />

Reed-Solomon<br />

Decoding 2400<br />

MAC<br />

1000<br />

400<br />

288<br />

SPIHT wavelet-based<br />

image compression 457<br />

FFT<br />

88<br />

protein<br />

identification<br />

20 Astrophysics<br />

DES breaking<br />

Viterbi Decoding<br />

Smith-Waterman<br />

pattern matching<br />

100<br />

DNA & protein<br />

sequencing<br />

crypto<br />

1000<br />

8723<br />

molecular<br />

dynamics<br />

simulation<br />

28514<br />

3000<br />

CT imaging<br />

CUDA<br />

ZONE<br />

Bioinformatics<br />

~50x<br />

(200x)<br />

Garland<br />

IPDPS‘09<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


0.5xMOPS/MHz/Billion Transistors<br />

SPECfp2000/MHz/Billion Transistors<br />

Speedup-Factor<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

14<br />

reiner@hartenstein.de<br />

4 December 2009<br />

10 6 Speedup-<br />

Factor<br />

Grid-based DRC*<br />

(„fair comparizon“)<br />

15000<br />

Grid-based DRC:<br />

2000<br />

no FPGA: DPLA<br />

on MoM by TU-KL*<br />

10 3<br />

Lee Routing<br />

160<br />

(DPLA by TU-KL*)<br />

39.4<br />

2-D FIR filter (no<br />

FPGA: DPLA by TU-KL*)<br />

fabricated by E.I.S.<br />

Multi University<br />

Project<br />

http://hartenstein.de<br />

©<br />

*)<br />

2009,<br />

TU Kaiserslautern<br />

reiner@hartenstein.de<br />

10 0<br />

1985 1990<br />

10 6<br />

+ Pre-FPGA solutions<br />

10 3<br />

1984: 1 DPLA<br />

replaced 256<br />

FPGAs<br />

10 0<br />

Image processing,<br />

Pattern matching,<br />

Multimedia<br />

real-time<br />

face detection<br />

6000<br />

GRAPE<br />

27<br />

DSP and<br />

wireless<br />

video-rate<br />

stereo vision<br />

pattern 730<br />

recognition<br />

900<br />

52<br />

BLAST<br />

40<br />

27<br />

Reed-Solomon<br />

Decoding 2400<br />

MAC<br />

1000<br />

400<br />

288<br />

SPIHT wavelet-based<br />

image compression 457<br />

FFT<br />

88<br />

protein<br />

identification<br />

20 Astrophysics<br />

DES breaking<br />

crypto<br />

1000<br />

Viterbi Decoding<br />

Smith-Waterman<br />

pattern matching<br />

100<br />

8723<br />

molecular<br />

dynamics<br />

simulation<br />

28514<br />

DNA & protein<br />

sequencing<br />

3000<br />

CT imaging<br />

Bioinformatics<br />

Benchmarks: Moore‘s<br />

law does not indicate<br />

microprocessor MIPS<br />

200<br />

175<br />

150<br />

125<br />

100<br />

75<br />

50<br />

1996<br />

420<br />

[BWRC, UC Berkeley, 2004]<br />

25<br />

HP<br />

0<br />

1990 1995 2000<br />

Software vs. FPGA<br />

46<br />

relative performance<br />

10 5<br />

10 4<br />

10 3<br />

10 2<br />

10 1<br />

?<br />

!<br />

year<br />

10 0 96 98 00 02 04 06 08 10<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

28<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


Speedup-Factor<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

15<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Moore‟s law not applicable to all aspects<br />

For multicore*:<br />

the Law of More …<br />

…with drastically declining<br />

programmer productivity<br />

*) number of cores doubles every 2 years<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

29<br />

Software<br />

vs. FPGA<br />

(2)<br />

Massive<br />

Energy Saving<br />

factors: ~10%<br />

of speedup factor<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

10 6<br />

10 3<br />

10 0<br />

Image processing,<br />

Pattern matching,<br />

Multimedia<br />

real-time<br />

face detection<br />

6000<br />

GRAPE<br />

30<br />

DSP and<br />

wireless<br />

video-rate<br />

stereo vision<br />

pattern 730<br />

recognition<br />

900<br />

52<br />

BLAST<br />

40<br />

30<br />

Reed-Solomon<br />

Decoding 2400<br />

MAC<br />

1000<br />

400<br />

288<br />

SPIHT wavelet-based<br />

image compression 457<br />

FFT<br />

88<br />

protein<br />

identification<br />

20 Astrophysics<br />

DES breaking<br />

crypto<br />

1000<br />

Viterbi Decoding<br />

Smith-Waterman<br />

pattern matching<br />

100<br />

molecular<br />

dynamics<br />

simulation<br />

28514<br />

DNA & protein<br />

sequencing<br />

8723<br />

3000<br />

CT imaging<br />

Bioinformatics<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

16<br />

reiner@hartenstein.de<br />

4 December 2009<br />

RC*: Demonstrating the intensive Impact<br />

Tarek<br />

El-Ghazawi<br />

[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008]<br />

SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster<br />

Application<br />

Speed-up<br />

factor<br />

Savings<br />

Power Cost Size<br />

DNA and Protein<br />

sequencing 8723 779 22 253<br />

DES breaking 28514 3439 96 1116<br />

much less memory and bandwidth needed<br />

*) RC = Reconfigurable Computing<br />

massively<br />

saving energy<br />

much less<br />

equipment<br />

needed<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

31<br />

31<br />

Why such Speed-up Factors ...<br />

... with FPGAs: a much worse technology !<br />

massive wiring overhead<br />

+ massive reconfigurability overhead<br />

+ routing congestion growing with FPGA size<br />

The „Reconfigurable Computing Paradox“<br />

main reason: no von Neumann Syndrome!<br />

more recently also: more „platform FPGAs“<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

32<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


end of the<br />

singlecore era<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

17<br />

reiner@hartenstein.de<br />

4 December 2009<br />

„RC“ =<br />

Reconfigurable<br />

Computing<br />

RC versus Multicore<br />

RC: speed-up often higher<br />

by orders of magnitude<br />

Sure !<br />

RC: energy-efficiency often higher:<br />

very much, or, by orders of magnitude ?<br />

this is the<br />

silver bullet<br />

We need both: Multicore and RC<br />

Sure !<br />

Multicore:<br />

legacy <strong>software</strong>,<br />

control-intensive<br />

applications, etc.<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

33<br />

For a Booming Multicore Era<br />

relative performance<br />

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30<br />

von-Neumann-only http://hartenstein.de<br />

is not the silver bullet<br />

Reconfigurable<br />

© 2009,<br />

reiner@hartenstein.de<br />

Computing is indispensable!<br />

34<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden<br />

34<br />

year


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

18<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Outline (5)<br />

• The Power Consumption of Computing<br />

• The Single-Core Approach<br />

• The Multicore Scenario<br />

• The Silver Bullet?<br />

• A CPU-centric Flat World<br />

• The Generalisation of Software Engineering<br />

• Conclusions<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

35<br />

CPU-centric<br />

flat world<br />

(Aristotelian model)<br />

typical programmer<br />

qualification:<br />

sequential-only<br />

mind set –<br />

CPU-“centric“ but no<br />

hardware know-how<br />

(kind of tunnel view)<br />

This<br />

Software-centric<br />

world model<br />

is obsolete<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

36<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

19<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Machine Model of the Mainframe Era<br />

Machine<br />

model<br />

resources<br />

programming<br />

property source<br />

property<br />

sequencer<br />

programming source<br />

state<br />

register<br />

CPU hardwired - programmable Software<br />

(instruction streams)<br />

program<br />

counter<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

37<br />

F. L. Bauer<br />

40 years Software Crisis<br />

Wirth„s Law [Niklaus Wirth]<br />

“<strong>software</strong> is slowing faster<br />

than hardware is accelerating“<br />

Oct 1957<br />

The Economist: Nov 19th 1955<br />

Nathan Myhrvold<br />

Software critics is not new:<br />

Nathan‟s Law: Software is a gas.<br />

It expands to fill its container ...<br />

… until being limited by Moore’s Law<br />

[& Kryder’s Law]<br />

[Cyril Northcote Parkinson]<br />

In 1955, Parkinson could not have<br />

foreseen the impact of <strong>software</strong>.<br />

formula: bureaucracy growth independent of actual work to be done<br />

F. L. Bauer 1968, coined the term „Software Crisis“<br />

http://hartenstein.de<br />

N. N. © 1995: 2009, THE STANDISH GROUP REPORT<br />

reiner@hartenstein.de<br />

Robert N. Charette 2005: Why Software Fails; IEEE Spectrum, Sep 2005<br />

Anthony Berglas 2008: Why it is Important that Software 38 Projects Fail<br />

Peter G. Neumann 1985-2003:<br />

216x “Inside Risks“(18 years inside back<br />

cover of Comm_ACM)<br />

L. Savain 2006:<br />

Why Software is bad<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

20<br />

reiner@hartenstein.de<br />

4 December 2009<br />

“Software”<br />

the uglyness<br />

of this term<br />

stands for extremely<br />

memory-cycle-hungry instruction streams<br />

Patterson‟s Law:<br />

bandwidth gap grows 50% / year<br />

Dave has reached >1000x<br />

Patterson<br />

“The Memory Wall”<br />

coined by Sally McKee (& co-author)<br />

The von Neumann<br />

Syndrome:<br />

C.V.<br />

Ramamoorthy<br />

from earlier talks:<br />

overhead piles up to code sizes<br />

of astronomic dimensions<br />

datastream<br />

parallelism<br />

from earlier talks:<br />

instruction stream<br />

parallelism<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

stolen from Bob Colwell<br />

memory<br />

wall,<br />

caches,<br />

...<br />

data meet PU<br />

39<br />

Machine Model of the PC Era<br />

Machine<br />

model<br />

resources<br />

property<br />

programming<br />

source<br />

property<br />

sequencer<br />

programming source<br />

ASIC<br />

accelerator hardwired - hardwired -<br />

CPU hardwired - programmable Software<br />

(instruction streams)<br />

state<br />

register<br />

program<br />

counter<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

“the tail is<br />

wagging<br />

the dog“<br />

Application-Specific Integrated Circuit &<br />

other accelerators: e.g. graphics processor<br />

[ ]<br />

ISIS 1997<br />

Austin, TX<br />

40<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

21<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Thomas S. Kuhn 1969:<br />

The Structure of<br />

Scientific Revolutions<br />

Thomas S. Kuhn<br />

Science does not<br />

progress continuously,<br />

… shortcomings in an established<br />

paradigm produces a crisis<br />

that may lead to a revolution<br />

…in which the established paradigm<br />

is overthrown and replaced. .<br />

?<br />

?<br />

The von Neumann paradigm?<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

41<br />

Outline (6)<br />

• The Power Consumption of Computing<br />

• The Single-Core Approach<br />

• The Multicore Scenario<br />

• The Silver Bullet?<br />

• A CPU-centric Flat World<br />

• The Generalisation of Software Engineering<br />

• Conclusions<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

42<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

22<br />

reiner@hartenstein.de<br />

4 December 2009<br />

From CPU to RPU<br />

machine<br />

model<br />

right now<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

resources<br />

programming<br />

property source<br />

property<br />

43<br />

sequencer<br />

programming<br />

source<br />

ASIC<br />

accelerator hardwired - hardwired -<br />

CPU hardwired - programmable<br />

RPU<br />

accelerator<br />

programmable<br />

now accelerators<br />

are programmable!<br />

non-von-Neumann-<br />

Configware<br />

(configuration<br />

code)<br />

programmable<br />

Reconfigurable<br />

Processing Unit<br />

Software<br />

(instruction<br />

streams)<br />

Flowware<br />

(data streams)<br />

state<br />

register<br />

program<br />

counter<br />

data<br />

counters<br />

we need 2 more program sources<br />

What Revolution?<br />

Thomas<br />

Kuhn<br />

is right !<br />

“… in which the established paradigm is<br />

overthrown and replaced.”<br />

[Thomas S. Kuhn 1969: The Structure of Scientific Revolutions]<br />

However, not the von Neumann paradigm<br />

will be overthrown and replaced.<br />

The CPU-centric world model of Software Engineering<br />

will be replaced by removing the tunnel view perspective<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

44<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

23<br />

reiner@hartenstein.de<br />

4 December 2009<br />

RC* outside a<br />

CPU-centric<br />

flat world?<br />

For the<br />

Multicore era<br />

we need<br />

a new model<br />

(Copernican)<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

45<br />

*) RC = Reconfigurable Computing<br />

Program Performance<br />

„Multicore computers shift the burden of <strong>software</strong><br />

performance from chip designers to programmers.“<br />

... performance drops & other problems<br />

in moving single-core to multicore ...<br />

Since People have to write code differently,<br />

we anyway need a Software Education Revolution ...<br />

46<br />

[J. Larus: Spending Moore's<br />

Dividend; C_ACM, May 2009]<br />

... the chance to move RC* from niche to mainstream<br />

Embedded syst. & hdw scene have the right background<br />

to reform the parallelism education of SW programmers<br />

Missing http://hartenstein.deprogrammer population and methodology:<br />

© 2009,<br />

reiner@hartenstein.de<br />

a scenario like before the Mead-&-Conway revolution<br />

*) RC = Reconfigurable Computing<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

24<br />

reiner@hartenstein.de<br />

4 December 2009<br />

A Heliocentric CS Model<br />

time to space<br />

mapping<br />

issue<br />

special<br />

Flowware<br />

Engineering<br />

RPU<br />

FE<br />

SE<br />

Software<br />

Engineering<br />

The Generalization of<br />

Software Engineering —<br />

PE<br />

Program<br />

Engineering<br />

*) do not confuse<br />

with „dataflow“!<br />

A Twin Paradigm Dual<br />

Dichotomy Approach.<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

47<br />

A Multicore Submarine Model?<br />

mapping parallelism just into the time domain:<br />

“abstracting” away the space domain is fatal<br />

C is not the silver bullet: it’s inherently serial<br />

There is no easy way to program in parallel<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

48<br />

But nobody wants to<br />

learn a new language.<br />

The programmer needs to understand how data flows<br />

through cores, accelerators, interconnect and peripherals<br />

The datastream model of the twin-paradigm approach<br />

helps to understand the space domain and parallelism<br />

The programmer* needs system visualization in the space<br />

domain, to understand performance under parallelism<br />

*) and, especially the student<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


David Parnas<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

25<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Our Contemporary Computer Machine Model<br />

Machine<br />

resources<br />

sequencer<br />

model<br />

programming<br />

property source<br />

property<br />

programming<br />

source<br />

ASIC<br />

accelerator hardwired - hardwired -<br />

CPU hardwired - programmable<br />

Software<br />

(instruction<br />

streams)<br />

RPU<br />

accelerator<br />

programmable<br />

Configware<br />

Flowware<br />

(configuration programmable (data<br />

code)<br />

streams)<br />

data counters of reconfigurable<br />

address generators in asM (autosequencing)<br />

data memory blocks<br />

state register<br />

program<br />

counter<br />

in CPU<br />

data<br />

counters<br />

in RAM<br />

twin Paradigm Dichotomy<br />

the same language primitives!<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

49<br />

Time to Space Mapping<br />

Machine<br />

resources<br />

sequencer<br />

model<br />

programming<br />

property source<br />

property<br />

programming<br />

source<br />

ASIC<br />

accelerator hardwired - hardwired -<br />

CPU hardwired - programmable<br />

Software<br />

(instruction<br />

streams)<br />

RPU<br />

accelerator<br />

programmable<br />

Configware<br />

Flowware<br />

(configuration programmable (data<br />

code)<br />

streams)<br />

Relativity Dichotomy<br />

loop turns<br />

2 pipeline C<br />

state register<br />

program<br />

counter<br />

data<br />

counters<br />

1967<br />

„The biggest payoff will come from Putting Old ideas into<br />

Practice and teaching people how to apply them properly.“<br />

C<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

50<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

26<br />

reiner@hartenstein.de<br />

4 December 2009<br />

How to achieve acceptance<br />

Hardware description<br />

languages hidden<br />

Tools usable by users<br />

not being hardware<br />

designers<br />

„How to hide the ugliness<br />

from the user“ [Herman Schmit]<br />

Courses tailored for<br />

students not being<br />

hardware-savvy<br />

[Courtesy Richard Newton]<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

51<br />

Software Education (R)evolution:<br />

step by step, not overthrowing the SE scene<br />

by simultaneous dual domain co-education:<br />

traditional qualification in the time domain<br />

+ lean qualification in the space domain<br />

= lean hardware modeling qualification<br />

at a higher level of abstraction<br />

viable methodology for dual rail education<br />

(only a few % curricula need to be changed)<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

52<br />

52<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

27<br />

reiner@hartenstein.de<br />

4 December 2009<br />

We need a Software Education Revolution<br />

2010 - ....<br />

DOS to Windows<br />

took 10 years<br />

partially<br />

re-write<br />

the code<br />

Create the<br />

missing<br />

programmer<br />

population<br />

next most<br />

http://hartenstein.de<br />

effective<br />

©<br />

project in<br />

2009,<br />

reiner@hartenstein.de<br />

the history of modern<br />

computer science<br />

The incubator<br />

of the free ride<br />

on Cowhey‘s &<br />

Aronson‘s law<br />

53<br />

53<br />

massive<br />

funding<br />

required<br />

RAW 2010<br />

in conjunction with:<br />

17th Reconfigurable Architectures Workshop<br />

April 19-20, 2010, Atlanta (Georgia), USA<br />

Run-Time Reconfiguration & Adaptive Computing:<br />

Architectures, Algorithms, Technologies<br />

24th IEEE International<br />

Parallel and Distributed<br />

Processing Symposium<br />

April 19-23, 2010,<br />

Atlanta (Georgia) USA<br />

http://www.ece.lsu.edu/vaidy/raw/<br />

Manuscript due: October 18, 2009<br />

Notification of acceptance: December 14, 2009<br />

Camera-ready Papers Due: February 1, 2010<br />

http://www.ipdps.org/<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

54<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


the end of the<br />

singlecore era<br />

<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

28<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Outline (7)<br />

• The Power Consumption of Computing<br />

• The Single-Core Approach<br />

• The Multicore Scenario<br />

• The Silver Bullet?<br />

• A CPU-centric Flat World<br />

• The Generalisation of Software Engineering<br />

• Conclusions<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

55<br />

Conclusions (1)<br />

To maintain a Booming Multicore Era:<br />

possible for 2 or 3 more decades?<br />

Not without Reconfigurable Computing!<br />

relative<br />

performance<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

04 06 08 10 12 14 16 18 20 22<br />

56<br />

24 26 28 30<br />

year<br />

56<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

29<br />

reiner@hartenstein.de<br />

4 December 2009<br />

key motivation: performance and<br />

energy consumption of programs<br />

Conclusions<br />

additional Flowware / Configware skills are<br />

essential qualifications for programmers.<br />

we need to master hetero of<br />

all 3: Singlecore, Multicore,<br />

& Reconfigurable Computing<br />

Mead-&-Conway-style<br />

SE Revolution toward<br />

dual-rail education<br />

is urgently needed<br />

SERUM-RC<br />

the key issue:<br />

ease of use!<br />

massive long term<br />

R&D funding required<br />

like known from DARPA<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

A main problem: selecting (or<br />

creating) tools for lab courses<br />

57<br />

We need „une' Levée en Masses“<br />

We need „une'<br />

Levée en Masses“<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

58<br />

58<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

30<br />

reiner@hartenstein.de<br />

4 December 2009<br />

thank you for your<br />

patience<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

59<br />

59<br />

END<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

60<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

31<br />

reiner@hartenstein.de<br />

4 December 2009<br />

CV of <strong>Reiner</strong> <strong>Hartenstein</strong><br />

Credited to be „The father of Reconfigurable Computing“ (also pre-FPGA era) [1],<br />

Creator of KARL[2], most successful [3] trailblazer HDL before VHDL came up<br />

EU grant (80ies), 85 mio ECU (pre-€): complete EDA framework [4,5] around KARL<br />

1981: visiting professor at UC Berkeley (& coop. w. Xerox PARC)<br />

KARL: a<br />

Pascalish<br />

hardware<br />

language<br />

1983: founder of the German contribution to the Mead-&-Conway VLSI design revolution:<br />

the multi university „E.I.S. project“ (gov. grant: 38 million Deutschmark)<br />

Founder / co-founder of several international annual conference series<br />

IEEE fellow, SDPS fellow, FPL fellow, best paper awards, other awards<br />

Professor (ordinarius emeritus), TU Kaiserslautern<br />

his hobby:<br />

giving keynotes<br />

http://hartenstein.de/keynotes.htm<br />

reiner@hartenstein.de<br />

All acad. degrees from KIT Karlsruhe Institute of Technology (his mentor: Karl Steinbuch)<br />

[1] qu. Viktor Prasanna (with Gerald Estrin as the grandfather of Reconfigurable Computing, who proposed it in 1960 WJCC)<br />

[2] R. <strong>Hartenstein</strong>: Fundamentals of Structured Hardware Design; American Elsevier, 1977 -- Bestseller<br />

[3] for users, usage details, quotations,etc.see: http://www.fpl.uni-kl.de/staff/hartenstein/KARLUsers.html<br />

[4] R. http://hartenstein.de<br />

<strong>Hartenstein</strong>: The History of KARL and ABL; in: J. Mermet (editor): Fundamentals and Standards in<br />

Hardware © 2009, Description Languages; ISBN 0-7923-2513-4, Kluwer (now Springer), September 1993.<br />

also reiner@hartenstein.de<br />

see: http://xputers.informatik.uni-kl.de/karl/karl_history_fbi.html<br />

[5] format-checking functional floorplan graphic editor, and textual editors, calculus-based term rewriting floorplan generator,<br />

embedded router, automatic test generation, testability analysis, structured logic 61synthesis, simulator, et al. -- also see [4]<br />

1977 & later<br />

used as a<br />

textbook at<br />

UC Berkeley<br />

(not only here)<br />

61<br />

backup for discussion<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

62<br />

62<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

32<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Double Dichotomy<br />

1) Paradigm Dichotomy<br />

von Neumann Machine Datastream Machine<br />

instruction stream data stream<br />

(Software-Domain)<br />

(Flowware-Domain)<br />

2) Relativity Dichotomy<br />

time: space:<br />

-Procedure -Structure<br />

(Software-Domain)<br />

(Configware-Domain)<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

63<br />

63<br />

time<br />

time domain:<br />

procedure domain<br />

Relativity Dichotomy<br />

space<br />

2 phases:<br />

1) programming<br />

instruction streams<br />

2) run time von Neumann Machine<br />

(time time/space)<br />

space domain:<br />

structure domain<br />

3 phases:<br />

1) reconfiguration<br />

of structures<br />

2) programming<br />

data streams<br />

3) run time<br />

Datastream Machine<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

64<br />

64<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

33<br />

reiner@hartenstein.de<br />

4 December 2009<br />

time-iterative to space-iterative<br />

n time steps,<br />

1 CPU<br />

a time to<br />

space<br />

mapping<br />

1 time step,<br />

n DPUs<br />

loop transformation<br />

methodogy: 70ies and later<br />

e. g. example: bubble sort migration<br />

Often the space dimension is limited<br />

n*k time steps,<br />

1 CPU<br />

a time to<br />

space/time<br />

mapping<br />

n time steps,<br />

k DPUs<br />

Strip mining<br />

[D. Loveman, J-ACM, 1977]<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

65<br />

65<br />

time to space mapping<br />

time domain:<br />

procedure domain<br />

conditional<br />

swap<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

program loop<br />

n time steps, 1 CPU<br />

Bubble Sort<br />

n x k time steps,<br />

x<br />

y<br />

time algorithm<br />

1 „conditional<br />

swap“ unit<br />

time algorithm<br />

space domain:<br />

structure domain<br />

space algorithm<br />

pipeline<br />

1 time step, n DPUs<br />

Shuffle Sort<br />

k time steps,<br />

n „conditional<br />

swap“ units<br />

space/time algorithm s<br />

conditional<br />

swap<br />

conditional<br />

swap<br />

conditional<br />

swap<br />

conditional<br />

swap<br />

66<br />

66<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

34<br />

reiner@hartenstein.de<br />

4 December 2009<br />

4<br />

2<br />

3<br />

1<br />

*> Declarations<br />

EastScan is<br />

step by [1,0]<br />

end EastScan;<br />

SouthScan is<br />

step by [0,1]<br />

endSouthScan;<br />

NorthEastScan is<br />

loop 6 times until [*,1]<br />

step by [1,-1]<br />

endloop<br />

end NorthEastScan;<br />

SouthWestScan is<br />

loop 7 times until [1,*]<br />

step by [-1,1]<br />

endloop<br />

end SouthWestScan;<br />

HalfZigZag is<br />

EastScan<br />

loop 3 times<br />

SouthWestScan<br />

SouthScan<br />

NorthEastScan<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

EastScan<br />

endloop<br />

end HalfZigZag;<br />

Main program:<br />

goto PixMap[1,1]<br />

HalfZigZag;<br />

SouthWestScan<br />

uturn (HalfZigZag)<br />

y<br />

x<br />

JPEG zigzag scan pattern<br />

Flowware language example (MoPL):<br />

programming the datastream<br />

1 2 3 4 5 6 7 8<br />

data HalfZigZag<br />

counter<br />

data counter<br />

data counter<br />

data counter<br />

HalfZigZag<br />

67<br />

(an animation)<br />

67<br />

x<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

y<br />

LPF 1<br />

FMDemod<br />

Split<br />

LPF 2 LPF 3<br />

Programming model: Flowware<br />

• Pros for streaming<br />

[Pierre Paulin]<br />

– Streamlined, low-overhead<br />

communication<br />

– (More) deterministic behaviour<br />

– Good match for many simple media<br />

rich applications<br />

HPF 1 HPF 2 HPF 3<br />

Gather<br />

• Cons<br />

– control-dominated applications<br />

– shunt yard problem<br />

Source:<br />

MIT<br />

StreamIT<br />

Adder<br />

Speaker<br />

We„ve to find out, which applications<br />

types and programming models Students<br />

should exercise for the flowware approach<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

68<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

35<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Flowware means parallelism<br />

resulting from time to space migration<br />

Flowware<br />

Flowware: scheduling data streams -<br />

from a generalization of the systolic arrays<br />

supports any wild free form of pipe networks:<br />

spiral, zigzag, fork and join, and even more wild,<br />

unidirectional and fully or partially bidirectional,<br />

Fifos, stacks, registers, register files, RAM blocks...<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

69<br />

Ways to implement an Algorithm<br />

• Hardware<br />

• Software<br />

• Configware<br />

• mixed<br />

RAM-based<br />

von<br />

Neumannmachine<br />

datastream<br />

machine<br />

singlecore<br />

multicore<br />

manycore<br />

. manycore<br />

per se<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

70<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

36<br />

reiner@hartenstein.de<br />

4 December 2009<br />

Acceleration Mechanisms<br />

•parallelism by multi bank memory architecture<br />

•auxiliary hardware for address calculation<br />

•address calculation before run time<br />

•avoiding multiple accesses to the same data.<br />

•avoiding memory cycles for address computation<br />

•optimization by storage scheme transformations<br />

•optimization by memory architecture transformations<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

71<br />

New boundary constraints<br />

New boundary constraints are the limiting factor<br />

Legacy scientific applications: predominantly sequential<br />

The entire <strong>software</strong> ecosystem will need to evolve<br />

(including curricula): O/S, libraries, <strong>software</strong><br />

development environments, compilers and languages<br />

additional levels of parallelism: chaining, pipelining,<br />

systolic, super-systolic, wavefront arrays<br />

additional data structures and storage organization:<br />

the new distributed memory discipline<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

27 October 2008 Software 2008, Zurich<br />

72<br />

72<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

37<br />

reiner@hartenstein.de<br />

4 December 2009<br />

old Paradigms and Methodologies<br />

1946: Machine Paradigm (von Neumann)<br />

1980: Datastreams (Kung, Leiserson)<br />

1989: Anti Machine** Paradigm (TU-KL)<br />

1990: first rDPA* (Rabaey)<br />

1994: higher Anti Machine** Programming Language (Flowware: TU-KL)<br />

1995: super systolic array: rDPA (Kress)<br />

1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...<br />

**) datastream machine<br />

(flowware machine):<br />

no „dataflow machine“!!<br />

*) rDPA = reconfigurable<br />

Data Path Array<br />

1997+: Discipline of Distributed Memory Architectures (IMEC …)<br />

1997: first automatically partitioning Configware/Software Co-Compiler<br />

(TU-KL)<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

73<br />

“We urgently need a<br />

Mead-&-Conway-style<br />

new mass movement“<br />

SERUM-RC*<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

74<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden


<strong>Reiner</strong> <strong>Hartenstein</strong>, TU Kaiserslautern, Germany<br />

http://hartenstein.de/RH-bio.pdf<br />

38<br />

reiner@hartenstein.de<br />

4 December 2009<br />

SERUM-RC*<br />

Software Education Revolution<br />

for using Multicore - and RC* (SERUM-RC*)<br />

*) Reconfigurable Computing<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

75<br />

75<br />

“We urgently need a<br />

Mead-&-Conway-dimension<br />

text book on twin-paradigm<br />

programming education“<br />

Teaching computing fundamentals<br />

Ignoring Reconfigurable Computing<br />

in teaching computing fundamentals<br />

within our CS curricula is one of<br />

the biggest mistakes in the history of<br />

information technology application<br />

causing to waste billions of dollars.<br />

http://hartenstein.de<br />

© 2009,<br />

reiner@hartenstein.de<br />

reiner@hartenstein.de<br />

76<br />

Opening keynote, the 6 th FPGAworld Conference, 10 Sep 2009, Stockholm, Sweden

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!