01.06.2013 Views

Design of a Hardware/Software RTOS for FPGAs with Processors

Design of a Hardware/Software RTOS for FPGAs with Processors

Design of a Hardware/Software RTOS for FPGAs with Processors

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Design</strong> <strong>of</strong> a <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong> <strong>for</strong><br />

<strong>FPGAs</strong> <strong>with</strong> <strong>Processors</strong><br />

The δ <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong> Generation Framework<br />

Vincent J. Mooney III<br />

http://codesign.ece.gatech.edu<br />

http:// codesign.ece.gatech.edu<br />

http://www.crest.gatech.edu<br />

http:// www.crest.gatech.edu<br />

Associate Pr<strong>of</strong>essor, School <strong>of</strong> Electrical and Computer Engineering<br />

Engineering<br />

Adjunct Associate Pr<strong>of</strong>essor, College <strong>of</strong> Computing<br />

Associate Director, Center <strong>for</strong> Research on Embedded Systems and Technology<br />

Georgia Institute <strong>of</strong> Technology<br />

Atlanta, Georgia, USA<br />

FPGA World 2004 HW/SW <strong>RTOS</strong> Project <strong>of</strong> the HW/SW Codesign Group at GT<br />

©Vincent J. Mooney III, 2004


Outline<br />

• Trends<br />

• Vision: <strong>Hardware</strong>/S<strong>of</strong>tware Real-Time<br />

Operating System<br />

• Custom <strong>RTOS</strong> <strong>Hardware</strong> IP Components<br />

• System-on-a-Chip Lock Cache (SoCLC)<br />

• SoC Deadlock Detection Unit (SoCDDU) and SoC<br />

Deadlock Avoidance Unit (SoCDAU)<br />

• The δ <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong> Generation<br />

Framework<br />

• Comparison <strong>with</strong> the RTU <strong>Hardware</strong> <strong>RTOS</strong><br />

• Conclusion<br />

FPGA World 2004 2<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Digital Silicon CMOS VLSI Trends<br />

Yesterday (1980s) Today Tomorrow<br />

memory<br />

processors<br />

gate arrays<br />

ASICs<br />

memory<br />

processors<br />

SoC<br />

reconfigurable<br />

gate arrays<br />

ASICs<br />

memory<br />

processors<br />

Plat<strong>for</strong>m SoC<br />

Custom SoC ≤ 9<br />

products<br />

reconfigurable<br />

gate arrays<br />

ASICs<br />

FPGA World 2004 3<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

≥ 10<br />

products


Why Should SoC Customizers/Programmers<br />

Customizers/Programmers<br />

Worry About <strong>RTOS</strong>es?<br />

<strong>RTOS</strong>es<br />

• Trend: logic design → RTL design → processors +<br />

custom RTL design → embedded s<strong>of</strong>tware design!<br />

• The <strong>RTOS</strong> is the state-<strong>of</strong>-the-art embedded<br />

s<strong>of</strong>tware manager<br />

• Prediction: the best embedded s<strong>of</strong>tware designers<br />

will work closely <strong>with</strong> a custom <strong>RTOS</strong> design =><br />

the best SoC hardware/s<strong>of</strong>tware designs will<br />

include trade<strong>of</strong>fs in the <strong>RTOS</strong><br />

FPGA World 2004 4<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Vision: Dynamic S<strong>of</strong>tware/<br />

<strong>Hardware</strong> <strong>RTOS</strong> <strong>Design</strong><br />

Key to System-on-a-Chip architecture<br />

optimization and customization<br />

FPGA World 2004 5<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


SoC Example: Broadcom BCM1400<br />

• Four Processor<br />

Cores<br />

–MIPS64<br />

– 1GHz<br />

– 8-way 1MB<br />

Shared L2<br />

• On-chip<br />

ZBbus<br />

– maintains<br />

coherency<br />

– proprietary<br />

• Off-chip HT/<br />

SPI-4 19Gb/s<br />

MIPS64<br />

Core0<br />

Memory<br />

Bridge<br />

MIPS64<br />

Core1<br />

HT/<br />

SPI-4<br />

MIPS64<br />

Core2<br />

MIPS64<br />

Core3<br />

ZBbus: 128Gb/s @ 1GHz<br />

L2<br />

Shared<br />

Mem<br />

Arbiter<br />

[Levy02] M. Levy, “Chip Combines Four 1GHz Cores,”<br />

Microprocessor Report, pp. 12-14, October 2002.<br />

FPGA World 2004 6<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

X<br />

HT/<br />

SPI-4<br />

HT/<br />

SPI-4<br />

Port0 Port1 Port2<br />

Packet<br />

DMA<br />

SoC<br />

Interfaces


Motivational Example: Home 2005<br />

Programmable<br />

PDA<br />

Projectors<br />

SoC Device Central Storage Unit (e.g., PC)<br />

wireless link wired link<br />

Projected Light Displays<br />

FPGA World 2004 7<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


• Microprocessor design<br />

– Compiler<br />

– Computer architecture<br />

Analogy<br />

• SoC design<br />

– Dynamic hw/sw <strong>RTOS</strong><br />

– SoC architecture<br />

FPGA World 2004 8<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Building Blocks<br />

• SoC Programming Model<br />

– multi-threading, shared mem., message passing,<br />

control-data flow graph<br />

• SoC Programming Environment<br />

– δ <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong><br />

• Microprocessor Programming Model<br />

– C/C++/Java/other serial language<br />

• Microprocessor Programming Environment<br />

– gcc, various<br />

FPGA World 2004 9<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Approach<br />

• δ Hw/Sw <strong>RTOS</strong> made up <strong>of</strong> library components<br />

• Library component = predefined C code, assembly<br />

code or HDL code<br />

• Similar to existing <strong>RTOS</strong>’s, except <strong>for</strong> the HDL code<br />

– ex.: SoC Lock Cache in hardware<br />

• <strong>RTOS</strong> HDL code can be automatically generated by<br />

a custom “IP Generator”<br />

– ex.: PARLAK SoC Lock Cache generator<br />

FPGA World 2004 10<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


The δ <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong><br />

Generation Framework<br />

SW <strong>RTOS</strong><br />

w/ sem<br />

FPGA World 2004 11<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

<strong>RTOS</strong>1<br />

Application<br />

User<br />

Input<br />

S<strong>of</strong>tware<br />

<strong>RTOS</strong> library<br />

SW <strong>RTOS</strong><br />

+<br />

SoCLC<br />

SW <strong>RTOS</strong><br />

w/ deadlock<br />

avoid.<br />

Executable HW file<br />

<strong>for</strong> each<br />

<strong>Hardware</strong><br />

<strong>RTOS</strong> library<br />

GUI tool<br />

SW <strong>RTOS</strong><br />

+<br />

SoCDDU<br />

Compile Stage <strong>for</strong> each system<br />

Base<br />

Architecture<br />

library<br />

Executable SW file<br />

<strong>for</strong> each<br />

SW <strong>RTOS</strong><br />

+ SoCLC<br />

+<br />

SoCDDU<br />

<strong>RTOS</strong>2 <strong>RTOS</strong>3 <strong>RTOS</strong>4 <strong>RTOS</strong>5<br />

Simulation in Seamless<br />

VCS XRAY<br />

CVE<br />

RTU<br />

<strong>RTOS</strong>6


Outline<br />

• Trends<br />

• Vision: <strong>Hardware</strong>/S<strong>of</strong>tware Real-Time<br />

Operating System<br />

• Custom <strong>RTOS</strong> <strong>Hardware</strong> IP Components<br />

• System-on-a-Chip Lock Cache (SoCLC)<br />

• SoC Deadlock Detection Unit (SoCDDU) and SoC<br />

Deadlock Avoidance Unit (SoCDAU)<br />

• The δ <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong> Generation<br />

Framework<br />

• Comparison <strong>with</strong> the RTU <strong>Hardware</strong> <strong>RTOS</strong><br />

• Conclusion<br />

FPGA World 2004 12<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


SoC Lock Cache<br />

• A hardware mechanism that resolves the critical section<br />

(CS) interactions among PEs<br />

• Lock variables are moved into a separate “lock cache”<br />

outside <strong>of</strong> the memory<br />

• Improves the per<strong>for</strong>mance criteria in terms <strong>of</strong> lock latency,<br />

lock delay and bandwidth consumption<br />

FPGA World 2004 13<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


S<strong>of</strong>tware/<strong>Hardware</strong> Architecture<br />

Application S<strong>of</strong>tware<br />

(Tasks)<br />

Atalanta-<strong>RTOS</strong><br />

Memory<br />

S<strong>of</strong>tware<br />

<strong>Hardware</strong><br />

Extension<br />

MPC750A MPC750B MPC750C<br />

MPC750D<br />

SoCLC<br />

Arbitration<br />

Logic<br />

•Multiple application tasks<br />

•Atalanta‐<strong>RTOS</strong><br />

•Four MPC750s<br />

• SoCLC provides lock syn‐<br />

chronization among PEs<br />

SoC Lock Cache<br />

FPGA World 2004 14<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Experiment<br />

Example: Database transaction application [1]<br />

short_Req3<br />

Client<br />

transaction1<br />

O 2<br />

O 3<br />

transaction3<br />

O 4<br />

long_Req1<br />

Access <strong>of</strong><br />

Object O 2<br />

by transaction1<br />

long_Req3<br />

Access <strong>of</strong><br />

Object O 4<br />

by transaction3<br />

Server<br />

transaction2<br />

[1] M. A. Olson, “Selecting and implementing an embedded database system,” IEEE Computer,<br />

pp.27-34, September 2000.<br />

FPGA World 2004 15<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

O 4<br />

O 2<br />

transaction4<br />

short_Req4


Experimental Result<br />

Comparison <strong>with</strong> database application example [2]<br />

• <strong>RTOS</strong>1 <strong>with</strong> semaphores and spin-locks<br />

• <strong>RTOS</strong>2 <strong>with</strong> SoCLC, no SW semaphores or spin-locks<br />

(clock cycles) * Without SoCLC With SoCLC Speedup<br />

Lock Latency 1200 908 1.32x<br />

Lock Delay 47264 23590 2.00x<br />

Execution Time 36.9M 29M 1.27x<br />

* Semaphores <strong>for</strong> long critical sections (CSes) and<br />

spin-locks <strong>for</strong> short CSes are used instead <strong>of</strong> SoCLC.<br />

[2] B. S. Akgul, J. Lee and V. Mooney, “System-on-a-chip processor synchronization hardware<br />

unit <strong>with</strong> task preemption support,” CASES ‘01, pp.149-157, November 2001.<br />

FPGA World 2004 16<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Outline<br />

• Trends<br />

• Vision: <strong>Hardware</strong>/S<strong>of</strong>tware Real-Time<br />

Operating System<br />

• Custom <strong>RTOS</strong> <strong>Hardware</strong> IP Components<br />

• System-on-a-Chip Lock Cache (SoCLC)<br />

• SoC Deadlock Detection Unit (SoCDDU) and SoC<br />

Deadlock Avoidance Unit (SoCDAU)<br />

• The δ <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong> Generation<br />

Framework<br />

• Comparison <strong>with</strong> the RTU <strong>Hardware</strong> <strong>RTOS</strong><br />

• Conclusion<br />

FPGA World 2004 17<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


SoC Deadlock Detection Unit<br />

(SoCDDU SoCDDU)<br />

• Matrix based parallel processing approach<br />

• Removable edge reduction technique to reveal cycles (i.e.,<br />

deadlock)<br />

– Removable edge*: not related to deadlock<br />

• Simple bit-wise Boolean operations<br />

– Implementation easier<br />

– Operation faster, O(min(m,n))<br />

– 2~3 orders <strong>of</strong> magnitude faster than s<strong>of</strong>tware<br />

• Novelty from previous algorithms<br />

– Does NOT trace exact cycles<br />

– Does NOT require linked lists<br />

P. Shiu, Y. Tan and V. Mooney, "A Novel Parallel Deadlock Detection<br />

Algorithm and Architecture," 9th International Workshop on<br />

<strong>Hardware</strong>/S<strong>of</strong>tware Codesign (CODES'01), pp. 30-36, April 2001.<br />

P1 P2<br />

Q1<br />

Q2 Q3<br />

FPGA World 2004 18<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

*<br />

P3


Architecture <strong>of</strong> the SoC Deadlock<br />

DDU<br />

(matrix)<br />

Start<br />

Reset<br />

Done<br />

Deadlock<br />

Avoidance Unit (SoCDAU ( SoCDAU)<br />

Cell access<br />

DAA<br />

Logic<br />

w/ FSM<br />

Address<br />

decoder<br />

Command<br />

registers<br />

Status<br />

registers<br />

DDU: Deadlock Detection Unit<br />

FSM: finite state machine<br />

Address<br />

control<br />

Data<br />

FPGA World 2004 19<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


SoCDDU/SoCDAU Experiment<br />

• Four resources<br />

– Q1: Video Interface (VI)<br />

– Q2: MPEG<br />

– Q3: DSP<br />

– Q4: Wireless Interface (WI)<br />

- System and Application<br />

• Each process requires two resources except P4<br />

– P1: processing a video stream (needs Q1 + Q2)<br />

– P2: separating/enhancing frames (needs Q2 + Q3)<br />

– P3: extracting special images (needs Q3 + Q1)<br />

– P4: transferring images (needs Q4)<br />

• One active process <strong>for</strong> each processing element (PE)<br />

HW<br />

SW<br />

<strong>RTOS</strong><br />

FPGA World 2004 20<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Experimental Results <strong>of</strong> SoCDAU<br />

• Deadlock avoidance simulation: Two kinds<br />

– SoCDAU: Synopsys VCS runs compiled Verilog code<br />

– DAA in s<strong>of</strong>tware: MPC755 runs compiled C code in Seamless CVE<br />

– Example application invokes deadlock avoidance 12 times<br />

P1 P2<br />

t1<br />

Q1<br />

t3<br />

P3<br />

t2<br />

Q2 Q3<br />

P1 P2<br />

Q1<br />

Q2 Q3<br />

FPGA World 2004 21<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

t4<br />

t5<br />

P3


Experimental Results <strong>of</strong> SoCDAU<br />

(cont’d) (cont d)<br />

• Deadlock avoidance simulation result<br />

• Per<strong>for</strong>mance improvement<br />

– 99% algorithm execution time reduction<br />

– 37% reduction in an application execution time<br />

Method <strong>of</strong> Deadlock<br />

Avoidance<br />

SoCDAU <strong>Hardware</strong><br />

DAA in S<strong>of</strong>tware<br />

Algorithm Exe.<br />

Time (cycles)<br />

7 (average)<br />

2188 (average)<br />

Normalized<br />

Exe. Time<br />

312X<br />

FPGA World 2004 22<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

1X<br />

Application Exe.<br />

Time (cycles)<br />

34791<br />

47704


Synthesis Results <strong>of</strong> SoCDAU<br />

• SoCDAU <strong>for</strong> 5 processes and 5 resources<br />

• Qualcore Logic library <strong>with</strong> TSMC .25µm technology<br />

• 0.01% <strong>of</strong> the total SoC area <strong>with</strong> four PEs and memory<br />

Module Name<br />

SoCDDU 5x5<br />

(inside SoCDAU)<br />

DAA Logic<br />

Total Area<br />

in terms <strong>of</strong> twoinput<br />

NAND<br />

gates<br />

364<br />

1472<br />

Lines <strong>of</strong> Verilog HDL<br />

Code<br />

203<br />

344<br />

TSMC: Taiwan Semiconductor Manufacturing Company<br />

PE: Processing Element<br />

FPGA World 2004 23<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Total area in<br />

Semi-custom VLSI<br />

Xilinx<br />

XC4000E 4003EPC84<br />

<strong>Hardware</strong> Area<br />

SoCLC<br />

(For 64 short CS locks +<br />

64 long CS locks)<br />

7435 gates using TSMC 0.25µm<br />

standard cell library<br />

SoCDDU<br />

(For 5 <strong>Processors</strong> x<br />

5 Resources)<br />

364 gates using AMI<br />

0.3µm standard cell library<br />

# Seq. logic 532 10<br />

# Other gates 9036 559<br />

FPGA World 2004 24<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Outline<br />

• Trends<br />

• Vision: <strong>Hardware</strong>/S<strong>of</strong>tware Real-Time<br />

Operating System<br />

• Custom <strong>RTOS</strong> <strong>Hardware</strong> IP Components<br />

• System-on-a-Chip Lock Cache (SoCLC)<br />

• SoC Deadlock Detection Unit (SoCDDU) and SoC<br />

Deadlock Avoidance Unit (SoCDAU)<br />

• The δ <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong> Generation<br />

Framework<br />

• Comparison <strong>with</strong> the RTU <strong>Hardware</strong> <strong>RTOS</strong><br />

• Conclusion<br />

FPGA World 2004 25<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


User<br />

Input<br />

δ <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong><br />

<strong>Hardware</strong><br />

<strong>RTOS</strong><br />

Library<br />

Generation Framework<br />

and current simulation plat<strong>for</strong>m<br />

Base<br />

Architecture<br />

Library<br />

GUI Tool<br />

S<strong>of</strong>tware<br />

<strong>RTOS</strong><br />

Library<br />

Top.v<br />

HW <strong>RTOS</strong><br />

User.h<br />

SW <strong>RTOS</strong><br />

Makefile<br />

Application<br />

HW<br />

Compile<br />

SW<br />

Compile<br />

Executable<br />

HW<br />

Compiled<br />

<strong>Hardware</strong><br />

Description<br />

Executable<br />

SW<br />

Modelsim<br />

or VCS<br />

Simulation<br />

in<br />

Seamless<br />

CVE<br />

XRAY<br />

Result<br />

and<br />

Feedback<br />

FPGA World 2004 26<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


δ <strong>Hardware</strong>/S<strong>of</strong>tware <strong>RTOS</strong><br />

Generation Framework Goals<br />

To help the user examine which configuration is<br />

most suitable <strong>for</strong> the user’s specific applications<br />

To help the user explore the <strong>RTOS</strong> design space<br />

be<strong>for</strong>e chip fabrication as well as after chip<br />

fabrication (in which case reconfigurable logic must<br />

be available on the chip)<br />

To help the user examine different System-on-a-<br />

Chip (SoC) architectures subject to a custom <strong>RTOS</strong><br />

FPGA World 2004 27<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Motivation (1/3)<br />

HW/SW <strong>RTOS</strong> partitioning approach<br />

Previous innovations in HW/SW <strong>RTOS</strong> components<br />

• System-on-a-Chip Lock Cache (SoCLC)<br />

• System-on-a-Chip Dynamic Memory Management Unit<br />

(SoCDMMU)<br />

• System-on-a-Chip Deadlock Detection Unit (SoCDDU) and<br />

Deadlock Avoidance Unit (SoCDAU)<br />

• RTU <strong>Hardware</strong> <strong>RTOS</strong><br />

FPGA World 2004 28<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Motivation (2/3)<br />

SoCDMMU: System-on-a-Chip Dynamic Memory<br />

Management Unit<br />

• Provides fast, deterministic and yet dynamic<br />

memory management <strong>of</strong> a global on-chip memory<br />

• Achieves flexible, efficient memory utilization<br />

• Provides APIs <strong>for</strong> applications<br />

FPGA World 2004 29<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Motivation (3/3)<br />

Constraints about using the previous HW/SW <strong>RTOS</strong><br />

innovations<br />

• Perhaps not enough chip space <strong>for</strong> all three <strong>of</strong> them<br />

• All <strong>of</strong> them may not be necessary<br />

⇒ The δ framework<br />

• Enables automatic generation <strong>of</strong> different mixes <strong>of</strong> the<br />

previous innovations <strong>for</strong> different versions <strong>of</strong> a HW/SW<br />

<strong>RTOS</strong><br />

• Enables selection <strong>of</strong> the RTU hardware <strong>RTOS</strong><br />

• Can be generalized to instantiate additional HW or SW<br />

<strong>RTOS</strong> components<br />

FPGA World 2004 30<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Our <strong>RTOS</strong> in Post-Fabricaton<br />

Post Fabricaton Scenario<br />

Application(s) run on the SoC<br />

using standard <strong>RTOS</strong> APIs<br />

Atalanta s<strong>of</strong>tware <strong>RTOS</strong><br />

• A multiprocessor SoC <strong>RTOS</strong><br />

The <strong>RTOS</strong> and device drivers are<br />

loaded into the L2 cache memory<br />

• All Processing Elements (PEs)<br />

share the kernel code and data<br />

structures<br />

<strong>Hardware</strong> <strong>RTOS</strong> components are<br />

downloaded into the reconfigurable<br />

logic<br />

HW/<br />

SW<br />

<strong>RTOS</strong><br />

FPGA World 2004 31<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Experimental Setup<br />

Six custom <strong>RTOS</strong>es<br />

• With semaphores and spin-locks, no<br />

HW components in the <strong>RTOS</strong><br />

• With SoCLC, no SW IPCs<br />

• With deadlock avoidance s<strong>of</strong>tware, no<br />

HW <strong>RTOS</strong> components<br />

• With SoCDAU<br />

• With SoCLC and SoCDAU<br />

• With RTU<br />

Each <strong>with</strong> the Base architecture<br />

Each <strong>with</strong> application(s)<br />

Each executable in Seamless CVE<br />

4 MPC750 processors<br />

Reconfigurable logic<br />

Single bus<br />

SW <strong>RTOS</strong><br />

w/ sem<br />

FPGA World 2004 32<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

<strong>RTOS</strong>1<br />

Application<br />

User<br />

Input<br />

S<strong>of</strong>tware<br />

<strong>RTOS</strong> library<br />

SW <strong>RTOS</strong><br />

+<br />

SoCLC<br />

SW <strong>RTOS</strong><br />

w/ deadlock<br />

avoid.<br />

Executable HW file<br />

<strong>for</strong> each<br />

<strong>Hardware</strong><br />

<strong>RTOS</strong> library<br />

GUI tool<br />

SW <strong>RTOS</strong><br />

+<br />

SoCDAU<br />

Compile Stage <strong>for</strong> each system<br />

Base<br />

Architecture<br />

library<br />

Executable SW file<br />

<strong>for</strong> each<br />

SW <strong>RTOS</strong><br />

+ SoCLC<br />

+<br />

SoCDAU<br />

<strong>RTOS</strong>2 <strong>RTOS</strong>3 <strong>RTOS</strong>4 <strong>RTOS</strong>5<br />

Simulation in Seamless<br />

VCS XRAY<br />

CVE<br />

RTU<br />

<strong>RTOS</strong>6


An <strong>RTOS</strong> in hardware<br />

RTU <strong>Hardware</strong> <strong>RTOS</strong><br />

LOCAL<br />

BUS<br />

FPGA World 2004 33<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

TDBI<br />

GBI<br />

• Real-Time Unit (RTU)<br />

– scheduling<br />

– IPC<br />

– dynamic task creation<br />

– timers<br />

• Custom hw => upper bound on # tasks<br />

• Reconfigurable hw => can alter max. # tasks, max. # priorities<br />

• Pr<strong>of</strong>. Lennart Lindh, Mälardalens U., Västerås, Sweden<br />

• RealFast, www.realfast.se<br />

Accelerator<br />

Interface<br />

MsgQLib<br />

RTC<br />

Scheduler<br />

IRQ


Methodology<br />

An SoC architecture <strong>with</strong> the RTU <strong>Hardware</strong> <strong>RTOS</strong><br />

MPC755-1<br />

L1<br />

RTU in<br />

Reconfig.<br />

Logic<br />

MPC755-2<br />

L1<br />

Arbiter,<br />

Intr.<br />

Controller,<br />

Clock<br />

MPC755-3<br />

Memory<br />

Controller<br />

and<br />

Memory<br />

FPGA World 2004 34<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

L1


Methodology<br />

An SoC architecture <strong>with</strong> a hardware/s<strong>of</strong>tware <strong>RTOS</strong><br />

MPC755-1<br />

SoCLC in<br />

Reconfig.<br />

Logic<br />

L1<br />

MPC755-2<br />

L1<br />

Bus Arbiter,<br />

Intr.<br />

Controller,<br />

Clock<br />

MPC755-3<br />

Memory<br />

(Atalanta<br />

<strong>RTOS</strong>)<br />

FPGA World 2004 35<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

L1


δ Framework<br />

–GUI<br />

Specilized SW<br />

<strong>RTOS</strong> component<br />

Methodology<br />

HW <strong>RTOS</strong><br />

component<br />

Number <strong>of</strong><br />

CPUs in system<br />

IPC module<br />

linking method<br />

FPGA World 2004 36<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Implementation<br />

Verilog top file generation example<br />

• Start <strong>with</strong> RTU description IP Library<br />

(i)<br />

clock clock_gen (SYSCLK);<br />

• Generate instantiation code<br />

Generate<br />

code<br />

cpu_mpc750 cpu1 (…);<br />

\rtu.rtu(struct) rtu_comp (…);<br />

multiple instantiations <strong>of</strong><br />

same unit if needed (e.g.,<br />

Desc RTU<br />

~~~<br />

enddesc<br />

arbiter arb (br_bar, bg_bar…);<br />

PEs) (ii) Add wires<br />

and initial states<br />

• Add wires and initial<br />

statements<br />

PEs 1,2,3,…<br />

Memory<br />

1,2,…<br />

SoCLC<br />

Clock<br />

Arbiter<br />

(iii)<br />

After<br />

Instantiation<br />

wire ADDR;<br />

wire DATA;<br />

wire BR_BAR;<br />

wire BG_BAR;<br />

wire SYSCLK;<br />

…<br />

initial begin … end;<br />

FPGA World 2004 37<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Comparison<br />

Experimental Results (1/3)<br />

• A system <strong>with</strong> RTU hardware <strong>RTOS</strong><br />

• A system <strong>with</strong> SoCLC hardware and s<strong>of</strong>tware <strong>RTOS</strong><br />

• A system <strong>with</strong> pure s<strong>of</strong>tware <strong>RTOS</strong><br />

6 tasks<br />

30 tasks<br />

Total Execution Time<br />

(in cycles)<br />

Reduction<br />

(in cycles)<br />

Reduction<br />

Pure SW *<br />

100398<br />

0%<br />

379440<br />

0%<br />

With SoCLC<br />

71365<br />

29%<br />

317916<br />

16%<br />

* A semaphore is used in pure s<strong>of</strong>tware and a hardware<br />

mechanism is used in SoCLC and RTU.<br />

With RTU<br />

67038<br />

33%<br />

279480<br />

26%<br />

FPGA World 2004 38<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Experimental Results (2/3)<br />

The number <strong>of</strong> interactions<br />

Times<br />

Number <strong>of</strong> semaphore<br />

interactions<br />

Number <strong>of</strong> context switches<br />

Number <strong>of</strong> short locks<br />

6 tasks<br />

FPGA World 2004 39<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004<br />

12<br />

3<br />

10<br />

30 tasks<br />

60<br />

30<br />

58


Experimental Results (4/4)<br />

The average number <strong>of</strong> cycles spent on communication, context switch<br />

and computation (6 task case)<br />

cycles<br />

communication<br />

context switch<br />

computation<br />

Pure SW<br />

18944<br />

3218<br />

8523<br />

With SoCLC<br />

3730<br />

3231<br />

8577<br />

With RTU<br />

2075<br />

2835<br />

8421<br />

FPGA World 2004 40<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Total area<br />

TSMC 0.25μm<br />

library from<br />

LEDA<br />

<strong>Hardware</strong> Area<br />

SoCLC (64 short CS locks +<br />

64 long CS locks)<br />

7435 gates<br />

RTU <strong>for</strong> 3 processors<br />

About 250000 gates<br />

FPGA World 2004 41<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004


Conclusion<br />

A framework <strong>for</strong> automatic generation <strong>of</strong> a custom<br />

HW/SW <strong>RTOS</strong><br />

Experimental results showing<br />

• a multiprocessor FPGA that utilizes the SoCDAU/SoCDDU<br />

has a 37% overall speedup <strong>of</strong> the application execution time<br />

over deadlock avoidance in s<strong>of</strong>tware<br />

• speedups <strong>with</strong> the SoCLC, RTU<br />

• addition hw <strong>RTOS</strong> component in references: SoCDMMU<br />

Future work<br />

support <strong>for</strong> heterogeneous processors<br />

support <strong>for</strong> multiple bus systems/structures<br />

FPGA World 2004 42<br />

HW/SW <strong>RTOS</strong> Project<br />

©Vincent J. Mooney III, 2004

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!