29.01.2014 Views

Approximation of Worst-case Execution Time for Preemptive ...

Approximation of Worst-case Execution Time for Preemptive ...

Approximation of Worst-case Execution Time for Preemptive ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Approximation</strong> <strong>of</strong> <strong>Worst</strong>-<strong>case</strong><br />

<strong>Execution</strong> <strong>Time</strong> <strong>for</strong> <strong>Preemptive</strong><br />

Multitasking Systems<br />

Matteo Corti, Roberto Brega, Thomas Gross<br />

ETH Zurich


Outline<br />

• Environment<br />

• Other approaches<br />

• <strong>Worst</strong>-<strong>case</strong> execution time approximation<br />

• Results<br />

• Conclusions<br />

Matteo Corti, ETH Zurich, LCTES 2000 2


Environment: User Needs<br />

• Complex mechatronic applications<br />

• Timing correctness<br />

• Concurrency (RT and non-RT tasks)<br />

• Rapid development: dynamic system<br />

• Modern programming languages<br />

• Modern processors<br />

Matteo Corti, ETH Zurich, LCTES 2000 3


Environment: System<br />

• XOberon:<br />

– Loading/unloading <strong>of</strong> modules (tasks) at runtime<br />

– Deadline driven scheduler with admission testing<br />

– Resources are shared between RT and non-RT tasks<br />

– <strong>Preemptive</strong> scheduling<br />

• Modern RISC processors: PowerPC 604e<br />

• Modern language: Oberon-2<br />

– Automatic garbage collection<br />

– Strong type checking<br />

Matteo Corti, ETH Zurich, LCTES 2000 4


Problem Description<br />

• Admission test<br />

– deadline: determined by the problem<br />

– max. duration: determined by the task and the<br />

system<br />

• <strong>Preemptive</strong> scheduling and processor<br />

complexity hinder a precise computation <strong>of</strong><br />

the worst-<strong>case</strong> execution time (WCET)<br />

• The system is able to stop safely if the given<br />

duration is to small (w/o damaging the robot<br />

or the operator)<br />

Matteo Corti, ETH Zurich, LCTES 2000 5


Issues<br />

• Static program analysis<br />

– automatic loop bounding<br />

– false paths<br />

– infeasible paths<br />

• Instruction length computation<br />

– caches (instruction and data)<br />

– pipelines<br />

Matteo Corti, ETH Zurich, LCTES 2000 6


Other approaches<br />

• Longest path:<br />

– user annotations<br />

– automatic tools (loop bounding, false paths, …)<br />

• Instruction length (w/o preemption):<br />

– cache prediction<br />

– active cache management<br />

– pipelines prediction<br />

• Dynamic systems:<br />

– trial-and-error experimentation<br />

Matteo Corti, ETH Zurich, LCTES 2000 7


Predictor Structure<br />

Source<br />

Compiler<br />

Object<br />

Per<strong>for</strong>mance<br />

Monitor<br />

Host<br />

Compiler<br />

WCET<br />

Run-time<br />

statistics<br />

Target<br />

Matteo Corti, ETH Zurich, LCTES 2000 8


Longest Path ...<br />

instr op op op<br />

instr op op op<br />

instr op op op<br />

instr op op op<br />

br addr<br />

len<br />

b<br />

=<br />

iter<br />

b<br />

⋅<br />

∑<br />

instr<br />

length<br />

instr<br />

Matteo Corti, ETH Zurich, LCTES 2000 9


Block Iterations<br />

• Static program analysis<br />

– loop iteration bounds<br />

• Real-time tasks are relatively well structured<br />

minimal compiler support<br />

– automatic loop bounding <strong>for</strong> simple loops<br />

– user annotations (driver calls, difficult loops,<br />

polymorphism, library calls)<br />

– user hints can be checked at run-time<br />

Matteo Corti, ETH Zurich, LCTES 2000 10


Instruction Length<br />

• Preemption, dynamic set <strong>of</strong> processes no exact<br />

knowledge <strong>of</strong> the cache and pipeline status<br />

• Maximal instruction lengths (caches are always<br />

empty, instructions always stall, ...) are not useful:<br />

the WCET is too high to be used in practice<br />

• Instruction length approximation using run-time<br />

in<strong>for</strong>mation about the processor usage during the<br />

task’s execution<br />

Matteo Corti, ETH Zurich, LCTES 2000 11


Per<strong>for</strong>mance Monitor ...<br />

• The PowerPC 604e provides hardware<br />

assist to monitor and count predefined<br />

events (cache misses, mispredicted<br />

branches, issued instructions, …)<br />

• Processes can be marked <strong>for</strong> runtime<br />

pr<strong>of</strong>iling<br />

• Events book-keeping is done in the scheduler<br />

(small overhead)<br />

• No code instrumentation<br />

Matteo Corti, ETH Zurich, LCTES 2000 12


Per<strong>for</strong>mance Monitor<br />

• Not specifically designed to help in program<br />

analysis:<br />

– event counting is not precise (out-<strong>of</strong>-order<br />

execution)<br />

– many events are not disjoint<br />

– only four different events can be monitored in<br />

parallel<br />

• The instruction length must be<br />

approximated dealing with the<br />

per<strong>for</strong>mance monitor (PM) inaccuracies<br />

Matteo Corti, ETH Zurich, LCTES 2000 13


Statistics Gathering<br />

• Problem: choose representative traces<br />

• Solution:<br />

– pr<strong>of</strong>ile different input sets<br />

– conservative approximation<br />

• The tests confirmed a certain homogeneity<br />

within different execution traces <strong>for</strong> the<br />

same tasks<br />

Matteo Corti, ETH Zurich, LCTES 2000 14


Cycles Per Instruction (CPI) …<br />

• The instruction length can be divided in<br />

several components:<br />

– ICP: infinite cache per<strong>for</strong>mance (CPU busy and stall time)<br />

– FCE: finite cache per<strong>for</strong>mance (effects <strong>of</strong> memory<br />

hierarchy)<br />

CPI = ICP +<br />

FCE<br />

CPI = busy + stall + FCE<br />

execunit + stallunit<br />

CPI =<br />

+ stall<br />

pipeline<br />

+<br />

parallelism<br />

CPI = ...<br />

FCE<br />

Matteo Corti, ETH Zurich, LCTES 2000 15


Cycles Per Instruction (CPI)<br />

• Instruction length components:<br />

– From the processor architecture<br />

• execution time<br />

• miss penalty<br />

– Estimated with help <strong>of</strong> run-time data<br />

• stalls<br />

• cache misses<br />

• instruction parallelism<br />

– Estimated by the program structure<br />

• distance between instructions <strong>of</strong> the same type<br />

Matteo Corti, ETH Zurich, LCTES 2000 16


Testing the Predictor<br />

• First phase: approximation tuning<br />

– simple tests with known WCET (matrix<br />

multiplication, Runge-Kutta, …)<br />

– different components <strong>of</strong> the approximator and <strong>of</strong><br />

the processor can be tested separately<br />

• Second phase: real applications<br />

– longest path and exact WCET unknown<br />

– not all the paths can be tested<br />

Matteo Corti, ETH Zurich, LCTES 2000 17


Results: Simple Tests<br />

3000<br />

2500<br />

+8%<br />

2000<br />

ms 1500<br />

1000<br />

500<br />

-5%<br />

+11% +6% +7% -5% +13%<br />

0<br />

Matrix Mult.<br />

Matrix Mult. FP<br />

Array Max<br />

Array Max FP<br />

Runge-Kutta<br />

Polynomial Eval<br />

Distribution Count<br />

WCET <strong>Approximation</strong><br />

Matteo Corti, ETH Zurich, LCTES 2000 18


Results: <strong>Approximation</strong>s<br />

• <strong>Worst</strong> <strong>case</strong> assumptions about caches and pipeline<br />

produce non usable durations<br />

• Example: no cache approximation (but all other<br />

included)<br />

Test<br />

Matr. Mul.<br />

Array Max.<br />

Pol. Eval.<br />

Measured value<br />

280 ms<br />

520 ms<br />

1252 ms<br />

Full predictor<br />

311 ms<br />

555 ms<br />

1188 ms<br />

No cache hits<br />

1403 ms<br />

1901 ms<br />

3193 ms<br />

Matteo Corti, ETH Zurich, LCTES 2000 19


Results: Real Applications<br />

• LaserPointer: laboratory<br />

machine that moves a laser pen<br />

applied on the tool-center point <strong>of</strong><br />

a 2-joints manipulator<br />

• Hexaglide: a parallel manipulator<br />

with 6 DOF used as a high speed<br />

milling machine<br />

• Robojet: a hydraulically<br />

actuated manipulator used in the<br />

construction <strong>of</strong> tunnels<br />

Matteo Corti, ETH Zurich, LCTES 2000 20


Results: Real Applications<br />

• Only a few loops had to be manually<br />

bounded<br />

Application<br />

LaserPointer<br />

Hexaglide<br />

Robojet<br />

User annotations<br />

Calls Bounds<br />

5 0 / N.a.<br />

4 2 / 258<br />

17 0 / 207<br />

Code Size<br />

1000 LOC<br />

2200 LOC<br />

1600 LOC<br />

Matteo Corti, ETH Zurich, LCTES 2000 21


Results: Real Applications<br />

1200<br />

1000<br />

800<br />

µs 600<br />

400<br />

200<br />

0<br />

+46%<br />

+57%<br />

+17%<br />

-1% +0.3%<br />

+11%<br />

0% 0%<br />

LaserPointer.t1<br />

LaserPointer.t2<br />

LaserPointer.t3<br />

Hexaglide.t1<br />

Hexaglide.t2<br />

Robojet.t1<br />

Robojet.t2<br />

Robojet.t3<br />

Measured Predicted<br />

Matteo Corti, ETH Zurich, LCTES 2000 22


Comments …<br />

• Per<strong>for</strong>mance monitors are not designed to<br />

help in program analysis (coarse-grain<br />

in<strong>for</strong>mation)<br />

• Many CPI components are gathered using<br />

statistical methods<br />

• There is no hard guarantee the result is<br />

correct<br />

• Architecture dependent (different<br />

per<strong>for</strong>mance monitors, and processor<br />

architectures)<br />

Matteo Corti, ETH Zurich, LCTES 2000 23


Comments<br />

• Simple approach: minimal user interaction<br />

needed (suitable <strong>for</strong> application experts)<br />

• No special hardware tools needed<br />

• Useful in complexenvironments with<br />

preemptive multitasking (dynamic<br />

constellation <strong>of</strong> real-time tasks)<br />

• Big and real applications can be analyzed<br />

Matteo Corti, ETH Zurich, LCTES 2000 24


Conclusions<br />

• The WCET can be approximated using<br />

run-time data<br />

– little or no user assistance is required<br />

• Processor’s per<strong>for</strong>mance monitors can<br />

help in program analysis<br />

– better support desirable<br />

• <strong>Approximation</strong>s are good enough <strong>for</strong><br />

many dynamic real-time systems<br />

Matteo Corti, ETH Zurich, LCTES 2000 25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!