Approximation of Worst-case Execution Time for Preemptive ...
Approximation of Worst-case Execution Time for Preemptive ...
Approximation of Worst-case Execution Time for Preemptive ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Approximation</strong> <strong>of</strong> <strong>Worst</strong>-<strong>case</strong><br />
<strong>Execution</strong> <strong>Time</strong> <strong>for</strong> <strong>Preemptive</strong><br />
Multitasking Systems<br />
Matteo Corti, Roberto Brega, Thomas Gross<br />
ETH Zurich
Outline<br />
• Environment<br />
• Other approaches<br />
• <strong>Worst</strong>-<strong>case</strong> execution time approximation<br />
• Results<br />
• Conclusions<br />
Matteo Corti, ETH Zurich, LCTES 2000 2
Environment: User Needs<br />
• Complex mechatronic applications<br />
• Timing correctness<br />
• Concurrency (RT and non-RT tasks)<br />
• Rapid development: dynamic system<br />
• Modern programming languages<br />
• Modern processors<br />
Matteo Corti, ETH Zurich, LCTES 2000 3
Environment: System<br />
• XOberon:<br />
– Loading/unloading <strong>of</strong> modules (tasks) at runtime<br />
– Deadline driven scheduler with admission testing<br />
– Resources are shared between RT and non-RT tasks<br />
– <strong>Preemptive</strong> scheduling<br />
• Modern RISC processors: PowerPC 604e<br />
• Modern language: Oberon-2<br />
– Automatic garbage collection<br />
– Strong type checking<br />
Matteo Corti, ETH Zurich, LCTES 2000 4
Problem Description<br />
• Admission test<br />
– deadline: determined by the problem<br />
– max. duration: determined by the task and the<br />
system<br />
• <strong>Preemptive</strong> scheduling and processor<br />
complexity hinder a precise computation <strong>of</strong><br />
the worst-<strong>case</strong> execution time (WCET)<br />
• The system is able to stop safely if the given<br />
duration is to small (w/o damaging the robot<br />
or the operator)<br />
Matteo Corti, ETH Zurich, LCTES 2000 5
Issues<br />
• Static program analysis<br />
– automatic loop bounding<br />
– false paths<br />
– infeasible paths<br />
• Instruction length computation<br />
– caches (instruction and data)<br />
– pipelines<br />
Matteo Corti, ETH Zurich, LCTES 2000 6
Other approaches<br />
• Longest path:<br />
– user annotations<br />
– automatic tools (loop bounding, false paths, …)<br />
• Instruction length (w/o preemption):<br />
– cache prediction<br />
– active cache management<br />
– pipelines prediction<br />
• Dynamic systems:<br />
– trial-and-error experimentation<br />
Matteo Corti, ETH Zurich, LCTES 2000 7
Predictor Structure<br />
Source<br />
Compiler<br />
Object<br />
Per<strong>for</strong>mance<br />
Monitor<br />
Host<br />
Compiler<br />
WCET<br />
Run-time<br />
statistics<br />
Target<br />
Matteo Corti, ETH Zurich, LCTES 2000 8
Longest Path ...<br />
instr op op op<br />
instr op op op<br />
instr op op op<br />
instr op op op<br />
br addr<br />
len<br />
b<br />
=<br />
iter<br />
b<br />
⋅<br />
∑<br />
instr<br />
length<br />
instr<br />
Matteo Corti, ETH Zurich, LCTES 2000 9
Block Iterations<br />
• Static program analysis<br />
– loop iteration bounds<br />
• Real-time tasks are relatively well structured<br />
minimal compiler support<br />
– automatic loop bounding <strong>for</strong> simple loops<br />
– user annotations (driver calls, difficult loops,<br />
polymorphism, library calls)<br />
– user hints can be checked at run-time<br />
Matteo Corti, ETH Zurich, LCTES 2000 10
Instruction Length<br />
• Preemption, dynamic set <strong>of</strong> processes no exact<br />
knowledge <strong>of</strong> the cache and pipeline status<br />
• Maximal instruction lengths (caches are always<br />
empty, instructions always stall, ...) are not useful:<br />
the WCET is too high to be used in practice<br />
• Instruction length approximation using run-time<br />
in<strong>for</strong>mation about the processor usage during the<br />
task’s execution<br />
Matteo Corti, ETH Zurich, LCTES 2000 11
Per<strong>for</strong>mance Monitor ...<br />
• The PowerPC 604e provides hardware<br />
assist to monitor and count predefined<br />
events (cache misses, mispredicted<br />
branches, issued instructions, …)<br />
• Processes can be marked <strong>for</strong> runtime<br />
pr<strong>of</strong>iling<br />
• Events book-keeping is done in the scheduler<br />
(small overhead)<br />
• No code instrumentation<br />
Matteo Corti, ETH Zurich, LCTES 2000 12
Per<strong>for</strong>mance Monitor<br />
• Not specifically designed to help in program<br />
analysis:<br />
– event counting is not precise (out-<strong>of</strong>-order<br />
execution)<br />
– many events are not disjoint<br />
– only four different events can be monitored in<br />
parallel<br />
• The instruction length must be<br />
approximated dealing with the<br />
per<strong>for</strong>mance monitor (PM) inaccuracies<br />
Matteo Corti, ETH Zurich, LCTES 2000 13
Statistics Gathering<br />
• Problem: choose representative traces<br />
• Solution:<br />
– pr<strong>of</strong>ile different input sets<br />
– conservative approximation<br />
• The tests confirmed a certain homogeneity<br />
within different execution traces <strong>for</strong> the<br />
same tasks<br />
Matteo Corti, ETH Zurich, LCTES 2000 14
Cycles Per Instruction (CPI) …<br />
• The instruction length can be divided in<br />
several components:<br />
– ICP: infinite cache per<strong>for</strong>mance (CPU busy and stall time)<br />
– FCE: finite cache per<strong>for</strong>mance (effects <strong>of</strong> memory<br />
hierarchy)<br />
CPI = ICP +<br />
FCE<br />
CPI = busy + stall + FCE<br />
execunit + stallunit<br />
CPI =<br />
+ stall<br />
pipeline<br />
+<br />
parallelism<br />
CPI = ...<br />
FCE<br />
Matteo Corti, ETH Zurich, LCTES 2000 15
Cycles Per Instruction (CPI)<br />
• Instruction length components:<br />
– From the processor architecture<br />
• execution time<br />
• miss penalty<br />
– Estimated with help <strong>of</strong> run-time data<br />
• stalls<br />
• cache misses<br />
• instruction parallelism<br />
– Estimated by the program structure<br />
• distance between instructions <strong>of</strong> the same type<br />
Matteo Corti, ETH Zurich, LCTES 2000 16
Testing the Predictor<br />
• First phase: approximation tuning<br />
– simple tests with known WCET (matrix<br />
multiplication, Runge-Kutta, …)<br />
– different components <strong>of</strong> the approximator and <strong>of</strong><br />
the processor can be tested separately<br />
• Second phase: real applications<br />
– longest path and exact WCET unknown<br />
– not all the paths can be tested<br />
Matteo Corti, ETH Zurich, LCTES 2000 17
Results: Simple Tests<br />
3000<br />
2500<br />
+8%<br />
2000<br />
ms 1500<br />
1000<br />
500<br />
-5%<br />
+11% +6% +7% -5% +13%<br />
0<br />
Matrix Mult.<br />
Matrix Mult. FP<br />
Array Max<br />
Array Max FP<br />
Runge-Kutta<br />
Polynomial Eval<br />
Distribution Count<br />
WCET <strong>Approximation</strong><br />
Matteo Corti, ETH Zurich, LCTES 2000 18
Results: <strong>Approximation</strong>s<br />
• <strong>Worst</strong> <strong>case</strong> assumptions about caches and pipeline<br />
produce non usable durations<br />
• Example: no cache approximation (but all other<br />
included)<br />
Test<br />
Matr. Mul.<br />
Array Max.<br />
Pol. Eval.<br />
Measured value<br />
280 ms<br />
520 ms<br />
1252 ms<br />
Full predictor<br />
311 ms<br />
555 ms<br />
1188 ms<br />
No cache hits<br />
1403 ms<br />
1901 ms<br />
3193 ms<br />
Matteo Corti, ETH Zurich, LCTES 2000 19
Results: Real Applications<br />
• LaserPointer: laboratory<br />
machine that moves a laser pen<br />
applied on the tool-center point <strong>of</strong><br />
a 2-joints manipulator<br />
• Hexaglide: a parallel manipulator<br />
with 6 DOF used as a high speed<br />
milling machine<br />
• Robojet: a hydraulically<br />
actuated manipulator used in the<br />
construction <strong>of</strong> tunnels<br />
Matteo Corti, ETH Zurich, LCTES 2000 20
Results: Real Applications<br />
• Only a few loops had to be manually<br />
bounded<br />
Application<br />
LaserPointer<br />
Hexaglide<br />
Robojet<br />
User annotations<br />
Calls Bounds<br />
5 0 / N.a.<br />
4 2 / 258<br />
17 0 / 207<br />
Code Size<br />
1000 LOC<br />
2200 LOC<br />
1600 LOC<br />
Matteo Corti, ETH Zurich, LCTES 2000 21
Results: Real Applications<br />
1200<br />
1000<br />
800<br />
µs 600<br />
400<br />
200<br />
0<br />
+46%<br />
+57%<br />
+17%<br />
-1% +0.3%<br />
+11%<br />
0% 0%<br />
LaserPointer.t1<br />
LaserPointer.t2<br />
LaserPointer.t3<br />
Hexaglide.t1<br />
Hexaglide.t2<br />
Robojet.t1<br />
Robojet.t2<br />
Robojet.t3<br />
Measured Predicted<br />
Matteo Corti, ETH Zurich, LCTES 2000 22
Comments …<br />
• Per<strong>for</strong>mance monitors are not designed to<br />
help in program analysis (coarse-grain<br />
in<strong>for</strong>mation)<br />
• Many CPI components are gathered using<br />
statistical methods<br />
• There is no hard guarantee the result is<br />
correct<br />
• Architecture dependent (different<br />
per<strong>for</strong>mance monitors, and processor<br />
architectures)<br />
Matteo Corti, ETH Zurich, LCTES 2000 23
Comments<br />
• Simple approach: minimal user interaction<br />
needed (suitable <strong>for</strong> application experts)<br />
• No special hardware tools needed<br />
• Useful in complexenvironments with<br />
preemptive multitasking (dynamic<br />
constellation <strong>of</strong> real-time tasks)<br />
• Big and real applications can be analyzed<br />
Matteo Corti, ETH Zurich, LCTES 2000 24
Conclusions<br />
• The WCET can be approximated using<br />
run-time data<br />
– little or no user assistance is required<br />
• Processor’s per<strong>for</strong>mance monitors can<br />
help in program analysis<br />
– better support desirable<br />
• <strong>Approximation</strong>s are good enough <strong>for</strong><br />
many dynamic real-time systems<br />
Matteo Corti, ETH Zurich, LCTES 2000 25