01.01.2015 Views

Optimization techniques for ARM 7 microcontroller based embedded ...

Optimization techniques for ARM 7 microcontroller based embedded ...

Optimization techniques for ARM 7 microcontroller based embedded ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Optimization</strong> <strong>techniques</strong> <strong>for</strong> <strong>ARM</strong> 7<br />

<strong>microcontroller</strong> <strong>based</strong> <strong>embedded</strong> systems<br />

presented on gamma correction algorithm<br />

implementation<br />

Kosta Karpuzovski<br />

FEIT Skopje


We will discuss<br />

Motivation<br />

General C code optimization<br />

<strong>techniques</strong><br />

<strong>Optimization</strong> of the algorithm<br />

Gamma correction<br />

Fast power calculation<br />

Results<br />

General discussion <strong>for</strong> improvement


Motivation<br />

Allow <strong>ARM</strong>7 processor to make real<br />

time processing<br />

◦ Make algorithm simple enough<br />

<strong>ARM</strong>7TDMI processor to execute it <strong>for</strong> the<br />

whole picture in less then 1/25 seconds<br />

Green power<br />

◦ Faster execution leaves more time <strong>for</strong><br />

sleep mode operation<br />

Lower costs<br />

◦ Use cheaper components in design


General C code optimization<br />

<strong>techniques</strong><br />

<strong>Optimization</strong> of loops by<br />

countdown from max to zero<br />

Use of lookup tables<br />

instead of calculation<br />

◦ Use<br />

<strong>for</strong> (ArrayIndex = ImageLength; ArrayIndex !=<br />

0;)<br />

{<br />

ArrayIndex--;<br />

m = ImageArray[ArrayIndex];<br />

ImageArray[ArrayIndex] = LookupTable[m];<br />

}<br />

◦ Instead of<br />

<strong>for</strong> (ArrayIndex = 0; ArrayIndex = 254.5f)<br />

q=255.0f;<br />

ImageArray[ArrayIndex] = (unsigned int)q;<br />

}


General C code optimization<br />

<strong>techniques</strong> (2)<br />

Loop unrolling to minimize branching in code<br />

Use of Do While instead of For loop in the cases<br />

when we know that the loop will be executed at<br />

least once<br />

Use of ternary operator<br />

◦ MinRange = (MinRange > 255) MinRange : MinValue; is better<br />

then<br />

◦ MinRange = (255 < MinRange) MinValue : MinRange;<br />

Use of local variables in loops<br />

<br />

<br />

Use local variables in all calculations with global ones and<br />

return the value in the global variables once all calculations<br />

are done<br />

Use of break <strong>for</strong> premature loop exit


General C code optimization<br />

<strong>techniques</strong> (3)<br />

<strong>techniques</strong> (3)<br />

LoopCounter = ImageLength;<br />

Do {<br />

InputValue = *ImageArray_p;<br />

ImageArray_p++;<br />

MaxValue = (InputValue > MaxValue) InputValue : MaxValue;<br />

MinValue = (InputValue < MinValue) InputValue : MinValue;<br />

InputValue = *ImageArray_p;<br />

ImageArray_p++;<br />

MaxValue = (InputValue > MaxValue) InputValue : MaxValue;<br />

MinValue = (InputValue < MinValue) InputValue : MinValue;<br />

...<br />

InputValue = *ImageArray_p;<br />

ImageArray_p++;<br />

MaxValue = (InputValue > MaxValue) InputValue : MaxValue;<br />

MinValue = (InputValue < MinValue) InputValue : MinValue;<br />

LoopCounter -= 8;<br />

if ((0 == MinValue) && (0xff == MaxValue))<br />

break;<br />

} while (LoopCounter != 0);<br />

*MinValue_p = MinValue;<br />

*MaxValue_p = MaxValue;


General C code optimization<br />

<strong>techniques</strong> (4)<br />

Use of enumerated constants <strong>for</strong> switch() cases, to<br />

make it continuous from max to 0<br />

typedef enum {<br />

PARSER_EXECUTE_COMMAND = 0, /*< State to execute command */<br />

PARSER_WAIT_PAYLOAD, /*< State to receive payload */<br />

PARSER_WAIT_COMMAND, /*< State to receive command */<br />

PARSER_IDLE, /*< State <strong>for</strong> all initializations */<br />

} ParserState_e;<br />

static ParserState_e State = PARSER_IDLE;


General C code optimization<br />

<strong>techniques</strong> (5)<br />

<strong>techniques</strong> (5)<br />

switch(State) {<br />

case PARSER_IDLE:<br />

if (NULL == ReceiverContext_p)<br />

ReceiverContext_p=(ReceiverContext_t*)malloc(sizeof(ReceiverContext_t));<br />

assert(NULL != ReceiverContext_p);<br />

InitialiseReceiver(ReceiverContext_p, &State);<br />

State = PARSER_WAIT_COMMAND;<br />

case PARSER_WAIT_COMMAND:<br />

ReceiveCommand(ReceiverContext_p, &State, ReceivedChar);<br />

break;<br />

case PARSER_WAIT_PAYLOAD:<br />

ReceiveCommand(ReceiverContext_p, &State, ReceivedChar);<br />

ReceivePayload(ReceiverContext_p, &State, ReceivedChar);<br />

if (PARSER_WAIT_PAYLOAD == State)<br />

break;<br />

case PARSER_EXECUTE_COMMAND:<br />

ExecuteCommand(ReceiverContext_p, &State);<br />

break;<br />

default:<br />

State = PARSER_IDLE;<br />

break;<br />

}


General C code optimization<br />

<strong>techniques</strong> (6)<br />

Keep number of parameters transferred in function<br />

call to less or equal to 4<br />

◦ To allow compiler to transfer all parameters in registers<br />

Number of local variables to be 7 or less<br />

◦ To allow compiler to use registers <strong>for</strong> local calculations<br />

Use compiler invocation settings with optimization<br />

level set to maximum<br />

<strong>Optimization</strong> of the algorithm<br />

◦ Most important part of the process of optimization


<strong>Optimization</strong> of the algorithm<br />

Use open-minded approach when solving<br />

problems<br />

◦ Read about the subject and what is already done in the field<br />

◦ Choose algorithm that will potentially give best results on the plat<strong>for</strong>m you<br />

use<br />

<br />

◦ Try to find alternative solutions <strong>for</strong> all computational parts in the algorithm<br />

Avoid division in the case of <strong>ARM</strong> controllers<br />

Use faster or less memory demanding solutions then the ones offered by the generic libraries<br />

◦ Find the botomnecks and then optimize<br />

Do not optimize code that is executed only once, the gain will probably not mach the ef<strong>for</strong>t<br />

Specifics of gamma correction algorithm we are implementing<br />

◦ First we make range correction<br />

Input values in the range from InMin to InMax we transfer in OutMin to OutMax<br />

◦ We limit the range of values<br />

Input and output values are in the range between 0 and 255<br />

◦ We saturate the signal<br />

There are no values bigger then 255<br />

◦ We define optimization target<br />

Our main concern is speed of execution


<strong>Optimization</strong> of the algorithm (2)


<strong>Optimization</strong> of the algorithm (3)<br />

Gamma correct<br />

Find Minimum/<br />

Maximum<br />

Index = ImageLength<br />

Index != 0<br />

yes<br />

Exit<br />

no<br />

Index--;<br />

Value = ImageArray[Index];<br />

Value = (Value – ImgMin)*ExtRange/ImgRange<br />

+ ExtMinValue;<br />

Value = powf(Value,Gamma);<br />

Value = 255;<br />

yes<br />

Value >=254,5<br />

ImageArray[Index] = Value;<br />

no


<strong>Optimization</strong> of the algorithm (4)<br />

Fast gamma correct<br />

Find Minimum/<br />

Maximum<br />

Create corrected<br />

range power<br />

lookup<br />

Index = ImageLength<br />

Index != 0<br />

yes<br />

no<br />

Index--;<br />

Value = ImageArray[Index];<br />

Value = LookupTable[Value];<br />

ImageArray[Index] = Value;<br />

Exit


<strong>Optimization</strong> of the algorithm (5)<br />

Find minimum/maximum<br />

Check if all<br />

elements in image<br />

are passed<br />

yes<br />

no<br />

Value = *ImageArray_p;<br />

ImageArray_p++;<br />

Value ><br />

MaxValue<br />

yes<br />

MaxValue = Value;<br />

One of eight “Find min/max” units<br />

unrolled in this function<br />

no<br />

Value < MinValue<br />

yes<br />

MinValue = Value;<br />

no<br />

.<br />

.<br />

.<br />

Check if<br />

MinValue = 0 and<br />

MaxValue = 255<br />

yes<br />

no<br />

Exit


<strong>Optimization</strong> of the algorithm (5)


Gamma correction<br />

Gamma correction<br />

◦ General definition<br />

s<br />

= c( r −ε )<br />

◦ Simplified <strong>for</strong>mula<br />

γ<br />

s = r<br />

Fast power calculation<br />

γ


Fast power calculation<br />

Basic idea <strong>for</strong> fast power calculation<br />

◦ Use of IEEE-754 float representation<br />

<strong>for</strong>mat<br />

◦ Bit manipulations<br />

◦ Table lookup<br />

IEEE-754 floating point numbers<br />

representation<br />

sign exponent mantissa<br />

S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM<br />

31 23 0


Fast power calculation (2)<br />

IEEE-754 floating point numbers<br />

representation<br />

◦ (+/-)(1 + mantissa/2 23 ) * 2<br />

(exponent - 127)<br />

◦ Number math<br />

( b+<br />

c)<br />

b<br />

a = a *<br />

2<br />

a<br />

( WholeNumber + DecimalPart )<br />

c<br />

WholeNumber<br />

*2<br />

DecimalPart<br />

◦ 2 DecimalPart is in the range 1


Fast power calculation (3)<br />

Mantissa table precision<br />

◦ Average error of 0,01% <strong>for</strong> 11bit table<br />

(8kB)<br />

◦ With 9bit table we experience shift of 1 in<br />

two values in the range (0-255)


Results<br />

Amount of data<br />

No C code<br />

optimizations<br />

Execution time gain<br />

19200 Byte (<strong>for</strong>est) high<br />

compiler optimization<br />

19200 Byte (tire) high<br />

compiler optimization<br />

19200 Byte (<strong>for</strong>est) no<br />

compiler optimization<br />

19200 Byte (tire) no<br />

compiler optimization<br />

C code<br />

optimized<br />

Ratio<br />

14,6s 0,083 176<br />

14,67s 0,085 173<br />

14,6s 0,1157 126<br />

14,67s 0,1183 124


General discussion <strong>for</strong><br />

improvement<br />

Better results with board with more<br />

RAM to keep both pictures<br />

Optimize Max/Min function with<br />

additional check if we reached 255 or<br />

0<br />

Limit gamma range from 0,3 to 3 and<br />

optimize further<br />

Time under 40ms

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!