17.04.2014 Views

Gradient Descent and the Nelder-Mead Simplex Algorithm

Gradient Descent and the Nelder-Mead Simplex Algorithm

Gradient Descent and the Nelder-Mead Simplex Algorithm

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Optimisation <strong>and</strong> Search:<br />

<strong>Gradient</strong> <strong>Descent</strong> <strong>and</strong><br />

<strong>the</strong> <strong>Nelder</strong>-<strong>Mead</strong> <strong>Simplex</strong> <strong>Algorithm</strong>


Why minimise a function numerically?<br />

a<br />

b<br />

f(a,b)<br />

y<br />

Unknown!


Why minimise a function numerically?<br />

Background: linear regression


Why minimise a function numerically?<br />

Background: linear regression<br />

Straight Line: f(x) = α 1 x + α 2


Why minimise a function numerically?<br />

Background: linear regression<br />

Straight Line: f(x) = α 1 x + α 2<br />

ε i


Why minimise a function numerically?<br />

Background: linear regression<br />

Straight Line: f(x) = α 1 x + α 2<br />

Error between f(x i ) given by<br />

<strong>the</strong> model <strong>and</strong> y i from <strong>the</strong><br />

data:<br />

ε ( α<br />

i<br />

1<br />

, α<br />

2)<br />

= f ( xi<br />

)<br />

= α x<br />

1<br />

i<br />

−<br />

+ α<br />

y<br />

2<br />

i<br />

−<br />

y<br />

i<br />

ε i


Why minimise a function numerically?<br />

Background: linear regression<br />

Straight Line: f(x) = α 1 x + α 2<br />

Error between f(x i ) given by<br />

<strong>the</strong> model <strong>and</strong> y i from <strong>the</strong><br />

data:<br />

ε ( α<br />

i<br />

1<br />

, α<br />

2)<br />

= f ( xi<br />

)<br />

= α x<br />

1<br />

i<br />

−<br />

+ α<br />

y<br />

2<br />

i<br />

−<br />

y<br />

i<br />

ε i<br />

Task: Find <strong>the</strong> parameters α 1<br />

<strong>and</strong> α 2 that minimise <strong>the</strong> sum<br />

of squared errors!<br />

E(<br />

α , α )<br />

1<br />

2<br />

=<br />

N<br />

∑<br />

i=<br />

1<br />

ε ( α , α )<br />

i<br />

1<br />

2<br />

2<br />

=<br />

N<br />

∑<br />

i=<br />

1<br />

( α x<br />

1<br />

i<br />

+ α<br />

2<br />

−<br />

y<br />

i<br />

)<br />

2


Why minimise a function numerically?<br />

non-linear regression<br />

• Linear Regression:<br />

▫ Fitting function is linear with respect to <strong>the</strong><br />

parameters<br />

can be solved analytically (see Wikipedia)<br />

• Non-linear Regression:<br />

▫ Fitting function is non-linear with respect to <strong>the</strong><br />

parameters (e.g. f(x,α 1 ,α 2 ) = sin(α 1 x)+cos(α 2 x))<br />

Often no analytical solution<br />

Numerical optimisation or direct search


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong><br />

1. Choose initial parameters α 1 <strong>and</strong> α 2<br />

2. Calculate <strong>the</strong> gradient<br />

3. Step in <strong>the</strong> direction of <strong>the</strong> gradient with a step<br />

size proportional to <strong>the</strong> amplitude of <strong>the</strong><br />

gradient<br />

you get new parameters α 1 <strong>and</strong> α 2<br />

4. Check if <strong>the</strong> parameters have changed at a rate<br />

above a certain threshold<br />

5. If yes, go to 2, else terminate


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Gradient</strong> <strong>Descent</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Simplex</strong> <strong>Algorithm</strong><br />

(for functions of 2 variables)<br />

1. Pick 3 parameter combinations Triangle<br />

2. Evaluate <strong>the</strong> function for those combinations<br />

f h ,f s ,f l : highest, second highest <strong>and</strong> lowest point<br />

3. Update <strong>the</strong> triangle<br />

using <strong>the</strong> best of <strong>the</strong><br />

transformations in<br />

<strong>the</strong> figure<br />

4. Check for end<br />

condition<br />

5. Go to 2 or<br />

terminate


f<br />

l<br />

<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Update Rules<br />

≤<br />

f<br />

r<br />

accept<br />

<<br />

f<br />

r<br />

f<br />

s<br />

f<br />

r<br />

< f l<br />

f ≤ f <<br />

s<br />

r<br />

f<br />

h<br />

no improvement<br />

fr ≥ f h


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2


<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />

E(α 1 ,α 2 )<br />

α 1<br />

α 2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!