Gradient Descent and the Nelder-Mead Simplex Algorithm
Gradient Descent and the Nelder-Mead Simplex Algorithm
Gradient Descent and the Nelder-Mead Simplex Algorithm
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Optimisation <strong>and</strong> Search:<br />
<strong>Gradient</strong> <strong>Descent</strong> <strong>and</strong><br />
<strong>the</strong> <strong>Nelder</strong>-<strong>Mead</strong> <strong>Simplex</strong> <strong>Algorithm</strong>
Why minimise a function numerically?<br />
a<br />
b<br />
f(a,b)<br />
y<br />
Unknown!
Why minimise a function numerically?<br />
Background: linear regression
Why minimise a function numerically?<br />
Background: linear regression<br />
Straight Line: f(x) = α 1 x + α 2
Why minimise a function numerically?<br />
Background: linear regression<br />
Straight Line: f(x) = α 1 x + α 2<br />
ε i
Why minimise a function numerically?<br />
Background: linear regression<br />
Straight Line: f(x) = α 1 x + α 2<br />
Error between f(x i ) given by<br />
<strong>the</strong> model <strong>and</strong> y i from <strong>the</strong><br />
data:<br />
ε ( α<br />
i<br />
1<br />
, α<br />
2)<br />
= f ( xi<br />
)<br />
= α x<br />
1<br />
i<br />
−<br />
+ α<br />
y<br />
2<br />
i<br />
−<br />
y<br />
i<br />
ε i
Why minimise a function numerically?<br />
Background: linear regression<br />
Straight Line: f(x) = α 1 x + α 2<br />
Error between f(x i ) given by<br />
<strong>the</strong> model <strong>and</strong> y i from <strong>the</strong><br />
data:<br />
ε ( α<br />
i<br />
1<br />
, α<br />
2)<br />
= f ( xi<br />
)<br />
= α x<br />
1<br />
i<br />
−<br />
+ α<br />
y<br />
2<br />
i<br />
−<br />
y<br />
i<br />
ε i<br />
Task: Find <strong>the</strong> parameters α 1<br />
<strong>and</strong> α 2 that minimise <strong>the</strong> sum<br />
of squared errors!<br />
E(<br />
α , α )<br />
1<br />
2<br />
=<br />
N<br />
∑<br />
i=<br />
1<br />
ε ( α , α )<br />
i<br />
1<br />
2<br />
2<br />
=<br />
N<br />
∑<br />
i=<br />
1<br />
( α x<br />
1<br />
i<br />
+ α<br />
2<br />
−<br />
y<br />
i<br />
)<br />
2
Why minimise a function numerically?<br />
non-linear regression<br />
• Linear Regression:<br />
▫ Fitting function is linear with respect to <strong>the</strong><br />
parameters<br />
can be solved analytically (see Wikipedia)<br />
• Non-linear Regression:<br />
▫ Fitting function is non-linear with respect to <strong>the</strong><br />
parameters (e.g. f(x,α 1 ,α 2 ) = sin(α 1 x)+cos(α 2 x))<br />
Often no analytical solution<br />
Numerical optimisation or direct search
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong><br />
1. Choose initial parameters α 1 <strong>and</strong> α 2<br />
2. Calculate <strong>the</strong> gradient<br />
3. Step in <strong>the</strong> direction of <strong>the</strong> gradient with a step<br />
size proportional to <strong>the</strong> amplitude of <strong>the</strong><br />
gradient<br />
you get new parameters α 1 <strong>and</strong> α 2<br />
4. Check if <strong>the</strong> parameters have changed at a rate<br />
above a certain threshold<br />
5. If yes, go to 2, else terminate
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Gradient</strong> <strong>Descent</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Simplex</strong> <strong>Algorithm</strong><br />
(for functions of 2 variables)<br />
1. Pick 3 parameter combinations Triangle<br />
2. Evaluate <strong>the</strong> function for those combinations<br />
f h ,f s ,f l : highest, second highest <strong>and</strong> lowest point<br />
3. Update <strong>the</strong> triangle<br />
using <strong>the</strong> best of <strong>the</strong><br />
transformations in<br />
<strong>the</strong> figure<br />
4. Check for end<br />
condition<br />
5. Go to 2 or<br />
terminate
f<br />
l<br />
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Update Rules<br />
≤<br />
f<br />
r<br />
accept<br />
<<br />
f<br />
r<br />
f<br />
s<br />
f<br />
r<br />
< f l<br />
f ≤ f <<br />
s<br />
r<br />
f<br />
h<br />
no improvement<br />
fr ≥ f h
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2
<strong>Nelder</strong>-<strong>Mead</strong> <strong>Algorithm</strong>: Example<br />
E(α 1 ,α 2 )<br />
α 1<br />
α 2