Parameter Estimation Methods in Physiological Modeling: An ...
Parameter Estimation Methods in Physiological Modeling: An ...
Parameter Estimation Methods in Physiological Modeling: An ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Parameter</strong> <strong>Estimation</strong> <strong>Methods</strong> <strong>in</strong><br />
<strong>Physiological</strong> Model<strong>in</strong>g:<br />
<strong>An</strong> Introduction<br />
Hien Tran<br />
Center for Research <strong>in</strong> Scientific Computation<br />
Department of Mathematics<br />
NC STATE University
Overview<br />
<br />
<br />
<br />
<br />
<strong>Parameter</strong> <strong>Estimation</strong>: Concepts<br />
Sensitivity Identifiability<br />
M<strong>in</strong>imization (Nonl<strong>in</strong>ear Least-squares Problems)<br />
– Gradient-free methods<br />
– Gradient-based methods<br />
Kalman Filter-based Method<br />
GRAZ 2007
<strong>Parameter</strong> <strong>Estimation</strong>: Concepts<br />
<br />
Scientists and modelers frequently wish to relate<br />
physical/biological parameters characteriz<strong>in</strong>g a<br />
model, ! , to collected observations mak<strong>in</strong>g up<br />
some data sets , y . We will assume that the<br />
fundamental physics/biology are adequately<br />
understood, so a function<br />
G, may be specified<br />
G(!) = y<br />
Comput<strong>in</strong>g<br />
G(!) might <strong>in</strong>volve solv<strong>in</strong>g an ord<strong>in</strong>ary<br />
differential equation or partial differential equation.<br />
GRAZ 2007
<strong>Parameter</strong> <strong>Estimation</strong>: Concepts (cont’d)<br />
Example: : Drug concentration dynamics<br />
x(t)<br />
u(t)<br />
y(t)<br />
dx(t)<br />
= ! p 1<br />
x(t) + p 2<br />
u(t), x(0) = 0<br />
dt<br />
y(t) = p 3<br />
x(t)<br />
- concentration of a drug<br />
- test-<strong>in</strong>put <strong>in</strong>jection of a drug<br />
- temporal measurement of the drug concentration<br />
- model parameters<br />
! = (p 1<br />
, p 2<br />
, p 3<br />
)<br />
GRAZ 2007<br />
For any known<br />
u(t)<br />
, the explicit solution is given by<br />
t<br />
y(t) = p 3<br />
p 2<br />
e ! p 1 (t ! s)<br />
" u(s)ds<br />
0
<strong>Parameter</strong> <strong>Estimation</strong>: Concepts (cont’d)<br />
y(t) = p 3<br />
p 2<br />
e ! p 1 (t ! s)<br />
" u(s)ds<br />
If the drug is <strong>in</strong>troduced rapidly as a brief pulse of unit<br />
magnitude, that is,<br />
u(t) = !(t) , then<br />
t<br />
0<br />
y(t) = p 3<br />
p 2<br />
e ! p 1t<br />
!# " $#<br />
G(" )<br />
! y<br />
y !<br />
The forward problem is given f<strong>in</strong>d . Our focus is on the<br />
<strong>in</strong>verse problem of given f<strong>in</strong>d .<br />
GRAZ 2007<br />
Inverse problems are hard!
<strong>Parameter</strong> <strong>Estimation</strong>: Concepts (cont’d)<br />
<br />
Consider a model described by<br />
1<br />
"<br />
0<br />
g(t,s)!(s)ds = y(t)<br />
Fredholm <strong>in</strong>tegral equation of the first k<strong>in</strong>d. Even <strong>in</strong> the<br />
simplest case,<br />
g(t, s) ! 1 , the system<br />
1<br />
"<br />
0<br />
!(s)ds = y(t)<br />
y(t)<br />
has no solution unless<br />
is a constant! Moreover, when a<br />
solution does exist, the solution is not unique!<br />
GRAZ 2007<br />
Existence. There may be no model that exactly fits the data<br />
(mathematical model is only an approximation or because the<br />
data conta<strong>in</strong> noise).<br />
Uniqueness. If exact solutions do exist, they may not be unique,<br />
even for an <strong>in</strong>f<strong>in</strong>ite number of exact data po<strong>in</strong>ts.
<strong>Parameter</strong> <strong>Estimation</strong>: Concepts (cont’d)<br />
<br />
Even if we do not encounter existence or uniqueness issue,<br />
<strong>in</strong>stability is a fundamental feature of <strong>in</strong>verse problem.<br />
Consider the simpler system<br />
A ! ! = ! y<br />
A det A ! 0<br />
If<br />
is near s<strong>in</strong>gular ( or the condition number of the<br />
matrix is very large), small change <strong>in</strong> measurement lead to an<br />
a large change <strong>in</strong> the model parameters. This issue lies <strong>in</strong> the<br />
mathematical model itself, and not <strong>in</strong> the particular algorithm<br />
used to solve the problem. Inverse problems where this<br />
situation arises are referred to as ill-posed.<br />
GRAZ 2007<br />
Issues <strong>in</strong> <strong>in</strong>verse problems: solution existence,<br />
solution uniqueness, , and <strong>in</strong>stability of the solution<br />
process.
<strong>Parameter</strong> <strong>Estimation</strong>: Concepts (cont’d)<br />
<br />
Remarks:<br />
For a model describ<strong>in</strong>g the relationship between model parameters<br />
and data<br />
as<br />
y<br />
y<br />
G(!) = y<br />
In practice,<br />
may be a function of time and/or space, or may be a<br />
collection of discrete observations. <strong>An</strong> important issue is that actual<br />
observations always conta<strong>in</strong> some amount of noise (from <strong>in</strong>strument<br />
read<strong>in</strong>gs, human errors, numerical round-off, etc.). We can thus<br />
envision data as generally consist<strong>in</strong>g of noiseless observations from a<br />
“perfect” experiment,<br />
, plus a noise component<br />
y true<br />
!<br />
!<br />
y =<br />
!<br />
y + " true<br />
G(! true )<br />
GRAZ 2007<br />
Statistical methods for parameter estimation and <strong>in</strong>ference<br />
(Prof. Banks)
Sensitivity Identifiability<br />
References:<br />
J.G. Reid, Structural identifiability <strong>in</strong> l<strong>in</strong>ear time-<strong>in</strong>variant<br />
systems, , IEEE Trans. AC 22: 242-246, 1977.<br />
M.S. Grewal, , and K. Glover, Identifiability of l<strong>in</strong>ear and<br />
nonl<strong>in</strong>ear dynamical systems, , IEEE Trans. AC 21: 833-837,<br />
1976.<br />
J.J. DiStefano, , and C. Cobelli, On parameter and structural<br />
identifiability: Nonunique observability/reconstructibility for<br />
identifiable systems, other ambiguities and new def<strong>in</strong>itions,<br />
IEEE Trans AC 25: 830-833, 1980.<br />
GRAZ 2007
Sensitivity Identifiability (cont’d)<br />
Consider the simple example of drug concentration dynamics<br />
given by<br />
dx(t)<br />
= ! p 1<br />
x(t) + p 2<br />
u(t), x(0) = 0<br />
dt<br />
y(t) = p 3<br />
x(t)<br />
For u(t) = !(t) ,<br />
y(t) = p 3<br />
p 2<br />
e ! p 1t<br />
It is clear that only<br />
p 1 and the product<br />
p 3<br />
p 2 can be determ<strong>in</strong>ed<br />
(and not or ). We say that the model is unidentifiable!<br />
p 2<br />
p 3<br />
GRAZ 2007
Sensitivity Identifiability (cont’d)<br />
More general, consider the system - experiment model (or<br />
simply, structure)<br />
dx<br />
dt = f (x(t,!),u(t),t;!), x(t 0,!) = x 0<br />
,<br />
y(t,!) = g(x(t,!);!)<br />
x "R n , ! "R p , y "R m<br />
A standard approach to estimate the unknown parameters<br />
!<br />
<strong>in</strong> terms of least-square error criterion<br />
is<br />
GRAZ 2007<br />
J(!) =<br />
#[ y(t,!) " z(t) ] 2<br />
dt,<br />
t 0<br />
T<br />
where z(t) is the data fitted by the model output by<br />
optimum choice of .<br />
!<br />
y(t,!)
Sensitivity Identifiability (cont’d)<br />
<br />
Structural Identifiability<br />
The given structure is said to be locally identifiable at<br />
J(!) ! 0<br />
if<br />
has a local m<strong>in</strong>imum at<br />
. If the m<strong>in</strong>imum is global,<br />
the structure is said to globally identifiable.<br />
These concepts have also became known as (local and globally) least<br />
square identifiability.<br />
! 0<br />
GRAZ 2007<br />
<br />
Output Dist<strong>in</strong>guishability<br />
This notion is the question of whether system output obta<strong>in</strong>ed<br />
with different parameter values can be quantitatively<br />
dist<strong>in</strong>guishable from each other (local concept)<br />
It has been shown that local output dist<strong>in</strong>guishability is equivalent to<br />
local (structural) identifiability
Sensitivity Identifiability (cont’d)<br />
<br />
Sensitivity Identifiability<br />
This notion is def<strong>in</strong>ed <strong>in</strong> terms of the output sensitivity<br />
functions with respect to the parameters, that is,<br />
!y i<br />
!" j<br />
(t," 0 )<br />
local concept!<br />
Def<strong>in</strong>e the<br />
m ! p sensitivity function matrix<br />
GRAZ 2007<br />
S(t,!) =<br />
#<br />
%<br />
%<br />
%<br />
%<br />
%<br />
%<br />
$<br />
"y 1<br />
"! 1<br />
! "y 1<br />
"! p<br />
" # "<br />
"y m<br />
"! 1<br />
! "y m<br />
"! p<br />
&<br />
(<br />
(<br />
(<br />
(<br />
(<br />
(<br />
'
Sensitivity Identifiability (cont’d)<br />
<br />
Sensitivity Identifiability (cont’d)<br />
Now, let !" = " # " 0 denote a small perturbation from ! 0<br />
. This gives<br />
rise to a small perturbation <strong>in</strong> the output<br />
!y = y(t,") # y(t," 0 ). Then,<br />
by the cha<strong>in</strong> rule for differentiation, we obta<strong>in</strong> the follow<strong>in</strong>g<br />
(approximate)) relationship<br />
!y = S!"<br />
A structure is then said to be sensitivity identifiable if the above<br />
equation can be solved uniquely for<br />
!" . This is the case if and only<br />
the<br />
rank(S) = p, or equivalently, if and only if det(S T S) ! 0 .<br />
GRAZ 2007<br />
It is clear that (local) output dist<strong>in</strong>guishability and sensitivity<br />
identifiability are equivalent concepts.<br />
#<br />
Comput<strong>in</strong>g the sensitivity function matrix S(t,!) =<br />
"y i(t,!) &<br />
% ?<br />
$ "!<br />
(<br />
j '
Sensitivity Identifiability (cont’d)<br />
<br />
Remarks:<br />
The name sensitivity from the sensitivity function matrix<br />
S(t,!) =<br />
#<br />
%<br />
$<br />
"y (t,!) &<br />
i<br />
"!<br />
(<br />
j '<br />
is used because the elements of the matrix are precisely the<br />
sensitivity functions (to be <strong>in</strong>troduced by Prof. Kappel)<br />
<br />
Problem:<br />
How do we compute the elements of sensitivity function matrix<br />
GRAZ 2007<br />
S(t,!) =<br />
#<br />
%<br />
$<br />
"y (t,!) &<br />
i<br />
"!<br />
(<br />
j '<br />
?
Sensitivity Identifiability (cont’d)<br />
<br />
F<strong>in</strong>ite-differences:<br />
dy i<br />
(t,! 0 )<br />
= y i(t,! 0 + he j<br />
) " y i<br />
(t,! 0 )<br />
,<br />
d! j<br />
h<br />
where<br />
!<br />
$<br />
e j<br />
= # 0,0,…,0, 1! ,0,…,0&<br />
"<br />
%<br />
j th<br />
T<br />
h =<br />
! - ! = mach<strong>in</strong>e epsilon<br />
GRAZ 2007
Sensitivity Identifiability (cont’d)<br />
GRAZ 2007<br />
<br />
Direct:<br />
Us<strong>in</strong>g the cha<strong>in</strong> rule for differentiation,<br />
where<br />
y(t,!) = g(x(t,!);!) "<br />
with <strong>in</strong>itial conditions x(t 0<br />
,!) = x 0<br />
, dx .<br />
d! (t 0<br />
) = 0<br />
!g<br />
Remarks: The derivatives<br />
!x , !g<br />
!" , !f<br />
!x , !f<br />
!"<br />
(tedious and error-prone)<br />
d<br />
dt<br />
dx<br />
dt = f (x(t,!),t;!)<br />
dx<br />
d! = "f dx<br />
"x d! + "f<br />
"!<br />
dy<br />
d! = #g<br />
#x<br />
Automatic Differentiation (TOMLAB/MAD,<br />
dx<br />
d! + #g<br />
#!<br />
can be computed by hand<br />
http://tomopt<br />
//tomopt.com/tomlab/products/mad/<br />
com/tomlab/products/mad/)
Sensitivity Identifiability (cont’d)<br />
<br />
Remarks: Relationship to Fisher Information Matrix<br />
For noisy data,<br />
z(t l<br />
) = y(t l<br />
,!) + e(t<br />
! l<br />
)<br />
F<br />
Fisher <strong>in</strong>formation matrix<br />
(Prof. Kappel’s lecture) is a measure of<br />
the amount of <strong>in</strong>formation about the unknown parameters available<br />
<strong>in</strong> the noisy data. It is <strong>in</strong>timately related to the identifiability question<br />
<strong>in</strong> the broadest sense.<br />
z<br />
noise<br />
T<br />
) + # "log f (z |!) & # "log f (z |!) &-+<br />
F(!) = E *<br />
$<br />
% "! '<br />
(<br />
$<br />
% "! '<br />
(.<br />
,+<br />
/+<br />
- augmented vector of measurements<br />
f (z |!) z !<br />
- conditional probability density function of given<br />
GRAZ 2007<br />
Now, , if the noise <strong>in</strong> the data has zero mean, unit variance, and<br />
identical normal distribution at each t l , and if the error e(t l<br />
) are<br />
uncorrelated, .<br />
F = S T S
Sensitivity Identifiability (cont’d)<br />
<br />
Remarks:<br />
– Nons<strong>in</strong>gularity of the Fisher <strong>in</strong>formation matrix has been shown to be a<br />
necessary and sufficient condition for a large class of problems (<strong>in</strong>clud<strong>in</strong>g<br />
schochastic models, with noise also <strong>in</strong> the dynamical system)<br />
– Fisher <strong>in</strong>formation matrix is also used <strong>in</strong> the computation of the<br />
generalized sensitivity function (Prof. Kappel)<br />
– Fisher <strong>in</strong>formation matrix is a very useful tool for optimiz<strong>in</strong>g the design<br />
variables <strong>in</strong> a parameter estimation experiment. In particular, it has been<br />
used to optimize sampl<strong>in</strong>g schedules and test-<strong>in</strong>puts <strong>in</strong> physiological<br />
studies<br />
GRAZ 2007<br />
References<br />
<br />
<br />
J.J. DiStefano III, Match<strong>in</strong>g the model and the experiment to the goals: Data<br />
limitations, complexity and optimal experiment design for dynamic systems<br />
with biochemical signals, , J. Cybern. Inf. Sci. . 2: 2-4, 1979.<br />
F. Mori, and J.J. DiStefano III, Optimal nonuniform sampl<strong>in</strong>g <strong>in</strong>terval and test-<br />
<strong>in</strong>put design for identification of physiological systems from very limited data,<br />
IEEE Trans. AC 24: 893-900, 1979.
M<strong>in</strong>imization (Nonl<strong>in</strong>ear Least-squares Problem)<br />
Reference:<br />
C.T. Kelley, Iterative <strong>Methods</strong> for Optimization, SIAM, 1999.<br />
Consider the structure<br />
dx<br />
dt = f (x(t,!),u(t),t;!), x(t 0 ,!) = x 0 ,<br />
y(t,!) = g(x(t,!);!)<br />
x "R n , ! "R p , y "R m<br />
In the least-squares formulation, , the parameters ! are found by<br />
m<strong>in</strong>imiz<strong>in</strong>g<br />
J(!) =<br />
#[ y(t,!) " z(t) ] 2<br />
dt,<br />
t 0<br />
T<br />
GRAZ 2007<br />
Algorithms fall <strong>in</strong>to 2 classes:<br />
Gradient-free methods (sampl<strong>in</strong>g methods)<br />
Gradient-based methods (Quasi-Newton, subset selection)
M<strong>in</strong>imization (cont’d)<br />
<br />
Gradient-free methods (Nelder-Mead algorithm)<br />
It uses the concept of a simplex, , which is a polytope of<br />
vertices <strong>in</strong><br />
p dimension (a l<strong>in</strong>e segment <strong>in</strong> one-parameter space, a<br />
triangle <strong>in</strong> two-parameter, a tetrahedron <strong>in</strong> three-parameter space,<br />
etc.)<br />
Consider a 2-parameter estimation<br />
p + 1<br />
! = (p 1<br />
, p 2<br />
), simplex is a triangle<br />
! 1<br />
! 2<br />
! 3<br />
GRAZ 2007<br />
Evaluate<br />
J(! i<br />
) and sort them so that<br />
J(! 1<br />
) " J(! 2<br />
) " ! " J(! p+1<br />
)<br />
"#$<br />
worst po<strong>in</strong>t<br />
Idea: Replace the worst po<strong>in</strong>t with a po<strong>in</strong>t with a lower cost value!
M<strong>in</strong>imization (cont’d)<br />
<br />
Nelder-Mead algorithm (cont’d)<br />
– Compute the centroid of the simplex<br />
the worst po<strong>in</strong>t)<br />
– Replace<br />
with<br />
! = 1 p<br />
! p+1<br />
! new<br />
= "#! p+1<br />
+ (1+ #)!<br />
p<br />
"<br />
l =1<br />
! l<br />
(not <strong>in</strong>clud<strong>in</strong>g<br />
# = (1,2, 1 2 ," 1 2 ) = (# r ,# e ,# oc ,# ic ) ! i new = ! i<br />
+ ! 1<br />
If none of these values are better than the previous worst,<br />
algorithm shr<strong>in</strong>ks the simplex towards the best po<strong>in</strong>t<br />
2<br />
! 1<br />
! 2<br />
! 3<br />
GRAZ 2007<br />
e<br />
r<br />
oc<br />
ic
M<strong>in</strong>imization (cont’d)<br />
<br />
Nelder-Mead algorithm (cont<br />
(cont’d)<br />
Remarks:<br />
– In theory, the Nelder-Mead algorithm is not guaranteed to<br />
converge to a m<strong>in</strong>imum and can stagnate at a suboptimal po<strong>in</strong>t.<br />
However, <strong>in</strong> practice, the method performs well and often results<br />
<strong>in</strong> an <strong>in</strong>itial rapid decrease of the objective function value.<br />
– This suggests a hybrid approach: : Use the Nelder-Mead algorithm<br />
<strong>in</strong>itially then a gradient-based method (Gauss-Newton algorithm)<br />
to take advantage of fast local convergence of Gauss-Newton<br />
method<br />
GRAZ 2007<br />
– Other sampl<strong>in</strong>g methods: Implicit Filter<strong>in</strong>g, DIRECT, genetic<br />
algorithm
M<strong>in</strong>imization (cont’d)<br />
<br />
Gradient-based <strong>Methods</strong> (Gauss-Newton Algorithm)<br />
For discrete measurements,<br />
k<br />
( ) " z i<br />
J(!) = # y(t i<br />
,! ) 2<br />
i=1<br />
Def<strong>in</strong>e<br />
Y (!) =<br />
# y(t 1<br />
,!) " z 1 &<br />
%<br />
!<br />
(<br />
% (<br />
$ % y(t k<br />
,!) " z k '(<br />
J(!) = Y T (!)Y (!)<br />
Start with some current parameter estimate<br />
computed as<br />
! n<br />
= ! c<br />
+ ! "!<br />
correction<br />
! c<br />
. A new estimate is<br />
GRAZ 2007<br />
Idea: Compute the correction term from a quadratic expansion of the<br />
cost functional<br />
J(! new<br />
) = J(! c<br />
+ "!)
M<strong>in</strong>imization (cont’d)<br />
<br />
Gauss-Newton Algorithm (cont<br />
J(! new<br />
) = J(! c<br />
+ "!)<br />
(cont’d)<br />
= Y T (! c<br />
+ "!)Y (! c<br />
+ "!)<br />
$<br />
= Y (! c<br />
) + #Y<br />
#! (! '<br />
c)"!<br />
%<br />
&<br />
(<br />
)<br />
T<br />
$<br />
Y (! c<br />
) + #Y<br />
#! (! '<br />
c)"!<br />
%<br />
&<br />
(<br />
)<br />
T<br />
T<br />
#Y<br />
#Y<br />
= Y T (! c<br />
)Y (! c<br />
) + 2"! T (! c<br />
)Y (! c<br />
) + "! T (! c<br />
) #Y<br />
#!<br />
#! #! (! )"! c<br />
This is a quadratic function <strong>in</strong><br />
derivative with respect to<br />
!"<br />
!"<br />
. The m<strong>in</strong>imum is given by tak<strong>in</strong>g<br />
and set it equal to zero to obta<strong>in</strong><br />
GRAZ 2007<br />
T<br />
!Y<br />
(" c<br />
) !Y<br />
!" !" (" c) #" = $ !Y<br />
T<br />
(" c<br />
)Y (" c<br />
)<br />
!##"<br />
## $ !"<br />
S T S
M<strong>in</strong>imization (cont’d)<br />
<br />
Gauss-Newton Algorithm (cont’d)<br />
Remarks:<br />
– Clearly, the method fails if the matrix<br />
S T S is (almost) s<strong>in</strong>gular. A<br />
well-know remedy is to modify the correction as<br />
(S T S + !I)"# = $ %Y<br />
T<br />
(# c<br />
)Y (# c<br />
)<br />
%#<br />
The positive parameter<br />
! is adjusted based on how near s<strong>in</strong>gular<br />
the matrix<br />
S T S is (Levenberg-Marquardt algorithm). It is chosen<br />
to balance between the steepest-descent step (very slow but<br />
certa<strong>in</strong> convergence) and the Gauss-Newton method (fast but<br />
uncerta<strong>in</strong> convergence)<br />
– To have sufficient decrease <strong>in</strong> the cost functional,<br />
GRAZ 2007<br />
! n<br />
= ! c<br />
+ s"!<br />
Backtrack<strong>in</strong>g (Armijo’s s Rule)<br />
J(! n<br />
) # J(!) $ s ||! n<br />
$ ! c<br />
||
M<strong>in</strong>imization (cont’d)<br />
<br />
Subset Selection<br />
Reference:<br />
– M. Burth, G.C. Verghese, and M. Velez-Reyes, Subset selection for<br />
improved parameter estimation <strong>in</strong> on-l<strong>in</strong>e identification of a<br />
synchronous generator, , IEEE Trans on Power Systems 14:218-<br />
225, 1999.<br />
Idea:<br />
– Use sensitivity <strong>in</strong>formation to partition the parameter set <strong>in</strong>to two<br />
subsets: One subset associated with highly sensitive parameters<br />
and one subset associated with low sensitive parameters<br />
– When solv<strong>in</strong>g for parameters, only update those parameters <strong>in</strong> the<br />
first subset (highly sensitive parameter set). This way, the problem<br />
should be better conditioned!<br />
GRAZ 2007
Kalman-filter Based Method<br />
References:<br />
– R.E. Kalman, A new approach to l<strong>in</strong>ear filter<strong>in</strong>g and prediction<br />
problems, , Trans of the ASME - Journal of Basic Eng<strong>in</strong>eer<strong>in</strong>g 82<br />
(Series D): 35-45, 1960<br />
– F.L. Lewis, Optimal <strong>Estimation</strong> with an Introduction to Stochastic<br />
Control Theory, John Wiley & Sons, 1986<br />
Kalman Filter ( (A A Hypothetical Example)<br />
Suppose there are 3 contestants for the Price is Right show!<br />
f (x | z 1<br />
)<br />
- conditional probability<br />
GRAZ 2007<br />
z 1<br />
! z1<br />
First contestant’s<br />
estimate of the<br />
price is<br />
z 1 with<br />
standard deviation<br />
! z1
Kalman-filter Based Method (cont’d)<br />
f (x | z 2<br />
)<br />
z 1<br />
! z1<br />
z 2<br />
! z2<br />
GRAZ 2007<br />
At this po<strong>in</strong>t, there are two estimates available for predict<strong>in</strong>g the<br />
correct price. Question: : How do you (the third contestant) comb<strong>in</strong>e<br />
these data (so that you have a better estimate than either the first or<br />
the second contestant)?<br />
If<br />
! z1<br />
= ! z2 , the best<br />
estimate should be the<br />
2<br />
2<br />
! z2<br />
µ =<br />
! 2 2<br />
z1<br />
+ ! z ! z1<br />
1<br />
+<br />
z2<br />
! 2 2<br />
z1<br />
+ ! z average of the two<br />
2<br />
If ! z1<br />
> ! z2<br />
(i.e., z 2 is a<br />
z2<br />
better estimate), then<br />
1<br />
! = 1<br />
2 2<br />
! + 1<br />
the formula <strong>in</strong>dicates<br />
2<br />
that we should weight<br />
z1<br />
! z2<br />
our estimate more<br />
toward<br />
z 2
Kalman-filter Based Method (cont’d)<br />
f (x | z 2<br />
)<br />
z 1<br />
! z1<br />
z 2<br />
! z2<br />
GRAZ 2007<br />
Rewrite the updates as:<br />
µ = z 1<br />
+ K[z 2<br />
! z 1<br />
], K =<br />
" 2 = " 2 2<br />
z1<br />
! K" z1<br />
2<br />
" z1<br />
" 2 2<br />
z1<br />
+ " z2<br />
z 1<br />
z2<br />
Now, suppose that<br />
is the output<br />
from your model and<br />
is the<br />
measurement. Kalman filter is a<br />
technique that comb<strong>in</strong>es the model<br />
output with measurement to derive<br />
at a better estimate for the model<br />
output by consider<strong>in</strong>g both the<br />
error <strong>in</strong> the model and the error<br />
<strong>in</strong> the data
Kalman-filter Based Method (cont’d)<br />
<br />
Problem: : How do we extend this to dynamical systems?<br />
For l<strong>in</strong>ear system,<br />
dx<br />
dt<br />
= A(!)x + g(t)w(t)<br />
w(t)<br />
v k<br />
z k<br />
= C(!)x(t k<br />
) + v k<br />
where<br />
and<br />
are white noise processes with means zero and<br />
covariances Q and R , respectively. First, we def<strong>in</strong>e the conditional<br />
expected value and the conditional variance<br />
ˆx = E[ x(t) | (z k<br />
:t k<br />
< t) ]<br />
P(t) = E "# (x ! ˆx)(x ! ˆx) T | (z k<br />
:t k<br />
< t) $ %<br />
GRAZ 2007<br />
Kalman filter then attempts to make an estimate on the true state<br />
with a “predictor-corrector” sort of implementation. Note that to<br />
estimate also the parameters<br />
! we augment the state equations<br />
with<br />
d! / dt = 0
Kalman-filter Based Method (cont’d)<br />
ˆ!x = Aˆx<br />
!P = PA T + AP + gQg T !<br />
"<br />
#<br />
$# t k %1<br />
< t < t k<br />
, Predictor state (no measurement is used)<br />
GRAZ 2007<br />
<br />
ˆx k<br />
= ˆx ! k<br />
+ K k<br />
[z k<br />
! Cˆx ! k<br />
] "<br />
$<br />
K k<br />
= P ! (t k<br />
)C T [CP ! (t k<br />
)C T + R] !1<br />
# Corrector state (new measurement is used)<br />
P(t k<br />
) = [I ! K k<br />
C]P ! (t k<br />
)<br />
$<br />
%<br />
For nonl<strong>in</strong>ear system, , one idea is to l<strong>in</strong>earize the nonl<strong>in</strong>earities (Extended(<br />
Kalman Filter). Other ideas, such as Gaussian filters and Unscented<br />
Kalman filter, , attempt to approximate the underly<strong>in</strong>g conditional<br />
distribution by apply<strong>in</strong>g the nonl<strong>in</strong>ear function to a set of po<strong>in</strong>ts and<br />
recover<strong>in</strong>g <strong>in</strong>formation on this distribution based on the effect the<br />
function has on these po<strong>in</strong>ts. In ensemble Kalman filter, , Monte Carlo<br />
method is used to solve the time evolution equation of the probability<br />
density of the model state . F<strong>in</strong>ally, neural network is a viable approach<br />
for model design and parameter estimation ( (Dr. Spyros Courellis’ lecture)<br />
Reference: : G. Evensen, The ensemble Kalman filter: Theoretical<br />
formulation and practical implementation, , Ocean Dyn. . 53: 343-367, 2003