Parameter Estimation Methods in Physiological Modeling: An ...

Parameter Estimation Methods in 

Physiological Modeling: 

An Introduction 

Hien Tran 

Center for Research in Scientific Computation 

Department of Mathematics 

NC STATE University

Overview 

 

 

 

 

Parameter Estimation: Concepts 

Sensitivity Identifiability 

Minimization (Nonlinear Least-squares Problems) 

– Gradient-free methods 

– Gradient-based methods 

Kalman Filter-based Method 

GRAZ 2007

Parameter Estimation: Concepts 

 

Scientists and modelers frequently wish to relate 

physical/biological parameters characterizing a 

model, ! , to collected observations making up 

some data sets , y . We will assume that the 

fundamental physics/biology are adequately 

understood, so a function 

G, may be specified 

G(!) = y 

Computing 

G(!) might involve solving an ordinary 

differential equation or partial differential equation. 

GRAZ 2007

Parameter Estimation: Concepts (cont’d) 

Example: : Drug concentration dynamics 

x(t) 

u(t) 

y(t) 

dx(t) 

= ! p 1 

x(t) + p 2 

u(t), x(0) = 0 

dt 

y(t) = p 3 

x(t) 

- concentration of a drug 

- test-input injection of a drug 

- temporal measurement of the drug concentration 

- model parameters 

! = (p 1 

, p 2 

, p 3 

) 

GRAZ 2007 

For any known 

u(t) 

, the explicit solution is given by 

t 

y(t) = p 3 

p 2 

e ! p 1 (t ! s) 

" u(s)ds 

0


y(t) = p 3 

p 2 

e ! p 1 (t ! s) 

" u(s)ds 

If the drug is introduced rapidly as a brief pulse of unit 

magnitude, that is, 

u(t) = !(t) , then 

t 

0 

y(t) = p 3 

p 2 

e ! p 1t 

!# " $# 

G(" ) 

! y 

y ! 

The forward problem is given find . Our focus is on the 

inverse problem of given find . 

GRAZ 2007 

Inverse problems are hard!


 

Consider a model described by 

1 

" 

0 

g(t,s)!(s)ds = y(t) 

Fredholm integral equation of the first kind. Even in the 

simplest case, 

g(t, s) ! 1 , the system 

1 

" 

0 

!(s)ds = y(t) 

y(t) 

has no solution unless 

is a constant! Moreover, when a 

solution does exist, the solution is not unique! 

GRAZ 2007 

Existence. There may be no model that exactly fits the data 

(mathematical model is only an approximation or because the 

data contain noise). 

Uniqueness. If exact solutions do exist, they may not be unique, 

even for an infinite number of exact data points.


 

Even if we do not encounter existence or uniqueness issue, 

instability is a fundamental feature of inverse problem. 

Consider the simpler system 

A ! ! = ! y 

A det A ! 0 

If 

is near singular ( or the condition number of the 

matrix is very large), small change in measurement lead to an 

a large change in the model parameters. This issue lies in the 

mathematical model itself, and not in the particular algorithm 

used to solve the problem. Inverse problems where this 

situation arises are referred to as ill-posed. 

GRAZ 2007 

Issues in inverse problems: solution existence, 

solution uniqueness, , and instability of the solution 

process.


 

Remarks: 

For a model describing the relationship between model parameters 

and data 

as 

y 

y 

G(!) = y 

In practice, 

may be a function of time and/or space, or may be a 

collection of discrete observations. An important issue is that actual 

observations always contain some amount of noise (from instrument 

readings, human errors, numerical round-off, etc.). We can thus 

envision data as generally consisting of noiseless observations from a 

“perfect” experiment, 

, plus a noise component 

y true 

! 

! 

y = 

! 

y + " true 

G(! true ) 

GRAZ 2007 

Statistical methods for parameter estimation and inference 

(Prof. Banks)


References: 

J.G. Reid, Structural identifiability in linear time-invariant 

systems, , IEEE Trans. AC 22: 242-246, 1977. 

M.S. Grewal, , and K. Glover, Identifiability of linear and 

nonlinear dynamical systems, , IEEE Trans. AC 21: 833-837, 

1976. 

J.J. DiStefano, , and C. Cobelli, On parameter and structural 

identifiability: Nonunique observability/reconstructibility for 

identifiable systems, other ambiguities and new definitions, 

IEEE Trans AC 25: 830-833, 1980. 

GRAZ 2007

Sensitivity Identifiability (cont’d) 

Consider the simple example of drug concentration dynamics 

given by 

dx(t) 

= ! p 1 

x(t) + p 2 

u(t), x(0) = 0 

dt 

y(t) = p 3 

x(t) 

For u(t) = !(t) , 

y(t) = p 3 

p 2 

e ! p 1t 

It is clear that only 

p 1 and the product 

p 3 

p 2 can be determined 

(and not or ). We say that the model is unidentifiable! 

p 2 

p 3 

GRAZ 2007


More general, consider the system - experiment model (or 

simply, structure) 

dx 

dt = f (x(t,!),u(t),t;!), x(t 0,!) = x 0 

, 

y(t,!) = g(x(t,!);!) 

x "R n , ! "R p , y "R m 

A standard approach to estimate the unknown parameters 

! 

in terms of least-square error criterion 

is 

GRAZ 2007 

J(!) = 

#[ y(t,!) " z(t) ] 2 

dt, 

t 0 

T 

where z(t) is the data fitted by the model output by 

optimum choice of . 

! 

y(t,!)


 

Structural Identifiability 

The given structure is said to be locally identifiable at 

J(!) ! 0 

if 

has a local minimum at 

. If the minimum is global, 

the structure is said to globally identifiable. 

These concepts have also became known as (local and globally) least 

square identifiability. 

! 0 

GRAZ 2007 

 

Output Distinguishability 

This notion is the question of whether system output obtained 

with different parameter values can be quantitatively 

distinguishable from each other (local concept) 

It has been shown that local output distinguishability is equivalent to 

local (structural) identifiability


 


This notion is defined in terms of the output sensitivity 

functions with respect to the parameters, that is, 

!y i 

!" j 

(t," 0 ) 

local concept! 

Define the 

m ! p sensitivity function matrix 

GRAZ 2007 

S(t,!) = 

# 

% 

% 

% 

% 

% 

% 

$ 

"y 1 

"! 1 

! "y 1 

"! p 

" # " 

"y m 

"! 1 

! "y m 

"! p 

& 

( 

( 

( 

( 

( 

( 

'


 


Now, let !" = " # " 0 denote a small perturbation from ! 0 

. This gives 

rise to a small perturbation in the output 

!y = y(t,") # y(t," 0 ). Then, 

by the chain rule for differentiation, we obtain the following 

(approximate)) relationship 

!y = S!" 

A structure is then said to be sensitivity identifiable if the above 

equation can be solved uniquely for 

!" . This is the case if and only 

the 

rank(S) = p, or equivalently, if and only if det(S T S) ! 0 . 

GRAZ 2007 

It is clear that (local) output distinguishability and sensitivity 

identifiability are equivalent concepts. 

# 

Computing the sensitivity function matrix S(t,!) = 

"y i(t,!) & 

% ? 

$ "! 

( 

j '


 

Remarks: 

The name sensitivity from the sensitivity function matrix 

S(t,!) = 

# 

% 

$ 

"y (t,!) & 

i 

"! 

( 

j ' 

is used because the elements of the matrix are precisely the 

sensitivity functions (to be introduced by Prof. Kappel) 

 

Problem: 

How do we compute the elements of sensitivity function matrix 

GRAZ 2007 

S(t,!) = 

# 

% 

$ 

"y (t,!) & 

i 

"! 

( 

j ' 

?


 

Finite-differences: 

dy i 

(t,! 0 ) 

= y i(t,! 0 + he j 

) " y i 

(t,! 0 ) 

, 

d! j 

h 

where 

! 

$ 

e j 

= # 0,0,…,0, 1! ,0,…,0& 

" 

% 

j th 

T 

h = 

! - ! = machine epsilon 

GRAZ 2007


GRAZ 2007 

 

Direct: 

Using the chain rule for differentiation, 

where 

y(t,!) = g(x(t,!);!) " 

with initial conditions x(t 0 

,!) = x 0 

, dx . 

d! (t 0 

) = 0 

!g 

Remarks: The derivatives 

!x , !g 

!" , !f 

!x , !f 

!" 

(tedious and error-prone) 

d 

dt 

dx 

dt = f (x(t,!),t;!) 

dx 

d! = "f dx 

"x d! + "f 

"! 

dy 

d! = #g 

#x 

Automatic Differentiation (TOMLAB/MAD, 

dx 

d! + #g 

#! 

can be computed by hand 

http://tomopt 

//tomopt.com/tomlab/products/mad/ 

com/tomlab/products/mad/)


 

Remarks: Relationship to Fisher Information Matrix 

For noisy data, 

z(t l 

) = y(t l 

,!) + e(t 

! l 

) 

F 

Fisher information matrix 

(Prof. Kappel’s lecture) is a measure of 

the amount of information about the unknown parameters available 

in the noisy data. It is intimately related to the identifiability question 

in the broadest sense. 

z 

noise 

T 

) + # "log f (z |!) & # "log f (z |!) &-+ 

F(!) = E * 

$ 

% "! ' 

( 

$ 

% "! ' 

(. 

,+ 

/+ 

- augmented vector of measurements 

f (z |!) z ! 

- conditional probability density function of given 

GRAZ 2007 

Now, , if the noise in the data has zero mean, unit variance, and 

identical normal distribution at each t l , and if the error e(t l 

) are 

uncorrelated, . 

F = S T S


 

Remarks: 

– Nonsingularity of the Fisher information matrix has been shown to be a 

necessary and sufficient condition for a large class of problems (including 

schochastic models, with noise also in the dynamical system) 

– Fisher information matrix is also used in the computation of the 

generalized sensitivity function (Prof. Kappel) 

– Fisher information matrix is a very useful tool for optimizing the design 

variables in a parameter estimation experiment. In particular, it has been 

used to optimize sampling schedules and test-inputs in physiological 

studies 

GRAZ 2007 

References 

 

 

J.J. DiStefano III, Matching the model and the experiment to the goals: Data 

limitations, complexity and optimal experiment design for dynamic systems 

with biochemical signals, , J. Cybern. Inf. Sci. . 2: 2-4, 1979. 

F. Mori, and J.J. DiStefano III, Optimal nonuniform sampling interval and test- 

input design for identification of physiological systems from very limited data, 

IEEE Trans. AC 24: 893-900, 1979.

Minimization (Nonlinear Least-squares Problem) 

Reference: 

C.T. Kelley, Iterative Methods for Optimization, SIAM, 1999. 

Consider the structure 

dx 

dt = f (x(t,!),u(t),t;!), x(t 0 ,!) = x 0 , 

y(t,!) = g(x(t,!);!) 

x "R n , ! "R p , y "R m 

In the least-squares formulation, , the parameters ! are found by 

minimizing 

J(!) = 

#[ y(t,!) " z(t) ] 2 

dt, 

t 0 

T 

GRAZ 2007 

Algorithms fall into 2 classes: 

Gradient-free methods (sampling methods) 

Gradient-based methods (Quasi-Newton, subset selection)

Minimization (cont’d) 

 

Gradient-free methods (Nelder-Mead algorithm) 

It uses the concept of a simplex, , which is a polytope of 

vertices in 

p dimension (a line segment in one-parameter space, a 

triangle in two-parameter, a tetrahedron in three-parameter space, 

etc.) 

Consider a 2-parameter estimation 

p + 1 

! = (p 1 

, p 2 

), simplex is a triangle 

! 1 

! 2 

! 3 

GRAZ 2007 

Evaluate 

J(! i 

) and sort them so that 

J(! 1 

) " J(! 2 

) " ! " J(! p+1 

) 

"#$ 

worst point 

Idea: Replace the worst point with a point with a lower cost value!


 

Nelder-Mead algorithm (cont’d) 

– Compute the centroid of the simplex 

the worst point) 

– Replace 

with 

! = 1 p 

! p+1 

! new 

= "#! p+1 

+ (1+ #)! 

p 

" 

l =1 

! l 

(not including 

# = (1,2, 1 2 ," 1 2 ) = (# r ,# e ,# oc ,# ic ) ! i new = ! i 

+ ! 1 

If none of these values are better than the previous worst, 

algorithm shrinks the simplex towards the best point 

2 

! 1 

! 2 

! 3 

GRAZ 2007 

e 

r 

oc 

ic


 

Nelder-Mead algorithm (cont 

(cont’d) 

Remarks: 

– In theory, the Nelder-Mead algorithm is not guaranteed to 

converge to a minimum and can stagnate at a suboptimal point. 

However, in practice, the method performs well and often results 

in an initial rapid decrease of the objective function value. 

– This suggests a hybrid approach: : Use the Nelder-Mead algorithm 

initially then a gradient-based method (Gauss-Newton algorithm) 

to take advantage of fast local convergence of Gauss-Newton 

method 

GRAZ 2007 

– Other sampling methods: Implicit Filtering, DIRECT, genetic 

algorithm


 

Gradient-based Methods (Gauss-Newton Algorithm) 

For discrete measurements, 

k 

( ) " z i 

J(!) = # y(t i 

,! ) 2 

i=1 

Define 

Y (!) = 

# y(t 1 

,!) " z 1 & 

% 

! 

( 

% ( 

$ % y(t k 

,!) " z k '( 

J(!) = Y T (!)Y (!) 

Start with some current parameter estimate 

computed as 

! n 

= ! c 

+ ! "! 

correction 

! c 

. A new estimate is 

GRAZ 2007 

Idea: Compute the correction term from a quadratic expansion of the 

cost functional 

J(! new 

) = J(! c 

+ "!)


 

Gauss-Newton Algorithm (cont 

J(! new 

) = J(! c 

+ "!) 

(cont’d) 

= Y T (! c 

+ "!)Y (! c 

+ "!) 

$ 

= Y (! c 

) + #Y 

#! (! ' 

c)"! 

% 

& 

( 

) 

T 

$ 

Y (! c 

) + #Y 

#! (! ' 

c)"! 

% 

& 

( 

) 

T 

T 

#Y 

#Y 

= Y T (! c 

)Y (! c 

) + 2"! T (! c 

)Y (! c 

) + "! T (! c 

) #Y 

#! 

#! #! (! )"! c 

This is a quadratic function in 

derivative with respect to 

!" 

!" 

. The minimum is given by taking 

and set it equal to zero to obtain 

GRAZ 2007 

T 

!Y 

(" c 

) !Y 

!" !" (" c) #" = $ !Y 

T 

(" c 

)Y (" c 

) 

!##" 

## $ !" 

S T S


 

Gauss-Newton Algorithm (cont’d) 

Remarks: 

– Clearly, the method fails if the matrix 

S T S is (almost) singular. A 

well-know remedy is to modify the correction as 

(S T S + !I)"# = $ %Y 

T 

(# c 

)Y (# c 

) 

%# 

The positive parameter 

! is adjusted based on how near singular 

the matrix 

S T S is (Levenberg-Marquardt algorithm). It is chosen 

to balance between the steepest-descent step (very slow but 

certain convergence) and the Gauss-Newton method (fast but 

uncertain convergence) 

– To have sufficient decrease in the cost functional, 

GRAZ 2007 

! n 

= ! c 

+ s"! 

Backtracking (Armijo’s s Rule) 

J(! n 

) # J(!) $ s ||! n 

$ ! c 

||


 

Subset Selection 

Reference: 

– M. Burth, G.C. Verghese, and M. Velez-Reyes, Subset selection for 

improved parameter estimation in on-line identification of a 

synchronous generator, , IEEE Trans on Power Systems 14:218- 

225, 1999. 

Idea: 

– Use sensitivity information to partition the parameter set into two 

subsets: One subset associated with highly sensitive parameters 

and one subset associated with low sensitive parameters 

– When solving for parameters, only update those parameters in the 

first subset (highly sensitive parameter set). This way, the problem 

should be better conditioned! 

GRAZ 2007

Kalman-filter Based Method 

References: 

– R.E. Kalman, A new approach to linear filtering and prediction 

problems, , Trans of the ASME - Journal of Basic Engineering 82 

(Series D): 35-45, 1960 

– F.L. Lewis, Optimal Estimation with an Introduction to Stochastic 

Control Theory, John Wiley & Sons, 1986 

Kalman Filter ( (A A Hypothetical Example) 

Suppose there are 3 contestants for the Price is Right show! 

f (x | z 1 

) 

- conditional probability 

GRAZ 2007 

z 1 

! z1 

First contestant’s 

estimate of the 

price is 

z 1 with 

standard deviation 

! z1

Kalman-filter Based Method (cont’d) 

f (x | z 2 

) 

z 1 

! z1 

z 2 

! z2 

GRAZ 2007 

At this point, there are two estimates available for predicting the 

correct price. Question: : How do you (the third contestant) combine 

these data (so that you have a better estimate than either the first or 

the second contestant)? 

If 

! z1 

= ! z2 , the best 

estimate should be the 

2 

2 

! z2 

µ = 

! 2 2 

z1 

+ ! z ! z1 

1 

+ 

z2 

! 2 2 

z1 

+ ! z average of the two 

2 

If ! z1 

> ! z2 

(i.e., z 2 is a 

z2 

better estimate), then 

1 

! = 1 

2 2 

! + 1 

the formula indicates 

2 

that we should weight 

z1 

! z2 

our estimate more 

toward 

z 2


f (x | z 2 

) 

z 1 

! z1 

z 2 

! z2 

GRAZ 2007 

Rewrite the updates as: 

µ = z 1 

+ K[z 2 

! z 1 

], K = 

" 2 = " 2 2 

z1 

! K" z1 

2 

" z1 

" 2 2 

z1 

+ " z2 

z 1 

z2 

Now, suppose that 

is the output 

from your model and 

is the 

measurement. Kalman filter is a 

technique that combines the model 

output with measurement to derive 

at a better estimate for the model 

output by considering both the 

error in the model and the error 

in the data


 

Problem: : How do we extend this to dynamical systems? 

For linear system, 

dx 

dt 

= A(!)x + g(t)w(t) 

w(t) 

v k 

z k 

= C(!)x(t k 

) + v k 

where 

and 

are white noise processes with means zero and 

covariances Q and R , respectively. First, we define the conditional 

expected value and the conditional variance 

ˆx = E[ x(t) | (z k 

:t k 

< t) ] 

P(t) = E "# (x ! ˆx)(x ! ˆx) T | (z k 

:t k 

< t) $ % 

GRAZ 2007 

Kalman filter then attempts to make an estimate on the true state 

with a “predictor-corrector” sort of implementation. Note that to 

estimate also the parameters 

! we augment the state equations 

with 

d! / dt = 0


ˆ!x = Aˆx 

!P = PA T + AP + gQg T ! 

" 

# 

$# t k %1 

< t < t k 

, Predictor state (no measurement is used) 

GRAZ 2007 

 

ˆx k 

= ˆx ! k 

+ K k 

[z k 

! Cˆx ! k 

] " 

$ 

K k 

= P ! (t k 

)C T [CP ! (t k 

)C T + R] !1 

# Corrector state (new measurement is used) 

P(t k 

) = [I ! K k 

C]P ! (t k 

) 

$ 

% 

For nonlinear system, , one idea is to linearize the nonlinearities (Extended( 

Kalman Filter). Other ideas, such as Gaussian filters and Unscented 

Kalman filter, , attempt to approximate the underlying conditional 

distribution by applying the nonlinear function to a set of points and 

recovering information on this distribution based on the effect the 

function has on these points. In ensemble Kalman filter, , Monte Carlo 

method is used to solve the time evolution equation of the probability 

density of the model state . Finally, neural network is a viable approach 

for model design and parameter estimation ( (Dr. Spyros Courellis’ lecture) 

Reference: : G. Evensen, The ensemble Kalman filter: Theoretical 

formulation and practical implementation, , Ocean Dyn. . 53: 343-367, 2003

Parameter Estimation Methods in Physiological Modeling: An ...

Create successful ePaper yourself

Delete template?

Save as template?