X - Webdocs Cs Ualberta

TRACKING REGION OF INTEREST 

IN MEDICAL SURGERY VIDEO 

Satarupa Mukherjee 

M Tech (Computer Science), Year – II 

Under the guidance of 

Professor Dipti Prasad Mukherjee 

ECSU 

Indian Statistical Institute 

Kolkata

O 

B 

J 

E 

C 

T 

I 

V 

E 

The objective of the project is to track a 

surgical tool in medical surgery video. 

Indian 

Statistical 

Institute 

August 2008

Indian 

Statistical 

Institute 

August 2008

M 

E 

T 

H 

O 

D 

O 

L 

O 

G 

Y 

Indian 

Statistical 

Institute 

August 2008

Indian 

Statistical 

Institute 

August 2008

• How to represent shapes more efficiently 

S 

H 

A 

P 

E 

S 

Q = 

set of n points (x i ,y i ), that more or less represent 

the object contour 

= (x 1 ,x 2 ,…,x n ,y 1 ,y 2 ,…,y n ) T 

X= affine shape vector (6-dimensional) 

• Affine Shape Transformation 

Q -> X 

Indian 

Statistical 

Institute 

August 2008

Affine Shape Transformation 

S 

H 

A 

P 

E 

S 

Q = WX+ Q 0 

• Master Template Q 0 : Set of n points 

W 

Inverse Affine Shape Transformation 

X = W + (Q –Q 0 ) 

10Q 

x 

0 

010 Q 

00Q 

y 

0 

Q 

y 

0 

x 

0 

0 

Indian 

Statistical 

Institute 

August 2008

Affine Shape Transformation 

S 

H 

A 

P 

E 

S 

Q = WX+ Q 0 

• Master Template Q 0 : Set of n points 

W 

10Q 

010 Q 

Inverse Affine Shape Transformation 

X = W + (Q –Q 0 ) 

x 

0 

00Q 

y 

0 

Q 

y 

0 

x 

0 

0 

Indian 

Statistical 

Institute 

August 2008 

X = (0,10,0,0,0,0) T

Indian 

Statistical 

Institute 

August 2008

M 

O 

T 

I 

O 

N M 

O 

D 

E 

L 

Indian 

Statistical 

Institute 

August 2008

M 

O 

T 

I 

O 

N M 

O 

D 

E 

L 

We shall train 

Our motion model 

From first 20 frames 

Indian 

Statistical 

Institute 

August 2008

M 

O 

T 

I 

O 

N 

M 

O 

D 

E 

L 

Indian 

Statistical 

Institute 

August 2008 

• Motion Model: Autoregressive Process 

X 

K 

X 

X K = affine shape vector in K-th Frame 

X = mean shape vector 

A 1 , A 2 = constant coefficients 

• Learning: Estimation of unknown 

parameters 

A ( X X) 

A ( X X 

2 K 2 

1 K 1 

X , A2, 

A 

1 

)

M 

O 

T 

I 

O 

N 

M 

O 

D 

E 

L 

X 

K 


X 

• Learning: Estimation of unknown 

parameters 

A ( X X) 

A ( X X 

2 K 2 

1 K 1 

X , A A 

2 , 

1 

) 

Indian 

Statistical 

Institute 

August 2008

M 

O 

T 

I 

O 

N 

M 

O 

D 

E 

L 

X 

K 

Random Motion Exhibited in Video 


X 

• w k ~ N(0,I) 

A ( X X) 

A ( X X 

2 K 2 

1 K 1 

B 0 

w K 

Noise added to model 

the non-uniformity of 

the tool motion 

) 

Indian 

Statistical 

Institute 

August 2008

M 

O 

T 

I 

O 

N 

M 

O 

D 

E 

L 

X 

K 

Random Motion Exhibited in Video 


X 

A ( X X) 

A ( X X 

2 K 2 

1 K 1 

Deterministic Part 

B 0 

w K 

Stochastic 

Part 

) 

Indian 

Statistical 

Institute 

August 2008

M 

O 

T 

I 

O 

N M 

O 

D 

E 

L 

Motion model 

looks for tool in a 

“cloud” of 

probable locations 

Indian 

Statistical 

Institute 

August 2008

M 

O 

T 

I 

O 

N 

M 

O 

D 

E 

L 

X 

K 

Non-Uniform Motion Exhibited in Video 


X 

A ( X X) 

A ( X X 

2 K 2 

1 K 1 

• What are the unknowns 

B 0 

w K 

) 

Indian 

Statistical 

Institute 

August 2008 

X 

, A A B 

, , 

2 1 

0

M 

O 

T 

I 

O 

N 

M 

O 

D 

E 

L 

Indian 

Statistical 

Institute 

August 2008 

Learning Algorithm For Motion Model 

R 

Input: Shape Vectors {X 1 ,……,X M } from first M(=20) 

frames 

Output: Parameters 

– Step 1: 

i 

Aˆ 

Aˆ 

Dˆ 

k 

X, A B 

2 

, A1 

, 

• Auto correlation coefficients R ij and R ij , i,j = 

0,1,2 are computed- 

M 

3 

X 

k 

i 

, 

R 

ij 

k 

M 

3 

X 

k 

i 

X 

T 

K 

j 

, 

R 

' 

ij 

0 

R 

ij 

1 

M 

1 

1 

2 

( R02 

R01R11 

R12)( 

R22 

R21R11 

R12) 

ˆ 

1 

( R01 

A2 

R21) 

1 

M 

( R 

2 

R 

11 

Aˆ 

R 

Aˆ 

0 2 2 1R1 

) 

Ri 

R 

2 

1 

T 

j

M 

O 

T 

I 

O 

N 

M 

O 

D 

E 

L 

Indian 

Statistical 

Institute 

August 2008 

Learning Algorithm For Motion Model 

Input: Shape Vectors {X 1 ,……,X M } from first M(=20) 

frames 

Output: Parameters 

– Step 2: 

• Mean X 

X 

Aˆ 

ˆ 

2 

A ) 

( I 

1 

X, A B 

2 

, A1 

, 

• Covariance matrix B o is estimated as matrix 

square root of C where C is : 

C 

1 

M 

( R 

2 

00 

Aˆ 

R 

2 

20 

Aˆ 

R 

1 

0 

10 

DR ˆ 

T 

0 

)

M 

O 

T 

I 

O 

N M 

O 

D 

E 

L 

1 

2 

3 

4 

Indian 

Statistical 

Institute 

August 2008 

Learning the 

parameters of motion 

model from X 1 ,X 2 ,…,X 20 

20

M 

O 

T 

I 

O 

N M 

O 

D 

E 

L 

Indian 

Statistical 

Institute 

August 2008 

• Motion Model Output: Predicted location of 

tool in K th frame 

X 

K 

X 

A ( X X) 

A ( X X 

2 K 2 

1 K 1 

• How to search for the tool in/around the 

location 

– Answer: Observation Model 

B 0 

w K 

)

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

Indian 

Statistical 

Institute 

August 2008 

• Motion Model Output: 

– Predicted location (X) of tool in K th frame 

– Shape vector X = (x 1 , x 2 , … , x 6 ) T 

Q 

Q 

x 

y 

10Q 

x 

0 

010 Q 

00Q 

y 

0 

Q 

y 

0 

x 

0 

0 

* 

x 

x 

1 

... 

6 

Q 

Q 

The set of 32 pixel co-ordinates (Q x ,Q y ) T will be 

used to detect the contour of tool in K th frame 

x 

0 

y 

0

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

 

For each of 32 pixel co-ordinates in (Q x ,Q y ) T look 

for its “substitute” - How 

Indian 

Statistical 

Institute 

August 2008

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

 



• On each normal look for edge pixel 

• Take the nearest one 

• This is one of the 32 substitutes 

• Repeat this for all 32 points 

Indian 

Statistical 

Institute 

August 2008

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

 



Indian 

Statistical 

Institute 

August 2008

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

How to draw the normal 

Q 

j 

x 

y 

j 

j 

Indian 

Statistical 

Institute 

August 2008

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

Indian 

Statistical 

Institute 

August 2008 


Rotate (Q j+1 -Q j ) 

by π/2 

R 

cos 

sin 

2 

2 

sin 

cos 

2 

2

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

Indian 

Statistical 

Institute 

August 2008 


Normal n j =R* (Q j+1 - Q j ) 

R 

cos 

sin 

2 

2 

sin 

cos 

2 

2

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

Indian 

Statistical 

Institute 

August 2008 


First Co-ordinate on 

normal 

R 

cos 

sin 

nˆ 

2 

2 

j 

sin 

cos 

n 

n 

n 

n 

2 

2 

x 

j 

j 

y 

j 

j

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 


All pixel co-ordinates 

on normal 

nˆ 

j 

n 

n 

n 

n 

x 

j 

j 

y 

j 

j 

Indian 

Statistical 

Institute 

August 2008 

(replacing All pixel coordinates 

normal on on other 

side 

+ by – will 

give 

normal

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

 


for its “substitute” - how “good is the fit” 

Indian 

Statistical 

Institute 

August 2008

O 

B 

S 

E 

R 

V 

A 

T 

I 

O 

N 

M 

O 

D 

E 

L 

 

Observation Model : Parameters 

φ(s i ) = ||r(s i )-r f (s i )|| 2, if we get edge pixel 

= ρ otherwise 

 

The weights can be calculated in the 

following way- 

exp{ 

N 

1 

( )} 

2 

i 

s i 

2 i 1 

Indian 

Statistical 

Institute 

August 2008

C 

O 

N 

D 

E 

N 

S 

A 

T 

I 

O 

N 

A 

L 

G 

O 

R 

I 

T 

H 

M 

• Proposed by Isard & Blake, 1997 

– Condensation – CONDitional DENSity 

propagATION, International Journal Of Computer 

Vision 

• Application so far 

– Dancing girl (Isard & Blake, 1997) 

– Human hand moving over table (Isard & Blake, 

1998) 

– Multiple object tracking (McCormick et al, …) 

Indian 

Statistical 

Institute 

August 2008

C 

O 

N 

D 

E 

N 

S 

A 

T 

I 

O 

N 

A 

L 

G 

O 

R 

I 

T 

H 

M 

• This presentation addresses 

– Condensation Algorithm applied to medical surgery 

video 

• Challenges I faced 

– Non-unifornm movement of surgical tool 

– Too much clutter!!! 

– Heavy noise 

– Insufficient illumination / specular effects (due to 

too much illumination) 

Indian 

Statistical 

Institute 

August 2008 

– Occlusion (tool going inside human organ)

C 

O 

N 

D 

E 

N 

S 

A 

T 

I 

O 

N 

A 

L 

G 

O 

R 

I 

T 

H 

M 

Indian 

Statistical 

Institute 

Iteration From the old sample set {s k-1 (n), π k-1 (n), c k-1 (n), n=1,2,…..,N} 

at time step t k-1 , construct a new sample set {s k (n), π k (n), c k (n), 

n=1,2,…..,N} for time t k . 

Construct the n-th of N new samples as follows- 

Select a sample s k ’(n) as follows- 

Generate a random number, uniformly distributed. 

Find, by binary subdivision, the smallest j for which c k-1 (j)>=r 

Set Set s k ’ (n) =s 

(j) 

k-1 

Predict by sampling from p(χ k |χ k-1 =s k ’ (n) ) to chose each s k (n). As the 

dynamics are governed by linear AR process, the new sample value 

may be generated as 

s k (n)=As k (n)+(I-A)+Bw k (n) 

wk(n) is vector of standard normal variates. 

BB T is process noise covariance. 

A=[0 I; A2 A1] B=[0; B0] 

Measure and weight the new position in terms of the measured features 

Z k - 

π k (n)=p(Z k |X k =s k (n)) 

then normalize so that k(n)=1 and store together with cumulative 

probability as 

(s k (n), π k (n), c k (n)) where 

c k (0)=0 

c k (n)=c k (n-1)+π k (n) 

n=1,2,…..,N

Indian 

Statistical 

Institute 

August 2008

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

Indian 

Statistical 

Institute 

August 2008 

• Training Set 

– From each of first 20 (why 20 ) frames 

• Locate the tool by hand 

• Take 32 (why 32) points along the border of 

the tool (called as Q) 

• The tool in the first frame is considered master 

template (Q o ) 

• Deduce Shape Vector X i for i = 1, 2, …, 20 

frames by Affine Shape Space Transformation 

X 

W 

( Q Q ) o 

• 32 dimensional Q vector is reduced to 6 - 

dimensional shape vector X

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

Indian 

Statistical 

Institute 

August 2008 

• Learning Motion Model (2 nd order AR Process) 

X 

– From all shape vectors {X 1 , X 2 , …, X 20 } from the 

k 

20 frames we learn the 4 parameters of motion 

model 

X 

A ( X X ) A ( X X 

2 k 2 

1 k 1 

• Learning Observation Model (Gaussian Mixtures) 

– From all shape vectors {X 1 , X 2 , …, X 20 } from the 

20 frames we learn the 4 parameters of motion 

model {ϭ, Length of Normal} 

B 0 

w k 

)

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

Indian 

Statistical 

Institute 

August 2008 

• Tracking Initialization 

– On the 21 st frame we hand picked “two” sets of 32 points 

• One set of 32 points (say Q’) lie inside the tool contour 

X 1 = W + (Q’ – Q o ) 

• One set of 32 points lie outside the tool contour 

X 1 = W + (Q’ – Q o ) 

– In between them we interpolated n number of shape 

vectors ( n = 10 in our case)

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

• Tracking Initialization (contd…) 

– So we have {X 1 , X 2 , …, X 10 } shape vectors initially 

– Initially we assign equal weights π i = 1/10 to each of 

the 10 shape vectors 

– Now the iteration step … 

Indian 

Statistical 

Institute 

August 2008

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

Indian 

Statistical 

Institute 

August 2008 

• Input (from K-1 th frame): 

{X 1 , X 2 , …, X 10 } – 10 shape vectors 

{π 1 , π 2 ,…, π 10 } - 10 weights, each associated with a 

particular shape vector 

• Iteration: (1 st step) 

Selection 

‣ Generate a random number r Є (0,1), uniformly 

distributed 

‣ Find the smallest j for which π j ≥ r 

‣ Select X j

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

Indian 

Statistical 

Institute 

August 2008 





• Iteration: (2 nd step) 

Prediction (AR2 Motion Model) 

‣ Each shape vector from K-1 th frame and its 

parent from k-2 th frame are used to predict the 

possible location of tool in k th frame 

X 

K 

X A2 ( XK 

2 

X) 

A1 

( XK 

1 

X) 

B 0 

w K

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

Indian 

Statistical 

Institute 

August 2008 





• Iteration: (3 rd step) 

Measurement – Construction of X 1,2,…,10 for K th frame 

‣ Each predicted shape vector X is Affine 

Transformed as follows to generate 32 points 

Q 

Q 

x 

y 

10Q 

x 

0 

010 Q 

00Q 

y 

0 

Q 

y 

0 

x 

0 

0 

* 

x 

x 

1 

... 

6 

Q 

Q 

x 

0 

y 

0

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 






Measurement - Construction of X 1,2,…,10 for K th frame 



‣ Normal is drawn and edge pixel is sought on each 

normal. The nearest pixel is chosen 

Indian 

Statistical 

Institute 

August 2008

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

Indian 

Statistical 

Institute 

August 2008 






Measurement - Construction of X 1,2,…,10 for K th frame 



‣ Normal is drawn and edge pixel is sought on each 

normal. The nearest pixel is chosen 

‣ X = W + (Q – Q o ) - ( X is constructed this way for K 

th frame)

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

Indian 

Statistical 

Institute 

August 2008 






Measurement: Construction of π 1,2,…,10 for K th frame 

‣ Error Estimate (s i ) 

φ(si)= ||r(s i )-r f (s i )|| 2 if we get edge pixel 


exp{ 

N 

1 

( )} 

2 

i 

s i 

2 i 1

I 

M 

P 

L 

E 

M 

E 

N 

T 

A 

T 

I 

O 

N 

Indian 

Statistical 

Institute 

August 2008 






Measurement: Construction of π 1,2,…,10 for K th frame 

‣ Error Estimate (s i ) 

φ(si)= ||r(s i )-r f (s i )|| 2 if we get edge pixel 


exp{ 

N 

1 

( )} 

2 

i 

s i 

2 i 1

Indian 

Statistical 

Institute 

August 2008

R 

E 

S 

U 

L 

T 

S 

Normals and edge pixels on each normal 

Indian 

Statistical 

Institute 

August 2008

R 

E 

S 

U 

L 

T 

S 

Initialization done with 2 hand drawn shape vectors 

and 8 more interpolated (from the 2) shape vectors 

Indian 

Statistical 

Institute 

August 2008

R 

E 

S 

U 

L 

T 

S 

One may of course initialize the algorithm with many 

(say 100) shape vectors… but at the cost of 

computation! 

Indian 

Statistical 

Institute 

August 2008

R 

E 

S 

U 

L 

T 

S 

Frame wise Tracking Results 

Frame 19 

Frame 23 

Indian 

Statistical 

Institute 

August 2008 

Frame 34 Frame 47

R 

E 

S 

U 

L 

T 

S 

Survival Of Fittest – Choosing the best shape 

(Front Tool) 

Indian 

Statistical 

Institute 

August 2008

R 

E 

S 

U 

L 

T 

S 

Survival Of Fittest – Choosing the best shape 

(Back Tool) 

Frame 5 Frame 8 

Indian 

Statistical 

Institute 

August 2008 

Frame 10

R 

E 

S 

U 

L 

T 

S 

Indian 

Statistical 

Institute 

August 2008

F 

I 

T 

N 

E 

S 

S 

Frame 

Number 

Measure Of Fit Of Estimated Shape To True Shape 

Area Covered 

By Tool 

(T sq units) 

Area Covered 

By Best 

Estimated Fit 

(E sq units) 

Area Of 

Overlap 

(O sq units) 

Missfit = (T-O) 

(in %) 

19.00 10296.00 10227.00 9006.00 12.53 

23.00 10726.00 10186.00 9260.00 13.67 

34.00 12025.00 9846.00 9541.00 20.66 

47.00 11231.00 11134.00 8771.00 21.90 

Indian 

Statistical 

Institute 

August 2008

F 

U 

T 

U 

R 

E 

S 

C 

O 

P 

E 

• Multiple tool detection 

• Updating motion parameters 

Indian 

Statistical 

Institute 

August 2008

R 

E 

F 

E 

R 

E 

N 

C 

E 

S 

Indian 

Statistical 

Institute 

August 2008 

• Andrew Blake, Michael Isard and David Reynard, Learning 

to track curves in motion, 1994 IEEE 

• Andrew Blake and Michael Isard, Active Contours, Springer 

• Michael Isard and Andrew Blake. Contour tracking by 

stochastic propagation of conditional density, 1996 

• Andrew Blake and Michael Isard, 3D position, attitude and 

shape input using video tracking of hands and lips, Robotics 

Research Group, University of Oxford. 

• David Reynard, Andrew Wildenberg, Andrew Blake and John 

Merchant, Learning Dynamics of Complex Motions from 

Image Sequences, 1996 

• A. Blake, M.Isard and D.Reynard, Learning to track the visual 

motion of contours, Artificial Intelligence, 1995. 

• DataSource: www.video.google.com 

• Michael Isard and Andrew Blake, Condensation-Conditional 

Density Propagation for Visual Tracking, International Journal 

of Computer Vision, 1998. 

• Craig, Introduction To robotics, Addison Wesley. 

• J. Canny, A Computational Approach to Edge Detection,

Thank You

X - Webdocs Cs Ualberta

Create successful ePaper yourself

Delete template?

Save as template?