Subspace-based Learning with Grassmann Kernels - VideoLectures

Subspace-based Learning with 

Grassmann Kernels 

Jihun Hamm and Daniel. D. Lee 

University of Pennsylvania 

July 8, 2008

Subspace structure in data 

• Image data 

• 

• 

• 

Illumination variation is low-dimensional 

empirical [Hallinan94,Epstein95], theoretical 

[Belhumeur98,Ramamoorthi01,Ramamoorthi02,Basri03] 

Pose, expression, etc also modeled well by 

subspaces: “Eigenface” [Sirovich87,Kirby90,Turk91]

Set of illumination-subspaces 

sets subspaces 

X 1 

X 2 

PCA 

2 

1 

Y 1 

0 

0 2 4 6 

2 

1 

Y 2 

∈ R D×m 

0 

0 2 4 6

Set of pose-subsapces 

sets subspaces 

X 1 

X 2 

PCA 

2 

1 

Y 1 

0 

0 2 4 6 

2 

1 

Y 2 

∈ R D×m 

0 

0 2 4 6

Linear dynamical model 

• ARMA model of a sequential data 

• 

e.g., dynamic textures, human actions 

[Doretto03,Veeraraghavan05,Turaga08] 

• Observability matrix [Cock02] 

• 

x(t + 1) = Ax(t) + v(t) 

y(t) = Cx(t) + w(t) 

x(t): internal state, y(t): observed output, v(t), w(t): noise 

O (A,C) = [C ; CA ; CA 2 ; ...] ∈ R D×m 

Each ARMA model spans a unique subspace

Set of observability subspaces 

check watch OA1,C1 

scratch head 

. . . 

image sequences observability matrices 

OA2,C2 

. . .

• 

Subspace-based learning 

Assumption: data consists of linear subspaces 

• 

• 

subspace 

1 

R D 

subspace 

2 subspace 

N 

Model-out the known (undesired) variability with 

linear subspaces 

Learn the unknown (interesting) variability 

between subspaces

Framework for subspace-based learning 

• 

The Grassmann manifold G(m, D) is the set of m-dimensional 

linear subspaces of the R D . 

• 

R D 

span( Yi ) 

u 1 

!1, ..., !m 

span( Yj ) 

v 1 

Applications in signal processing and control 

[Srivastava00,Henkel05,Baumann07], optimization 

[Edelman99], and computer vision 

[Liu04,Lin06,Chang06,Turaga08]. 

Yi 

G(m, D ) 

! 2 

Yj

Representation 

• Quotient space representations of 

G(m, D) = O(D)/O(m) × O(D − m) 

• Basis representation of an element 

An element of G(m, D) is represented by a D × m matrix 

such that Y ′ Y = Im, with the equivalence relation: 

Y1 ∼ = Y2 

⇐⇒ span(Y1) = span(Y2) 

⇐⇒ ∃Rm ∈ O(m), such that Y1 = Y2Rm 

G(m, D)

Principal angle/canonical corr 

Let Y1 and Y2 be two orthonormal matrices of size D by m, 

and let u ∈ span(Y1) and v ∈ span(Y2) be unit vectors. 

R D 

span( Yi ) 

u 1 

!1, ..., !m 

span( Yj ) 

The first principal angle/canoncial corr between span(Y1) and span(Y2) is 

cos θ1 = max 

u∈span(Y1) 

v 1 

max 

v∈span(Y2) u′ v, subject to �u� = �v� = 1. 

Yi 

G(m, D ) 

! 2

• 

• 

k-th principal angle 

The k-th principal angle/cannonical correlation is: 

cos θk = max 

uk∈span(Y1) 

max 

vk∈span(Y2) uk ′ vk, subject to 

uk ′ uk = 1, vk ′ vk = 1, 

uk ′ ui = 0, vk ′ vi = 0, (i = 1, ..., k − 1). 

0 ≤ θ1 ≤ · · · ≤ θm ≤ π/2 and 1 ≥ cos θ1 ≥ · · · ≥ cos θm ≥ 0 

Use SVD for computation: 

Y ′ 

1Y2 = USV ′ [Golub96] 

, where U = [u1 ... um], V = [v1 ... vm], 

and S is the diagonal matrix S = diag(cos θ1 ... cos θm).

Principal angles and distance 

• Given: two subspaces Y1 and Y2, and principal 

angles θ1, ... , θm from SVD 

• How to define a good subspace distance 

from θ1, ... , θm 

? 

• A canonical distance [Edelman99] 

• Arc-length: 

• Is this the only distance? 

d 2 Arc (Y1, Y2) = � 

i θ2 i

Grassmann distances 

• Projection distance [Edelmann99,Wang06] 

d2 Proj (Y1, Y2) = �m i=1 sin2 θi = 2−1 �Y1Y ′ 

1 − Y2Y ′ 

• Binet-Cauchy distance [Wolf03,Vishwanathan04] 

• 

d 2 BC (Y1, Y2) = 1 − � 

i cos2 θi = 1 − det(Y ′ 

1Y2) 2 

Martin distance between two ARMA models 

[Martin00] 

• Max Corr, Min Corr, Procrustes1/2 

2� 2 F

• Characteristics 

• d2 MaxCor = 2 sin2 θ1 

• 

• 

Comparison 

Arc Length Projection Binet-Cauchy 

d2 (Y1, Y2) · 2−1�Y1Y ′ 

1 − Y2Y ′ 

2�2 F 1 − det(Y ′ 

1Y2) 2 

In terms of θ 

� 2 θi � 2 

sin θi 1 − � cos2 Is a metric? Yes Yes 

θi 

Yes 

Max Corr Min Corr Procrustes 1 Procrustes 2 

d2 (Y1, Y2) 2 − 2�Y ′ 

1Y2�2 2 �Y1Y ′ 

1 − Y2Y ′ 

2�2 2 �Y1U − Y2V �2 F �Y1U − Y2V �2 In terms of θ 2 sin 

2 

2 θ1 sin 2 θm 4 � sin 2 (θi/2) 4 sin 2 (θm/2) 

Is a metric? No Yes Yes Yes 

is a rough measure 

d 2 MinCor = 2 sin2 θm, d 2 Proc2 = 4 sin2 (θm/2) 

, intermediate 

too sensitive 

d 2 Arc = � θ 2 i , d2 Proj = � sin 2 θi, d 2 Proc1 = 4 � sin 2 (θi/2) 

d 2 BC = 1 − � m 

i=1 cos2 θi

Applications 

• Distance-based: e.g., k-NN 

• Mutual Subspace Method [Yamaguchi98] 

• Beyond k-NN: subspace-based discriminant 

analysis 

• 

• 

• 

Previous methods 

Constrained Mutual Subspace Method [Fukui03] 

Discriminant Analysis of Canonical Correlations 

[Kim06]

Some complications 

• Find a discriminative direction w ∈ R , 

so that 

D 

• 

• 

d(w ′ Xi, w ′ Xj) becomes 

� small if yi = yj, 

large if yi �= yj, 

A difficult optimization problem 

Procedures often iterative, and not well justified 

• Inconsistency: projection is performed in 

image space, but distances are computed on 

Grassmann space

Easier solution 

• Use kernel-induced Hilbert space 

H 1 

H 1 

span( Yi ) 

span( Yi ) 

!1, ..., !m 

!1, ..., !m 

span( Yj ) 

span( Yj ) 

Yi 

G(m, D ) 

G(m, D ) 

! 2 

! 2 

! " 

X 

X 

! 

H2 

H2 

" 

"( ) 

"( ) "( ) 

"( ) 

No need to 1) project data and 

2) measure distances separately 

Yi 

Yj 

Yj

Grassmann kernels 

• Let k : R be a real-valued 

symmetric function, 

Dm × RDm → R 

• Invariance: 

k(Y1, Y2) = k(Y1R1, Y2R2), ∀R1, R2 ∈ O(m) 

• 

k(x1, x2) = k(x2, x1) 

Positive definiteness 

� 

cicjk(xi, xj) ≥ 0, ∀(x1, ..., xn), ∀(c1, ..., cn), n ∈ N 

i,j 

• dProj, dBC 

have corresponding Grassmann kernels

Projection kernel 

• Projection embedding [Chikuse06] 

The map Ψ : G(m, D) → R D×D , span(Y ) ↦→ Y Y ′ 

is an isometric embedding from (G, dProj) to (R D×D , � · �F ). 

• Natural inner product in 

• Projection kernel 

R D×D : tr(Y1Y ′ 

1Y2Y ′ 

2) 

• kProj(Y1, Y2) = tr(Y1Y is a 

Grassmann kernel 

′ 

1Y2Y ′ 

2) = �Y ′ 

1Y2�2 F 

• Has a very simple form and requires only 

multiplications to evaluate. 

O(Dm)

Binet-Cauchy kernel 

• Binet-Cauchy identity [Horn85] 

Suppose we choose m rows from a D × m matrix A. 

Then, there are n = DCm square subsmatrices A (s1) , ..., A (sn) 

det(A ′ B) = � 

s det A(s) det B (s) 

• Binet-Cauchy embedding 

Ψ : G(m, D) → R 

• Binet-Cauchy kernel [Wolf03,Vishwanathan04] 

• is a Grassmann kernel 

n , span(Y ) ↦→ � det Y (s1) (sn) , ..., det Y � 

is an embedding. It is also an isometry from (G, dBC) to (Rn , � · �2). 

kBC(Y1, Y2) = (det Y ′ 

1Y2) 2

Advantages of Grassmann kernel 

• Access to all the kernel-based algorithms for 

Hilbert spaces! 

• Can generate a family of kernels: 

If k1(x, y) and k2(x, y) are PD kernels, then so are 

1. α1k1(x, y) + α2k2(x, y), (α1, α2 > 0), 

2. k1(x, y)k2(x, y), 

3. � k1(x, z)k1(y, z) dz, 

4. f(x)k1(x, y)f(y)

Extension to nonlinear subspace 

• `Doubly kernel’ method [Wolf03,Wang06] 

H 1 

Kernel PCA 

span( Yi ) 

!1, ..., !m 

span( Yj ) 

! " 

X H 

2 

Yi 

G(m, D ) 

! 2 

Yj 

"( ) "( )

Kernel Fisher Discriminant Analysis 

Training: 

1. Compute the matrix [Ktrain]ij = kP (Yi, Yj) or kBC(Yi, Yj) for all Yi, Yj in 

the training set. 

2. Solve maxα L(α) by eigen-decomposition. 

3. Compute the (C − 1)-dimensional coefficients Ftrain = α ′ Ktrain. 

Testing: 

1. Compute the matrix [Ktest]ij = kP (Yi, Yj) or kBC(Yi, Yj) for all Yi in 

training set and Yj in the test set. 

2. Compute the (C − 1)-dim coefficients Ftest = α ′ Ktest. 

3. Perform 1-NN classification from the Euclidean distance between Ftrain 

and Ftest.

Discriminant Analysis Algorithms 

• Baseline: Euclidean FDA 

• Grassmann Discriminant Analysis (GDA): 

• kernel FDA + Proj / BC kernel 

• Others 

• MSM : no dim reduction + MaxCor [Yamaguchi98] 

• cMSM : heur dim reduction + MaxCor [Fukui03] 

• DCC : iterating between 1.NDA 2. Proc1 [Kim07]

Illum-invariant face recognition 

• Yale face database [Georghiades01] 

• 38 persons x 9 poses x 45 illums 

• PCA along illum axis 

• 9-fold cross validation 

• Illum-invariant face recognition 

• CMU-PIE database [Sim03] 

• 68 persons x 7 poses x 43 illums 


person 

illumination 

pose

Pose-inv. object categorization 

• ETH-80 database [Leibe02] 

• 8 cats x 10 objects x 41 poses 

• PCA along pose axis 


• Pose-invariant object 

categorization 

category 

pose 

object

Video-based action recognition 

• IXMAS database [Weinland06] 

• 11 actions x 11 actors x T frms 

(x 3 trials) 

• ARMA model with T frms 


• Video-based action recognition 

action 

frame 

person

9(:*+7;8 

9(:*+7;8 

!&& 

#& 

& 

!&& 

#& 

& 

'()*+,(-* 

! " # $ % 

./0.1(-*+234*5.365+748 

ABC!D& 

! " # $ % 

./0.1(-*+234*5.365+748 

Results 

9(:*+7;8 

9(:*+7;8 

!&& 

#& 

& 

!&& 

#& 

& 

!?@A 

! " # $ % 

./0.1(-*+234*5.365+748 

@G=HI 

! E " F # 

./0.1(-*+234*5.365+748

Conclusion 

• Subspace-based learning: new paradigm for 

exploiting inherent linear structures in data 

• Grassmann manifold as a framework: 

Projection distances and kernels 

• Experiments: superior classification 

performance with proposed method 

• Not limited to image data/FDA method/ 

classification task 

• An open question

Subspace-based Learning with Grassmann Kernels - VideoLectures

Create successful ePaper yourself

Delete template?

Save as template?