05.04.2013 Views

Self-Organizing Maps, Principal Components and Non-negative ...

Self-Organizing Maps, Principal Components and Non-negative ...

Self-Organizing Maps, Principal Components and Non-negative ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong><br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

Karoline Geissler<br />

May 18, 2011<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

Table of Contents<br />

1 <strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

2 <strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

3 <strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

The method can be viewed as a version of K-means clustering.<br />

The prototypes lie in a one- or two- dimensional manifold.<br />

The resulting manifold is referred to as a constrained<br />

topological map.<br />

The original high dimensional observation can be mapped<br />

down onto a two-dimensional coordinate system.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The simplest version of the SOM<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

two dimensional grid of K prototypes mj ∈ R p .<br />

Parametrize each of the K prototypes to an integer coordinate<br />

pair ℓj ∈ Q1 × Q2.<br />

Prototypes are like ”buttons”.<br />

Map the observations xi down onto a two-dimensional grid.<br />

Find the closest prototype mj to xi. (Euclidean distance)<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

We move mk toward xi via<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

mk ←− mk + α(xi − mk) (1)<br />

Then for all neighbors of mk. (ℓj − ℓk < r).<br />

α ... the learning rate<br />

r... the distance threshold<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

Other version of the SOM<br />

The update step<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

mk ← ml + αh(ℓj − ℓk)(xi − mk) (2)<br />

h... neighborhood function, which gives more weight to<br />

prototypes mk with indices ℓk closer to ℓl than to those<br />

further away.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


Example<br />

<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

Generate 90 data points in three dimensions (near the surface<br />

of a half sphere of radius 1)<br />

5 × 5 grid of prototypes<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

Figure: Simulated data in three classes<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

Figure: Left panel is the initial configuration, right panel the final one.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

Figure: Wiremesh representation of the fitted SOM model<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

Reconstruction Error<br />

(x − mj) 2<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

equal to the total sum of squares of each data point around<br />

it’s prototype<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

Figure: reconstruction error<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The simplest version of the SOM<br />

Other version of the SOM<br />

Example<br />

Reconstruction Error<br />

usefull for Document Organization <strong>and</strong> Retrieval<br />

SOM usefull for organizing <strong>and</strong> indexing large corpora<br />

term document matrix, where each row represents a single<br />

document<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

a sequence of projections of the data<br />

uncorrelated <strong>and</strong> ordered in variance<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

provide a sequence of best linear approximations to the given<br />

data in R p , of all ranks q ≤ p<br />

observations x1, ..., xN <strong>and</strong> rank - q linear model<br />

µ ... location vector in R p<br />

f (λ) = µ + Vqλ (3)<br />

Vq ... p × q matrix with q orthogonal unit vectors as columns<br />

λ ... q vector of parameters<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

minimizing the reconstruction error<br />

We obtain<br />

min<br />

µ,λ,V<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

N<br />

xi − µ − Vqλi 2<br />

i=1<br />

(4)<br />

ˆµ = ˆx (5)<br />

ˆλi = V T q (xi − ˆx) (6)<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

This leaves us to find the orthogonal matrix Vq<br />

min<br />

Vq<br />

N<br />

(xi − ˆx) − VqV T q (xi − ˆx) 2<br />

i=1<br />

The projection matrix Hq = VqV T q maps each point xi onto<br />

its rank-q reconstruction Hqxi.<br />

Other expression of the solution, singular value decomposition<br />

X = UDV T<br />

X...the rows contain the centered observations<br />

(7)<br />

(8)<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

U ... a N × p orthogonal matrix<br />

columns uj ... left singular vectors<br />

V ... p × p orthogonal matrix<br />

columns vj... right singular vectors<br />

D... diagonal matrix p × p matrix<br />

d1 ≥ d2 ≥ ... ≥ 0... singular values<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

columns of UD... principal components of X<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

H<strong>and</strong>written Digits<br />

sample of 130 h<strong>and</strong>written 3’s<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

We consider these images as points xi in R 256 <strong>and</strong> compute<br />

their principal components via the SVD.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

Figure: A sample of 130 h<strong>and</strong>written 3’s shows a variety of writing styles.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

Figure: the first two principal components of the h<strong>and</strong>written threes<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> Curves<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

Generalization of the principal component line.<br />

First for r<strong>and</strong>om variables X ∈ R p .<br />

f (λ)... a parameterized smooth curve, a vector function with<br />

p coordinates<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

For each data value x, let λf (x) define the closest point on<br />

the curve to x.<br />

The function f (λ) is called a principal curve for the<br />

distribution of X<br />

f (λ) = E(X | λf (X ) = λ) (9)<br />

f (λ) is the average of all data points that project to it.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> Points<br />

Set of k prototypes.<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

For each point x in the support of a distribution the closest<br />

prototype. (responsible prototype)<br />

The set of k points that minimize the exspected distance from<br />

X to its prototype are called the principal points.<br />

k = 1...the mean vector (circular normal distribution)<br />

k = ∞... principal curves<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

Construction of a principal curve of a distribution<br />

f (λ) = [f1(λ), f2(λ), ..., fp(λ)] ... coordinate function<br />

X T = (X1, ..., Xp)<br />

Consider the following alternating steps<br />

1 ˆ fj(λ) ←− E(Xj | λ(X ) = λ)<br />

2 ˆλf (x) ←− arg minλ x − ˆf (λ)<br />

The first equation fixes λ<br />

The second fixes the curve <strong>and</strong> finds the closest point to each<br />

data point.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

Spectral Clustering<br />

for non-convex clusters<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

Generalization of st<strong>and</strong>ard clustering methods.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

N × N matrix of pairwise similarities sii ′ ≥ 0 between all<br />

observation pairs<br />

undirected similarity graph G = 〈V , E〉.<br />

The N vertices vi represent the observations.<br />

Pairs of vertices are connected by an edge if their similarity is<br />

positive.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

=⇒ graph-partition problem (we identify connected<br />

components with clusters)<br />

Partition of the graph, such that edges between different<br />

groups have low weight und within a group have high weight.<br />

idea: construction of similarity graphs that represent the local<br />

neighborhood relationships between the observations.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

mutual K-nearest neighbor graph<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

NK ... a symmetric set of nearby pairs of points<br />

We connect all symmetric nearest points <strong>and</strong> give them edge<br />

weight wii ′ = sii ′ (otherwise zero)<br />

We set to zero all the pairwise similarities not in NK <strong>and</strong> draw<br />

the graph.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

Unnormalized graph Laplacian<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

A fully connected graph includes all pairwise edges with<br />

weights wii ′ = sii ′<br />

adjacency matrix... matrix of edge weights W = {wii ′}<br />

G... diagonal matrix with diagonal elements gi = <br />

i<br />

sum of the weights connected to it)<br />

unnormalized graph Laplacian<br />

′ wii ′ (the<br />

L = G − W (10)<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


Procedure<br />

<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

Spectral clustering finds the m smallest eigenvectors to the m<br />

smallest eigenvalues of L.<br />

Consider any vector f<br />

f T Lf =<br />

N<br />

i=1<br />

= 1<br />

2<br />

gif 2<br />

i −<br />

N<br />

N<br />

i=1 i ′ = 1<br />

N<br />

N<br />

i=1 i ′ =1<br />

wii ′(fi − f ′<br />

i ) 2<br />

fifi ′wii ′ (11)<br />

(12)<br />

We have a small value of f T Lf if pairs of points with large wii ′<br />

have coordinates fi <strong>and</strong> fi ′ close together.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

If 1 T L1 = 0, the constant vector is a trivial eigenvector with<br />

eigenvalue zero.<br />

Then the graph is connected.<br />

For a graph with m connected components <strong>and</strong> L has m<br />

eigenvectors with eigenvalue zero.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Principal</strong> <strong>Components</strong><br />

<strong>Principal</strong> Curves<br />

Spectral Clustering<br />

Figure: Toy example illustrating spectral clustering<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

alternative approach to principal components analysis<br />

The data <strong>and</strong> components are assumed to be non-<strong>negative</strong>.<br />

It’s usefull for modeling non-<strong>negative</strong> data such as images.<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The N × p data matrix X is approximated by<br />

W is N × p<br />

H is r × p<br />

we assume that xij, wik, hkj ≥ 0<br />

X ≈ WH (13)<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

The matrices W <strong>and</strong> H are found by maximizing<br />

L(W, H) =<br />

N<br />

i=1<br />

N<br />

[xijlog(WHij) − (WHij)] (14)<br />

i=j<br />

This is the log-Likelihood from a model in which xij has a<br />

Poisson distribution with mean (WB)ij<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M


<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />

<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />

<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />

Thank you!<br />

Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!