Self-Organizing Maps, Principal Components and Non-negative ...
Self-Organizing Maps, Principal Components and Non-negative ...
Self-Organizing Maps, Principal Components and Non-negative ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong><br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
Karoline Geissler<br />
May 18, 2011<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
Table of Contents<br />
1 <strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
2 <strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
3 <strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
The method can be viewed as a version of K-means clustering.<br />
The prototypes lie in a one- or two- dimensional manifold.<br />
The resulting manifold is referred to as a constrained<br />
topological map.<br />
The original high dimensional observation can be mapped<br />
down onto a two-dimensional coordinate system.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The simplest version of the SOM<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
two dimensional grid of K prototypes mj ∈ R p .<br />
Parametrize each of the K prototypes to an integer coordinate<br />
pair ℓj ∈ Q1 × Q2.<br />
Prototypes are like ”buttons”.<br />
Map the observations xi down onto a two-dimensional grid.<br />
Find the closest prototype mj to xi. (Euclidean distance)<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
We move mk toward xi via<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
mk ←− mk + α(xi − mk) (1)<br />
Then for all neighbors of mk. (ℓj − ℓk < r).<br />
α ... the learning rate<br />
r... the distance threshold<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
Other version of the SOM<br />
The update step<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
mk ← ml + αh(ℓj − ℓk)(xi − mk) (2)<br />
h... neighborhood function, which gives more weight to<br />
prototypes mk with indices ℓk closer to ℓl than to those<br />
further away.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
Example<br />
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
Generate 90 data points in three dimensions (near the surface<br />
of a half sphere of radius 1)<br />
5 × 5 grid of prototypes<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
Figure: Simulated data in three classes<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
Figure: Left panel is the initial configuration, right panel the final one.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
Figure: Wiremesh representation of the fitted SOM model<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
Reconstruction Error<br />
(x − mj) 2<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
equal to the total sum of squares of each data point around<br />
it’s prototype<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
Figure: reconstruction error<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The simplest version of the SOM<br />
Other version of the SOM<br />
Example<br />
Reconstruction Error<br />
usefull for Document Organization <strong>and</strong> Retrieval<br />
SOM usefull for organizing <strong>and</strong> indexing large corpora<br />
term document matrix, where each row represents a single<br />
document<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
a sequence of projections of the data<br />
uncorrelated <strong>and</strong> ordered in variance<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
provide a sequence of best linear approximations to the given<br />
data in R p , of all ranks q ≤ p<br />
observations x1, ..., xN <strong>and</strong> rank - q linear model<br />
µ ... location vector in R p<br />
f (λ) = µ + Vqλ (3)<br />
Vq ... p × q matrix with q orthogonal unit vectors as columns<br />
λ ... q vector of parameters<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
minimizing the reconstruction error<br />
We obtain<br />
min<br />
µ,λ,V<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
N<br />
xi − µ − Vqλi 2<br />
i=1<br />
(4)<br />
ˆµ = ˆx (5)<br />
ˆλi = V T q (xi − ˆx) (6)<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
This leaves us to find the orthogonal matrix Vq<br />
min<br />
Vq<br />
N<br />
(xi − ˆx) − VqV T q (xi − ˆx) 2<br />
i=1<br />
The projection matrix Hq = VqV T q maps each point xi onto<br />
its rank-q reconstruction Hqxi.<br />
Other expression of the solution, singular value decomposition<br />
X = UDV T<br />
X...the rows contain the centered observations<br />
(7)<br />
(8)<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
U ... a N × p orthogonal matrix<br />
columns uj ... left singular vectors<br />
V ... p × p orthogonal matrix<br />
columns vj... right singular vectors<br />
D... diagonal matrix p × p matrix<br />
d1 ≥ d2 ≥ ... ≥ 0... singular values<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
columns of UD... principal components of X<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
H<strong>and</strong>written Digits<br />
sample of 130 h<strong>and</strong>written 3’s<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
We consider these images as points xi in R 256 <strong>and</strong> compute<br />
their principal components via the SVD.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
Figure: A sample of 130 h<strong>and</strong>written 3’s shows a variety of writing styles.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
Figure: the first two principal components of the h<strong>and</strong>written threes<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> Curves<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
Generalization of the principal component line.<br />
First for r<strong>and</strong>om variables X ∈ R p .<br />
f (λ)... a parameterized smooth curve, a vector function with<br />
p coordinates<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
For each data value x, let λf (x) define the closest point on<br />
the curve to x.<br />
The function f (λ) is called a principal curve for the<br />
distribution of X<br />
f (λ) = E(X | λf (X ) = λ) (9)<br />
f (λ) is the average of all data points that project to it.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> Points<br />
Set of k prototypes.<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
For each point x in the support of a distribution the closest<br />
prototype. (responsible prototype)<br />
The set of k points that minimize the exspected distance from<br />
X to its prototype are called the principal points.<br />
k = 1...the mean vector (circular normal distribution)<br />
k = ∞... principal curves<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
Construction of a principal curve of a distribution<br />
f (λ) = [f1(λ), f2(λ), ..., fp(λ)] ... coordinate function<br />
X T = (X1, ..., Xp)<br />
Consider the following alternating steps<br />
1 ˆ fj(λ) ←− E(Xj | λ(X ) = λ)<br />
2 ˆλf (x) ←− arg minλ x − ˆf (λ)<br />
The first equation fixes λ<br />
The second fixes the curve <strong>and</strong> finds the closest point to each<br />
data point.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
Spectral Clustering<br />
for non-convex clusters<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
Generalization of st<strong>and</strong>ard clustering methods.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
N × N matrix of pairwise similarities sii ′ ≥ 0 between all<br />
observation pairs<br />
undirected similarity graph G = 〈V , E〉.<br />
The N vertices vi represent the observations.<br />
Pairs of vertices are connected by an edge if their similarity is<br />
positive.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
=⇒ graph-partition problem (we identify connected<br />
components with clusters)<br />
Partition of the graph, such that edges between different<br />
groups have low weight und within a group have high weight.<br />
idea: construction of similarity graphs that represent the local<br />
neighborhood relationships between the observations.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
mutual K-nearest neighbor graph<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
NK ... a symmetric set of nearby pairs of points<br />
We connect all symmetric nearest points <strong>and</strong> give them edge<br />
weight wii ′ = sii ′ (otherwise zero)<br />
We set to zero all the pairwise similarities not in NK <strong>and</strong> draw<br />
the graph.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
Unnormalized graph Laplacian<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
A fully connected graph includes all pairwise edges with<br />
weights wii ′ = sii ′<br />
adjacency matrix... matrix of edge weights W = {wii ′}<br />
G... diagonal matrix with diagonal elements gi = <br />
i<br />
sum of the weights connected to it)<br />
unnormalized graph Laplacian<br />
′ wii ′ (the<br />
L = G − W (10)<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
Procedure<br />
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
Spectral clustering finds the m smallest eigenvectors to the m<br />
smallest eigenvalues of L.<br />
Consider any vector f<br />
f T Lf =<br />
N<br />
i=1<br />
= 1<br />
2<br />
gif 2<br />
i −<br />
N<br />
N<br />
i=1 i ′ = 1<br />
N<br />
N<br />
i=1 i ′ =1<br />
wii ′(fi − f ′<br />
i ) 2<br />
fifi ′wii ′ (11)<br />
(12)<br />
We have a small value of f T Lf if pairs of points with large wii ′<br />
have coordinates fi <strong>and</strong> fi ′ close together.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
If 1 T L1 = 0, the constant vector is a trivial eigenvector with<br />
eigenvalue zero.<br />
Then the graph is connected.<br />
For a graph with m connected components <strong>and</strong> L has m<br />
eigenvectors with eigenvalue zero.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Principal</strong> <strong>Components</strong><br />
<strong>Principal</strong> Curves<br />
Spectral Clustering<br />
Figure: Toy example illustrating spectral clustering<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
alternative approach to principal components analysis<br />
The data <strong>and</strong> components are assumed to be non-<strong>negative</strong>.<br />
It’s usefull for modeling non-<strong>negative</strong> data such as images.<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The N × p data matrix X is approximated by<br />
W is N × p<br />
H is r × p<br />
we assume that xij, wik, hkj ≥ 0<br />
X ≈ WH (13)<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
The matrices W <strong>and</strong> H are found by maximizing<br />
L(W, H) =<br />
N<br />
i=1<br />
N<br />
[xijlog(WHij) − (WHij)] (14)<br />
i=j<br />
This is the log-Likelihood from a model in which xij has a<br />
Poisson distribution with mean (WB)ij<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M
<strong>Self</strong> <strong>Organizing</strong> <strong>Maps</strong><br />
<strong>Principal</strong> <strong>Components</strong>, Curves <strong>and</strong> Surfaces<br />
<strong>Non</strong>-<strong>negative</strong> Matrix Factorization<br />
Thank you!<br />
Karoline Geissler <strong>Self</strong>-<strong>Organizing</strong> <strong>Maps</strong>, <strong>Principal</strong> <strong>Components</strong> <strong>and</strong> <strong>Non</strong>-<strong>negative</strong> M