slides - SNAP - Stanford University

CS224W: Social and Information Network Analysis 

Jure Leskovec Stanford University 

Jure Leskovec, Stanford University 

http://cs224w.stanford.edu

Task: 

Find coalitions in signed networks 

Incentives: 

European chocolates! 

FFame 

Up to 10% extra credit 

DDue: 

Friday midnight 

No late days! 

11/8/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

Today: 3 methods 

(3) Trawling: Community signatures that can be efficiently extracted 

(4) SSpectral t lgraph h partitioning: titi i LLaplacian l i matrix, ti 22nd deigenvector i t 

(5) Overlapping communities: Clique percolation method 


[Kumar et al. ‘99] 

Searching for small communities in Web graph 

(1) What is the signature of a 

community/discussion in a Web graph 

… 

… 

A dense 2‐layer graph 

Intuition: many people all talking about the same things 

11/8/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 

Use this to define topics: 

Wh What t th the same people l on th the 

left talk about on the right 

4

(2) A more well‐defined well defined problem: 

Enumerate complete bipartite subgraphs K s,t 

Where K Ks,t = s nodes where each links to the 

same t other nodes 



Two points: 

(1) The signature of a community/discussion 

(2) Complete bipartite subgraph K Ks,t Ks,t = graph on s nodes, each links to the same t other nodes 

Plan: 

(A) From (2) get back to (1): 

Via: Any dense enough graph contains a 

smaller K s,t as a subgraph 

(B) How do we solve (2) in a giant graph? 


What similar problems have been solved on big non‐graph non graph data? 

(3) Frequent itemset enumeration [Agrawal‐Srikant ‘99] 


Marketbasket analysis: 

What items are bought together in a store? 

Setting: 

Universe U of n items 

m subsets b t of f UU: S S1, S S2, …, S Sm U 

(Si is a set of items one person bought) 

Frequency threshold f 

Goal: 

Fi Find d all ll subsets b t T s.t. t T S Si of f f f sets t S Si (items in T were bought together f times) 

[Agrawa‐Srikant ‘99] 


Example: 

Universe of items: 

U={1,2,3,4,5} {,,,,} 

Itemsets: 

S1={1,3,5}, S2={2,3,4}, S3={2,4,5}, S4={3,4,5}, S5={1,3,4,5}, S6={2,3,4,5} Minimum support: 

f=3 

Algorithm: Build up the lists 

Insight: for a frequent set of size k, all 

its subsets are also frequent 


U={1,2,3,4,5}, U={12345} f=3 

S1={1,3,5}, S2={2,3,4}, S3={2,4,5}, S S4={3,4,5}, ={345} SS={1345} 5={1,3,4,5}, SS={2345} 6={2,3,4,5} 



For i =1 = 1,…,kk 

Find all frequent sets of size i by 

composing sets of size ii-1 1 that 

differ in 1 element 

Open question: 

Efficiently find only 

maximal frequent sets 




Claim: (3) (itemsets) solves (2) (bipartite graphs) 

How? 

View each node i as a 

set S i of nodes i points to 

K Ks,t = a set t y of f size i t 

that occurs in s sets Si Looking for Ks,t set of 

frequency threshold to s 

and look at layer t – all 

frequent sets of size t 



(2) (1): Informally, Informally every dense enough 

graph G contains a bipartite Ks,t subgraph 

where s and t depend on size (# of nodes) and 

density (avg. degree) of G [Kovan‐Sos‐Turan ‘53] 

Theorem: Let G=(X,Y,E), |X|=|Y|= n with 

g g 

1/ 

t 1 

1/ 

t 

avg. degree: d 

s n 

t 

then G contains K stas a subgraph 

s,t g p 

[Will not prove it here. See online slides] 


For the proof we will need the following fact 

Recall: 

a 

a( 

a 1)...( 

a b 1) 

 

b 

 

 

 

b! 

Let f(x) = x(x-1)(x-2)…(x-k) 

Once x k, f(x) curves upward (convex) 

Suppose a setting: 

g(y) is convex 

Want to minimize g(xi) where xi=x TTo minimize i i i g(x ( i) ) make k each h xi = x/n / 


Node i, i degree d i and 

neighbor set Si Put node i in buckets 

for all size t subsets of 

its neighbors 

Potential right‐hand 

sides of Ks,t (i.e., all 

size t subsets of Si) 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 14

Note: As soon as s nodes appear in a bucket 

we have a Ks,t How many buckets node i contributes? 

d i … degree of node i 

i 

What is the total size of all buckets? 

11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 15

So, So the total height of 

all buckets is… 

Plug in: 

d 

 

s 

1 

/ t 1 1 

1 / t 

n 

 

t 

a 

a( 

a 1)...( 

a b 1) 

 

b 

 

 

b 

b ! 

11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 16

We have: Total height of all buckets 

How many buckets are there? 

n s 

t 

 

t! 

t 

n 

n 

 

t t! 

 

 

What is the average height of buckets? 

s t 

s 

t n 

t ! height s 

n t 

! So, avg. bucket 

So by pigeonhole principle, principle there must be at least 

one bucket with more than s nodes in it. 


Theoretical result: 


Complete bipartite subgraphs Ks,t are embedded in 

larger dense enough graphs (i (i.e., e the communities) 

i.e., biparite subgraphs as signatures of communities 

Algorithmic result: 

Frequent itemset extraction and dynamic 

programming 

p g g 

SCALABLE!!! 


Undirected graph G(V,E): G(V E): 

Bi‐partitioning Bi partitioning task: 

Divide vertices into two disjoint groups (A,B) 

Questions: 

A 1 

5 B 

2 

3 

How can we define a “good” g ppartition 

of G? 

How can we efficiently identify such a partition? 


19 

4 

2 

6 

1 

3 

4 

5 

6

What makes a good partition? 

Maximize the number of within‐group connections 

Mi Minimize i i th the number b of f bt between‐group 

connections 

2 

1 

3 

11/8/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20 

4 

5 

6

A 

Express partitioning objectives as a 

function of the “edge cut” of the partition 

CCut: SSet of f edges d with ihonly l one vertex in i a 

group: 

2 

1 

3 

4 

5 

6 

B 

cut(A,B) = 2 


21

Criterion: Minimum Minimum‐cut cut 

Minimise weight of connections between groups 

min minABcut(A,B) A,B cut(A,B) 

Degenerate case: 

Problem: 

Optimal cut Minimum cut 

OOnly l considers id external t lcluster l t connections ti 

Does not consider internal cluster connectivity 


22

Criterion: Normalized‐cut [ [Shi‐Malik, , ’97] ] 

Connectivity between groups relative to the density of 

each group 

Vol(A): The total weight of the edges originating from group AA. 

Why use this criterion? 

Produces more balanced partitions 

How do we efficiently find a good partition? 

Problem: Computing optimal cut is NP‐hard 

[Shi‐Malik] 


23

A: adjacency matrix of undirected G 

Aij =1 if (i,j) is an edge, else 0 

x is a vector in n x is a vector in with components (x x ) 

n with components (x1,…, xn) just a label/value of each node of G 

What is the meaning of AAx? x? 

Entry y j is a sum of labels x i of neighbors of j 


jth j coordinate of Ax: 

Sum of the x‐values 

of neighbors of j 

Make this a new value at node j 

Spectral Graph Theory: 

Analyze the “spectrum” of matrix representing G 

Spectrum: Eigenvectors of a graph, graph ordered by the 

magnitude (strength) of their corresponding 

eigenvalues: 


Suppose all nodes in G have degree d 

and G is connected 

What are some eigenvalues/vectors of G? 

Ax = x What is ? What x? 


What if G is not connected? 

Say G has 2 components, each d‐regular 

What are some eigenvectors? 

x= Put all 1s on A and 0s on B or vice versa 


Adjacency matrix (A): 

n n matrix 

A=[a A [aij] ij], a aij=1 ij 1 if edge between node i and j 

2 

1 

3 

Important properties: 

4 

5 

Symmetric matrix 

Eigenvectors are real and orthogonal 

6 

1 2 3 4 5 6 

1 0 1 1 0 1 0 

2 1 0 1 0 0 0 

3 1 1 0 1 0 0 

4 0 0 1 0 1 1 

5 1 0 0 1 0 1 

6 0 0 0 1 1 0 


28

Degree matrix (D): 

n n diagonal matrix 

D D=[d [d ii], ] d dii = ddegree of f node d i 

2 

1 

3 

4 

5 

6 

1 2 3 4 5 6 

1 3 0 0 0 0 0 

2 0 2 0 0 0 0 

3 0 0 3 0 0 0 

4 0 0 0 3 0 0 

5 0 0 0 0 3 0 

6 0 0 0 0 0 2 


29

Laplacian p matrix () (L): 

n n symmetric matrix 

2 

1 

3 

What is trivial eigenvector/ 

eigenvalue? 

Important properties: 

4 

5 

6 

1 2 3 4 5 6 

1 3 ‐1 ‐1 0 ‐1 0 

2 ‐1 2 ‐1 0 0 0 

3 ‐1 ‐1 3 ‐1 0 0 

4 0 0 ‐1 3 ‐1 ‐1 

5 ‐1 0 0 ‐1 3 ‐1 

6 0 0 0 ‐1 ‐1 2 

L = D - A 

Eigenvalues are non‐negative non negative real numbers 

Eigenvectors are real and orthogonal 


30

For symmetric matrix M: 

 

2 

min 

x 

T 

Mx 

x 

T 

Mx 

x x 

T 

What is the meaning of min xT What is the meaning of min x Lx on G? 


What else do we know about x? 

x is unit vector 

i th l t 1st x is orthogonal to 1 i t (1 1) th 

st eigenvector (1,…,1) thus: 

Then: 

( x 

x 

min 

i j 

2 All lbli f 

x 

2 

i 

All labelings of 

nodes so that 

sum(x i)=0 

 


) 

2

Express p p partition ( (A,B) , ) as a vector 

We can minimize the cut of the partition p by y 

finding a non‐trivial vector x that minimizes: 


33

The minimum value is given by the 2 nd 

smallest eigenvalue λ 2 of the Laplacian 

matrix L 

The optimal solution for x is given by the 

corresponding eigenvector λ2, referred as 

the Fiedler vector 


How to define a “good” good partition of a graph? 

Minimise a given graph cut criterion 

How to efficiently identify such a partition? 

Approximate using information provided by the 

eigenvalues and eigenvectors of a graph 

Spectral l Clustering l i 


35

Three basic stages: 

1. Pre‐processing 

Construct a matrix representation of the graph 

2. Decomposition 

Compute eigenvalues and eigenvectors of the matrix 

Map each point to a lower‐dimensional 

representation based on one or more eigenvectors 

3. Grouping 

Assign points to two or more clusters, based on the 

new representation 


36

p g 

Pre‐processing: 

Build Laplacian 

matrix L of the 

graph g p 

p 

1 2 3 4 5 6 

1 3 

‐1 ‐1 0 ‐1 0 

2 ‐1 2 ‐1 0 0 0 

3 ‐1 ‐1 3 ‐1 0 0 

4 0 0 ‐1 3 ‐1 ‐1 

5 ‐1 0 0 ‐1 3 ‐1 

Decomposition: 3.0 

0.4 0.3 0.1 0.6 ‐0.4 0.5 

Find eigenvalues 

and eigenvectors x 

of the matrix L 

Map vertices to 

corresponding p g 

components of 2 0.0 

1.0 

= X = 

1 

2 

3 

4 

0.3 

3.0 

4.0 

5.0 

0.6 

0.3 

6 0 0 0 ‐1 ‐1 2 

0.4 

0.4 

0.4 

0.4 

0.4 

0.3 

0.6 

‐0.3 

‐0.3 

‐0.6 

‐0.5 

0.4 

0.1 

‐0.5 

‐0.4 

‐0.2 

‐0.4 

0.6 

‐0.2 

‐0.4 

‐0.4 

0.4 

0.4 

0.4 

‐0.4 

‐0.5 

0.0 

‐0.5 

‐0.5 

0.0 

How do we now 

ffind dthe h clusters? l 


5 

6 

‐0.3 

‐0.3 

‐0.6

Grouping: 

Sort components of reduced 1‐dimensional vector 

Identify clusters by splitting the sorted vector in two 

How to choose h a splitting l point? 

Naïve approaches: 

Split p at 0, mean or median value 

More expensive approaches: 

Attempt to minimise normalized cut criterion in 1‐dimension 

1 

2 

3 

4 

5 

6 

03 0.3 

0.6 

0.3 

‐0.33 

‐0.3 

‐0.6 

Split p at 0: 

Cluster A: Positive points 

Cluster B: Negative points 

1 

2 

3 

03 0.3 

0.6 

0.3 


38 

4 

5 

6 

‐0.3 03 

‐0.3 

‐0.6 

A 

B


How do we partition a graph into k clusters? 

Two basic approaches: 

Recursive bi‐partitioning [Hagen et al., ’92] 

Recursively apply bi‐partitioning algorithm in a 

hierarchical divisive manner 

Disadvantages: Inefficient, unstable 

Cluster multiple p eigenvectors g [Shi‐Malik, [ , ’00] ] 

Build a reduced space from multiple eigenvectors 

Commonly used in recent papers 

A preferable f bl approach… h 


k‐eigenvector k eigenvector Algorithm [Ng et al., al ’01] 01] 

Pre‐processing: 

Construct the scaled adjacency matrix A': A : 

1/ 

2 1/ 

2 

A' 

Decomposition: 

D AD 

Find the eigenvalues and eigenvectors of A' 

Build embedded space from the eigenvectors 

corresponding to the k largest eigenvalues 

Grouping: 

Apply k‐means to reduced nk space to get k clusters 


Approximates pp the optimal p cut [ [Shi‐Malik, , ’00] ] 

Can be used to approximate the optimal k‐way normalized cut 

Emphasizes cohesive clusters 

Increases the h unevenness in i the h distribution di ib i of f the h data d 

Associations between similar points are amplified, associations 

between dissimilar points are attenuated 

Th The data dt bbegins i t to “ “approximate i t a clustering” l t i ” 

Well‐separated space 

Transforms data to a new “embedded embedded space”, space , 

consisting of k orthogonal basis vectors 

NB: Multiple eigenvectors prevent instability due to 

information f loss l 


Eigengap: 

The difference between two consecutive eigenvalues 

Most stable clustering is generally given by the 

value k that maximises eigengap: 

p 

Example: 

Eigenvalue 

50 

45 

40 

35 

30 

25 

20 

15 

10 

5 

0 

λ 1 

λ 2 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

k 

 

k k k1 

max 

 

k 

Choose 

kk=22 11/8/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43 

2 

1

METIS: 

Heuristic but works really well in practice 

http://glaros.dtc.umn.edu/gkhome/views/metis 

Graclus: 

Based on kernel k‐means 

http://www.cs.utexas.edu/users/dml/Software/graclus.html 

Cluto: 

http://glaros.dtc.umn.edu/gkhome/views/cluto/ 

p //g /g / / / 

Clique percorlation method: 

For finding overlapping clusters 

http://angel.elte.hu/cfinder/ 


Non‐overlapping Non overlapping vs. vs overlapping communities

slides - SNAP - Stanford University

Create successful ePaper yourself

Delete template?

Save as template?