19.08.2013 Views

slides - SNAP - Stanford University

slides - SNAP - Stanford University

slides - SNAP - Stanford University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CS224W: Social and Information Network Analysis<br />

Jure Leskovec <strong>Stanford</strong> <strong>University</strong><br />

Jure Leskovec, <strong>Stanford</strong> <strong>University</strong><br />

http://cs224w.stanford.edu


Task:<br />

Find coalitions in signed networks<br />

Incentives:<br />

European chocolates!<br />

FFame<br />

Up to 10% extra credit<br />

DDue:<br />

Friday midnight<br />

No late days!<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2


Today: 3 methods<br />

(3) Trawling: Community signatures that can be efficiently extracted<br />

(4) SSpectral t lgraph h partitioning: titi i LLaplacian l i matrix, ti 22nd deigenvector i t<br />

(5) Overlapping communities: Clique percolation method<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3


[Kumar et al. ‘99]<br />

Searching for small communities in Web graph<br />

(1) What is the signature of a<br />

community/discussion in a Web graph<br />

…<br />

…<br />

A dense 2‐layer graph<br />

Intuition: many people all talking about the same things<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

Use this to define topics:<br />

Wh What t th the same people l on th the<br />

left talk about on the right<br />

4


(2) A more well‐defined well defined problem:<br />

Enumerate complete bipartite subgraphs K s,t<br />

Where K Ks,t = s nodes where each links to the<br />

same t other nodes<br />

[Kumar et al. ‘99]<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5


Two points:<br />

(1) The signature of a community/discussion<br />

(2) Complete bipartite subgraph K Ks,t Ks,t = graph on s nodes, each links to the same t other nodes<br />

Plan:<br />

(A) From (2) get back to (1):<br />

Via: Any dense enough graph contains a<br />

smaller K s,t as a subgraph<br />

(B) How do we solve (2) in a giant graph?<br />

[Kumar et al. ‘99]<br />

What similar problems have been solved on big non‐graph non graph data?<br />

(3) Frequent itemset enumeration [Agrawal‐Srikant ‘99]<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6


Marketbasket analysis:<br />

What items are bought together in a store?<br />

Setting:<br />

Universe U of n items<br />

m subsets b t of f UU: S S1, S S2, …, S Sm U<br />

(Si is a set of items one person bought)<br />

Frequency threshold f<br />

Goal:<br />

Fi Find d all ll subsets b t T s.t. t T S Si of f f f sets t S Si (items in T were bought together f times)<br />

[Agrawa‐Srikant ‘99]<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7


Example:<br />

Universe of items:<br />

U={1,2,3,4,5} {,,,,}<br />

Itemsets:<br />

S1={1,3,5}, S2={2,3,4}, S3={2,4,5}, S4={3,4,5}, S5={1,3,4,5}, S6={2,3,4,5} Minimum support:<br />

f=3<br />

Algorithm: Build up the lists<br />

Insight: for a frequent set of size k, all<br />

its subsets are also frequent<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8


U={1,2,3,4,5}, U={12345} f=3<br />

S1={1,3,5}, S2={2,3,4}, S3={2,4,5}, S S4={3,4,5}, ={345} SS={1345} 5={1,3,4,5}, SS={2345} 6={2,3,4,5}<br />

[Agrawa‐Srikant ‘99]<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9


For i =1 = 1,…,kk<br />

Find all frequent sets of size i by<br />

composing sets of size ii-1 1 that<br />

differ in 1 element<br />

Open question:<br />

Efficiently find only<br />

maximal frequent sets<br />

[Agrawa‐Srikant ‘99]<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10


[Kumar et al. ‘99]<br />

Claim: (3) (itemsets) solves (2) (bipartite graphs)<br />

How?<br />

View each node i as a<br />

set S i of nodes i points to<br />

K Ks,t = a set t y of f size i t<br />

that occurs in s sets Si Looking for Ks,t set of<br />

frequency threshold to s<br />

and look at layer t – all<br />

frequent sets of size t<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11


[Kumar et al. ‘99]<br />

(2) (1): Informally, Informally every dense enough<br />

graph G contains a bipartite Ks,t subgraph<br />

where s and t depend on size (# of nodes) and<br />

density (avg. degree) of G [Kovan‐Sos‐Turan ‘53]<br />

Theorem: Let G=(X,Y,E), |X|=|Y|= n with<br />

g g<br />

1/<br />

t 1<br />

1/<br />

t<br />

avg. degree: d <br />

s n <br />

t<br />

then G contains K stas a subgraph<br />

s,t g p<br />

[Will not prove it here. See online <strong>slides</strong>]<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12


For the proof we will need the following fact<br />

Recall:<br />

a<br />

a(<br />

a 1)...(<br />

a b 1)<br />

<br />

b<br />

<br />

<br />

<br />

b!<br />

Let f(x) = x(x-1)(x-2)…(x-k)<br />

Once x k, f(x) curves upward (convex)<br />

Suppose a setting:<br />

g(y) is convex<br />

Want to minimize g(xi) where xi=x TTo minimize i i i g(x ( i) ) make k each h xi = x/n /<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13


Node i, i degree d i and<br />

neighbor set Si Put node i in buckets<br />

for all size t subsets of<br />

its neighbors<br />

Potential right‐hand<br />

sides of Ks,t (i.e., all<br />

size t subsets of Si) 11/10/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis 14


Note: As soon as s nodes appear in a bucket<br />

we have a Ks,t How many buckets node i contributes?<br />

d i … degree of node i<br />

i<br />

What is the total size of all buckets?<br />

11/10/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis 15


So, So the total height of<br />

all buckets is…<br />

Plug in:<br />

d<br />

<br />

s<br />

1<br />

/ t 1 1<br />

1 / t<br />

n<br />

<br />

t<br />

a<br />

a(<br />

a 1)...(<br />

a b 1)<br />

<br />

b<br />

<br />

<br />

b<br />

b !<br />

11/10/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis 16


We have: Total height of all buckets<br />

How many buckets are there?<br />

n s<br />

t<br />

<br />

t!<br />

t<br />

n<br />

n<br />

<br />

t t!<br />

<br />

<br />

What is the average height of buckets?<br />

s t<br />

s<br />

t n<br />

t ! height s<br />

n t<br />

! So, avg. bucket<br />

So by pigeonhole principle, principle there must be at least<br />

one bucket with more than s nodes in it.<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17


Theoretical result:<br />

[Kumar et al. ‘99]<br />

Complete bipartite subgraphs Ks,t are embedded in<br />

larger dense enough graphs (i (i.e., e the communities)<br />

i.e., biparite subgraphs as signatures of communities<br />

Algorithmic result:<br />

Frequent itemset extraction and dynamic<br />

programming<br />

p g g<br />

SCALABLE!!!<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18


Undirected graph G(V,E): G(V E):<br />

Bi‐partitioning Bi partitioning task:<br />

Divide vertices into two disjoint groups (A,B)<br />

Questions:<br />

A 1<br />

5 B<br />

2<br />

3<br />

How can we define a “good” g ppartition<br />

of G?<br />

How can we efficiently identify such a partition?<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

19<br />

4<br />

2<br />

6<br />

1<br />

3<br />

4<br />

5<br />

6


What makes a good partition?<br />

Maximize the number of within‐group connections<br />

Mi Minimize i i th the number b of f bt between‐group<br />

connections<br />

2<br />

1<br />

3<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20<br />

4<br />

5<br />

6


A<br />

Express partitioning objectives as a<br />

function of the “edge cut” of the partition<br />

CCut: SSet of f edges d with ihonly l one vertex in i a<br />

group:<br />

2<br />

1<br />

3<br />

4<br />

5<br />

6<br />

B<br />

cut(A,B) = 2<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

21


Criterion: Minimum Minimum‐cut cut<br />

Minimise weight of connections between groups<br />

min minABcut(A,B) A,B cut(A,B)<br />

Degenerate case:<br />

Problem:<br />

Optimal cut Minimum cut<br />

OOnly l considers id external t lcluster l t connections ti<br />

Does not consider internal cluster connectivity<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

22


Criterion: Normalized‐cut [ [Shi‐Malik, , ’97] ]<br />

Connectivity between groups relative to the density of<br />

each group<br />

Vol(A): The total weight of the edges originating from group AA.<br />

Why use this criterion?<br />

Produces more balanced partitions<br />

How do we efficiently find a good partition?<br />

Problem: Computing optimal cut is NP‐hard<br />

[Shi‐Malik]<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

23


A: adjacency matrix of undirected G<br />

Aij =1 if (i,j) is an edge, else 0<br />

x is a vector in n x is a vector in with components (x x )<br />

n with components (x1,…, xn) just a label/value of each node of G<br />

What is the meaning of AAx? x?<br />

Entry y j is a sum of labels x i of neighbors of j<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24


jth j coordinate of Ax:<br />

Sum of the x‐values<br />

of neighbors of j<br />

Make this a new value at node j<br />

Spectral Graph Theory:<br />

Analyze the “spectrum” of matrix representing G<br />

Spectrum: Eigenvectors of a graph, graph ordered by the<br />

magnitude (strength) of their corresponding<br />

eigenvalues:<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25


Suppose all nodes in G have degree d<br />

and G is connected<br />

What are some eigenvalues/vectors of G?<br />

Ax = x What is ? What x?<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26


What if G is not connected?<br />

Say G has 2 components, each d‐regular<br />

What are some eigenvectors?<br />

x= Put all 1s on A and 0s on B or vice versa<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27


Adjacency matrix (A):<br />

n n matrix<br />

A=[a A [aij] ij], a aij=1 ij 1 if edge between node i and j<br />

2<br />

1<br />

3<br />

Important properties:<br />

4<br />

5<br />

Symmetric matrix<br />

Eigenvectors are real and orthogonal<br />

6<br />

1 2 3 4 5 6<br />

1 0 1 1 0 1 0<br />

2 1 0 1 0 0 0<br />

3 1 1 0 1 0 0<br />

4 0 0 1 0 1 1<br />

5 1 0 0 1 0 1<br />

6 0 0 0 1 1 0<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

28


Degree matrix (D):<br />

n n diagonal matrix<br />

D D=[d [d ii], ] d dii = ddegree of f node d i<br />

2<br />

1<br />

3<br />

4<br />

5<br />

6<br />

1 2 3 4 5 6<br />

1 3 0 0 0 0 0<br />

2 0 2 0 0 0 0<br />

3 0 0 3 0 0 0<br />

4 0 0 0 3 0 0<br />

5 0 0 0 0 3 0<br />

6 0 0 0 0 0 2<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

29


Laplacian p matrix () (L):<br />

n n symmetric matrix<br />

2<br />

1<br />

3<br />

What is trivial eigenvector/<br />

eigenvalue?<br />

Important properties:<br />

4<br />

5<br />

6<br />

1 2 3 4 5 6<br />

1 3 ‐1 ‐1 0 ‐1 0<br />

2 ‐1 2 ‐1 0 0 0<br />

3 ‐1 ‐1 3 ‐1 0 0<br />

4 0 0 ‐1 3 ‐1 ‐1<br />

5 ‐1 0 0 ‐1 3 ‐1<br />

6 0 0 0 ‐1 ‐1 2<br />

L = D - A<br />

Eigenvalues are non‐negative non negative real numbers<br />

Eigenvectors are real and orthogonal<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

30


For symmetric matrix M:<br />

<br />

2<br />

min<br />

x<br />

T<br />

Mx<br />

x<br />

T<br />

Mx<br />

x x<br />

T <br />

What is the meaning of min xT What is the meaning of min x Lx on G?<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31


What else do we know about x?<br />

x is unit vector<br />

i th l t 1st x is orthogonal to 1 i t (1 1) th<br />

st eigenvector (1,…,1) thus:<br />

Then:<br />

( x <br />

x<br />

min<br />

i j<br />

2 All lbli f<br />

x<br />

2<br />

i<br />

All labelings of<br />

nodes so that<br />

sum(x i)=0<br />

<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32<br />

)<br />

2


Express p p partition ( (A,B) , ) as a vector<br />

We can minimize the cut of the partition p by y<br />

finding a non‐trivial vector x that minimizes:<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

33


The minimum value is given by the 2 nd<br />

smallest eigenvalue λ 2 of the Laplacian<br />

matrix L<br />

The optimal solution for x is given by the<br />

corresponding eigenvector λ2, referred as<br />

the Fiedler vector<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34


How to define a “good” good partition of a graph?<br />

Minimise a given graph cut criterion<br />

How to efficiently identify such a partition?<br />

Approximate using information provided by the<br />

eigenvalues and eigenvectors of a graph<br />

Spectral l Clustering l i<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

35


Three basic stages:<br />

1. Pre‐processing<br />

Construct a matrix representation of the graph<br />

2. Decomposition<br />

Compute eigenvalues and eigenvectors of the matrix<br />

Map each point to a lower‐dimensional<br />

representation based on one or more eigenvectors<br />

3. Grouping<br />

Assign points to two or more clusters, based on the<br />

new representation<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

36


p g<br />

Pre‐processing:<br />

Build Laplacian<br />

matrix L of the<br />

graph g p<br />

p<br />

1 2 3 4 5 6<br />

1 3<br />

‐1 ‐1 0 ‐1 0<br />

2 ‐1 2 ‐1 0 0 0<br />

3 ‐1 ‐1 3 ‐1 0 0<br />

4 0 0 ‐1 3 ‐1 ‐1<br />

5 ‐1 0 0 ‐1 3 ‐1<br />

Decomposition: 3.0<br />

0.4 0.3 0.1 0.6 ‐0.4 0.5<br />

Find eigenvalues <br />

and eigenvectors x<br />

of the matrix L<br />

Map vertices to<br />

corresponding p g<br />

components of 2 0.0<br />

1.0<br />

= X =<br />

1<br />

2<br />

3<br />

4<br />

0.3<br />

3.0<br />

4.0<br />

5.0<br />

0.6<br />

0.3<br />

6 0 0 0 ‐1 ‐1 2<br />

0.4<br />

0.4<br />

0.4<br />

0.4<br />

0.4<br />

0.3<br />

0.6<br />

‐0.3<br />

‐0.3<br />

‐0.6<br />

‐0.5<br />

0.4<br />

0.1<br />

‐0.5<br />

‐0.4<br />

‐0.2<br />

‐0.4<br />

0.6<br />

‐0.2<br />

‐0.4<br />

‐0.4<br />

0.4<br />

0.4<br />

0.4<br />

‐0.4<br />

‐0.5<br />

0.0<br />

‐0.5<br />

‐0.5<br />

0.0<br />

How do we now<br />

ffind dthe h clusters? l<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37<br />

5<br />

6<br />

‐0.3<br />

‐0.3<br />

‐0.6


Grouping:<br />

Sort components of reduced 1‐dimensional vector<br />

Identify clusters by splitting the sorted vector in two<br />

How to choose h a splitting l point?<br />

Naïve approaches:<br />

Split p at 0, mean or median value<br />

More expensive approaches:<br />

Attempt to minimise normalized cut criterion in 1‐dimension<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

03 0.3<br />

0.6<br />

0.3<br />

‐0.33<br />

‐0.3<br />

‐0.6<br />

Split p at 0:<br />

Cluster A: Positive points<br />

Cluster B: Negative points<br />

1<br />

2<br />

3<br />

03 0.3<br />

0.6<br />

0.3<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu<br />

38<br />

4<br />

5<br />

6<br />

‐0.3 03<br />

‐0.3<br />

‐0.6<br />

A<br />

B


11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 39


How do we partition a graph into k clusters?<br />

Two basic approaches:<br />

Recursive bi‐partitioning [Hagen et al., ’92]<br />

Recursively apply bi‐partitioning algorithm in a<br />

hierarchical divisive manner<br />

Disadvantages: Inefficient, unstable<br />

Cluster multiple p eigenvectors g [Shi‐Malik, [ , ’00] ]<br />

Build a reduced space from multiple eigenvectors<br />

Commonly used in recent papers<br />

A preferable f bl approach… h<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40


k‐eigenvector k eigenvector Algorithm [Ng et al., al ’01] 01]<br />

Pre‐processing:<br />

Construct the scaled adjacency matrix A': A :<br />

1/<br />

2 1/<br />

2<br />

A'<br />

Decomposition:<br />

D AD<br />

Find the eigenvalues and eigenvectors of A'<br />

Build embedded space from the eigenvectors<br />

corresponding to the k largest eigenvalues<br />

Grouping:<br />

Apply k‐means to reduced nk space to get k clusters<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41


Approximates pp the optimal p cut [ [Shi‐Malik, , ’00] ]<br />

Can be used to approximate the optimal k‐way normalized cut<br />

Emphasizes cohesive clusters<br />

Increases the h unevenness in i the h distribution di ib i of f the h data d<br />

Associations between similar points are amplified, associations<br />

between dissimilar points are attenuated<br />

Th The data dt bbegins i t to “ “approximate i t a clustering” l t i ”<br />

Well‐separated space<br />

Transforms data to a new “embedded embedded space”, space ,<br />

consisting of k orthogonal basis vectors<br />

NB: Multiple eigenvectors prevent instability due to<br />

information f loss l<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42


Eigengap:<br />

The difference between two consecutive eigenvalues<br />

Most stable clustering is generally given by the<br />

value k that maximises eigengap:<br />

p<br />

Example:<br />

Eigenvalue<br />

50<br />

45<br />

40<br />

35<br />

30<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

λ 1<br />

λ 2<br />

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20<br />

k<br />

<br />

k k k1<br />

max <br />

<br />

k<br />

Choose<br />

kk=22 11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43<br />

2<br />

1


METIS:<br />

Heuristic but works really well in practice<br />

http://glaros.dtc.umn.edu/gkhome/views/metis<br />

Graclus:<br />

Based on kernel k‐means<br />

http://www.cs.utexas.edu/users/dml/Software/graclus.html<br />

Cluto:<br />

http://glaros.dtc.umn.edu/gkhome/views/cluto/<br />

p //g /g / / /<br />

Clique percorlation method:<br />

For finding overlapping clusters<br />

http://angel.elte.hu/cfinder/<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 44


Non‐overlapping Non overlapping vs. vs overlapping communities<br />

11/8/2010 Jure Leskovec, <strong>Stanford</strong> CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!