Machine learning in complex networks

Muzeeker• Wikipedia based common sense• Wikipedia used as a proxy for themusic users mental model• Implementation: Filter retrievalusing Wikipedia’s article/categories• Muzeeker.com• LINK PREDICTION to complete theontological quality of Wikipedia

Network models• Nodes/vertices and links/edges– Directed / undirected– Weighted / un-weighted• Link distributions– Random– Long tail– Hubs and authorities• Link induced correlations– The Rich club• Communities– Link prediction

Motivation for community detection• Community structure may mark a non-stationary linkdistribution with “high and low density” sub-networks, hencesummarizing with a single “model” could be misleading

Modularity can be predictive for dynamicsM.E.J. Newman and M. Girvan, Finding and evaluatingcommunity structure in networks, Phys. Rev. E 69, 026113 (2004).

Modularity objective functionThe modularity is expressed as a sum over links, such that we penalizemissing links in communities - missing is measured relative to anull distribution P 0 ij.⎡ Aij⎤Q = ∑ PPi jδ ( ci , cj)ij ⎢ −2m⎥⎣ ⎦C i is the community assignment of node jand 2m = Σ ij A ij , k i = Σ j A ijThe null is a baseline distribution P ij = k i k j /(2m) 2The value of the modularity lies in the range [−1,1].It is positive if the number of edges within groupsexceeds the number expected on the basis of chanceM.E.J. Newman and M. Girvan. Finding and evaluating communitystructure in networks. Physical Review E, 69:026113,2004, cond-mat/0308217.

Potts representationIntroduce 0,1 binary variables S kj coding thecommunity assignment: “node j is member of community k”δ ( c, c ) =∑S Si j k ki kjAijP( j, i)=2m⎡ Aij⎤ ⎡ Aij⎤Q = ∑ ( , )ij ⎢ − PP c c PP S Sij k2m⎥δ= ∑ ⎢ −2m⎥∑⎣ ⎦ ⎣ ⎦1 Tr( SBS ')Q = ∑ Bijk ijSkiSkj=2m2mi j i j i j ki kj

Spectral optimization• Newman relaxes the optimization problem to the simplexQ =1 Tr( SBS ')∑ BijS kiSijkkj=2m2mL =Tr( SBS ')+ Tr( ΛS )2 mB S= S Λ

Combinatorial optimization• We can use a physics analogy Simulated Annealing (Kirkpatricket al. 1983)QS ( ) TrSBS ( ')PS ( | AT , ) ∝ exp( ) = exp( )T2mT• Gibbs sampling is a Monte Carlo realization of a Markovprocess in which each variable is randomly assignedaccording to its marginal distributionPS ( | S , AT , )j−jPS ( | AT , )=∑ PS ( | AT , )SjS Geman,D Geman, "Stochastic Relaxation,Gibbs Distributions, and theBayesian Restoration of Images".IEEE Transactions on Pattern Analysis andMachine Intelligence 6 (6): 721–741 (1984)

Potts model 1-node• Discrete probability distribution on states k = 1,…,Kk=1 k k( | , ) exp ,PS ATPS ( | AT , ) =Sk∝∏kk⎛⎜⎝( r)k= r =∑SkK∑TSϕ⎞⎟⎠⎛ϕk⎞exp⎜⎟⎝ T ⎠⎛ϕexpk ' ⎜⎝ Tk '⎞⎟⎠

Gibbs samplingϕBij Aij k ki j= ∑ S = ∑ S −∑Sj j j2m 2m 2m 2mki kj kj kjrki=∑exp( ϕki/ T )exp( ϕ / T )k 'ki 'Si=potts( r)i

Deterministic annealing• Instead of drawing Gibbs samples according to the marginals wecan average instead, this provides a set of self-consistentequations for the means (for 0,1 Bernoulli variables the meanis the probability μ ki =P(S ki ))rki=∑exp( ϕki/ T )exp( ϕ / T )k 'ki 'ϕBijAij= ∑ r = ∑ r −∑PPrj j j2m2mki kj kj i j kjS. Lehmann, L.K. Hansen: Deterministic modularity optimizationEuropean Physical Journal B 60(1) 83-88 (2007).

Experimental evaluation• Create a simple testbed with link probability and “noise”S. Lehmann, L.K. Hansen: Deterministic modularity optimizationEuropean Physical Journal B 60(1) 83-88 (2007).

S. Lehmann, L.K. Hansen: Deterministic modularity optimizationEuropean Physical Journal B 60(1) 83-88 (2007).

Generative community model (Hofman & Wiggins, 2008)PASpq ( | , , ) = p(1 − p) q(1 −q)cd=12121212∑∑j≠i,k= (1 − A ) S S∑j≠i,kj≠iij kj kic d e fij kj ki( 1 )ij k kj ki( )f = (1 − A ) 1 − S Sj≠iA S Se= A − S S∑∑∑ij k kj ki

Learning parameters of the generative model• Hofman & Wiggins (2008)•Here– “Variational Bayes”– Dirichlets/beta prior and posteriordistributions for the probabilities– Very well determined (over kill)– Independent binomials for the assignmentvariables (misses correlation)– Maximum likelihood for the parameters– Gibbs sampling for the assignmentsJake M. Hofman and Chris H. Wiggins,Bayesian Approach to Network ModularityPhys. Rev. Lett. 100, 258701 (2008),

The community detection thresholdhow many links are needed to detect the structure?Pinp SNR= =qC ( −1) C−1Jorg Reichardt and Michele Leone,Un)detectable Cluster Structure in Sparse NetworksPhys. Rev. Lett. 101, 078701 (2008),

Experimental design• Planted solution– N = 1000 nodes–C true= 5– Quality: Mutual information between• planted assignments and the best identified• Gibbs sampling– No annealing– Burn-in 200 iterations– Averaging 800 iterations• Parameter learning– Q = 10 iterations

Community Detection – fully informed on number of communities and probabilitiesMUTUAL INF. PLANTED COMMUNITYMUTUAL INF. PLANTED COMMUNITYCOMMUNITY DETECTION (N =1000, C = 10, SNR = 50)2.521.510.500 0.01 0.02 0.03 0.04 0.05INTRA COMMUNITY LINK PROB (P)COMMUNITY DETECTION (N =1000, C = 5, SNR = 50)2.521.510.500 0.01 0.02 0.03 0.04 0.05INTRA COMMUNITY LINK PROB (P)MUTUAL INF. PLANTED COMMUNITYMUTUAL INF. PLANTED COMMUNITYCOMMUNITY DETECTION (N =1000, C = 5, SNR = 5)2.521.510.500 0.01 0.02 0.03 0.04 0.05INTRA COMMUNITY LINK PROB (P)COMMUNITY DETECTION (N =1000, C = 5, SNR = 10)2.521.510.500 0.01 0.02 0.03 0.04 0.05INTRA COMMUNITY LINK PROB (P)

Now what happens to the phase transition if welearn the parameters … with a too complex model(C > C true = 5) ?MUTUAL INF. PLANTED COMMUNITYCOMMUNITY DETECTION (N =1000, C = 10, SNR = 10)2.521.510.500 0.02 0.04 0.06 0.08 0.1INTRA COMMUNITY LINK PROB (P)MUTUAL INF. PLANTED COMMUNITYCOMMUNITY DETECTION (N =1000, C = 10, SNR = 5)2.521.510.500 0.02 0.04 0.06 0.08 0.1INTRA COMMUNITY LINK PROB (P)200MEMBERSHIPS1501005001 2 3 4 5 6 7 8 9 10COMMUNITY

Conclusions• Community detection can be formulated as aninference problem (Hofman & Wiggins, 2008)• The sampling process for fixed SNR has a phasetransition like detection threshold (Richard &Leone, 2008)• The phase transition remains (sharpens?) if youlearn the parameters of a generative model withunknown complexity

Machine learning in complex networks

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?