OASNET: An Optimal Allocation Approach to Influence Maximization ...

An Optimal Allocation Approach to InfluenceMaximization Problem on Modular SocialNetworkTianyu Cao, Xindong Wu, Song Wang, Xiaohua HuACM SAC 2010

outline Social network Definition and properties Social contagion Models of Influence Linear threshold model Independent cascade model Influence maximization problem Greedy algorithm and the CELF optimized version Degree discount algorithm OASNET algorithm Experiments2

Social Network ExamplesThe University Karate Club Network[8]4

Social Network ExampleModularity and structure5

Social Network Example6

Social Network ExampleMalaysian Blogosphere Division[9]7

Social Network Properties Power law degree distribution. P(k)=ck ‐r Positive assortativity. Similar degree nodes connecting toeach other. High clustering coefficient. Your friend tend to know eachother. Short diameter. (six degree of separation) Community structure.8

Social Contagion Yawning is contagious.9

Social contagion “I’ll have what she is having.”10

Models of contagion/diffusion Initially some nodes are active: Active nodes spread their influence on the other nodes,and so on …11

Models of Influence Two basic classes of diffusion models: threshold[5] andcascade [1] General operational view: A social network is represented as a directed graph, with eachperson (customer) as a node Nodes start either active or inactive An active node may trigger activation of neighboring nodes Monotonicity assumption: active nodes never deactivate12

Linear Threshold Model A node v has random threshold θ v ~ U[0,1] A node v is influenced dby each neighbor ihb w according to aweight b vw such that∑w neighbor of vvw ,≤1 A node v becomes active when at least(weighted) θ v fraction of its neighbors are activebw∑active neighbor ofvbvw ,≥ θv13

ExampleInactive Node0.60.3 0.2 0.2Active NodeThreshold0.4X0.1UActive neighbors0.50.30.2 Stop!w0.5v14

Independent Cascade Model When node v becomes active, it has a single chance ofactivating each currently inactive neighbor w. The activation attempt succeeds with probability p vw .15

Example0.60.3 0.2 0.2Inactive NodeActive Node0.4wX0.50.10.30.5vU0.2Newly activenodeSuccessfulattemptUnsuccessfulattemptStop!16

Problem Setting Given Goala limited budget B for initial advertising (e.g. give away freesamples of product)estimates for influence between individualstrigger a large cascade of influence (e.g. further adoptions of aproduct) QuestionWhich set of individuals should B target at?17

Influence Maximization Problem Influence of node set S: f(S) expected number of active nodes at the end, if set S is theinitial active set Problem: Given a parameter k (budget), find a k‐node set S to maximizef(S) Constrained optimization problem with f(S) as the objectivefunction18

First Solution• Greedy Algorithm[1]– Start with an empty set S– For k iterations:Add node v to S that maximizes f(S +v) ‐ f(S).• How good (bad) it is?[1]– Theorem: The greedy algorithm is a (1 –1/e) approximation.– The resulting set S activates t at least t(1‐ 1/e) > 63% of thenumber of nodes that any size‐k set S could activate.19

Problems with Greedy Algorithm The greedy ago algorithm in[1] is so slow. Its time complexity isO(knms), where n is the number of nodes, m is thenumber of edges and s is the times of simulation. It does not assume any topological structure of the socialnetwork. There might be some space for optimizationgiven a certain topological property.20

CELF optimized Greedy Algorithm The function f(S) is sub‐modular for both the independentcascade model and the linear threshold model.[1]f(TU{v})‐f(T)

CELF optimized Greedy Algorithm Recall the greedy algorithm. For k iterations:Add node v to S that maximizes f(S +v) ‐ f(S). CELF optimized greedy algorithm [2] Keep an ordered list of marginal benefits bi from previousiteration Re‐evaluate evaluate bi only if bi in previous iteration is larger than themaximum marginal benefits for current iteration22

CELF optimized Greedy Algorithm23In the second iteration, the nodes b, e, c do notneed to be evaluated because their gain in firstiteration is smaller than the gain of d in the currentiteration.[10]

CELF optimized Greedy Algorithm Pros It produces the same result as the greedy algorithm with muchless time. (Experiments show that it is 700 times faster thanthe original greedy algorithm) Cons It does not improve the running time complexity. For a largenetwork it is still slow.24

Degree Discount Heuristics The degree heuristics is a good baseline compared toother heuristics(random, distance centrality). If a node u has been selected as a seed, when consideringselecting a new seed v based on its degree, we should notcount the edge {v, u} towards its degree.[2]25

Degree Discount Heuristics26Node 6 is selected in the second iteration because ithas a higher degree than node 5, whose degree isdiscounted by 1.

Degree Discount heuristics There is a more sophisticated discount mechanisms whenthe diffusion probability is small[2]. In the IC model with propagation probability p, supposethat dv=O(1/p) and tv=o(1/p) for a vertex v. The expectednumber of additional vertices in Star(v) influenced byselecting v into the seed set is: 1(d 1+(dv‐2tv‐(dv‐tv)tvp+o(tv))*pt)t (t))*27

Result on the above heuristics[2]28



Our motivation Make use of the modularity of social networks. Generallya social network exhibits some degree of modularity. Try to make an algorithm that scales better than greedyalgorithm in terms of the time complexity. Idea: Treat the seeds as a resource. Try to allocate the resources tothe communities of a network kin an intelligent t way.31

Growth function The growth function F(k, G, M, Γ) maps the number ofinitial seeds k, the network G, the diffusion model M anda base strategy of selecting initial active nodes(like highdegree) to the expected number of active nodes whenthe dff diffusion process terminates. Relation with f(S) in [1]. Γ(k, G)=S.32

Growth FunctionCondensed Matter physics networkwith diffusion probability 0.05Condensed Matter physics networkwith linear threshold model33

Problem Reformulation We made a simplified assumptions. Communities aredisconnected. Maximize D=∑F i (k i , G i , M i , Γ i )s.t. ∑k i =k. Solutions: OPT(k,n)=MAX{F n (i, G n , M n , Γ n )+OPT(k‐I,n‐1)} 0≤ i≤k.34

Adaptation to a real world network• The assumption that all the communities aredisconnected is too restrictive.• Communities are usually weakly connected.• It violates the requirement of the reformulated problem.• Second assumption: The number of cross communityedges is relatively small and they are far from the centerof the community. It won’t have a large effect.35

OASNET algorithm The steps of the algorithm. 1. Use an external community finding algorithm to findthe communities of the network. Extract thesecommunities from the network. 2. Simulate the diffusion process on each community tofind its corresponding growth function. 3. Apply the optimization equation on the n growthfunctions to find an optimal allocation of n initial targetnodes into n communities. 4. Return the seed as the union of Γ(k i ,G i ). Time complexity is O(kms).36

Experiment Settings The greedy algorithm does not scale well to the cases ofour experiments (too slow for networks over 30,000nodes). We use four heuristics to compare. Degree heuristics(M5), equal allocation(M2), randomallocation(M4), proportional allocation(M1).37

Experiment Settings Datasets. 1.Condense matter physics collaboration network[3]; 2.PGP giant component network[4]; 3.Lederberg citation network; 4.Zewail Citation Network.38

Experiment Results39Condensed Matter physics network with diffusion probability 0.05

Experiment Results40PGP Network with diffusion probability 0.1

Experiment Results41PGP Network with diffusion probability 0.2

Experiment Results42Lederberg Citation Network with diffusion probability 0.1

Experiment Results43Lederberg Citation Network with diffusion probability 0.2

Experiment Results44Zewail Citation Network with diffusion probability 0.1

Experiment Results45Zewail Citation Network with diffusion probability 0.2

Experiment Results46Condensed matter physics Network with threshold model

Experiment Results47PGP Network with threshold model

Experiment Results48Lederberg Citation Network with threshold model

Experiment Results49Zewail Citation Network with threshold model

Conclusions The proposed algorithm outperforms comparingheuristics in most cases. The difference between the proposed method andcomparison heuristics are larger when the diffusionprobability is higher for the independent cascade model.50

References [1] D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizingthe spread of influence through a social network. In KDD,pages 137–146, 2003. [2]Wei Chen, Yajun Wang, Siyu Yang. Efficient InfluenceMaximization In Social network. KDD 2009 [3] M. E. J. Newman. The structure of scientificcollaboration networks. PNAS, 98(2):404–409, 409 2001. [4] M. B. n´a, R. Pastor‐Satorras, A. D´ıaz‐Guilera, and A.Arenas. Models of social networks based on socialdistance attachment. Physical Review E, 70(5):056122,2004.51

References [5] M Granovetter. Threshold Models of Collective Behavior.American Journal of. Sociology 83 (1978) 1420–1443. [6] Bearman PS, Moody J, Stovel K. Chains of affection: Thestructure of adolescent romantic and sexual networks.the American Journal of Sociology, Vol. 100, No. 1. [7] A. Clauset, C. Moore, and M. E. J. Newman. Structuralinference of hierarchies in networks, 2006. [8] Zachary W. (1977). An information flow model forconflict and fission in small groups. Journal ofAnthropological Research, 33, 452‐473.52

References [9]http://datamining.typepad.com/data_mining/2010/01/malaysian‐blogosphere‐division.html. [10] Jure Leskovec, Christos Faloutsos. Tools for largegraph mining WWW 2008 tutorial Part 2: Diffusion andcascading behavior.53

54Thanks

OASNET: An Optimal Allocation Approach to Influence Maximization ...

Create successful ePaper yourself

Delete template?

Save as template?