11.07.2015 Views

OASNET: An Optimal Allocation Approach to Influence Maximization ...

OASNET: An Optimal Allocation Approach to Influence Maximization ...

OASNET: An Optimal Allocation Approach to Influence Maximization ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>An</strong> <strong>Optimal</strong> <strong>Allocation</strong> <strong>Approach</strong> <strong>to</strong> <strong>Influence</strong><strong>Maximization</strong> Problem on Modular SocialNetworkTianyu Cao, Xindong Wu, Song Wang, Xiaohua HuACM SAC 2010


outline Social network Definition and properties Social contagion Models of <strong>Influence</strong> Linear threshold model Independent cascade model <strong>Influence</strong> maximization problem Greedy algorithm and the CELF optimized version Degree discount algorithm <strong>OASNET</strong> algorithm Experiments2


Social Network ExamplesThe University Karate Club Network[8]4


Social Network ExampleModularity and structure5


Social Network Example6


Social Network ExampleMalaysian Blogosphere Division[9]7


Social Network Properties Power law degree distribution. P(k)=ck ‐r Positive assortativity. Similar degree nodes connecting <strong>to</strong>each other. High clustering coefficient. Your friend tend <strong>to</strong> know eachother. Short diameter. (six degree of separation) Community structure.8


Social Contagion Yawning is contagious.9


Social contagion “I’ll have what she is having.”10


Models of contagion/diffusion Initially some nodes are active: Active nodes spread their influence on the other nodes,and so on …11


Models of <strong>Influence</strong> Two basic classes of diffusion models: threshold[5] andcascade [1] General operational view: A social network is represented as a directed graph, with eachperson (cus<strong>to</strong>mer) as a node Nodes start either active or inactive <strong>An</strong> active node may trigger activation of neighboring nodes Mono<strong>to</strong>nicity assumption: active nodes never deactivate12


Linear Threshold Model A node v has random threshold θ v ~ U[0,1] A node v is influenced dby each neighbor ihb w according <strong>to</strong> aweight b vw such that∑w neighbor of vvw ,≤1 A node v becomes active when at least(weighted) θ v fraction of its neighbors are activebw∑active neighbor ofvbvw ,≥ θv13


ExampleInactive Node0.60.3 0.2 0.2Active NodeThreshold0.4X0.1UActive neighbors0.50.30.2 S<strong>to</strong>p!w0.5v14


Independent Cascade Model When node v becomes active, it has a single chance ofactivating each currently inactive neighbor w. The activation attempt succeeds with probability p vw .15


Example0.60.3 0.2 0.2Inactive NodeActive Node0.4wX0.50.10.30.5vU0.2Newly activenodeSuccessfulattemptUnsuccessfulattemptS<strong>to</strong>p!16


Problem Setting Given Goala limited budget B for initial advertising (e.g. give away freesamples of product)estimates for influence between individualstrigger a large cascade of influence (e.g. further adoptions of aproduct) QuestionWhich set of individuals should B target at?17


<strong>Influence</strong> <strong>Maximization</strong> Problem <strong>Influence</strong> of node set S: f(S) expected number of active nodes at the end, if set S is theinitial active set Problem: Given a parameter k (budget), find a k‐node set S <strong>to</strong> maximizef(S) Constrained optimization problem with f(S) as the objectivefunction18


First Solution• Greedy Algorithm[1]– Start with an empty set S– For k iterations:Add node v <strong>to</strong> S that maximizes f(S +v) ‐ f(S).• How good (bad) it is?[1]– Theorem: The greedy algorithm is a (1 –1/e) approximation.– The resulting set S activates t at least t(1‐ 1/e) > 63% of thenumber of nodes that any size‐k set S could activate.19


Problems with Greedy Algorithm The greedy ago algorithm in[1] is so slow. Its time complexity isO(knms), where n is the number of nodes, m is thenumber of edges and s is the times of simulation. It does not assume any <strong>to</strong>pological structure of the socialnetwork. There might be some space for optimizationgiven a certain <strong>to</strong>pological property.20


CELF optimized Greedy Algorithm The function f(S) is sub‐modular for both the independentcascade model and the linear threshold model.[1]f(TU{v})‐f(T)


CELF optimized Greedy Algorithm Recall the greedy algorithm. For k iterations:Add node v <strong>to</strong> S that maximizes f(S +v) ‐ f(S). CELF optimized greedy algorithm [2] Keep an ordered list of marginal benefits bi from previousiteration Re‐evaluate evaluate bi only if bi in previous iteration is larger than themaximum marginal benefits for current iteration22


CELF optimized Greedy Algorithm23In the second iteration, the nodes b, e, c do notneed <strong>to</strong> be evaluated because their gain in firstiteration is smaller than the gain of d in the currentiteration.[10]


CELF optimized Greedy Algorithm Pros It produces the same result as the greedy algorithm with muchless time. (Experiments show that it is 700 times faster thanthe original greedy algorithm) Cons It does not improve the running time complexity. For a largenetwork it is still slow.24


Degree Discount Heuristics The degree heuristics is a good baseline compared <strong>to</strong>other heuristics(random, distance centrality). If a node u has been selected as a seed, when consideringselecting a new seed v based on its degree, we should notcount the edge {v, u} <strong>to</strong>wards its degree.[2]25


Degree Discount Heuristics26Node 6 is selected in the second iteration because ithas a higher degree than node 5, whose degree isdiscounted by 1.


Degree Discount heuristics There is a more sophisticated discount mechanisms whenthe diffusion probability is small[2]. In the IC model with propagation probability p, supposethat dv=O(1/p) and tv=o(1/p) for a vertex v. The expectednumber of additional vertices in Star(v) influenced byselecting v in<strong>to</strong> the seed set is: 1(d 1+(dv‐2tv‐(dv‐tv)tvp+o(tv))*pt)t (t))*27


Result on the above heuristics[2]28


Result on the above heuristics[2]29


Result on the above heuristics[2]30


Our motivation Make use of the modularity of social networks. Generallya social network exhibits some degree of modularity. Try <strong>to</strong> make an algorithm that scales better than greedyalgorithm in terms of the time complexity. Idea: Treat the seeds as a resource. Try <strong>to</strong> allocate the resources <strong>to</strong>the communities of a network kin an intelligent t way.31


Growth function The growth function F(k, G, M, Γ) maps the number ofinitial seeds k, the network G, the diffusion model M anda base strategy of selecting initial active nodes(like highdegree) <strong>to</strong> the expected number of active nodes whenthe dff diffusion process terminates. Relation with f(S) in [1]. Γ(k, G)=S.32


Growth FunctionCondensed Matter physics networkwith diffusion probability 0.05Condensed Matter physics networkwith linear threshold model33


Problem Reformulation We made a simplified assumptions. Communities aredisconnected. Maximize D=∑F i (k i , G i , M i , Γ i )s.t. ∑k i =k. Solutions: OPT(k,n)=MAX{F n (i, G n , M n , Γ n )+OPT(k‐I,n‐1)} 0≤ i≤k.34


Adaptation <strong>to</strong> a real world network• The assumption that all the communities aredisconnected is <strong>to</strong>o restrictive.• Communities are usually weakly connected.• It violates the requirement of the reformulated problem.• Second assumption: The number of cross communityedges is relatively small and they are far from the centerof the community. It won’t have a large effect.35


<strong>OASNET</strong> algorithm The steps of the algorithm. 1. Use an external community finding algorithm <strong>to</strong> findthe communities of the network. Extract thesecommunities from the network. 2. Simulate the diffusion process on each community <strong>to</strong>find its corresponding growth function. 3. Apply the optimization equation on the n growthfunctions <strong>to</strong> find an optimal allocation of n initial targetnodes in<strong>to</strong> n communities. 4. Return the seed as the union of Γ(k i ,G i ). Time complexity is O(kms).36


Experiment Settings The greedy algorithm does not scale well <strong>to</strong> the cases ofour experiments (<strong>to</strong>o slow for networks over 30,000nodes). We use four heuristics <strong>to</strong> compare. Degree heuristics(M5), equal allocation(M2), randomallocation(M4), proportional allocation(M1).37


Experiment Settings Datasets. 1.Condense matter physics collaboration network[3]; 2.PGP giant component network[4]; 3.Lederberg citation network; 4.Zewail Citation Network.38


Experiment Results39Condensed Matter physics network with diffusion probability 0.05


Experiment Results40PGP Network with diffusion probability 0.1


Experiment Results41PGP Network with diffusion probability 0.2


Experiment Results42Lederberg Citation Network with diffusion probability 0.1


Experiment Results43Lederberg Citation Network with diffusion probability 0.2


Experiment Results44Zewail Citation Network with diffusion probability 0.1


Experiment Results45Zewail Citation Network with diffusion probability 0.2


Experiment Results46Condensed matter physics Network with threshold model


Experiment Results47PGP Network with threshold model


Experiment Results48Lederberg Citation Network with threshold model


Experiment Results49Zewail Citation Network with threshold model


Conclusions The proposed algorithm outperforms comparingheuristics in most cases. The difference between the proposed method andcomparison heuristics are larger when the diffusionprobability is higher for the independent cascade model.50


References [1] D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizingthe spread of influence through a social network. In KDD,pages 137–146, 2003. [2]Wei Chen, Yajun Wang, Siyu Yang. Efficient <strong>Influence</strong><strong>Maximization</strong> In Social network. KDD 2009 [3] M. E. J. Newman. The structure of scientificcollaboration networks. PNAS, 98(2):404–409, 409 2001. [4] M. B. n´a, R. Pas<strong>to</strong>r‐Sa<strong>to</strong>rras, A. D´ıaz‐Guilera, and A.Arenas. Models of social networks based on socialdistance attachment. Physical Review E, 70(5):056122,2004.51


References [5] M Granovetter. Threshold Models of Collective Behavior.American Journal of. Sociology 83 (1978) 1420–1443. [6] Bearman PS, Moody J, S<strong>to</strong>vel K. Chains of affection: Thestructure of adolescent romantic and sexual networks.the American Journal of Sociology, Vol. 100, No. 1. [7] A. Clauset, C. Moore, and M. E. J. Newman. Structuralinference of hierarchies in networks, 2006. [8] Zachary W. (1977). <strong>An</strong> information flow model forconflict and fission in small groups. Journal of<strong>An</strong>thropological Research, 33, 452‐473.52


References [9]http://datamining.typepad.com/data_mining/2010/01/malaysian‐blogosphere‐division.html. [10] Jure Leskovec, Chris<strong>to</strong>s Faloutsos. Tools for largegraph mining WWW 2008 tu<strong>to</strong>rial Part 2: Diffusion andcascading behavior.53


54Thanks

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!